Construct SQL expressions for a design matrix

Description

Expands formula terms into SQL SELECT expressions, handling factor one-hot encoding and interaction terms. Only the right-hand side of the formula is processed; the response variable (LHS) is ignored.

Usage

sql_model_matrix(
  formula,
  conn,
  table,
  expand = c("all", "interactions"),
  sep = "_x_",
  fe_vars = character()
)

Arguments

formula A formula (or Formula) object. Only RHS terms are expanded; the LHS (response variable) is ignored.
conn Database connection
table Table name or FROM clause
expand Character: ‘"all"’ expands factors and interactions, ‘"interactions"’ only expands interaction terms (factors in main effects kept as-is for grouping)
sep Character separator for interaction term names. Default is ‘"x"’. Use ‘":"’ for standard R naming convention.
fe_vars Character vector of fixed effect variable names. These are treated as "in the model" for determining whether to drop reference levels in interactions.

Value

List with: - ‘select_exprs’: character vector of SQL expressions - ‘col_names’: corresponding column names - ‘factor_levels’: list of factor levels by variable (for reference)

Examples

library("dbreg")

library(DBI)
library(duckdb)
con = dbConnect(duckdb())
duckdb_register(con, "test", data.frame(x1 = 1:3, x2 = c("a", "b", "c")))
sql_model_matrix(~ x1 + x2, con, "test")
$select_exprs
[1] "x1"                                      
[2] "CASE WHEN x2 = 'b' THEN 1.0 ELSE 0.0 END"
[3] "CASE WHEN x2 = 'c' THEN 1.0 ELSE 0.0 END"

$col_names
[1] "x1"  "x2b" "x2c"

$factor_levels
$factor_levels$x2
[1] "a" "b" "c"
sql_model_matrix(~ x1:x2, con, "test")
$select_exprs
[1] "(x1) * (CASE WHEN x2 = 'a' THEN 1.0 ELSE 0.0 END)"
[2] "(x1) * (CASE WHEN x2 = 'b' THEN 1.0 ELSE 0.0 END)"
[3] "(x1) * (CASE WHEN x2 = 'c' THEN 1.0 ELSE 0.0 END)"

$col_names
[1] "x1_x_x2a" "x1_x_x2b" "x1_x_x2c"

$factor_levels
$factor_levels$x2
[1] "a" "b" "c"
sql_model_matrix(~ x1:x2, con, "test", sep = ":")
$select_exprs
[1] "(x1) * (CASE WHEN x2 = 'a' THEN 1.0 ELSE 0.0 END)"
[2] "(x1) * (CASE WHEN x2 = 'b' THEN 1.0 ELSE 0.0 END)"
[3] "(x1) * (CASE WHEN x2 = 'c' THEN 1.0 ELSE 0.0 END)"

$col_names
[1] "x1:x2a" "x1:x2b" "x1:x2c"

$factor_levels
$factor_levels$x2
[1] "a" "b" "c"
dbDisconnect(con)