Covariate adjustment using generalized linear working model

Estimate treatment-group-specific response means and (optionally) treatment group contrasts using a generalized linear working model.

robincar_glm(
  df,
  treat_col,
  response_col,
  formula = NULL,
  car_strata_cols = NULL,
  car_scheme = "simple",
  g_family = stats::gaussian,
  g_accuracy = 7,
  contrast_h = NULL,
  contrast_dh = NULL,
  variance_type = 1
)

Arguments

df: A data.frame with the required columns
treat_col: Name of column in df with treatment variable
response_col: Name of the column in df with response variable
formula: The formula to use for adjustment specified using as.formula("..."). This overrides car_strata_cols and covariate_cols.
car_strata_cols: Names of columns in df with car_strata variables
car_scheme: Name of the type of covariate-adaptive randomization scheme. One of: "simple", "pocock-simon", "biased-coin", "permuted-block".
g_family: Family that would be supplied to glm(...), e.g., binomial. If no link specified, will use default link, like behavior in glm. If you wish to use a negative binomial working model with an unknown dispersion parameter, then use `g_family="nb"`.
g_accuracy: Level of accuracy to check prediction un-biasedness.
contrast_h: An optional function to specify a desired contrast
contrast_dh: An optional jacobian function for the contrast (otherwise use numerical derivative)
variance_type: The type of variance estimator to use, type 1, 2, or 3. All three are asymptotically equivalent. See details for more.

Value

If `contrast_h` argument is used, outputs a `main` and a `contrast` object. The `main` object has the following structure:

result: A dplyr::tibble() with the treatment label, treatment mean estimate using AIPW, estimated SE, and p-value based on a z-test with estimate and SE.
varcov: The variance-covariance matrix for the treatment mean estimates.
settings: List of model settings used in covariate adjustment.
original_df: The original dataset provided by the user.
mod: The fit from the glm() working model used for covariate adjustment.
mu_a: Predicted potential outcomes for each treatment category (columns) and individual (rows). These are the $\hat{\mu}_a$

g.estimate: The G-computation estimate based only on $\frac{1}{n} \sum_{i=1}^{n} \hat{\mu}_a(X_i)$. This is equivalent to the AIPW estimate when a canonical link function is used.
data: Attributes about the dataset.

The `contrast` object has a structure that is documented in RobinCar::robincar_contrast().

Details

The output is the AIPW estimator given by (for each treatment group $a$):

$$\frac{1}{n} \sum_{i=1}^{n} \hat{\mu}_a(X_i) + \frac{1}{n_a} \sum_{i:A_i=a} \{Y_i - \hat{\mu}(X_i)\}$$

where $Y_i$ is the outcome, $A_i$ is the treatment assignment, $X_i$ are the covariates, $n_a = \sum_{i=1}^{n} A_i=a$, and $\hat{\mu}_a$ is the estimated conditional mean function based on the GLM working model. This working model has treatment $a$-specific coefficients if `adj_method` is "heterogeneous". Otherwise, they are shared across the treatment arms. Alternatively, if `formula` is used, the working model can be specified according to the user.

Importantly, the estimated variance accounts for misspecification of the working model, and for covariate-adaptive randomization. The variance estimator is given by $$\hat{V} = \hat{V}_{\rm SR} - \hat{V}_{\Omega}$$ where $\hat{V}_{\rm SR}$ is the contribution to the variance under simple randomization, and $\hat{V}_{\Omega}$ is a term that only appears when a covariate-adaptive randomization scheme is used. The $\hat{V}_{\Omega}$ is the second line of $\hat{V}$ in Bannick et al. (2025).

There are three different estimators available for $\hat{V}_{\rm SR}$, which the user can choose with the argument variance_type. We describe these here.

The three variance types are given as follows:

Type 1 (default): $$\mathrm{diag}\left[\hat{\pi}_a^{-1} (\mathrm{Var}_a(Y_i) - 2\hat{Q}_{a,a} + \hat{\Sigma}_{a,a} ), a = 1, \dots, K \right] + \hat{Q} + \hat{Q}^T - \hat{\Sigma}$$
Type 2: $$\mathrm{diag}\left[\hat{\pi}_a^{-1} (\mathrm{Var}_a(Y_i - \hat{\mu}_a(X_i)) - 2\hat{Q}_{a,a} + \hat{\Sigma}_{a,a} ), a = 1, \dots, K \right] + \hat{Q} + \hat{Q}^T - \hat{\Sigma}$$
Type 3: $$\mathrm{diag}\left[\hat{\pi}_a^{-1} E_a([Y_i - \hat{\mu}_a(X_i)]^2), a = 1, \dots, K \right] + \hat{A}$$

where $\hat{\pi}_a$ is the treatment proportion for group a, $\hat{Q}_{a,b} = \mathrm{Cov}_a(Y_i, \hat{\mu}_b(X_i))$, $\hat{\Sigma}_{a,b} = \mathrm{Cov}(\hat{\mu}_a(X_i), \hat{\mu}_b(X_i))$, and the matrix $\hat{A}$ has diagonal entries for $(a, a)$ given by $$2\mathrm{Cov}_a(Y_i - \hat{\mu}_a(X_i), \hat{\mu}_a(X_i)) + \mathrm{Var}_a(\hat{\mu}_a(X_i))$$ and off-diagonal entries for $(a, b)$ given by $$\mathrm{Cov}_a(Y_i, \hat{\mu}_b(X_i)) + \mathrm{Cov}_b(Y_i, \hat{\mu}_a(X_i)) - (1/2) \left[\mathrm{Cov}_a(\hat{\mu}_a(X_i), \hat{\mu}_b(X_i)) + \mathrm{Cov}_b(\hat{\mu}_a(X_i), \hat{\mu}_b(X_i)) \right].$$ We use $E_a$, $\mathrm{Var}_a$, and $\mathrm{Cov}_a$ to refer to the empirical expectation, variance, and covariance among observations in group a only, and $\mathrm{Cov}$ is the covariance within the entire sample.

Please see the Supplemental Material Sect. H of Bannick et al. (2025) for a discussion of the merits of each type of variance estimator. Briefly, we recommend variance types 1 generally, and variance type 3 if it is anticipated that the distribution of $X$ varies substantially over treatment groups.