Estimate treatment-group-specific response means and (optionally) treatment group contrasts using a generalized linear working model.

robincar_glm(
  df,
  treat_col,
  response_col,
  formula = NULL,
  car_strata_cols = NULL,
  car_scheme = "simple",
  g_family = stats::gaussian,
  g_accuracy = 7,
  contrast_h = NULL,
  contrast_dh = NULL,
  variance_type = 1
)

Arguments

df

A data.frame with the required columns

treat_col

Name of column in df with treatment variable

response_col

Name of the column in df with response variable

formula

The formula to use for adjustment specified using as.formula("..."). This overrides car_strata_cols and covariate_cols.

car_strata_cols

Names of columns in df with car_strata variables

car_scheme

Name of the type of covariate-adaptive randomization scheme. One of: "simple", "pocock-simon", "biased-coin", "permuted-block".

g_family

Family that would be supplied to glm(...), e.g., binomial. If no link specified, will use default link, like behavior in glm. If you wish to use a negative binomial working model with an unknown dispersion parameter, then use `g_family="nb"`.

g_accuracy

Level of accuracy to check prediction un-biasedness.

contrast_h

An optional function to specify a desired contrast

contrast_dh

An optional jacobian function for the contrast (otherwise use numerical derivative)

variance_type

The type of variance estimator to use, type 1, 2, or 3. All three are asymptotically equivalent. See details for more.

Value

If `contrast_h` argument is used, outputs a `main` and a `contrast` object. The `main` object has the following structure:

result

A dplyr::tibble() with the treatment label, treatment mean estimate using AIPW, estimated SE, and p-value based on a z-test with estimate and SE.

varcov

The variance-covariance matrix for the treatment mean estimates.

settings

List of model settings used in covariate adjustment.

original_df

The original dataset provided by the user.

mod

The fit from the glm() working model used for covariate adjustment.

mu_a

Predicted potential outcomes for each treatment category (columns) and individual (rows). These are the \(\hat{\mu}_a\)

.

g.estimate

The G-computation estimate based only on \(\frac{1}{n} \sum_{i=1}^{n} \hat{\mu}_a(X_i)\). This is equivalent to the AIPW estimate when a canonical link function is used.

data

Attributes about the dataset.

The `contrast` object has a structure that is documented in RobinCar::robincar_contrast().

Details

The output is the AIPW estimator given by (for each treatment group \(a\)):

$$\frac{1}{n} \sum_{i=1}^{n} \hat{\mu}_a(X_i) + \frac{1}{n_a} \sum_{i:A_i=a} \{Y_i - \hat{\mu}(X_i)\}$$

where \(Y_i\) is the outcome, \(A_i\) is the treatment assignment, \(X_i\) are the covariates, \(n_a = \sum_{i=1}^{n} A_i=a\), and \(\hat{\mu}_a\) is the estimated conditional mean function based on the GLM working model. This working model has treatment \(a\)-specific coefficients if `adj_method` is "heterogeneous". Otherwise, they are shared across the treatment arms. Alternatively, if `formula` is used, the working model can be specified according to the user.

Importantly, the estimated variance accounts for misspecification of the working model, and for covariate-adaptive randomization. The variance estimator is given by $$\hat{V} = \hat{V}_{\rm SR} - \hat{V}_{\Omega}$$ where \(\hat{V}_{\rm SR}\) is the contribution to the variance under simple randomization, and \(\hat{V}_{\Omega}\) is a term that only appears when a covariate-adaptive randomization scheme is used. The \(\hat{V}_{\Omega}\) is the second line of \(\hat{V}\) in Bannick et al. (2025).

There are three different estimators available for \(\hat{V}_{\rm SR}\), which the user can choose with the argument variance_type. We describe these here.

The three variance types are given as follows:

  • Type 1 (default): $$\mathrm{diag}\left[\hat{\pi}_a^{-1} (\mathrm{Var}_a(Y_i) - 2\hat{Q}_{a,a} + \hat{\Sigma}_{a,a} ), a = 1, \dots, K \right] + \hat{Q} + \hat{Q}^T - \hat{\Sigma}$$

  • Type 2: $$\mathrm{diag}\left[\hat{\pi}_a^{-1} (\mathrm{Var}_a(Y_i - \hat{\mu}_a(X_i)) - 2\hat{Q}_{a,a} + \hat{\Sigma}_{a,a} ), a = 1, \dots, K \right] + \hat{Q} + \hat{Q}^T - \hat{\Sigma}$$

  • Type 3: $$\mathrm{diag}\left[\hat{\pi}_a^{-1} E_a([Y_i - \hat{\mu}_a(X_i)]^2), a = 1, \dots, K \right] + \hat{A}$$

where \(\hat{\pi}_a\) is the treatment proportion for group a, \(\hat{Q}_{a,b} = \mathrm{Cov}_a(Y_i, \hat{\mu}_b(X_i))\), \(\hat{\Sigma}_{a,b} = \mathrm{Cov}(\hat{\mu}_a(X_i), \hat{\mu}_b(X_i))\), and the matrix \(\hat{A}\) has diagonal entries for \((a, a)\) given by $$2\mathrm{Cov}_a(Y_i - \hat{\mu}_a(X_i), \hat{\mu}_a(X_i)) + \mathrm{Var}_a(\hat{\mu}_a(X_i))$$ and off-diagonal entries for \((a, b)\) given by $$\mathrm{Cov}_a(Y_i, \hat{\mu}_b(X_i)) + \mathrm{Cov}_b(Y_i, \hat{\mu}_a(X_i)) - (1/2) \left[\mathrm{Cov}_a(\hat{\mu}_a(X_i), \hat{\mu}_b(X_i)) + \mathrm{Cov}_b(\hat{\mu}_a(X_i), \hat{\mu}_b(X_i)) \right].$$ We use \(E_a\), \(\mathrm{Var}_a\), and \(\mathrm{Cov}_a\) to refer to the empirical expectation, variance, and covariance among observations in group a only, and \(\mathrm{Cov}\) is the covariance within the entire sample.

Please see the Supplemental Material Sect. H of Bannick et al. (2025) for a discussion of the merits of each type of variance estimator. Briefly, we recommend variance types 1 generally, and variance type 3 if it is anticipated that the distribution of \(X\) varies substantially over treatment groups.