fit_IVDML {IVDML}R Documentation

Fitting Double Machine Learning Models with Instrumental Variables and Potentially Heterogeneous Treatment Effect

Description

This function is used to fit a Double Machine Learning (DML) model with Instrumental Variables (IV) with the goal to perform inference on potentially heterogeneous treatment effects. The model under study is Y = \beta(A)D + g(X) + \epsilon, where the error \epsilon is potentially correlated with the treatment D, but there is an IV Z satisfying \mathbb E[\epsilon|Z,X] = 0. The object of interest is the treatment effect \beta of the treatment D on the response Y. The treatment effect \beta is either constant or can depend on the univariate quantity A, which is typically a component of the covariates X.

Usage

fit_IVDML(
  Y,
  D,
  Z,
  X = NULL,
  A = NULL,
  ml_method,
  ml_par = list(),
  A_deterministic_X = TRUE,
  K_dml = 5,
  iv_method = c("linearIV", "mlIV"),
  S_split = 1
)

Arguments

Y

Numeric vector. Response variable.

D

Numeric vector. Treatment variable.

Z

Matrix, vector, or data frame. Instrumental variables.

X

Matrix, vector, or data frame. Additional covariates (default: NULL).

A

Numeric vector. Variable with respect to which treatment effect heterogeneity is considered. Usually equal to a column of X and in this case it can also be specified later (default: NULL).

ml_method

Character. Machine learning method to use. Options are "gam", "xgboost", and "randomForest".

ml_par

List. Parameters for the machine learning method:

  • If ml_method == "gam", can specify ind_lin_Z and ind_lin_X for components of Z and X to be modeled linearly.

  • If ml_method == "xgboost", can specify max_nrounds, k_cv, early_stopping_rounds, and vectors eta and max_depth.

  • If ml_method == "randomForest", can specify num.trees, num_mtry (number of different mtry values to try out) or a vector mtry, a vector max.depth, num_min.node.size (number of different min.node.size values to try out) or a vector min.node.size.

  • To specify different parameters for the different nuisance function regressions, ml_par should be a list of lists: ml_par_D_XZ (parameters for nuisance function \mathbb E[D|Z, X], needed for iv_method "mlIV" and "mlIV_direct"), ml_par_D_X (parameters for nuisance function \mathbb E[D|X], needed for iv_method "linearIV", "mlIV" and "mlIV_direct"), ml_par_f_X (parameters for nuisance function \mathbb E[\widehat{\mathbb E}[D|Z, X]|X], needed for iv_method "mlIV"), ml_par_Y_X (parameters for nuisance function \mathbb E[Y|X], needed for iv_method "linearIV", "mlIV" and "mlIV_direct"), ml_par_Z_X (parameters for nuisance function \mathbb E[Z|X], needed for iv_method "linearIV").

A_deterministic_X

Logical. Whether A is a deterministic function of X (default: TRUE).

K_dml

Integer. Number of cross-fitting folds (default: 5).

iv_method

Character vector. Instrumental variables estimation method. Options: "linearIV", "mlIV", "mlIV_direct" (default: c("linearIV", "mlIV")). "linearIV" corresponds to using instruments linearly and "mlIV" corresponds to using machine learning instruments. "mlIV_direct" is a variant of "mlIV" that uses the same estimate of \mathbb E[D|X] for both the residuals X - \mathbb E[D|X] and \mathbb E[D|Z, X] - \mathbb E[D|X], whereas "mlIV" uses a two-stage estimate of \mathbb E[\widehat{\mathbb E}[D|Z, X]|X] for the residuals \mathbb E[D|Z, X] - \mathbb E[D|X].

S_split

Integer. Number of sample splits for cross-fitting (default: 1).

Value

An object of class IVDML, containing:

References

Cyrill Scheidegger, Zijian Guo and Peter Bühlmann. Inference for heterogeneous treatment effects with efficient instruments and machine learning. Preprint, arXiv:2503.03530, 2025.

See Also

Inference for a fitted IVDML object is done with the functions coef.IVDML(), se(), standard_confint() and robust_confint().

Examples

set.seed(1)
Z <- rnorm(100)
X <- Z + rnorm(100)
H <- rnorm(100)
D <- Z^2 + sin(X) + H + rnorm(100)
A <- X
Y <- tanh(A) * D + cos(X) - H + rnorm(100)
fit <- fit_IVDML(Y = Y, D = D, Z = Z, X = X, A = A, ml_method = "gam")
coef(fit, iv_method = "mlIV", a = 0, A = A, kernel_name = "boxcar", bandwidth = 0.2)


[Package IVDML version 1.0.0 Index]