DoubleMLSSM {DoubleML} | R Documentation |
Double machine learning for sample selection models
Description
Double machine learning for sample selection models.
Format
R6::R6Class object inheriting from DoubleML.
Super class
DoubleML::DoubleML
-> DoubleMLSSM
Active bindings
trimming_rule
(
character(1)
)
Acharacter(1)
specifying the trimming approach.trimming_threshold
(
numeric(1)
)
The threshold used for timming.
Methods
Public methods
Inherited methods
DoubleML::DoubleML$bootstrap()
DoubleML::DoubleML$confint()
DoubleML::DoubleML$fit()
DoubleML::DoubleML$get_params()
DoubleML::DoubleML$learner_names()
DoubleML::DoubleML$p_adjust()
DoubleML::DoubleML$params_names()
DoubleML::DoubleML$print()
DoubleML::DoubleML$set_sample_splitting()
DoubleML::DoubleML$split_samples()
DoubleML::DoubleML$summary()
Method new()
Creates a new instance of this R6 class.
Usage
DoubleMLSSM$new( data, ml_g, ml_pi, ml_m, n_folds = 5, n_rep = 1, score = "missing-at-random", normalize_ipw = FALSE, trimming_rule = "truncate", trimming_threshold = 1e-12, dml_procedure = "dml2", draw_sample_splitting = TRUE, apply_cross_fitting = TRUE )
Arguments
data
(
DoubleMLData
)
TheDoubleMLData
object providing the data and specifying the variables of the causal model.ml_g
(
LearnerRegr
,Learner
,character(1)
)
A learner of the classLearnerRegr
, which is available from mlr3 or its extension packages mlr3learners or mlr3extralearners. Alternatively, aLearner
object with public fieldtask_type = "regr"
can be passed, for example of classGraphLearner
. The learner can possibly be passed with specified parameters, for examplelrn("regr.cv_glmnet", s = "lambda.min")
.
ml_g
refers to the nuisance functiong_0(S,D,X) = E[Y|S,D,X]
.ml_pi
(
LearnerClassif
,Learner
,character(1)
)
A learner of the classLearnerClassif
, which is available from mlr3 or its extension packages mlr3learners or mlr3extralearners. Alternatively, aLearner
object with public fieldtask_type = "classif"
can be passed, for example of classGraphLearner
. The learner can possibly be passed with specified parameters, for examplelrn("classif.cv_glmnet", s = "lambda.min")
.
ml_pi
refers to the nuisance functionpi_0(D,X) = Pr[S=1|D,X]
.ml_m
(
LearnerRegr
,LearnerClassif
,Learner
,character(1)
)
A learner of the classLearnerClassif
, which is available from mlr3 or its extension packages mlr3learners or mlr3extralearners. Alternatively, aLearner
object with public fieldtask_type = "classif"
can be passed, for example of classGraphLearner
. The learner can possibly be passed with specified parameters, for examplelrn("classif.cv_glmnet", s = "lambda.min")
.
ml_m
refers to the nuisance functionm_0(X) = Pr[D=1|X]
.n_folds
(
integer(1)
)
Number of folds. Default is5
.n_rep
(
integer(1)
)
Number of repetitions for the sample splitting. Default is1
.score
(
character(1)
,function()
)
Acharacter(1)
("missing-at-random"
or"nonignorable"
) specifying the score function. Default is"missing-at-random"
.normalize_ipw
(
logical(1)
)
Indicates whether the inverse probability weights are normalized. Default isFALSE
.trimming_rule
(
character(1)
)
Acharacter(1)
("truncate"
is the only choice) specifying the trimming approach. Default is"truncate"
.trimming_threshold
(
numeric(1)
)
The threshold used for timming. Default is1e-12
.dml_procedure
(
character(1)
)
Acharacter(1)
("dml1"
or"dml2"
) specifying the double machine learning algorithm. Default is"dml2"
.draw_sample_splitting
(
logical(1)
)
Indicates whether the sample splitting should be drawn during initialization of the object. Default isTRUE
.apply_cross_fitting
(
logical(1)
)
Indicates whether cross-fitting should be applied. Default isTRUE
.
Method set_ml_nuisance_params()
Set hyperparameters for the nuisance models of DoubleML models.
Note that in the current implementation, either all parameters have to be set globally or all parameters have to be provided fold-specific.
Usage
DoubleMLSSM$set_ml_nuisance_params( learner = NULL, treat_var = NULL, params, set_fold_specific = FALSE )
Arguments
learner
(
character(1)
)
The nuisance model/learner (see methodparams_names
).treat_var
(
character(1)
)
The treatment varaible (hyperparameters can be set treatment-variable specific).params
(named
list()
)
A namedlist()
with estimator parameters. Parameters are used for all folds by default. Alternatively, parameters can be passed in a fold-specific way if optionfold_specific
isTRUE
. In this case, the outer list needs to be of lengthn_rep
and the inner list of lengthn_folds
.set_fold_specific
(
logical(1)
)
Indicates if the parameters passed inparams
should be passed in fold-specific way. Default isFALSE
. IfTRUE
, the outer list needs to be of lengthn_rep
and the inner list of lengthn_folds
. Note that in the current implementation, either all parameters have to be set globally or all parameters have to be provided fold-specific.
Returns
self
Method tune()
Hyperparameter-tuning for DoubleML models.
The hyperparameter-tuning is performed using the tuning methods provided in the mlr3tuning package. For more information on tuning in mlr3, we refer to the section on parameter tuning in the mlr3 book.
Usage
DoubleMLSSM$tune( param_set, tune_settings = list(n_folds_tune = 5, rsmp_tune = mlr3::rsmp("cv", folds = 5), measure = NULL, terminator = mlr3tuning::trm("evals", n_evals = 20), algorithm = mlr3tuning::tnr("grid_search"), resolution = 5), tune_on_folds = FALSE )
Arguments
param_set
(named
list()
)
A namedlist
with a parameter grid for each nuisance model/learner (see methodlearner_names()
). The parameter grid must be an object of class ParamSet.tune_settings
(named
list()
)
A namedlist()
with arguments passed to the hyperparameter-tuning with mlr3tuning to set up TuningInstance objects.tune_settings
has entries-
terminator
(Terminator)
A Terminator object. Specification ofterminator
is required to perform tuning. -
algorithm
(Tuner orcharacter(1)
)
A Tuner object (recommended) or key passed to the respective dictionary to specify the tuning algorithm used in tnr().algorithm
is passed as an argument to tnr(). Ifalgorithm
is not specified by the users, default is set to"grid_search"
. If set to"grid_search"
, then additional argument"resolution"
is required. -
rsmp_tune
(Resampling orcharacter(1)
)
A Resampling object (recommended) or option passed to rsmp() to initialize a Resampling for parameter tuning inmlr3
. If not specified by the user, default is set to"cv"
(cross-validation). -
n_folds_tune
(integer(1)
, optional)
Ifrsmp_tune = "cv"
, number of folds used for cross-validation. If not specified by the user, default is set to5
. -
measure
(NULL
, namedlist()
, optional)
Named list containing the measures used for parameter tuning. Entries in list must either be Measure objects or keys to be passed to passed to msr(). The names of the entries must match the learner names (see methodlearner_names()
). If set toNULL
, default measures are used, i.e.,"regr.mse"
for continuous outcome variables and"classif.ce"
for binary outcomes. -
resolution
(character(1)
)
The key passed to the respective dictionary to specify the tuning algorithm used in tnr().resolution
is passed as an argument to tnr().
-
tune_on_folds
(
logical(1)
)
Indicates whether the tuning should be done fold-specific or globally. Default isFALSE
.
Returns
self
Method clone()
The objects of this class are cloneable with this method.
Usage
DoubleMLSSM$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
Other DoubleML:
DoubleML
,
DoubleMLIIVM
,
DoubleMLIRM
,
DoubleMLPLIV
,
DoubleMLPLR
Examples
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
set.seed(2)
ml_g = lrn("regr.ranger",
num.trees = 100, mtry = 20,
min.node.size = 2, max.depth = 5)
ml_m = lrn("classif.ranger",
num.trees = 100, mtry = 20,
min.node.size = 2, max.depth = 5)
ml_pi = lrn("classif.ranger",
num.trees = 100, mtry = 20,
min.node.size = 2, max.depth = 5)
n_obs = 2000
df = make_ssm_data(n_obs = n_obs, mar = TRUE, return_type = "data.table")
dml_data = DoubleMLData$new(df, y_col = "y", d_cols = "d", s_col = "s")
dml_ssm = DoubleMLSSM$new(dml_data, ml_g, ml_m, ml_pi, score = "missing-at-random")
dml_ssm$fit()
print(dml_ssm)
## Not run:
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(mlr3tuning)
library(data.table)
set.seed(2)
ml_g = lrn("regr.rpart")
ml_m = lrn("classif.rpart")
ml_pi = lrn("classif.rpart")
dml_data = make_ssm_data(n_obs = n_obs, mar = TRUE)
dml_ssm = DoubleMLSSM$new(dml_data, ml_g = ml_g, ml_m = ml_m, ml_pi = ml_pi,
score = "missing-at-random")
param_grid = list(
"ml_g" = paradox::ps(
cp = paradox::p_dbl(lower = 0.01, upper = 0.02),
minsplit = paradox::p_int(lower = 1, upper = 2)),
"ml_m" = paradox::ps(
cp = paradox::p_dbl(lower = 0.01, upper = 0.02),
minsplit = paradox::p_int(lower = 1, upper = 2)),
"ml_pi" = paradox::ps(
cp = paradox::p_dbl(lower = 0.01, upper = 0.02),
minsplit = paradox::p_int(lower = 1, upper = 2)))
# minimum requirements for tune_settings
tune_settings = list(
terminator = mlr3tuning::trm("evals", n_evals = 5),
algorithm = mlr3tuning::tnr("grid_search", resolution = 5))
dml_ssm$tune(param_set = param_grid, tune_settings = tune_settings)
dml_ssm$fit()
dml_ssm$summary()
## End(Not run)