plugin_delta_s {survivalsurrogate} | R Documentation |
Residual treatment effect estimation using the plug-in estimator
Description
Estimates the residual treatment effect using the plug-in estimator for evaluating a longitudinal surrogate marker.
Usage
plugin_delta_s(data, folds, id, x, g, a = NULL, y, s, binary_lrnr = NULL, cont_lrnr
= NULL, t0 = length(s), e = NULL, gamma1 = NULL, gamma0 = NULL, mu1 = NULL, mu0 = NULL,
pi = NULL, pistar = NULL, Q1 = NULL, Q0 = NULL, truncate_e = 1e-12, truncate_pi = 1e-12,
se_type = "asymptotic", n_boot = NULL, alpha = 0.05, verbose = FALSE, retain_data
= FALSE)
Arguments
data |
A dataframe containing all necessary variables for estimation. The functions in this package require the dataframe to be in a specific form. Therefore, you will need to reformat your dataset as follows. At a minimum, you will need variables in your dataframe that indicate (1) the folds for crossfitting; (2) a unique observation identifier; (3) baseline covariates, if there are none you must have a variable with all values equal to 1 to provide to the argument x below; (4) a variable indicating treatment group which should be 1 for treatment and 0 for control; (5) a set of variables that contain the surrogate marker value at each time point up to and including the landmark time, denoted t0; (6) a set of variables that indicate primary outcome status at each time point where the surrogate marker is measured, in addition to measurements beyond t0 up to a final time point t > t0. Optionally, if there is censoring, you may also include a set of variables that indicate observation status (1 if not censored, 0 if censored) at all time points. See the exampledata for an example where t0=4 and t=5. |
folds |
A vector defining crossfitting folds for nuisance estimation. |
id |
A string giving the name of the column with unique unit IDs (e.g., individual ID). |
x |
A character vector of covariate names to adjust for in nuisance estimation. At a minimum this must have one covariate equal to 1 for all individuals. |
g |
A string indicating the column name of the treatment indicator with 1 for treatment and 0 for control. |
a |
(Optional) A character vector of the column names indicating observation status at each time point. Specifically, A_t = 1 if the individual is still under observation (i.e., uncensored) at time t, meaning their outcome status could in principle be measured at that time. A value of 0 means the individual was censored prior to t. If not provided, assumed to be 1 for all time points. |
y |
A character vector of the column names indicating primary outcome status at each time point. Specifically, Y_t = 1 if the primary event (e.g., failure, relapse, death) has not yet occurred by time t, and the individual is still at risk. A value of 0 means the primary event occurred on or before time t. |
s |
A character vector of the column names indicating the surrogate marker values measured at each time point, where the value is NA if the individual is no longer observable at that time point. |
binary_lrnr |
Learner object or specification used for estimating binary nuisance components (e.g., propensity scores, censoring). |
cont_lrnr |
Learner object used for estimating continuous-valued outcome regressions. |
t0 |
Landmark time after which the surrogate is not observed. |
e |
(Optional) Column name of propensity score estimates. If not provided, these will be estimated. |
gamma1 |
(Optional) Vector of column names for censoring probabilities under treatment. If not provided, these will be estimated. |
gamma0 |
(Optional) Vector of column names for censoring probabilities under control. If not provided, these will be estimated. |
mu1 |
(Optional) Names of estimated hazards under treatment. If not provided, these will be estimated. |
mu0 |
(Optional) Names of estimated hazards under control. If not provided, these will be estimated. |
pi |
(Optional) Vector of column names giving estimated 'propensity score' for treatment conditional on future values of the surrogate up to time k. See Agniel and Parast (2025+) for full definition. If not provided, these will be estimated. |
pistar |
(Optional) Vector of column names giving estimated 'propensity score' for treatment conditional on future values of the surrogate up to time k-1. See Agniel and Parast (2025+) for full definition. If not provided, these will be estimated. |
Q1 |
(Optional) Names of estimated conditional means under treatment. If not provided, these will be estimated. |
Q0 |
(Optional) Names of estimated conditional means under control.If not provided, these will be estimated. |
truncate_e |
Numeric truncation level for propensity scores to avoid division by near-zero values. Default is |
truncate_pi |
Truncation threshold for surrogate-conditional propensity scores to avoid instability. Default is |
se_type |
Type of standard error estimation, choices are "asymptotic" or "bootstrap". |
n_boot |
(Optional unless se_type = "bootstrap") Number of boostrap samples. |
alpha |
Significance level for confidence intervals. Default is |
verbose |
Logical; if TRUE, print progress messages. |
retain_data |
Logical; if TRUE, retains full dataset with estimated components in the output. |
Value
A dataframe with the following components:
plugin_est |
The plug-in estimate of the residual treatment effect at the landmark time. |
plugin_se |
Standard error of the estimate, based on influence function or bootstrap. |
ci_l |
Lower bound of the confidence interval. |
ci_h |
Upper bound of the confidence interval. |
if_data |
A list containing the dataset with the efficient influence function if retain_data = TRUE. |
References
Agniel D and Parast L (2025). "Robust Evaluation of Longitudinal Surrogate Markers with Censored Data." Journal of the Royal Statistical Society: Series B; doi:10.1093/jrsssb/qkae119.
Examples
data(exampledata)
names(exampledata)
library(glue)
library(rpart)
library(mlr3)
library(dplyr)
library(mlr3learners)
tt <- 5
t0 <- 4
yvars <- paste0('Y_', 0:tt)
lrnc <- glue('regr.rpart')
lrnb <- glue('classif.log_reg')
p_deltahat_s <- plugin_delta_s(
data = exampledata,
folds = 'ff',
id = 'ID',
x = 'X_0',
g = 'G_0',
a = paste0('A_', 0:tt),
y = yvars,
s = paste0('S_', 0:t0),
binary_lrnr = lrn(lrnb, predict_type = 'prob'),
cont_lrnr = lrn(lrnc),
t0=t0,
truncate_pi = 0.005,
truncate_e = 0.005,
verbose = FALSE
)
p_deltahat_s