dataexample.stratified {CaseCohortCoxSurvival}R Documentation

Example of case-cohort with stratified sampling of the subcohort, and set of auxiliary variables

Description

List with cohort and A.

cohort is a simulated cohort with 20 000 subjects. It contains:

id is the subject identifier.

X1 is a continuous baseline covariate. Its measurements are only available for subjects in the case-cohort, i.e., on subjects with subcohort = 1 and/or status = 1.

X2 is a categorical baseline covariate, with categories 0, 1, and 2. It is measured on all cohort subjects.

X3 is a continuous baseline covariate. Its measurements are only available for subjects in the case-cohort.

W is a baseline categorical variable, with categories 0, 1, 2, and 3. It depends on predictors of X1 and X2. It is measured on all cohort subjects. The stratified sampling of the subcohort was based on the 4 strata defined by W.

status indicates case status.

event.time gives the event or censoring time. status indicates whether the subject experienced the event of interest or was censored.

97, 294, 300, and 380 subjects were sampled (independently of case status) from the 4 strata, respectively. subcohort indicates all these subjects included in the subcohort. The stratified case-cohort (phase-two sample) consists of the subcohort and any other cases not in the subcohort.

strata.n gives the number of subjects in the stratum in the cohort.

strata.m gives the number of subjects sampled from each of the 4 strata (i.e., 97, 294, 300, or 380). strata.m and strata.n would be used to compute the stratum-specific design weights of non-cases. Because all the cases were included in the case-cohort, they would be assigned a design weight of 1.

strata.n.cases gives the number of cases in each of the 4 strata.

n.cases gives the number of cases in the entire cohort.

X1.proxy is a continuous baseline covariate. It is a proxy of X1, with 0.8 correlation. It is measured on all cohort subjects. It can be used for design weights calibration in the argument predictors.cox.phase2 of function caseCohortCoxSurvival, as one would need to predict X1 on the entire cohort.

X3.proxy is a continuous baseline covariate. It is a proxy of X3, with 0.8 correlation. It is measured on all cohort subjects. It can be used for design weights calibration in the argument predictors.cox.phase2 of function caseCohortCoxSurvival, as one would need to predict X3 on the entire cohort.

X1.pred is a prediction of X1, available for all cohort subjects. The predictions were obtained by weighted linear regression on X1.proxy and W, with the design weights.

X3.pred is a prediction of X3, available for all cohort subjects. The predictions were obtained by weighted linear regression on X1.proxy, X2, and X3.proxy, with the design weights.

A contains auxiliary variables, obtained as proposed by Breslow et al. (2009) and Shin et al. (2020). A can be used with argument aux.var of function caseCohortCoxSurvival.

Predictions of X1 were obtained by weighted linear regression on X1.proxy and W, with the design weights. Predictions of X3 were obtained by weighted linear regression on X1.proxy, X2, and X3.proxy, with the design weights. Then the Cox model with X2 and the predicted values of X1 and X3 (available for all cohort subjects) was run. A.X1, A.X2, and A.X3 contain the influences on the estimated log-RHs (available for all cohort subjects).

Second, design weights were then calibrated based on A.1, A.X1, A.X2, and A.X3, with A.1 that is identically equal to 1. The log-RH parameter was then estimated from the case-cohort data with these calibrated weights. Finally, the log-RH estimate was used with X2 and the predicted values of X1 and X3 (available for all cohort subjects), and exponentiated. A.Shin contains the product of this quantity with the total follow-up time on interval (0,8].

References

Etievant, L., Gail, M. H. (2024). Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data. Lifetime Data Analysis, 30, 572-599.

Etievant, L., Gail, M. H. (2024). Software Application Profile: CaseCohortCoxSurvival: an R package for case-cohort inference for relative hazard and pure risk under the Cox model. Submitted.

Shin Y.E., Pfeiffer R.M., Graubard B.I., Gail M.H. (2020) Weight calibration to improve the efficiency of pure risk estimates from case-control samples nested in a cohort. Biometrics, 76, 1087-1097

Breslow, N.E., Lumley, T., Ballantyne, C.M., Chambless, L.E. and Kulich, M. (2009). Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology. Statistics in Biosciences, 1, 32-49.

Examples


 data(dataexample.stratified, package="CaseCohortCoxSurvival")

 # Display some of the data
 dataexample.stratified$cohort[1:5, ]

 dataexample.stratified$A[1:5, ] # auxiliary variable values in the cohort

[Package CaseCohortCoxSurvival version 0.0.36 Index]