BBC_dichotom {Qindex} | R Documentation |
Bootstrap-based Optimism Correction for Dichotomization
Description
Multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.
Usage
BBC_dichotom(formula, data, ...)
optimism_dichotom(fom, X, data, R = 100L, ...)
coef_dichotom(fom, X., data)
Arguments
formula |
formula, e.g., |
data |
|
... |
additional parameters, currently not in use |
fom |
formula, e.g., |
X |
numeric matrix of |
R |
positive integer scalar,
number of bootstrap replicates |
X. |
logical matrix |
Details
Function BBC_dichotom obtains a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors. Specifically,
Obtain the dichotomizing rules
\mathbf{\mathcal{D}}
of predictorsx_1,\cdots,x_k
based on responsey
(via m_rpartD). Multivariable regression (with additional predictorsz
, if any) with dichotomized predictors\left(\tilde{x}_1,\cdots,\tilde{x}_k\right) = \mathcal{D}\left(x_1,\cdots,x_k\right)
(via helper function coef_dichotom) is the apparent performance.Obtain the bootstrap-based optimism based on
R
copies of bootstrap samples (via helper function optimism_dichotom). The median of bootstrap-based optimism overR
bootstrap copies is the optimism-correction of the dichotomized predictors\tilde{x}_1,\cdots,\tilde{x}_k
.Subtract the optimism-correction (in Step 2) from the apparent performance estimates (in Step 1), only for
\tilde{x}_1,\cdots,\tilde{x}_k
. The apparent performance estimates for additional predictorsz
's, if any, are not modified. Neither the variance-covariance (vcov) estimates nor the other regression diagnostics, e.g., residuals, logLikelihood, etc., of the apparent performance are modified for now. This coefficient-only, partially-modified regression model is the optimism-corrected performance.
Value
Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,
attr(,'optimism')
the returned object from optimism_dichotom
attr(,'apparent_cutoff')
a double vector, cutoff thresholds for the
k
predictors in the apparent model
Details on Helper Functions
Bootstrap-Based Optimism
Helper function optimism_dichotom computes the bootstrap-based optimism of the dichotomized predictors. Specifically,
R
copies of bootstrap samples are generated. In thej
-th bootstrap sample,obtain the dichotomizing rules
\mathbf{\mathcal{D}}^{(j)}
of predictorsx_1^{(j)},\cdots,x_k^{(j)}
based on responsey^{(j)}
(via m_rpartD)multivariable regression (with additional predictors
z^{(j)}
, if any) coefficient estimates\mathbf{\hat{\beta}}^{(j)} = \left(\hat{\beta}_1^{(j)},\cdots,\hat{\beta}_k^{(j)}\right)^t
of the dichotomized predictors\left(\tilde{x}_1^{(j)},\cdots,\tilde{x}_k^{(j)}\right) = \mathcal{D}^{(j)}\left(x_1^{(j)},\cdots,x_k^{(j)}\right)
(via coef_dichotom) are the bootstrap performance estimate.
Dichotomize
x_1,\cdots,x_k
in the entire data using each of the bootstrap rules\mathcal{D}^{(1)},\cdots,\mathcal{D}^{(R)}
. Multivariable regression (with additional predictorsz
, if any) coefficient estimates\mathbf{\hat{\beta}}^{[j]} = \left(\hat{\beta}_1^{[j]},\cdots,\hat{\beta}_k^{[j]}\right)^t
of the dichotomized predictors\left(\tilde{x}_1^{[j]},\cdots,\tilde{x}_k^{[j]}\right) = \mathcal{D}^{(j)}\left(x_1,\cdots,x_k\right)
(via coef_dichotom) are the test performance estimate.Difference between the bootstrap and test performance estimates, an
R\times k
matrix of\left(\mathbf{\hat{\beta}}^{(1)},\cdots,\mathbf{\hat{\beta}}^{(R)}\right)
minus anotherR\times k
matrix of\left(\mathbf{\hat{\beta}}^{[1]},\cdots,\mathbf{\hat{\beta}}^{[R]}\right)
, are the bootstrap-based optimism.
Multivariable Regression Coefficient Estimates of Dichotomized Predictors \tilde{x}
's
Helper function coef_dichotom
fits a multivariable Cox proportional hazards (coxph) model for Surv response,
logistic (glm) regression model for logical response,
or linear (lm) regression model for gaussian response,
with
the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k
as well as
the additional predictors z
's.
It is almost inevitable to have duplicates among the dichotomized predictors \tilde{x}_1,\cdots,\tilde{x}_k
.
In such case, the multivariable model is fitted using the unique \tilde{x}
's.
Returns of Helper Functions
Of helper function optimism_dichotom
Helper function optimism_dichotom returns an R\times k
double matrix of
bootstrap-based optimism,
with attributes
attr(,'cutoff')
an
R\times k
double matrix, theR
copies of bootstrap cutoff thresholds for thek
predictors. See attribute'cutoff'
of function m_rpartD
Of helper function coef_dichotom
Helper function coef_dichotom returns a double vector of
the regression coefficients of dichotomized predictors \tilde{x}
's, with attributes
In the case of duplicated \tilde{x}
's, the regression coefficients of the unique \tilde{x}
's are duplicated for those duplicates in \tilde{x}
's.
References
For helper function optimism_dichotom
Ewout W. Steyerberg (2009) Clinical Prediction Models. doi:10.1007/978-0-387-77244-8
Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
Examples
library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))
m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus ~ kappa + lambda,
data = flchain_Circulatory, R = 1e2L)
summary(m1)
matrixStats::colMedians(BBC_cutoff(m1)) # median bootstrap cutoff
attr(m1, 'apparent_cutoff')