cv_logistic2ph {sleev}R Documentation

Cross-validation log-likelihood prediction for logistic2ph

Description

Performs cross-validation to calculate the average predicted log likelihood for the logistic2ph function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood. See pacakge vigenette for code examples.

Usage

cv_logistic2ph(
  y_unval = NULL,
  y = NULL,
  x_unval = NULL,
  x = NULL,
  z = NULL,
  data,
  nfolds = 5,
  tol = 1e-04,
  max_iter = 1000,
  verbose = FALSE
)

Arguments

y_unval

Column name of the error-prone or unvalidated binary outcome. This argument is optional. If y_unval = NULL (the default), y is treated as error-free.

y

Column name that stores the validated value of y_unval in the second phase. Subjects with missing values of y are considered as those not selected in the second phase. This argument is required.

x_unval

Specifies the columns of the error-prone covariates. This argument is required.

x

Specifies the columns that store the validated values of x_unval in the second phase. Subjects with missing values of x are considered as those not selected in the second phase. This argument is required.

z

Specifies the columns of the accurately measured covariates. Subjects with missing values of z are omitted from the analysis. This argument is optional.

data

Specifies the name of the dataset. This argument is required.

nfolds

Specifies the number of cross-validation folds. The default value is 5. Although nfolds can be as large as the sample size (leave-one-out cross-validation), it is not recommended for large datasets. The smallest value allowable is 3.

tol

Specifies the convergence criterion in the EM algorithm. The default value is 1E-4. This argument is optional.

max_iter

Specifies the maximum number of iterations in the EM algorithm. The default number is 2000. This argument is optional.

verbose

If TRUE, then show details of the analysis. The default value is FALSE.

Details

cv_logistic2ph gives log-likelihood prediction for models and data like those in logistic2ph. Therefore, the arguments of cv_logistic2ph is analogous to that of logistic2ph.

Value

cv_logistic2ph() returns a list that includes the following components:

avg_pred_loglike

Stores the average predicted log likelihood.

pred_loglike

Stores the predicted log likelihood in each fold.

converge

Stores the convergence status of the EM algorithm in each run.

Examples

## Not run: 
data("mock.vccc")
# different B-spline sizes
sns <- c(15, 20, 25, 30, 35, 40)
# vector to hold mean log-likelihood
pred_loglike.1 <- rep(NA, length(sns))
# specify number of folds in the cross validation
k <- 5
for (i in 1:length(sns)) {
  # constructing B-spline basis using the same process as in Section 4.3.1
  sn <- sns[i]
  data.sieve <- spline2ph(x = "CD4_unval", size = 20, degree = 3,
                          data = mock.vccc, group = "Prior_ART",
                          split_group = TRUE)
  # cross validation, produce mean log-likelihood
  start.time <- Sys.time()
  res.1 <- cv_logistic2ph(y = "ADE_val", y_unval = "ADE_unval",
                          x = "CD4_val", x_unval = "CD4_unval",
                          z = "Prior_ART", data = data.sieve,
                          tol = 1e-04, max_iter = 1000, verbose = FALSE)

  # save mean log-likelihood result
  pred_loglike.1[i] <- res.1$avg_pred_loglik
}
# Print predicted log-likelihood for different B-spline sizes
print(pred_loglike.1)


## End(Not run)


[Package sleev version 1.1.4 Index]