cv_linear2ph {sleev}R Documentation

Cross-validation log-likelihood prediction for linear2ph

Description

Performs cross-validation to calculate the average predicted log likelihood for the linear2ph function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood. See pacakge vigenette for code examples.

Usage

cv_linear2ph(
  y_unval = NULL,
  y = NULL,
  x_unval = NULL,
  x = NULL,
  z = NULL,
  data = NULL,
  nfolds = 5,
  max_iter = 2000,
  tol = 1e-04,
  verbose = FALSE
)

Arguments

y_unval

Specifies the column of the error-prone outcome that is continuous. Subjects with missing values of y_unval are omitted from the analysis. This argument is required.

y

Specifies the column that stores the validated value of y_unval in the second phase. Subjects with missing values of y are considered as those not selected in the second phase. This argument is required.

x_unval

Specifies the columns of the error-prone covariates. Subjects with missing values of x_unval are omitted from the analysis. This argument is required.

x

Specifies the columns that store the validated values of x_unval in the second phase. Subjects with missing values of x are considered as those not selected in the second phase. This argument is required.

z

Specifies the columns of the accurately measured covariates. Subjects with missing values of z are omitted from the analysis. This argument is optional.

data

Specifies the name of the dataset. This argument is required.

nfolds

Specifies the number of cross-validation folds. The default value is 5. Although nfolds can be as large as the sample size (leave-one-out cross-validation), it is not recommended for large datasets. The smallest value allowable is 3.

max_iter

Specifies the maximum number of iterations in the EM algorithm. The default number is 2000. This argument is optional.

tol

Specifies the convergence criterion in the EM algorithm. The default value is 1E-4. This argument is optional.

verbose

If TRUE, then show details of the analysis. The default value is FALSE.

Details

cv_linear2ph gives log-likelihood prediction for models and data like those in linear2ph. Therefore, the arguments of cv_linear2ph is analogous to that of linear2ph.

Value

cv_linear2ph() returns a list that includes the following components:

avg_pred_loglike

The average predicted log likelihood across each fold.

pred_loglike

The predicted log likelihood in each fold.

converge

The convergence status of the EM algorithm in each run.

Examples

## Not run: 
  data("mock.vccc")
  # different B-spline sizes
  sns <- c(15, 20, 25, 30, 35, 40)
  # vector to hold mean log-likelihood
  pred_loglike.1 <- rep(NA, length(sns))
  # specify number of folds in the cross validation
  k <- 5
  for (i in 1:length(sns)) {
    # constructing B-spline basis using the same process as in Section 4.3.1
    sn <- sns[i]
    data.sieve <- spline2ph(x = "VL_unval", data = mock.vccc, size = sn,
                            degree = 3, group = "Sex")

    # cross validation, produce mean log-likelihood
    start.time <- Sys.time()
    res.1 <- cv_linear2ph(y = "CD4_val", y_unval = "CD4_unval",
                          x ="VL_val", x_unval = "VL_unval", z = "Sex",
                          data = data.sieve, nfolds = k, max_iter = 2000,
                          tol = 1e-04, verbose = FALSE)
    # save mean log-likelihood result
    pred_loglike.1[i] <- res.1$avg_pred_loglik
  }
  # Print predicted log-likelihood for different B-spline sizes
  print(pred_loglike.1)

## End(Not run)


[Package sleev version 1.1.4 Index]