cv {easy.glmnet}R Documentation

Conduct cross-validation

Description

Function to easily cross-validate (including fold assignation, merging fold outputs, etc).

Usage

cv(x, y, family = c("binomial", "cox", "gaussian"), fit_fun, predict_fun, site = NULL,
covar = NULL, nfolds = 10, pred.format = NA, verbose = TRUE, ...)

Arguments

x

input matrix for glmnet of dimension nobs x nvars; each row is an observation vector. It can be easily obtained with data.frame2glmnet.matrix.

y

response to be predicted. A binary vector for "binomial", a "Surv" object for "cox", or a numeric vector for "gaussian".

family

distribution of y: "binomial", "cox", or "gaussian".

fit_fun

function to create the prediction model using the training subsets. It can have between two and four arguments(the first two are compulsory): x_training (training X data.frame), y_training (training Y outcomes), site_training (training site names), and covar_training (training covariates). It must return the overall prediction model, which may be a list of the different submodels used in different steps and/or derived from different imputations.

predict_fun

function to apply the prediction model to the test sets. It can have between two and four arguments (the first two are compulsory): model (the overall prediction model), x_test (test X data.frame), site_test (test site names), and covar_test (test covariates). It must return the predictions.

site

vector with the sites' names, or NULL for studies conducted in a single site.

covar

other covariates that can be passed to fit_fun and predict_fun.

...

other arguments that can be passed to fit_fun and predict_fun.

nfolds

number of folds, only used if folds is NULL.

pred.format

format of the predictions returned by each fold. E.g., if the prediction is an array, use NA.

verbose

(optional) logical, whether to print some messages during execution.

Details

This function iteratively divides the dataset into a training dataset, with which fits the model using the function fit_fun, and a test dataset, to which applies the model using the function predict_fun. It saves the models fit with the training datasets and the predictions obtained in the test datasets. The fols are assigned automatically using assign.folds, accounting for the site is this is not null.

Value

A list with the predictions and the models used.

Author(s)

Joaquim Radua

See Also

glmnet_predict for obtaining predictions.

Examples

# Create random x (predictors) and y (binary)
x = matrix(rnorm(25000), ncol = 50)
y = 1 * (plogis(apply(x[,1:5], 1, sum) + rnorm(500, 0, 0.1)) > 0.5)

# Predict y via cross-validation
fit_fun = function (x_training, y_training) {
  list(
    lasso = glmnet_fit(x_training, y_training, family = "binomial")
  )
}
predict_fun = function (m, x_test) {
  glmnet_predict(m$lasso, x_test)
}
# Only 2 folds to ensure the example runs quickly
res = cv(x, y, family = "binomial", fit_fun = fit_fun, predict_fun = predict_fun, nfolds = 2)

# Show accuracy
se = mean(res$predictions$y.pred[res$predictions$y == 1] > 0.5)
sp = mean(res$predictions$y.pred[res$predictions$y == 0] < 0.5)
bac = (se + sp) / 2
cat("Sensitivity:", round(se, 2), "\n")
cat("Specificity:", round(sp, 2), "\n")
cat("Balanced accuracy:", round(bac, 2), "\n")

[Package easy.glmnet version 1.0 Index]