assign.folds {easy.glmnet}R Documentation

Assign observations to folds in a balanced way

Description

Function to assign observations to folds, ensuring a similar distribution across folds (and sites).

Usage

assign.folds(y, family = c("binomial", "cox", "gaussian"), site = NULL, nfolds = 10)

Arguments

y

response to be predicted. A binary vector for "binomial", an object of class "Surv" for "cox", or a numeric vector for "gaussian".

family

distribution of y: "binomial", "cox", or "gaussian".

site

vector with the sites' names, or NULL for studies conducted in a single site.

nfolds

number of folds.

Details

If family is "binomial", the function randomly assigns the folds separately for the two outcomes. If family is "gaussian", the function randomly assigns the folds separately for ranges of the outcome. If family is "gaussian", the function randomly assigns the folds separately for ranges of time and censorship. If site is not null, the function randomly assigns the folds separately for each site.

Value

A numeric vector with the fold assigned to each observation

Author(s)

Joaquim Radua and Aleix Solanes

References

Solanes, A., Mezquida, G., Janssen, J., Amoretti, S., Lobo, A., Gonzalez-Pinto, A., Arango, C., Vieta, E., Castro-Fornieles, J., Berge, D., Albacete, A., Gine, E., Parellada, M., Bernardo, M.; PEPs group (collaborators); Pomarol-Clotet, E., Radua, J. (2022) Combining MRI and clinical data to detect high relapse risk after the first episode of psychosis. Schizophrenia, 8, 100, doi:10.1038/s41537-022-00309-w.

See Also

cv for conducting a cross-validation.

Examples

# Create random y (numeric)
y = rnorm(200, sample(c(1, 10), 200, replace = TRUE))

# Assign folds
fold = assign.folds(y, "gaussian", nfolds = 4)

# Check that the distribution of y is similar across folds
oldpar = par(mfrow = c(2, 2))
for (i in 1:4) {
  hist(y[which(fold == i)], main = paste("Fold", i), xlab = "y")
}
par(oldpar)

[Package easy.glmnet version 1.0 Index]