repeatcv {nestedcv} | R Documentation |
Repeated nested CV
Description
Performs repeated calls to a nestedcv
model to determine performance across
repeated runs of nested CV.
Usage
repeatcv(
expr,
n = 5,
repeat_folds = NULL,
keep = FALSE,
extra = FALSE,
progress = TRUE,
rep_parallel = "mclapply",
rep.cores = 1L
)
Arguments
expr |
An expression containing a call to |
n |
Number of repeats |
repeat_folds |
Optional list containing fold indices to be applied to the outer CV folds. |
keep |
Logical whether to save repeated outer CV fitted models for variable importance, SHAP etc. Note this can make the resulting object very large. |
extra |
Logical whether additional performance metrics are gathered for
binary classification models. See |
progress |
Logical whether to show progress. |
rep_parallel |
Either "mclapply" or "future". This determines which parallel backend to use. |
rep.cores |
Integer specifying number of cores/threads to invoke.
Ignored if |
Details
We recommend using this with the R pipe |>
(see examples).
When comparing models, it is recommended to fix the sets of outer CV folds
used across each repeat for comparing performance between models. The
function repeatfolds()
can be used to create a fixed set of outer CV folds
for each repeat.
Parallelisation over repeats is performed using parallel::mclapply
(not
available on windows) or future
depending on how rep_parallel
is set.
Beware that cv.cores
can still be set within calls to nestedcv
models (=
nested parallelisation). This means that rep.cores
x cv.cores
number of
processes/forks will be spawned, so be careful not to overload your CPU. In
general parallelisation of repeats using rep.cores
is faster than
parallelisation using cv.cores
. rep.cores
is ignored if you are using
future. Set the number of workers for future using future::plan()
.
Value
List of S3 class 'repeatcv' containing:
call |
the model call |
result |
matrix of performance metrics |
output |
a matrix or dataframe containing the outer CV predictions from each repeat |
roc |
(binary classification models only) a ROC curve object based on
predictions across all repeats as returned in |
fits |
(if |
Examples
data("iris")
dat <- iris
y <- dat$Species
x <- dat[, 1:4]
res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1,
n_outer_folds = 4) |>
repeatcv(3, rep.cores = 2)
res
summary(res)
## set up fixed fold indices
set.seed(123, "L'Ecuyer-CMRG")
folds <- repeatfolds(y, repeats = 3, n_outer_folds = 4)
res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1,
n_outer_folds = 4) |>
repeatcv(3, repeat_folds = folds, rep.cores = 2)
res