rSplit {Qindex} | R Documentation |
Stratified Random Split Sampling
Description
Random split sampling, stratified based on the type of the response.
Usage
rSplit(y, nsplit, stratify = TRUE, s_ratio = 0.8, ...)
Arguments
y |
a double vector,
a logical vector,
a factor,
or a Surv object,
response |
nsplit |
positive integer scalar, number of replicates of random splits to be performed |
stratify |
logical scalar,
whether stratification based on response |
s_ratio |
double scalar between 0 and 1,
split ratio, i.e., percentage of training subjects |
... |
additional parameters, currently not in use |
Details
Function rSplit performs random split sampling, with or without stratification. Specifically,
If
stratify = FALSE
, or if we have a double responsey
, then split the sample into a training and a test set by oddsp/(1-p)
, without stratification.Otherwise, split a Surv response
y
, stratified by its censoring status. Specifically, split subjects with observed event into a training and a test set by oddsp/(1-p)
, and split the censored subjects into a training and a test set by oddsp/(1-p)
. Then combine the training sets from subjects with observed events and censored subjects, and combine the test sets from subjects with observed events and censored subjects.Otherwise, split a logical response
y
, stratified by itself. Specifically, split the subjects withTRUE
response into a training and a test set by oddsp/(1-p)
, and split the subjects withFALSE
response into a training and a test set by oddsp/(1-p)
. Then combine the training sets, and the test sets, in a similar fashion as described above.Otherwise, split a factor response
y
, stratified by its levels. Specifically, split the subjects in each level ofy
into a training and a test set by oddsp/(1-p)
. Then combine the training sets, and the test sets, from all levels ofy
.
Value
Function rSplit returns a length-nsplit
list of
logical vectors.
In each logical vector,
the TRUE
elements indicate training subjects and
the FALSE
elements indicate test subjects.
Note
caTools::sample.split
is not what we need.
See Also
split, caret::createDataPartition
Examples
rSplit(y = rep(c(TRUE, FALSE), times = c(20, 30)), nsplit = 3L)