shadow_vimp {shadowVIMP} | R Documentation |
Select influential covariates in random forests using multiple testing control
Description
shadow_vimp()
performs variable selection and determines whether each
covariate is influential based on unadjusted, FDR-adjusted, and FWER-adjusted
p-values.
Usage
shadow_vimp(
alphas = c(0.3, 0.1, 0.05),
niters = c(30, 120, 1500),
data,
outcome_var,
num.trees = max(2 * (ncol(data) - 1), 10000),
num.threads = NULL,
importance = "permutation",
save_vimp_history = c("all", "last", "none"),
to_show = c("FWER", "FDR", "unadjusted"),
method = c("pooled", "per_variable"),
...
)
Arguments
alphas |
Numeric vector, significance level values for each step of the
procedure, default |
niters |
Numeric vector, number of permutations to be performed in each
step of the procedure, default |
data |
Input data frame. |
outcome_var |
Character, name of the column containing the outcome variable. |
num.trees |
Numeric, number of trees. Passed to |
num.threads |
Numeric. The number of threads used by |
importance |
Character, the type of variable importance to be calculated
for each variable. Argument passed to |
save_vimp_history |
Character, specifies which variable importance measures to save. Possible values are:
|
to_show |
Character, one of
|
method |
Character, one of
|
... |
Additional parameters passed to |
Details
The shadow_vimp()
function by default performs variable selection in
multiple steps. Initially, it prunes the set of predictors using a relaxed
(higher) alpha threshold in a pre-selection stage. Variables that pass this
stage then undergo a final evaluation using the target (lower) alpha
threshold and more iterations. This stepwise approach distinguishes
informative from uninformative covariates based on their VIMPs and enhances
computational efficiency. The user can also perform variable selection in a
single step, without a pre-selection phase.
Value
Object of the class "shadow_vimp" with the following entries:
-
call
- the call formula used to generate the output. -
alpha
- numeric, significance level used in the algorithm. -
step_all_covariates_removed
- integer. If > 0, the step number at which all candidate covariates were deemed insignificant and removed. If 0, at least one covariate survived the pre-selection until the last step of the procedure. -
final_dec_pooled
(the default) orfinal_dec_per_variable
- a data frame that contains, depending on the specified value of theto_show
parameter, p-values and corresponding decisions (in columns with names ending inconfirmed
) if the variable is deemed informative at the final step of the procedure: 1 = covariate considered informative in the last step; 0 = not informative. If all covariates were dropped in the pre-selection, i.e. none reached the final step, then all p-values are NA and all decisions are set to 0. -
vimp_history
- ifsave_vimp_history
is set to"all"
or"last"
then it is a data frame with VIMPs of covariates and their shadows from the last step of the procedure. Ifsave_vimp_history
is set to"none"
, then it isNULL
. -
time_elapsed
- list containing the runtime of each step and the total time taken to execute the code. -
pre_selection
- list in which the results of the pre-selection are stored. The exact form of this element depends on the chosen value of thesave_vimp_history
parameter.
Examples
data(mtcars)
# When working with real data, use higher values for the niters and num.trees
# parameters --> here these parameters are set to small values to reduce the
# runtime.
# Function to make sure proper number of cores is specified
safe_num_threads <- function(n) {
available <- parallel::detectCores()
if (n > available) available else n
}
# Standard use
out1 <- shadow_vimp(
data = mtcars, outcome_var = "vs",
niters = c(10, 20, 30), num.trees = 30, num.threads = safe_num_threads(1)
)
# `num.threads` sets the number of threads for multithreading in
# `ranger::ranger`. By default, the `shadow_vimp` function uses half the
# available CPU threads.
out2 <- shadow_vimp(
data = mtcars, outcome_var = "vs",
niters = c(10, 20, 30), num.threads = safe_num_threads(2),
num.trees = 30
)
# Save variable importance measures only from the final step of the
# procedure
out4 <- shadow_vimp(
data = mtcars, outcome_var = "vs",
niters = c(10, 20, 30), save_vimp_history = "last", num.trees = 30,
num.threads = safe_num_threads(1)
)
# Print unadjusted and FDR-adjusted p-values together with the corresponding
# decisions
out5 <- shadow_vimp(
data = mtcars, outcome_var = "vs",
niters = c(10, 20, 30), to_show = "FDR", num.trees = 30,
num.threads = safe_num_threads(1)
)
# Use per-variable p-values to decide in the final step whether a covariate
# is informative or not. Note that pooled p-values are always used in the
# pre-selection (first two steps).
out6 <- shadow_vimp(
data = mtcars, outcome_var = "vs",
niters = c(10, 20, 30), method = "per_variable", num.trees = 30,
num.threads = safe_num_threads(1)
)
# Perform variable selection in a single step, without a pre-selection phase
out7 <- shadow_vimp(
data = mtcars, outcome_var = "vs", alphas = c(0.05),
niters = c(30), num.trees = 30,
num.threads = safe_num_threads(1)
)