opt_importance {optRF}R Documentation

Optimise random forest for estimation of variable importance

Description

Optimising random forest for estimating the importance of variables by calculating the variable importance stability with certain numbers of trees

Usage

opt_importance(
  y,
  X,
  number_repetitions = 10,
  alpha = 0.05,
  num.trees_values = c(250, 500, 750, 1000, 2000),
  importance = c("permutation", "impurity", "impurity_corrected"),
  visualisation = c("none", "importance", "selection"),
  recommendation = c("importance", "selection", "none"),
  rec_thresh = 1e-06,
  round_recommendation = c("thousand", "hundred", "ten", "none"),
  verbose = TRUE,
  ...
)

Arguments

y

A vector containing the response variable.

X

A data frame containing the explanatory variables. The number of rows must be equal to the number of elements in y.

number_repetitions

Number of repetitions of random forest to estimate the stability. It needs to be at least 2. Default is 10.

alpha

The amount of most important variables to be selected based on their estimated variable importance. If < 1, alpha will be considered the relative amount of variables in the data set.

num.trees_values

A vector containing the numbers of trees to be analysed. If not specified, 250, 500, 750, 1000, and 2000 trees will be analysed.

importance

Variable importance mode, one of "permutation" (default), "impurity" or "impurity_corrected". The "impurity" measure is the Gini index for classification and the variance of the responses for regression.

visualisation

Can be set to "importance" to draw a plot of the variable importance stability or to "selection" to draw a plot of the selection stability for the numbers of trees to be analysed.

recommendation

If set to "importance" (default) or "selection", a recommendation will be given based on optimised variable importance or selection stability. If set to be "none", the function will analyse the stability of random forest with the inserted numbers of trees without giving a recommendation.

rec_thresh

If the number of trees leads to an increase of stability smaller or equal to the value specified, this number of trees will be recommended. Default is 1e-6.

round_recommendation

Setting to what number the recommended number of trees should be rounded to. Options: "none", "ten", "hundred", "thousand" (default).

verbose

Show computation status

...

Any other argument from the ranger function.

Value

An opt_importance_object containing the recommended number of trees, based on which measure the recommendation was given (importance or selection), a matrix summarising the estimated stability and computation time of a random forest with the recommended numbers of trees, a matrix containing the calculated stability and computation time for the analysed numbers of trees, and the parameters used to model the relationship between stability and numbers of trees.

Examples

## Not run: 
data(SNPdata)
set.seed(123)
result_optimp = opt_importance(y = SNPdata[,1], X=SNPdata[,-1]) # optimise random forest
summary(result_optimp)

## End(Not run)


[Package optRF version 1.2.1 Index]