measure_stability {optRF}R Documentation

Measure the stability of random forest

Description

Measure the stability of random forest for a certain data set with a certain number of trees

Usage

measure_stability(
  y,
  X,
  num.trees = 500,
  method = c("prediction", "importance"),
  X_Test = NULL,
  alpha = NULL,
  select_for = c("high", "low", "zero"),
  importance = c("permutation", "impurity", "impurity_corrected"),
  number_repetitions = 10,
  verbose = TRUE,
  ...
)

Arguments

y

A vector containing the response variable in the training data set.

X

A data frame containing the explanatory variables in the training data set. The number of rows must be equal to the number of elements in y.

num.trees

Either a single value or a vector containing the numbers of trees for which the stability should be analysed (default = 500).

method

Either "prediction" (default) or "importance" specifying if random forest should be used for prediction or to estimate the variable importance.

X_Test

If method is "prediction", a data frame containing the explanatory variables of the test data set. If not entered, the out of bag data will be used.

alpha

If method is "prediction", the number of best individuals to be selected in the test data set (default = 0.15), if method is "importance", the number of most important variables to be selected (default = 0.05).

select_for

If method is "prediction", what should be selected? In random forest classification, this must be set to a vector containing the values of the desired classes. In random forest regression, this can be set as "high" (default) to select the individuals with the highest predicted value, "low" to select the individuals with the lowest predicted value, or "zero" to select the individuals which predicted value is closest to zero.

importance

If method is "importance", the variable importance mode, one of "permutation" (default), "impurity" or "impurity_corrected".

number_repetitions

Number of repetitions of random forest to estimate the stability. It needs to be at least 2. Default is 10.

verbose

Show computation status.

...

Any other argument from the ranger function.

Value

A data frame summarising the estimated stability for the given num.trees values.

Examples

## Not run: 
data(SNPdata)
set.seed(123)
stability_result = measure_stability(y = SNPdata[,1], X=SNPdata[,-1], num.trees=500)
stability_result # Stability of random forest with 500 trees

## End(Not run)


[Package optRF version 1.2.1 Index]