opt_prediction {optRF} | R Documentation |
Optimise random forest for prediction
Description
Optimising random forest predictions by calculating the prediction stability with certain numbers of trees
Usage
opt_prediction(
y,
X,
X_Test = NULL,
number_repetitions = 10,
alpha = 0.15,
num.trees_values = c(250, 500, 750, 1000, 2000),
visualisation = c("none", "prediction", "selection"),
select_for = c("high", "low", "zero"),
recommendation = c("prediction", "selection", "none"),
rec_thresh = 1e-06,
round_recommendation = c("thousand", "hundred", "ten", "none"),
verbose = TRUE,
...
)
Arguments
y |
A vector containing the response variable in the training data set. |
X |
A data frame containing the explanatory variables in the training data set. The number of rows must be equal to the number of elements in y. |
X_Test |
A data frame containing the explanatory variables of the test data set. If not entered, the out of bag data will be used. |
number_repetitions |
Number of repetitions of random forest to estimate the stability. It needs to be at least 2. Default is 10. |
alpha |
The number of best individuals to be selected in the test data set based on their predicted response values. If < 1, alpha will be considered to be the relative amount of individuals in the test data set. |
num.trees_values |
A vector containing the numbers of trees to be analysed. If not specified, 250, 500, 750, 1000, and 2000 trees will be analysed. |
visualisation |
Can be set to "prediction" to draw a plot of the prediction stability or "selection" to draw a plot of the selection stability for the numbers of trees to be analysed. |
select_for |
What should be selected? In random forest classification, this must be set to a vector containing the values of the desired classes. In random forest regression, this can be set as "high" (default) to select the individuals with the highest predicted value, "low" to select the individuals with the lowest predicted value, or "zero" to select the individuals which predicted value is closest to zero. |
recommendation |
If set to "prediction" (default) or "selection", a recommendation will be given based on optimised prediction or selection stability. If set to be "none", the function will analyse the stability of random forest with the inserted numbers of trees without giving a recommendation. |
rec_thresh |
If the number of trees leads to an increase of stability smaller or equal to the value specified, this number of trees will be recommended. Default is 1e-6. |
round_recommendation |
Setting to what number the recommended number of trees should be rounded to. Options: "none", "ten", "hundred", "thousand" (default). |
verbose |
Show computation status |
... |
Any other argument from the ranger function. |
Value
An opt_prediction_object containing the recommended number of trees, based on which measure the recommendation was given (prediction or selection), a matrix summarising the estimated stability and computation time of a random forest with the recommended numbers of trees, a matrix containing the calculated stability and computation time for the analysed numbers of trees, and the parameters used to model the relationship between stability and numbers of trees.
Examples
## Not run:
data(SNPdata)
set.seed(123)
result_optpred = opt_prediction(y = SNPdata[,1], X=SNPdata[,-1]) # optimise random forest
summary(result_optpred)
## End(Not run)