hmda.best.models {HMDA} | R Documentation |
Select Best Models Across All Models in HMDA Grid
Description
Scans a HMDA grid analysis data frame for H2O performance
metric columns and, for each metric, selects the top n_models
best-performing models based on the proper optimization direction
(i.e., lower values are better for some metrics and higher values
are better for others). The function then returns a summary data frame
showing the union of these best models (without duplication) along with
the corresponding metric values that led to their selection.
Usage
hmda.best.models(df, n_models = 1)
Arguments
df |
A data frame of class |
n_models |
Integer. The number of top models to select per metric. Default is 1. |
Details
The function uses a predefined set of H2O performance metrics along with their desired optimization directions:
- logloss, mae, mse, rmse, rmsle, mean_per_class_error
Lower values are better.
- auc, aucpr, r2, accuracy, f1, mcc, f2
Higher values are better.
For each metric in the predefined list that exists in df
and is not
entirely NA, the function orders the values (using order()
) according
to whether lower or higher values indicate better performance. It then selects
the top n_models
model IDs for that metric. The union of these model IDs
is used to subset the original data frame. The returned data frame includes
the model_ids
column and the performance metric columns (from the
predefined list) that were found in the input data frame.
Value
A data frame containing the rows corresponding to the union of
best model IDs (across all metrics) and the columns for
model_ids
plus the performance metrics that are present
in the data frame.
Author(s)
E. F. Haghish
Examples
## Not run:
# Example: Create a hyperparameter grid for GBM models.
predictors <- c("var1", "var2", "var3")
response <- "target"
# Define hyperparameter ranges
hyper_params <- list(
ntrees = seq(50, 150, by = 25),
max_depth = c(5, 10, 15),
learn_rate = c(0.01, 0.05, 0.1),
sample_rate = c(0.8, 1.0),
col_sample_rate = c(0.8, 1.0)
)
# Run the grid search
grid <- hmda.grid(
algorithm = "gbm",
x = predictors,
y = response,
training_frame = h2o.getFrame("hmda.train.hex"),
hyper_params = hyper_params,
nfolds = 10,
stopping_metric = "AUTO"
)
# Assess the performances of the models
grid_performance <- hmda.grid.analysis(grid)
# Return the best 2 models according to each metric
hmda.best.models(grid_performance, n_models = 2)
## End(Not run)