hmda.partition {HMDA} | R Documentation |
Partition Data for HMDA Analysis
Description
Partition a data frame into training, testing, and
optionally validation sets, and upload these sets to a local
H2O server. If an outcome column y
is provided and is a
factor or character, stratified splitting is used; otherwise, a
random split is performed. The proportions must sum to 1.
Usage
hmda.partition(
df,
y = NULL,
train = 0.8,
test = 0.2,
validation = NULL,
seed = 2025
)
Arguments
df |
A data frame to partition. |
y |
A string with the name of the outcome column.
Must match a column in |
train |
A numeric value for the proportion of the training set. |
test |
A numeric value for the proportion of the testing set. |
validation |
Optional numeric value for the proportion of
the validation set. Default is |
seed |
A numeric seed for reproducibility. Default is 2025. |
Details
This function uses the splitTools
package to perform
the partition. When y
is provided and is a factor or character,
a stratified split is performed to preserve class proportions. Otherwise,
a basic random split is used. The partitions are then converted to H2O
frames using h2o::as.h2o()
.
Value
A named list containing the partitioned data frames and their corresponding H2O frames:
- hmda.train
Training set (data frame).
- hmda.test
Testing set (data frame).
- hmda.validation
Validation set (data frame), if any.
- hmda.train.hex
Training set as an H2O frame.
- hmda.test.hex
Testing set as an H2O frame.
- hmda.validation.hex
Validation set as an H2O frame, if applicable.
Author(s)
E. F. Haghish
Examples
## Not run:
# Example: Random split (80% train, 20% test) using iris data
data(iris)
splits <- hmda.partition(
df = iris,
train = 0.8,
test = 0.2,
seed = 2025
)
train_data <- splits$hmda.train
test_data <- splits$hmda.test
# Example: Stratified split (70% train, 15% test, 15% validation)
# using iris data, stratified by Species
splits_strat <- hmda.partition(
df = iris,
y = "Species",
train = 0.7,
test = 0.15,
validation = 0.15,
seed = 2025
)
train_strat <- splits_strat$hmda.train
test_strat <- splits_strat$hmda.test
valid_strat <- splits_strat$hmda.validation
## End(Not run)