create_onehot {lgspline} | R Documentation |
Create One-Hot Encoded Matrix
Description
Converts a categorical vector into a one-hot encoded matrix where each unique value becomes a binary column.
Usage
create_onehot(x)
Arguments
x |
A vector containing categorical values (factors, character, etc.) |
Details
The function creates dummy variables for each unique value in the input vector using
model.matrix()
with dummy-intercept coding. Column names are cleaned by removing the
'x' prefix added by model.matrix()
.
Value
A data frame containing the one-hot encoded binary columns with cleaned column names
Examples
## lgspline will not accept this format of "catvar", because inputting data
# this way can cause difficult-to-diagnose issues in formula parsing
# all variables must be numeric
df <- data.frame(numvar = rnorm(100),
catvar = rep(LETTERS[1:4],
25))
print(head(df))
## Instead, replace with dummy-intercept coding by
# 1) applying one-hot encoding
# 2) dropping the first column
# 3) appending to our data
dummy_intercept_coding <- create_onehot(df$catvar)[,-1]
df$catvar <- NULL
df <- cbind(df, dummy_intercept_coding)
print(head(df))
[Package lgspline version 0.2.0 Index]