bake {recipes} | R Documentation |
Apply a trained preprocessing recipe
Description
For a recipe with at least one preprocessing operation that has been trained
by prep()
, apply the computations to new data.
Usage
bake(object, ...)
## S3 method for class 'recipe'
bake(object, new_data, ..., composition = "tibble")
Arguments
object |
A trained object such as a |
... |
One or more selector functions to choose which variables will be
returned by the function. See |
new_data |
A data frame, tibble, or sparse matrix from the |
composition |
Either |
Details
bake()
takes a trained recipe and applies its operations to a data set to
create a design matrix. If you are using a recipe as a preprocessor for
modeling, we highly recommend that you use a workflow()
instead of
manually applying a recipe (see the example in recipe()
).
If the data set is not too large, time can be saved by using the retain = TRUE
option of prep()
. This stores the processed version of the training
set. With this option set, bake(object, new_data = NULL)
will return it for
free.
Also, any steps with skip = TRUE
will not be applied to the data when
bake()
is invoked with a data set in new_data
. bake(object, new_data = NULL)
will always have all of the steps applied.
Value
A tibble, matrix, or sparse matrix that may have different columns than the
original columns in new_data
.
See Also
Examples
data(ames, package = "modeldata")
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
ames_rec <-
recipe(Sale_Price ~ ., data = ames[-(1:6), ]) |>
step_other(Neighborhood, threshold = 0.05) |>
step_dummy(all_nominal()) |>
step_interact(~ starts_with("Central_Air"):Year_Built) |>
step_ns(Longitude, Latitude, deg_free = 2) |>
step_zv(all_predictors()) |>
prep()
# return the training set (already embedded in ames_rec)
bake(ames_rec, new_data = NULL)
# apply processing to other data:
bake(ames_rec, new_data = head(ames))
# only return selected variables:
bake(ames_rec, new_data = head(ames), all_numeric_predictors())
bake(ames_rec, new_data = head(ames), starts_with(c("Longitude", "Latitude")))