distantia_model_frame {distantia}R Documentation

Dissimilarity Model Frame

Description

This function generates a model frame for statistical or machine learning analysis from these objects:

The resulting data frame contains the following columns:

This function supports a parallelization setup via future::plan().

Usage

distantia_model_frame(
  response_df = NULL,
  predictors_df = NULL,
  composite_predictors = NULL,
  scale = TRUE,
  distance = "euclidean"
)

Arguments

response_df

(required, data frame) output of distantia(), distantia_ls(), distantia_dtw(), or distantia_time_delay(). Default: NULL

predictors_df

(required, data frame or sf data frame) data frame with numeric predictors for the the model frame. Must have a column with the time series names in response_df$x and response_df$y. If sf data frame, the column "geographic_distance" with distances between pairs of time series is added to the model frame. Default: NULL

composite_predictors

(optional, list) list defining composite predictors. For example, composite_predictors = list(a = c("b", "c")) uses the columns "b" and "c" from predictors_df to generate the predictor a as the multivariate distance between "b" and "c" for each pair of time series in response_df. Default: NULL

scale

(optional, logical) if TRUE, all predictors are scaled and centered with scale(). Default: TRUE

distance

(optional, string) Method to compute the distance between predictor values for all pairs of time series in response_df. Default: "euclidean".

Value

data frame: with attributes "predictors", "response", and "formula".

See Also

Other distantia_support: distantia_aggregate(), distantia_boxplot(), distantia_cluster_hclust(), distantia_cluster_kmeans(), distantia_matrix(), distantia_spatial(), distantia_stats(), distantia_time_delay(), utils_block_size(), utils_cluster_hclust_optimizer(), utils_cluster_kmeans_optimizer(), utils_cluster_silhouette()

Examples


#covid prevalence in California counties
tsl <- tsl_initialize(
  x = covid_prevalence,
  name_column = "name",
  time_column = "time"
) |>
  #subset to shorten example runtime
  tsl_subset(
    names = 1:5
  )

#dissimilarity analysis
df <- distantia_ls(tsl = tsl)

#combine several predictors
#into a new one
composite_predictors <- list(
  economy = c(
    "poverty_percentage",
    "median_income",
    "domestic_product"
    )
)

#generate model frame
model_frame <- distantia_model_frame(
  response_df = df,
  predictors_df = covid_counties,
  composite_predictors = composite_predictors,
  scale = TRUE
)

head(model_frame)

#names of response and predictors
#and an additive formula
#are stored as attributes
attributes(model_frame)$predictors

#if response_df is output of distantia():
attributes(model_frame)$response
attributes(model_frame)$formula

#example of linear model
# model <- lm(
#   formula = attributes(model_frame)$formula,
#   data = model_frame
# )
#
# summary(model)


[Package distantia version 2.0.2 Index]