vif_filter {ClimaRep} | R Documentation |
Filter SpatRaster Layers based on Variance Inflation Factor (VIF)
Description
This function iteratively filters layers from a SpatRaster
object by removing the one with the highest Variance Inflation Factor (VIF) that exceeds a specified threshold (th
).
Usage
vif_filter(x, th = 5)
Arguments
x |
A |
th |
A |
Details
This function implements a common iterative procedure to reduce multicollinearity among raster layers by removing variables with high Variance Inflation Factor (VIF).
The VIF for a specific predictor indicates how much the variance of its estimated coefficient is inflated due to its linear relationships with all other predictors in the model.
Conceptually, it is based on the proportion of variance that predictor shares with the other independent variables.
A high VIF value suggests a high degree of collinearity with other predictors (values exceeding 5
or 10
are often considered problematic; see O'Brien, 2007).
In this context, the function also provides the Pearson correlation matrix between all initial variables.
Key steps:
Validate inputs: Ensures
x
is aSpatRaster
with at least two layers andth
is a validnumeric
value.Convert the input
SpatRaster
(x
) to adata.frame
, retaining only unique rows ifx
has many cells and few unique climate values.Remove rows containing any
NA
values across all variables from thedata.frame
.In each iteration, calculate the VIF for all variables currently remaining in the dataset.
Identify the variable with the highest VIF among the remaining variables.
If this highest VIF value is greater than the threshold (
th
), remove the variable with the highest VIF from the dataset, and the loop continues with the remaining variables.This iterative process repeats until the highest VIF among the remaining variables is less than or equal to
\le
th
, or until only one variable remains in the dataset.
The output of vif_filter
returns a list
object with a filtered SpatRaster
object and a statistics summary.
The SpatRaster
object containing only the variables that were kept and also provides a comprehensive summary printed to the console.
The summary list including:
The original Pearson's correlation matrix between all initial variables.
The variables names that were kept and those that were excluded.
The final VIF values for the variables retained after the process.
The internal VIF calculation includes checks to handle potential numerical instability, such as columns with zero or near-zero variance and cases of perfect collinearity among variables, which could otherwise lead to errors (e.g., infinite VIFs or issues with matrix inversion). Variables identified as having infinite VIF due to perfect collinearity are prioritized for removal.
References: O’brien (2007) A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41: 673–690. doi:10.1007/s11135-006-9018-6
Value
A SpatRaster
object containing only the layers retained by the VIF filtering process.
Examples
library(terra)
library(sf)
set.seed(2458)
n_cells <- 100 * 100
r_clim <- terra::rast(ncols = 100, nrows = 100, nlyrs = 7)
values(r_clim) <- c(
(rowFromCell(r_clim, 1:n_cells) * 0.2 + rnorm(n_cells, 0, 3)),
(rowFromCell(r_clim, 1:n_cells) * 0.9 + rnorm(n_cells, 0, 0.2)),
(colFromCell(r_clim, 1:n_cells) * 0.15 + rnorm(n_cells, 0, 2.5)),
(colFromCell(r_clim, 1:n_cells) +
(rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
(colFromCell(r_clim, 1:n_cells) /
(rowFromCell(r_clim, 1:n_cells)) * 0.1 + rnorm(n_cells, 0, 4)),
(colFromCell(r_clim, 1:n_cells) *
(rowFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))),
(colFromCell(r_clim, 1:n_cells) *
(colFromCell(r_clim, 1:n_cells) + 0.1 + rnorm(n_cells, 0, 4))))
names(r_clim) <- c("varA", "varB", "varC", "varD", "varE", "varF", "varG")
terra::crs(r_clim) <- "EPSG:4326"
terra::plot(r_clim)
vif_result <- ClimaRep::vif_filter(r_clim, th = 5)
print(vif_result$summary)
r_clim_filtered <- vif_result$filtered_raster
terra::plot(r_clim_filtered)