get_simpsons_paradox_c {covalchemy} | R Documentation |
Simpson's Paradox Transformation with Copula and Simulated Annealing
Description
This function simulates the Simpson's Paradox phenomenon by transforming data using Gaussian copulas, optimizing the transformation with simulated annealing, and comparing the results.
Usage
get_simpsons_paradox_c(
x,
y,
z,
corr_vector,
inv_cdf_type = "quantile_7",
sd_x = 0.05,
sd_y = 0.05,
lambda1 = 1,
lambda2 = 1,
lambda3 = 1,
lambda4 = 1,
max_iter = 1000,
initial_temp = 1,
cooling_rate = 0.99,
order_vec = NA,
degree = 5
)
Arguments
x |
A numeric vector of data points for variable X. |
y |
A numeric vector of data points for variable Y. |
z |
A categorical variable representing groups (e.g., factor or character vector). |
corr_vector |
A vector of correlations for each category of z. |
inv_cdf_type |
Type of inverse CDF transformation ("quantile_1", "quantile_4", "quantile_7", "quantile_8", "linear", "akima", "poly"). Default is "quantile_7". |
sd_x |
Standard deviation for perturbations on X (default is 0.05). |
sd_y |
Standard deviation for perturbations on Y (default is 0.05). |
lambda1 |
Regularization parameter for simulated annealing (default is 1). |
lambda2 |
Regularization parameter for simulated annealing (default is 1). |
lambda3 |
Regularization parameter for simulated annealing (default is 1). |
lambda4 |
Regularization parameter for simulated annealing (default is 1). |
max_iter |
Maximum iterations for simulated annealing (default is 1000). |
initial_temp |
Initial temperature for simulated annealing (default is 1.0). |
cooling_rate |
Cooling rate for simulated annealing (default is 0.99). |
order_vec |
Manual ordering of grids (default is NA, calculated automatically if not specified). |
degree |
Degree of polynomial used for polynomial inverse CDF (default is 5). |
Value
A list containing:
df_all |
The final dataset with original, transformed, and annealed data. |
df_res |
A simplified version with only the optimized data. |
Examples
set.seed(123)
n <- 300
z <- sample(c("A", "B", "C"), prob = c(0.3, 0.4, 0.3), size = n, replace = TRUE)
x <- rnorm(n, 10, sd = 5) + 5 * rbeta(n, 5, 3)
y <- 2 * x + rnorm(n, 5, sd = 4)
t <- c(-0.8, 0.8, -0.8)
res <- get_simpsons_paradox_c(x, y, z, t, sd_x = 0.07, sd_y = 0.07, lambda4 = 5)