generate_randomizations_mc {fastrerandomize} | R Documentation |
Draws a random sample of acceptable randomizations from all possible complete randomizations using Monte Carlo sampling
Description
This function performs sampling with replacement to generate randomizations in a memory-efficient way. It processes randomizations in batches to avoid memory issues and filters them based on covariate balance. The function uses JAX for fast computation and memory management.
Usage
generate_randomizations_mc(
n_units,
n_treated,
X,
randomization_accept_prob = 1,
threshold_func = NULL,
max_draws = 1e+05,
batch_size = 1000,
approximate_inv = TRUE,
verbose = TRUE,
conda_env = "fastrerandomize",
conda_env_required = TRUE
)
Arguments
n_units |
An integer specifying the total number of experimental units. |
n_treated |
An integer specifying the number of units to be assigned to treatment. |
X |
A numeric matrix of covariates used for balance checking. Cannot be NULL. |
randomization_accept_prob |
A numeric value between 0 and 1 specifying the probability threshold for accepting randomizations based on balance. Default is 1 |
threshold_func |
A JAX function that computes a balance measure for each randomization. Must be vectorized using |
max_draws |
An integer specifying the maximum number of randomizations to draw. |
batch_size |
An integer specifying how many randomizations to process at once. Lower values use less memory but may be slower. |
approximate_inv |
A logical value indicating whether to use an approximate inverse
(diagonal of the covariance matrix) instead of the full matrix inverse when computing
balance metrics. This can speed up computations for high-dimensional covariates.
Default is |
verbose |
A logical value indicating whether to print detailed information about batch processing progress, and GPU memory usage. Default is |
conda_env |
A character string specifying the name of the conda environment to use
via |
conda_env_required |
A logical indicating whether the specified conda environment
must be strictly used. If |
Details
The function works by:
Generating batches of random permutations.
Computing balance measures for each permutation using the provided threshold function.
Keeping only the top permutations that meet the acceptance probability threshold.
Managing memory by clearing unused objects and caches between batches.
The function uses smaller data types (int8, float16) where possible to reduce memory usage. It also includes assertions to verify array shapes and dimensions throughout.
Value
The function returns a list with two elements:
candidate_randomizations
: an array of randomization vectors
M_candidate_randomizations
: an array of their balance measures.
See Also
generate_randomizations
for full randomization generation function.
generate_randomizations_exact
for the exact version.
Examples
## Not run:
# Generate synthetic data
X <- matrix(rnorm(100*5), 100, 5) # 5 covariates
# Generate 1000 randomizations for 100 units with 50 treated
rand_less_strict <- generate_randomizations_mc(
n_units = 100,
n_treated = 50,
X = X,
randomization_accept_prob=0.01,
max_draws = 100000,
batch_size = 1000)
# Use a stricter balance criterion
rand_more_strict <- generate_randomizations_mc(
n_units = 100,
n_treated = 50,
X = X,
randomization_accept_prob=0.001,
max_draws = 1000000,
batch_size = 1000)
## End(Not run)