auxsurvey {AuxSurvey} | R Documentation |
Auxiliary Variables in Survey Analysis
Description
This function provides a user-friendly interface for various estimators in survey analysis when working with discretized auxiliary variables. Probability surveys often use continuous data from administrative records as auxiliary variables, but the utility of this data is diminished when discretized for confidentiality purposes. This package offers different estimators that handle discretized auxiliary variables effectively.
Usage
auxsurvey(
formula,
auxiliary = NULL,
samples,
population = NULL,
subset = NULL,
family = gaussian(),
method = c("sample_mean", "rake", "postStratify", "MRP", "GAMP", "linear", "BART"),
weights = NULL,
levels = c(0.95, 0.8, 0.5),
stan_verbose = TRUE,
show_plot = TRUE,
nskip = 1000,
npost = 1000,
nchain = 4,
HPD_interval = FALSE,
seed = NULL
)
Arguments
formula |
A string or formula specifying the outcome model. For non-model-based methods (e.g., sample mean, raking, post-stratification), only include the outcome variable (e.g., "~Y"). For model-based methods (e.g., MRP, GAMP, linear regression), additional fixed effect predictors can be specified, such as "Y ~ X1 + X2 + I(X^2)". For GAMP, smooth functions can be specified as "Y ~ X1 + s(X2, 10) + s(X3, by = X1)". Categorical variables are automatically treated as dummy variables in model-based methods. |
auxiliary |
A string specifying the formula for the auxiliary variables. For sample mean and
BART, this should be |
samples |
A dataframe or tibble containing all variables specified in |
population |
A dataframe or tibble containing all variables specified in |
subset |
A character vector representing filtering conditions to select subsets of |
family |
The distribution family of the outcome variable. Supported options are:
|
method |
A string specifying the model to use. Options include "sample_mean", "rake", "postStratify", "MRP", "GAMP", "linear", and "BART". |
weights |
A numeric vector of case weights. The length should match the number of cases in |
levels |
A numeric vector specifying the confidence levels for the confidence intervals (CIs). Multiple values can be specified to calculate multiple CIs. |
stan_verbose |
A logical scalar; if |
show_plot |
A logical scalar; if |
nskip |
An integer specifying the number of burn-in iterations for each chain in MCMC for Stan models.
Default is |
npost |
An integer specifying the number of posterior sampling iterations for each chain in MCMC for Stan models.
Default is |
nchain |
An integer specifying the number of MCMC chains for Stan models. Default is |
HPD_interval |
A logical scalar; if |
seed |
An integer specifying the random seed for reproducibility. Default is |
Details
The available estimators include:
Weighted or unweighted sample mean
Weighted or unweighted raking
Weighted or unweighted post-stratification
Bayesian methods:
BART (Bayesian Additive Regression Trees)
MRP (Multilevel Regression with Poststratification)
GAMP (Generalized Additive Model of Response Propensity)
Weighted linear regression
These Bayesian models are implemented using the rstan and rstanarm packages.
Value
A list containing the sample mean estimates and CIs for the subset and/or the whole dataset.
Each element in the list includes:
- estimate
: The point estimate of the sample mean.
- CI
: Confidence intervals for the sample mean.
- Other elements for each confidence level specified in levels
.
Examples
## Simulate data with nonlinear association (setting 3).
data = simulate(N = 3000, discretize = 10, setting = 3, seed = 123)
population = data$population
samples = data$samples
ipw = 1 / samples$true_pi
true_mean = mean(population$Y1)
## IPW Sample Mean
IPW_sample_mean = auxsurvey("~Y1", auxiliary = NULL, weights = ipw,
samples = samples, population = population,
subset = c("Z1 == 1 & Z2 == 1"), method = "sample_mean",
levels = 0.95)
## Raking
rake = auxsurvey("~Y1", auxiliary = "Z1 + Z2 + Z3 + auX_10", samples = samples,
population = population, subset = c("Z1 == 1", "Z1 == 1 & Z2 == 1"),
method = "rake", levels = 0.95)
## MRP
MRP = auxsurvey("Y1 ~ 1 + Z1", auxiliary = "Z2 + Z3:auX_10", samples = samples,
population = population, subset = c("Z1 == 1", "Z1 == 1 & Z2 == 1"),
method = "MRP", levels = 0.95, nskip = 4000, npost = 4000,
nchain = 1, stan_verbose = FALSE, HPD_interval = TRUE)
## GAMP
GAMP = auxsurvey("Y1 ~ 1 + Z1 + Z2 + Z3", auxiliary = "s(auX_10) + s(logit_true_pi, by = Z1)",
samples = samples, population = population, method = "GAMP",
levels = 0.95, nskip = 4000, npost = 4000, nchain = 1,
stan_verbose = FALSE, HPD_interval = TRUE)
## BART
BART = auxsurvey("Y1 ~ Z1 + Z2 + Z3 + auX_10", auxiliary = NULL, samples = samples,
population = population, method = "BART", levels = 0.95,
nskip = 4000, npost = 4000, nchain = 1, HPD_interval = TRUE)