generate_qualitative_data_did {causalQual}R Documentation

Generate Qualitative Data (Difference-in-Differences)

Description

Generate a synthetic data set with qualitative outcomes under a difference-in-differences design. The data include two time periods, a binary treatment indicator (applied only in the second period), and a matrix of covariates. Probabilities time shift among the treated and control groups evolve similarly across the two time periods (parallel trends on the probability mass functions).

Usage

generate_qualitative_data_did(n, assignment, outcome_type)

Arguments

n

Sample size.

assignment

String controlling treatment assignment. Must be either "randomized" (random assignment) or "observational" (assignment based on covariates).

outcome_type

String controlling the outcome type. Must be either "multinomial" or "ordered".

Details

Outcome type

Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_did computes linear predictors for each class using the covariates:

\eta_{mi} (d, s) = \beta_{m1}^d X_{i1} + \beta_{m2}^d X_{i2} + \beta_{m3}^d X_{i3}, \quad d = 0, 1, \quad s = t-1, t,

and then transforms \eta_{mi} (d, s) into valid probability distributions using the softmax function:

P(Y_{is}(d) = m | X_i) = \frac{\exp(\eta_{mi} (d, s))}{\sum_{m'} \exp(\eta_{m'i}(d, s))}, \quad d = 0, 1, \quad s = t-1, t.

It then generates potential outcomes Y_{it-1}(1), Y_{it}(1), Y_{it-1}(0), and Y_{it}(0) by sampling from {1, 2, 3} using P(Y(d, s) = m \mid X), \, d = 0, 1, \, s = t-1, t.

If instead outcome_type == "ordered", generate_qualitative_data_did first generates latent potential outcomes:

Y_i^* (d, s) = \tau d + X_{i1} + X_{i2} + X_{i3} + N (0, 1), \quad d = 0, 1, \quad s = t-1, t,

with \tau = 2. It then constructs Y_i (d, s) by discretizing Y_i^* (d, s) using threshold parameters \zeta_1 = 2 and \zeta_2 = 4. Then,

P(Y_i(d, s) = m | X_i) = P(\zeta_{m-1} < Y_i^*(d, s) \leq \zeta_m | X_i) = \Phi (\zeta_m - \sum_j X_{ij} - \tau d) - \Phi (\zeta_{m-1} - \sum_j X_{ij} - \tau d), \quad d = 0, 1, \quad s = t-1, t,

which allows us to analytically compute the probabilities of shift on the treated.

Treatment assignment

Treatment is always assigned as D_i \sim \text{Bernoulli}(\pi(X_i)). If assignment == "randomized", then the propensity score is specified as \pi(X_i) = P ( D_i = 1 | X_i)) = 0.5. If instead assignment == "observational", then \pi(X_i) = (X_{i1} + X_{i3}) / 2.

Other details

The function always generates three independent covariates from U(0,1). Observed outcomes Y_{is} are always constructed using the usual observational rule.

Value

A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift on the treated.

Author(s)

Riccardo Di Francesco

See Also

generate_qualitative_data_soo generate_qualitative_data_iv generate_qualitative_data_rd

Examples

## Generate synthetic data.
set.seed(1986)

data <- generate_qualitative_data_did(100,
                                      assignment = "observational",
                                      outcome_type = "ordered")

data$pshifts_treated


[Package causalQual version 1.0.0 Index]