
Generate Data for Lewbel (2012) Triangular Model
Source:R/data-generation.R
generate_lewbel_data.Rd
Creates a dataset based on the triangular model with single-factor error structure that satisfies Lewbel's identifying assumptions. The data generating process uses a common factor structure for the errors to ensure the covariance restriction Cov(Z, \(\epsilon_1 \epsilon_2\)) = 0 is satisfied.
Arguments
- n_obs
Integer. Sample size.
- params
List. Parameters for the data generating process containing:
beta1_0, beta1_1: Parameters for first equation (beta1_1 can be a vector for multiple X)
beta2_0, beta2_1: Parameters for second equation (beta2_1 can be a vector for multiple X)
gamma1: Endogenous parameter (key parameter of interest)
alpha1, alpha2: Factor loadings for common factor U
delta_het: Heteroscedasticity strength parameter
- n_x
Integer. Number of exogenous X variables to generate (default: 1). If n_x > 1, beta1_1 and beta2_1 should be vectors of length n_x.
Value
A data.frame with columns Y1, Y2, epsilon1, epsilon2, and:
If n_x = 1: Xk, Z
If n_x > 1: X1, X2, ..., Z1, Z2, ... (one Z per X)
Details
The triangular model consists of two equations: $$Y_1 = X'\beta_1 + \gamma_1 Y_2 + \epsilon_1$$ $$Y_2 = X'\beta_2 + \epsilon_2$$
where Y_1 is the outcome variable, Y_2 is the endogenous regressor, X is a vector of exogenous variables, and \((\epsilon_1, \epsilon_2)\) are the structural errors.
The error structure follows a single-factor model: $$\epsilon_1 = \alpha_1 U + V_1$$ $$\epsilon_2 = \alpha_2 U + V_2$$
where U, V_1, and V_2 are mutually independent, and heteroskedasticity is introduced through the variance of V_2.
The data generating process uses a unified approach for both single and multiple X:
For each j: Z_raw_j ~ Uniform(0, 1) independently
X_j = Z_raw_j
Z_j = Z_raw_j - mean(Z_raw_j) (centered for use as instrument)
For heteroskedasticity:
If n_x = 1: Z_het = Z_raw
If n_x > 1: Z_het = mean(Z_raw_1, ..., Z_raw_k)
V_2|Z_het ~ N(0, 0.5 + 2Z_het) (variance equals 0.5 + 2Z_het)
References
Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. Journal of Business & Economic Statistics, 30(1), 67-80. doi:10.1080/07350015.2012.643126
See also
verify_lewbel_assumptions
for testing the assumptions,
run_single_lewbel_simulation
for using this data in simulations
Examples
if (FALSE) { # \dontrun{
# Single X variable (backward compatible)
params <- list(
beta1_0 = 0.5, beta1_1 = 1.5, gamma1 = -0.8,
beta2_0 = 1.0, beta2_1 = -1.0,
alpha1 = -0.5, alpha2 = 1.0, delta_het = 1.2
)
data <- generate_lewbel_data(1000, params)
# Multiple X variables
params_multi <- list(
beta1_0 = 0.5, beta1_1 = c(1.5, 3.0), gamma1 = -0.8,
beta2_0 = 1.0, beta2_1 = c(-1.0, 0.7),
alpha1 = -0.5, alpha2 = 1.0, delta_het = 1.2
)
data_multi <- generate_lewbel_data(1000, params_multi, n_x = 2)
} # }