Skip to contents

Creates a dataset based on the triangular model with single-factor error structure that satisfies Lewbel's identifying assumptions. The data generating process uses a common factor structure for the errors to ensure the covariance restriction Cov(Z, \(\epsilon_1 \epsilon_2\)) = 0 is satisfied.

Usage

generate_lewbel_data(n_obs, params, n_x = 1)

Arguments

n_obs

Integer. Sample size.

params

List. Parameters for the data generating process containing:

  • beta1_0, beta1_1: Parameters for first equation (beta1_1 can be a vector for multiple X)

  • beta2_0, beta2_1: Parameters for second equation (beta2_1 can be a vector for multiple X)

  • gamma1: Endogenous parameter (key parameter of interest)

  • alpha1, alpha2: Factor loadings for common factor U

  • delta_het: Heteroscedasticity strength parameter

n_x

Integer. Number of exogenous X variables to generate (default: 1). If n_x > 1, beta1_1 and beta2_1 should be vectors of length n_x.

Value

A data.frame with columns Y1, Y2, epsilon1, epsilon2, and:

  • If n_x = 1: Xk, Z

  • If n_x > 1: X1, X2, ..., Z1, Z2, ... (one Z per X)

Details

The triangular model consists of two equations: $$Y_1 = X'\beta_1 + \gamma_1 Y_2 + \epsilon_1$$ $$Y_2 = X'\beta_2 + \epsilon_2$$

where Y_1 is the outcome variable, Y_2 is the endogenous regressor, X is a vector of exogenous variables, and \((\epsilon_1, \epsilon_2)\) are the structural errors.

The error structure follows a single-factor model: $$\epsilon_1 = \alpha_1 U + V_1$$ $$\epsilon_2 = \alpha_2 U + V_2$$

where U, V_1, and V_2 are mutually independent, and heteroskedasticity is introduced through the variance of V_2.

The data generating process uses a unified approach for both single and multiple X:

  • For each j: Z_raw_j ~ Uniform(0, 1) independently

  • X_j = Z_raw_j

  • Z_j = Z_raw_j - mean(Z_raw_j) (centered for use as instrument)

  • For heteroskedasticity:

    • If n_x = 1: Z_het = Z_raw

    • If n_x > 1: Z_het = mean(Z_raw_1, ..., Z_raw_k)

  • V_2|Z_het ~ N(0, 0.5 + 2Z_het) (variance equals 0.5 + 2Z_het)

References

Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. Journal of Business & Economic Statistics, 30(1), 67-80. doi:10.1080/07350015.2012.643126

See also

verify_lewbel_assumptions for testing the assumptions, run_single_lewbel_simulation for using this data in simulations

Examples

if (FALSE) { # \dontrun{
# Single X variable (backward compatible)
params <- list(
  beta1_0 = 0.5, beta1_1 = 1.5, gamma1 = -0.8,
  beta2_0 = 1.0, beta2_1 = -1.0,
  alpha1 = -0.5, alpha2 = 1.0, delta_het = 1.2
)
data <- generate_lewbel_data(1000, params)

# Multiple X variables
params_multi <- list(
  beta1_0 = 0.5, beta1_1 = c(1.5, 3.0), gamma1 = -0.8,
  beta2_0 = 1.0, beta2_1 = c(-1.0, 0.7),
  alpha1 = -0.5, alpha2 = 1.0, delta_het = 1.2
)
data_multi <- generate_lewbel_data(1000, params_multi, n_x = 2)
} # }