Skip to contents

A simulated dataset for testing Lewbel (2012) identification methods. This dataset contains a triangular system with endogeneity and heteroskedasticity suitable for testing the hetid package functions.

Usage

lewbel_sim

Format

A data frame with 1000 rows and 5 variables:

id

Observation identifier

y

Dependent variable (Y1 in the triangular system)

P

Endogenous regressor (Y2 in the triangular system)

X1

First exogenous regressor

X2

Second exogenous regressor

Source

Simulated data using generate_lewbel_data() function

Details

The data was generated using the following triangular system: $$y = 0.5 + 1.5 \cdot X1 + 3.0 \cdot X2 - 0.8 \cdot P + \epsilon_1$$ $$P = 1.0 - 1.0 \cdot X1 + 0.7 \cdot X2 + \epsilon_2$$

The error structure follows a single-factor model with heteroskedasticity: $$\epsilon_1 = -0.5 \cdot U + V_1$$ $$\epsilon_2 = 1.0 \cdot U + V_2$$

where \(V_2 \sim N(0, \exp(1.2 \cdot Z))\) with \(Z = X2^2 - E[X2^2]\).

References

Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. Journal of Business & Economic Statistics, 30(1), 67-80. doi:10.1080/07350015.2012.643126