A simulated dataset for testing Lewbel (2012) identification methods. This dataset contains a triangular system with endogeneity and heteroskedasticity suitable for testing the hetid package functions.
Format
A data frame with 1000 rows and 5 variables:
- id
Observation identifier
- y
Dependent variable (Y1 in the triangular system)
- P
Endogenous regressor (Y2 in the triangular system)
- X1
First exogenous regressor
- X2
Second exogenous regressor
Details
The data was generated using the following triangular system: $$y = 0.5 + 1.5 \cdot X1 + 3.0 \cdot X2 - 0.8 \cdot P + \epsilon_1$$ $$P = 1.0 - 1.0 \cdot X1 + 0.7 \cdot X2 + \epsilon_2$$
The error structure follows a single-factor model with heteroskedasticity: $$\epsilon_1 = -0.5 \cdot U + V_1$$ $$\epsilon_2 = 1.0 \cdot U + V_2$$
where \(V_2 \sim N(0, \exp(1.2 \cdot Z))\) with \(Z = X2^2 - E[X2^2]\).
References
Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. Journal of Business & Economic Statistics, 30(1), 67-80. doi:10.1080/07350015.2012.643126