Simulated Lewbel Test Data — lewbel

A simulated dataset for testing Lewbel (2012) identification methods. This dataset contains a triangular system with endogeneity and heteroskedasticity suitable for testing the hetid package functions.

Usage

lewbel_sim

Format

A data frame with 1000 rows and 5 variables:

id: Observation identifier
y: Dependent variable (Y1 in the triangular system)
P: Endogenous regressor (Y2 in the triangular system)
X1: First exogenous regressor
X2: Second exogenous regressor

Source

Simulated data using generate_lewbel_data() function

Details

The data was generated using the following triangular system: $$y = 0.5 + 1.5 \cdot X1 + 3.0 \cdot X2 - 0.8 \cdot P + \epsilon_1$$ $$P = 1.0 - 1.0 \cdot X1 + 0.7 \cdot X2 + \epsilon_2$$

The error structure follows a single-factor model with heteroskedasticity: $$\epsilon_1 = -0.5 \cdot U + V_1$$ $$\epsilon_2 = 1.0 \cdot U + V_2$$

where $V_2 \sim N(0, \exp(1.2 \cdot Z))$ with $Z = X2^2 - E[X2^2]$.

References

Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. Journal of Business & Economic Statistics, 30(1), 67-80. doi:10.1080/07350015.2012.643126