
GMM Estimation for Lewbel (2012) Heteroskedasticity-Based Identification
Source:vignettes/lewbel-gmm.Rmd
lewbel-gmm.Rmd
Introduction
This vignette demonstrates how to use the GMM (Generalized Method of Moments) implementation for Lewbel’s (2012) heteroskedasticity-based identification method. The GMM approach offers several advantages over traditional 2SLS:
- Efficiency: GMM provides optimal weighting of moment conditions
- Flexibility: Supports both triangular and simultaneous equation systems
- Robustness: Multiple variance-covariance matrix specifications available
Basic Usage
Triangular System
The triangular system is the most common case:
\begin{aligned} Y_1 &= X'\beta_1 + \gamma_1 Y_2 + \epsilon_1 \\ Y_2 &= X'\beta_2 + \epsilon_2 \end{aligned}
# Generate example data
set.seed(123)
n <- 1000
params <- list(
beta1_0 = 0.5, beta1_1 = 1.5, gamma1 = -0.8,
beta2_0 = 1.0, beta2_1 = -1.0,
alpha1 = -0.5, alpha2 = 1.0, delta_het = 1.2
)
data <- generate_lewbel_data(n, params)
# Estimate using GMM
gmm_result <- lewbel_gmm(data, system = "triangular")
summary(gmm_result)
Simultaneous System
For systems with bidirectional causality:
\begin{aligned} Y_1 &= X'\beta_1 + \gamma_1 Y_2 + \epsilon_1 \\ Y_2 &= X'\beta_2 + \gamma_2 Y_1 + \epsilon_2 \end{aligned}
Note: Requires \gamma_1 \gamma_2 \neq 1 for identification.
# Generate data with multiple $X$ variables
n_sim <- 1000
data_sim <- data.frame(
X1 = rnorm(n_sim),
X2 = rnorm(n_sim),
X3 = rnorm(n_sim)
)
# Create endogenous variables
gamma1_true <- -0.6
gamma2_true <- 0.3
beta1 <- c(0.5, 1.0, -0.5, 0.8)
beta2 <- c(1.0, -0.5, 1.2, -0.3)
# Generate with heteroskedasticity
u1 <- rnorm(n_sim) * exp(0.5 * data_sim$X1)
u2 <- rnorm(n_sim) * exp(-0.5 * data_sim$X1)
# Solve system
x_mat <- as.matrix(cbind(1, data_sim[, c("X1", "X2", "X3")]))
gamma_mat <- matrix(c(0, gamma1_true, gamma2_true, 0), 2, 2)
beta_mat <- cbind(beta1, beta2)
u_mat <- cbind(u1, u2)
i_minus_gamma_inv <- solve(diag(2) - gamma_mat)
y_constructed <- (x_mat %*% beta_mat + u_mat) %*% t(i_minus_gamma_inv)
data_sim$Y1 <- y_constructed[, 1]
data_sim$Y2 <- y_constructed[, 2]
# Estimate
gmm_sim <- lewbel_gmm(
data_sim,
system = "simultaneous",
y1_var = "Y1",
y2_var = "Y2",
x_vars = c("X1", "X2", "X3")
)
print(gmm_sim)
GMM Types
Three types of GMM estimators are available:
Two-Step GMM (Default)
Fast and reliable for most applications:
gmm_twostep <- lewbel_gmm(data, gmm_type = "twostep")
Iterative GMM
May improve finite sample properties:
gmm_iter <- lewbel_gmm(data, gmm_type = "iterative")
Continuous Updating Estimator (CUE)
Theoretically optimal but computationally intensive:
gmm_cue <- lewbel_gmm(data, gmm_type = "cue")
Custom Heteroskedasticity Drivers
By default, centered X variables are used as heteroskedasticity drivers (Z). You can specify custom Z variables:
# Add custom $Z$ variables
data$Z1 <- data$Xk^2
data$Z2 <- abs(data$Xk)
# Estimate with custom $Z$
gmm_custom <- lewbel_gmm(
data,
z_vars = c("Z1", "Z2")
)
coef(gmm_custom)["gamma1"]
Variance-Covariance Options
HAC (Default)
Heteroskedasticity and Autocorrelation Consistent:
gmm_hac <- lewbel_gmm(data, vcov = "HAC")
IID
Assumes independent and identically distributed errors:
gmm_iid <- lewbel_gmm(data, vcov = "iid")
Clustered
For grouped data:
# Add cluster variable
data$cluster <- rep(1:10, each = n_obs / 10)
gmm_cluster <- lewbel_gmm(
data,
vcov = "cluster",
cluster_var = "cluster"
)
Comparing GMM with 2SLS
The package provides a convenience function to compare GMM and 2SLS estimates:
comparison <- compare_gmm_2sls(data)
print(comparison)
Diagnostics
First-Stage Strength
Check instrument relevance:
# Manually construct Lewbel instruments
e2_hat <- residuals(lm(Y2 ~ Xk, data = data))
z_centered <- scale(data$Xk, center = TRUE, scale = FALSE)
lewbel_iv <- z_centered * e2_hat
# First-stage regression
first_stage <- lm(Y2 ~ Xk + lewbel_iv, data = data)
f_stat <- summary(first_stage)$fstatistic[1]
cat("First-stage F-statistic:", round(f_stat, 2), "\n")
Bootstrap Inference
For small samples, bootstrap inference may be preferred:
library(boot)
# Bootstrap function
bootstrap_function <- function(data, indices) {
gmm_boot <- lewbel_gmm(
data = data[indices, ],
y1_var = "Y1", y2_var = "Y2",
x_vars = c("X1", "X2", "X3"),
model_type = "simultaneous",
add_intercept = TRUE,
compute_se = FALSE,
verbose = FALSE
) # Suppress messages within bootstrap
coef(gmm_boot)["gamma1"]
}
# Run bootstrap (reduce B for speed in example)
set.seed(123)
boot_results <- boot(data, bootstrap_function, R = 100)
# Confidence interval
boot_ci <- boot.ci(boot_results, type = "perc")
print(boot_ci)
Recommendations
-
Choice of GMM Type:
- Use two-step GMM for most applications
- Consider iterative GMM for small samples
- Use CUE when optimal efficiency is crucial
-
Heteroskedasticity Drivers:
- Default (centered X) works well in most cases
- Custom Z variables can improve efficiency if theory suggests specific drivers
-
Variance-Covariance Matrix:
- Use HAC (default) for time series or when autocorrelation is suspected
- Use IID only when errors are truly independent
- Use clustered for grouped data
-
Sample Size:
- GMM performs well with n > 500
- Consider bootstrap for n < 500
- Check first-stage F-statistic for weak instruments
-
Model Selection:
- Use triangular system unless bidirectional causality is theoretically justified
- For simultaneous systems, ensure identification condition holds
See Also
- Theory and Methods - Mathematical foundations
- Getting Started - Basic usage examples
- Klein & Vella Method - Semiparametric control function approach
- Package Comparison - Validation against other software
- Rigobon Method - Alternative regime-based approach
References
Lewbel (2012)