Module 5.6

How Would We Model This Today?

Prework

Install the fixest package (install.packages("fixest")) and review the documentation
Install the lme4 package (install.packages("lme4")) for mixed effects models
Install the sandwich package documentation for robust standard errors (install.packages("sandwich"))

Overview

Fearon and Laitin’s groundbreaking 2003 study of civil war onset helped establish many conventions for conflict research. But that was over 20 years ago! How would we approach the same research question today with modern econometric tools and a deeper understanding of causal identification challenges?

In this module, we’ll explore how contemporary data scientists would tackle the Fearon and Laitin analysis differently. We’ll discover why clustered standard errors are now standard practice for panel data and how they can change our conclusions about statistical significance. We’ll learn when random effects models help account for panel structure while preserving key theoretical variables. We’ll understand how temporal fixed effects control for confounding trends that might bias our estimates. Finally, we’ll get a preview of modern causal identification strategies and why simple correlational approaches often fall short of causal claims.

By the end of this module, you’ll understand the importance of accounting for non-independence in panel data, know when random effects models are valuable for testing theories about structural factors, recognize how temporal confounds can bias estimates, and appreciate the challenges of making causal claims from observational data. Most importantly, you’ll be prepared for the more sophisticated analytical approaches you’ll encounter in advanced courses.

Setting Up the Modern Analysis

Let’s start by recreating our Fearon and Laitin dataset, but this time we’ll structure it properly for modern panel data analysis:

library(peacesciencer)
library(tidyverse)
library(fixest)
library(lme4)
library(modelsummary)
library(ggplot2)

# Recreate the conflict dataset
conflict_df <- create_stateyears(system = 'gw') |>
  filter(year %in% c(1946:1999)) |>
  add_ucdp_acd(type=c("intrastate"), only_wars = FALSE) |>
  add_democracy() |>
  add_creg_fractionalization() |>
  add_sdp_gdp() |>
  add_rugged_terrain() |>
  select(-ucdpongoing, -maxintensity, -conflict_ids, -sdpest) |>
  drop_na() |>
  # Add variables for modern approaches
  mutate(
    # Create a "democratization" treatment variable
    democracy_change = ave(v2x_polyarchy, gwcode, FUN = function(x) c(0, diff(x))),
    # Flag for democratization episodes
    democratizing = ifelse(democracy_change > 0.1, 1, 0)
  )

# Quick data overview
cat("Dataset:", nrow(conflict_df), "country-years from", 
    min(conflict_df$year), "to", max(conflict_df$year))

Dataset: 6091 country-years from 1946 to 1999

The Original Approach (2003)

Let’s start with the original Fearon and Laitin approach for comparison:

# The original approach - pooled logistic regression
original_model <- glm(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                     wbgdppc2011est + ethfrac + relfrac,
                     data = conflict_df, family = "binomial")

summary(original_model)


Call:
glm(formula = ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
    wbgdppc2011est + ethfrac + relfrac, family = "binomial", 
    data = conflict_df)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -5.53894    1.38149  -4.009 6.09e-05 ***
v2x_polyarchy  -0.54129    0.51336  -1.054 0.291690    
newlmtnest      0.12696    0.07256   1.750 0.080166 .  
wbpopest        0.26463    0.07029   3.765 0.000167 ***
wbgdppc2011est -0.36465    0.12180  -2.994 0.002755 ** 
ethfrac         0.81768    0.38874   2.103 0.035427 *  
relfrac        -0.37589    0.41629  -0.903 0.366543    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1172.3  on 6090  degrees of freedom
Residual deviance: 1113.6  on 6084  degrees of freedom
AIC: 1127.6

Number of Fisher Scoring iterations: 7

This is the approach that was standard in 2003: treat each country-year as an independent observation and run a pooled logistic regression. But we now know this approach has serious problems.

Problem #1: Non-Independence and Clustered Standard Errors

The biggest issue with the original approach is that it assumes each observation is independent. But countries are observed multiple times, and observations within the same country are likely correlated. A country that experiences conflict in one year is more likely to experience conflict in subsequent years. Economic and political characteristics also persist over time within countries.

Modern practice requires clustered standard errors that account for this non-independence:

# Modern approach with clustered standard errors
clustered_model <- feglm(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                        wbgdppc2011est + ethfrac + relfrac,
                        data = conflict_df, family = "binomial",
                        cluster = "gwcode")

summary(clustered_model)

GLM estimation, family = binomial, Dep. Var.: ucdponset
Observations: 6,091
Standard-errors: Clustered (gwcode) 
                Estimate Std. Error   z value   Pr(>|z|)    
(Intercept)    -5.538940   1.493886 -3.707740 0.00020912 ***
v2x_polyarchy  -0.541294   0.486385 -1.112893 0.26575459    
newlmtnest      0.126958   0.061822  2.053621 0.04001243 *  
wbpopest        0.264634   0.071146  3.719608 0.00019953 ***
wbgdppc2011est -0.364648   0.139672 -2.610745 0.00903453 ** 
ethfrac         0.817682   0.323638  2.526530 0.01151954 *  
relfrac        -0.375894   0.429018 -0.876175 0.38093513    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood:  -556.8   Adj. Pseudo R2: 0.039802
           BIC: 1,174.6     Squared Cor.: 0.010612

Notice how the standard errors change! This is not just a technical detail—it can fundamentally alter which variables we consider statistically significant and how confident we are in our estimates.

Code

# Compare original vs clustered
clustering_comparison <- list(
  "Original (2003)" = original_model,
  "Clustered SE" = clustered_model
)

modelplot(clustering_comparison, coef_omit = "Intercept") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Impact of Clustered Standard Errors",
       subtitle = "Comparing to 2003 baseline approach",
       x = "Coefficient Estimate (Log-Odds Scale)") +
  theme_minimal()

Problem #2: Panel Structure and Random Effects

Beyond the standard errors issue, pooled models ignore the panel structure of our data entirely. Each country has its own baseline conflict risk due to unobserved factors like historical legacies, cultural characteristics, or geographic features not captured in our variables.

Random effects models account for this panel structure while still allowing us to estimate effects of time-invariant variables like terrain and ethnic fractionalization:

# Random effects model using lme4
re_model <- glmer(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                  wbgdppc2011est + ethfrac + relfrac + (1|gwcode),
                  data = conflict_df, family = "binomial")

summary(re_model)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + wbgdppc2011est +  
    ethfrac + relfrac + (1 | gwcode)
   Data: conflict_df

     AIC      BIC   logLik deviance df.resid 
  1129.6   1183.4   -556.8   1113.6     6083 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.2976 -0.1582 -0.1282 -0.0918 15.5768 

Random effects:
 Groups Name        Variance  Std.Dev. 
 gwcode (Intercept) 4.778e-18 2.186e-09
Number of obs: 6091, groups:  gwcode, 151

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -5.53894    1.38087  -4.011 6.04e-05 ***
v2x_polyarchy  -0.54129    0.51332  -1.054 0.291659    
newlmtnest      0.12696    0.07256   1.750 0.080168 .  
wbpopest        0.26463    0.07028   3.765 0.000166 ***
wbgdppc2011est -0.36465    0.12178  -2.994 0.002750 ** 
ethfrac         0.81768    0.38872   2.104 0.035422 *  
relfrac        -0.37589    0.41627  -0.903 0.366520    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) v2x_pl nwlmtn wbppst wb2011 ethfrc
v2x_plyrchy  0.247                                   
newlmtnest  -0.040  0.101                            
wbpopest    -0.667 -0.047 -0.226                     
wbgdppc2011 -0.597 -0.474  0.083 -0.143              
ethfrac     -0.177  0.112  0.087 -0.172  0.245       
relfrac     -0.151 -0.059  0.101 -0.007  0.107 -0.216
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Code

# Compare original vs random effects
panel_comparison <- list(
  "Original (2003)" = original_model,
  "Random Effects" = re_model
)

modelplot(panel_comparison, coef_omit = "Intercept") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Impact of Accounting for Panel Structure",
       subtitle = "Random effects vs. 2003 baseline approach",
       x = "Coefficient Estimate (Log-Odds Scale)") +
  theme_minimal()

Why Random Effects Instead of Fixed Effects?

Fixed effects would absorb all time-invariant variables—terrain, ethnic fractionalization, religious fractionalization—which are central to Fearon and Laitin’s theory! Random effects allows us to test these key hypotheses while still accounting for unobserved country-specific factors.

The trade-off is that random effects assumes these country-specific factors are uncorrelated with our predictors, which may be questionable for conflict research.

Problem #3: Temporal Confounds and Year Fixed Effects

Another major issue is temporal confounding. The world changed dramatically during 1946-1999: the Cold War ended, international norms shifted, economic systems evolved. Without controlling for these temporal trends, we might attribute to our variables effects that are really due to changing global conditions.

Year fixed effects control for any factors that affect all countries in a given year:

# Add year fixed effects to control for temporal trends
year_fe_model <- feglm(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                      wbgdppc2011est + ethfrac + relfrac | year,
                      data = conflict_df, family = "binomial",
                      cluster = "gwcode")

summary(year_fe_model)

GLM estimation, family = binomial, Dep. Var.: ucdponset
Observations: 4,827
Fixed-effects: year: 43
Standard-errors: Clustered (gwcode) 
                Estimate Std. Error   z value   Pr(>|z|)    
v2x_polyarchy  -0.659361   0.509249 -1.294771 0.19539927    
newlmtnest      0.128998   0.064177  2.010049 0.04442602 *  
wbpopest        0.274511   0.073877  3.715794 0.00020257 ***
wbgdppc2011est -0.354237   0.144387 -2.453384 0.01415190 *  
ethfrac         0.800075   0.332117  2.409015 0.01599566 *  
relfrac        -0.368224   0.430174 -0.855987 0.39200483    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood:  -508.0   Adj. Pseudo R2: 0.003872
           BIC: 1,431.6     Squared Cor.: 0.0296

Code

# Compare original vs year fixed effects
temporal_comparison <- list(
  "Original (2003)" = original_model,
  "Year Fixed Effects" = year_fe_model
)

modelplot(temporal_comparison, coef_omit = "Intercept") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Impact of Controlling for Temporal Trends",
       subtitle = "Year fixed effects vs. 2003 baseline approach",
       x = "Coefficient Estimate (Log-Odds Scale)") +
  theme_minimal()

Robustness Check: Are the Effects Stable Over Time?

Given the major global changes during this period, we should check whether relationships were consistent across time. Let’s compare the Cold War period (1946-1989) with the post-Cold War period (1990-1999):

# Split sample analysis
cold_war <- conflict_df |> filter(year <= 1989)
post_cold_war <- conflict_df |> filter(year > 1989)

cold_war_model <- feglm(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                       wbgdppc2011est + ethfrac + relfrac,
                       data = cold_war, family = "binomial",
                       cluster = "gwcode")

post_cold_war_model <- feglm(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                            wbgdppc2011est + ethfrac + relfrac,
                            data = post_cold_war, family = "binomial",
                            cluster = "gwcode")

cat("Cold War period:", nrow(cold_war), "observations")

Cold War period: 4627 observations

cat("Post-Cold War period:", nrow(post_cold_war), "observations")

Post-Cold War period: 1464 observations

Code

# Compare time periods to original baseline
robustness_comparison <- list(
  "Original (Full Sample)" = original_model,
  "Cold War (1946-1989)" = cold_war_model,
  "Post-Cold War (1990-1999)" = post_cold_war_model
)

modelplot(robustness_comparison, coef_omit = "Intercept") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Temporal Robustness: Are Effects Stable Over Time?",
       subtitle = "Comparing time periods to full-sample baseline",
       x = "Coefficient Estimate (Log-Odds Scale)") +
  theme_minimal()

A Preview of Causal Identification: The Democratization “Treatment”

Modern research increasingly thinks in terms of causal identification. Instead of asking “Are democratic countries less likely to experience conflict?” we might ask “Does democratization reduce conflict risk?” This shifts focus from correlation to causation.

Let’s explore this by treating democratization episodes as a “treatment”:

# Look for democratization episodes
cat("Democratization episodes in the data:", sum(conflict_df$democratizing))

Democratization episodes in the data: 174

# What predicts democratization?
democratization_predictors <- feglm(democratizing ~ lag(wbgdppc2011est, 1) + 
                                   lag(wbpopest, 1) + newlmtnest | year,
                                   data = conflict_df, family = "binomial",
                                   cluster = "gwcode")

# Does democratization reduce subsequent conflict?
democratization_effects <- feglm(ucdponset ~ lag(democratizing, 1) + 
                                lag(democratizing, 2) + wbgdppc2011est + 
                                wbpopest + newlmtnest | year,
                                data = conflict_df, family = "binomial",
                                cluster = "gwcode")

summary(democratization_effects)

GLM estimation, family = binomial, Dep. Var.: ucdponset
Observations: 4,827
Fixed-effects: year: 43
Standard-errors: Clustered (gwcode) 
                       Estimate Std. Error   z value   Pr(>|z|)    
lag(democratizing, 1)  0.345479   0.428179  0.806855 4.1975e-01    
wbgdppc2011est        -0.512646   0.102713 -4.991036 6.0056e-07 ***
wbpopest               0.294163   0.077796  3.781236 1.5605e-04 ***
newlmtnest             0.136666   0.060943  2.242506 2.4929e-02 *  
... 1 variable was removed because of collinearity (lag(democratizing, 2))
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood:  -511.1   Adj. Pseudo R2: 0.001974
           BIC: 1,420.8     Squared Cor.: 0.027457

Your Turn!

The democratization analysis likely shows weak or non-significant effects. This isn’t a failure—it’s a valuable lesson! Consider:

Why might democratization episodes be rare in this dataset?
What does it mean that we can’t easily identify causal effects of democratization?
How might researchers today approach this question differently?
What additional data or research designs might help identify causal effects?

Model Comparison Summary

Let’s create a summary table to see how our conclusions change across specifications:

Code

# Create comparison table focusing on key models
key_models <- list(
  "Original" = original_model,
  "Clustered SE" = clustered_model,
  "Random Effects" = re_model,
  "Year FE" = year_fe_model
)

coef_order <- c(
  "v2x_polyarchy" = "Democracy (Polyarchy)",
  "newlmtnest" = "Mountainous Terrain", 
  "wbpopest" = "Population (log)",
  "wbgdppc2011est" = "GDP per Capita (log)",
  "ethfrac" = "Ethnic Fractionalization",
  "relfrac" = "Religious Fractionalization",
  "(Intercept)" = "Constant"
)

modelsummary(
  key_models,
  coef_map = coef_order,
  stars = TRUE,
  gof_map = c("nobs", "aic"),
    output = "gt",
  title = "Table 1: How Modern Methods Change Fearon & Laitin Results",
  notes = list(
    "Standard errors clustered by country for models 2 and 4.",
    "Random effects model includes country-level random intercepts.", 
    "Year FE model includes fixed effects for each year."
  )
)

Table 1: How Modern Methods Change Fearon & Laitin Results
	Original	Clustered SE	Random Effects	Year FE
Democracy (Polyarchy)	-0.541	-0.541	-0.541	-0.659
	(0.513)	(0.486)	(0.513)	(0.509)
Mountainous Terrain	0.127+	0.127*	0.127+	0.129*
	(0.073)	(0.062)	(0.073)	(0.064)
Population (log)	0.265***	0.265***	0.265***	0.275***
	(0.070)	(0.071)	(0.070)	(0.074)
GDP per Capita (log)	-0.365**	-0.365**	-0.365**	-0.354*
	(0.122)	(0.140)	(0.122)	(0.144)
Ethnic Fractionalization	0.818*	0.818*	0.818*	0.800*
	(0.389)	(0.324)	(0.389)	(0.332)
Religious Fractionalization	-0.376	-0.376	-0.376	-0.368
	(0.416)	(0.429)	(0.416)	(0.430)
Constant	-5.539***	-5.539***	-5.539***
	(1.381)	(1.494)	(1.381)
Num.Obs.	6091	6091	6091	4827
AIC	1127.6	1127.6	1129.6	1114.0
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
Standard errors clustered by country for models 2 and 4.
Random effects model includes country-level random intercepts.
Year FE model includes fixed effects for each year.

What We’ve Learned About Modern Practice

This analysis reveals several crucial insights about how empirical research has evolved:

Standard errors matter enormously. Clustering can dramatically increase uncertainty around estimates, potentially changing which variables we consider significant. Ignoring non-independence can lead to false confidence in our results.

Panel structure provides both opportunities and challenges. Random effects allow us to account for unobserved country-specific heterogeneity while preserving the ability to test theories about time-invariant factors like geography and social structure. But this comes with assumptions about how unobserved factors relate to our predictors.

Temporal confounds are everywhere. Year fixed effects help us separate the effects of our variables from broader historical trends. Without these controls, we might attribute to democracy or development effects that really reflect global changes in conflict patterns.

Causal identification is hard. Simply including control variables doesn’t solve the problem of confounding. Modern research increasingly relies on natural experiments, instrumental variables, and other identification strategies that go beyond the standard regression approach.

Robustness matters. Findings that depend critically on specific modeling choices or sample restrictions should be viewed with skepticism. Robust results survive multiple reasonable specifications.

Looking Ahead: Modern Causal Identification

While this module focused on improving traditional regression approaches, contemporary conflict research increasingly employs more sophisticated identification strategies:

Instrumental variables for democracy (e.g., using regional democratization waves)
Regression discontinuity designs around arbitrary thresholds
Difference-in-differences approaches comparing similar countries before/after policy changes
Natural experiments exploiting random variation in treatment assignment

These approaches represent the cutting edge of empirical research and will be covered in advanced econometrics courses. But the foundations we’ve built here—understanding panel data, clustering, and the challenges of causal inference—prepare you for these more sophisticated methods.

Summary

Twenty years after Fearon and Laitin’s influential study, we would approach the same research question with greater appreciation for the complexities of panel data and causal identification. While their core insights about the importance of state capacity, terrain, and economic development remain influential, modern methods reveal greater uncertainty around specific estimates and highlight the challenges of making causal claims from observational data.

This evolution reflects the broader maturation of empirical social science. As our methods become more sophisticated, we become more humble about what we can conclude from correlational evidence—but also more creative about finding ways to identify causal relationships in complex social phenomena.