Module 5.4

Multiple Predictors and Interactions

Overview

Building on our bivariate interpretation skills, this module extends logistic regression to realistic research scenarios involving multiple predictors and interaction effects. Real-world research rarely involves simple bivariate relationships. Instead, we need to control for multiple confounding variables and often discover that the effect of one variable depends on the level of another variable.

In this module, we will explore how controlling for additional variables changes our interpretation of individual predictors, understand when and why to include interaction terms in our models, learn to interpret complex conditional relationships step-by-step, and master both manual calculation and automated tools for understanding interactions. We will also discover how to visualize complex relationships using predicted probabilities and recognize when continuous versus dichotomous measures affect our analytical conclusions.

By the end of this module, you’ll understand how multiple predictors work together in logistic regression, be able to interpret interaction effects manually and using specialized R tools, know how to visualize complex conditional relationships, and recognize the theoretical and practical considerations that guide interaction modeling decisions.

Models with Multiple Predictors

In our previous modules, we focused on bivariate relationships to build solid interpretive foundations. However, real research requires us to control for multiple potential confounding variables simultaneously. One of the most influential examples of multiple predictor logistic regression in political science comes from Fearon and Laitin’s (2003) groundbreaking study of civil war onset.

Fearon and Laitin argued that civil wars are not primarily caused by ethnic or religious grievances, but rather by conditions that favor insurgency: weak state capacity, difficult terrain, and economic underdevelopment. Their comprehensive model included multiple predictors to control for alternative explanations while testing their core theoretical arguments.

Here we will loosely replicate their approach using the peacesciencer package and examine how our interpretation changes when we move from bivariate to multiple predictor models. First we will use the peacesciencer package to create a dataset similar to Fearon and Laitin’s. Notice how you can add multiple predictors using the add_* functions, which automatically handle data cleaning and merging. Notice also that we are creating a dichotomous democracy measure (democracy) based on the continuous V-Dem v2x_polyarchy score, which we will use in our interaction models later.

library(peacesciencer)
library(tidyverse)

# Recreate the conflict dataset, adding a dichotomous democracy measure
conflict_df <- create_stateyears(system = 'gw') |>
  filter(year %in% c(1946:1999)) |>
  add_ucdp_acd(type=c("intrastate"), only_wars = FALSE) |>
  add_democracy() |>
  add_creg_fractionalization() |>
  add_sdp_gdp() |>
  add_rugged_terrain() |>
  mutate(democracy = ifelse(v2x_polyarchy > 0.5, 1, 0)) |> # binary democracy measure
  select(-ucdpongoing, -maxintensity, -conflict_ids, -sdpest) |>
  drop_na() # need to drop NAs for logistic regression

glimpse(conflict_df)
Rows: 6,091
Columns: 17
$ gwcode         <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ statename      <chr> "United States of America", "United States of America",…
$ year           <dbl> 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1…
$ ucdponset      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ v2x_polyarchy  <dbl> 0.605, 0.587, 0.599, 0.599, 0.587, 0.602, 0.601, 0.594,…
$ polity2        <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
$ xm_qudsest     <dbl> 1.259180, 1.259180, 1.252190, 1.252190, 1.270106, 1.259…
$ ethfrac        <dbl> 0.2226323, 0.2248701, 0.2271561, 0.2294918, 0.2318781, …
$ ethpol         <dbl> 0.4152487, 0.4186156, 0.4220368, 0.4255134, 0.4290458, …
$ relfrac        <dbl> 0.4980802, 0.5009111, 0.5037278, 0.5065309, 0.5093204, …
$ relpol         <dbl> 0.7769888, 0.7770017, 0.7770303, 0.7770729, 0.7771274, …
$ wbgdp2011est   <dbl> 28.539, 28.519, 28.545, 28.534, 28.572, 28.635, 28.669,…
$ wbpopest       <dbl> 18.744, 18.756, 18.781, 18.804, 18.821, 18.832, 18.848,…
$ wbgdppc2011est <dbl> 9.794, 9.762, 9.764, 9.730, 9.752, 9.803, 9.821, 9.857,…
$ rugged         <dbl> 1.073, 1.073, 1.073, 1.073, 1.073, 1.073, 1.073, 1.073,…
$ newlmtnest     <dbl> 3.214868, 3.214868, 3.214868, 3.214868, 3.214868, 3.214…
$ democracy      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

Let’s now start out with a baseline model that includes multiple predictors. This model will allow us to see how controlling for other factors changes our interpretation of the relationship between democracy and conflict onset.

# Fit the multiple predictor baseline model
baseline_model <- glm(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                     wbgdppc2011est + ethfrac + relfrac,
                     data = conflict_df,
                     family = "binomial")

# Display the model summary
summary(baseline_model)

Call:
glm(formula = ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
    wbgdppc2011est + ethfrac + relfrac, family = "binomial", 
    data = conflict_df)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -5.53894    1.38149  -4.009 6.09e-05 ***
v2x_polyarchy  -0.54129    0.51336  -1.054 0.291690    
newlmtnest      0.12696    0.07256   1.750 0.080166 .  
wbpopest        0.26463    0.07029   3.765 0.000167 ***
wbgdppc2011est -0.36465    0.12180  -2.994 0.002755 ** 
ethfrac         0.81768    0.38874   2.103 0.035427 *  
relfrac        -0.37589    0.41629  -0.903 0.366543    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1172.3  on 6090  degrees of freedom
Residual deviance: 1113.6  on 6084  degrees of freedom
AIC: 1127.6

Number of Fisher Scoring iterations: 7

This baseline model reveals several important patterns. First, notice how some variables that might have been significant in bivariate models may lose significance when we control for other factors. This is a classic result in multiple regression: apparent relationships can disappear when we account for confounding variables.

The model tells us that population size (wbpopest) and GDP per capita (wbgdppc2011est) remain significant predictors even after controlling for other factors. Ethnic fractionalization (ethfrac) shows some evidence of increasing conflict risk, while the continuous democracy measure (v2x_polyarchy) is not statistically significant in this specification.

The key insight for interpretation is that each coefficient now represents the effect of that variable holding all other variables constant. When we say that the coefficient for GDP per capita is -0.365, we mean that a one-unit increase in log GDP per capita decreases the log-odds of conflict onset by 0.365 (or about 31% when we exponentiate the coefficient) controlling for democracy levels, terrain, population, ethnic fractionalization, and religious fractionalization*.

Your Turn!

The peacesciencer package provides multiple different measures of democracy, terrain and ethnic and religious divisions. Try running the baseline model using different combinations of these variables. How do the coefficients and significance levels change? What does this tell you about the importance of controlling for confounding variables in regression analysis?

Why Interaction Effects Matter in Conflict Research

The baseline model assumes that the effect of each predictor is the same regardless of the values of other predictors. This is called an additive model because the effects simply add up. But what if the effect of one variable depends on the level of another variable? What if mountainous terrain only increases conflict risk in non-democratic countries? What if economic development has a stronger peace-promoting effect in democratic societies?

These are questions about interaction effects, where the impact of one variable is conditional on the value of another variable. In conflict research, there are compelling theoretical reasons to expect such interactions. Democracy might provide institutional mechanisms for resolving grievances that would otherwise lead to violence. Democratic institutions might also change how other risk factors operate.

Consider two theoretical scenarios:

Democracy and Terrain: In non-democratic countries, mountainous terrain might facilitate insurgency because remote populations have no voice in government and rebels can hide in difficult terrain. In democratic countries, these same remote populations might have political representation and peaceful means of addressing grievances, reducing the conflict-promoting effect of terrain.

Democracy and Economic Development: Economic growth might reduce conflict risk in all countries, but this effect might be amplified in democracies where the benefits of growth are more widely shared and people have more say about how their tax dollars are spent.

Testing these theoretical expectations requires interaction terms in our logistic regression models. An interaction term allows the effect of one variable to vary depending on the level of another variable.

Interaction Effect #1: Democracy and Mountainous Terrain

Let’s start with the democracy-terrain interaction using our dichotomous democracy measure. This provides an excellent opportunity to understand interaction effects through manual calculation before moving to more complex tools.

# Fit model with democracy-terrain interaction
terrain_interaction_model <- glm(ucdponset ~ democracy * newlmtnest + wbpopest + 
                                wbgdppc2011est + ethfrac + relfrac,
                                data = conflict_df,
                                family = "binomial")

# Display the model summary
summary(terrain_interaction_model)

Call:
glm(formula = ucdponset ~ democracy * newlmtnest + wbpopest + 
    wbgdppc2011est + ethfrac + relfrac, family = "binomial", 
    data = conflict_df)

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -6.31995    1.40870  -4.486 7.24e-06 ***
democracy             0.19601    0.48195   0.407 0.684224    
newlmtnest            0.18651    0.07909   2.358 0.018355 *  
wbpopest              0.26762    0.07068   3.786 0.000153 ***
wbgdppc2011est       -0.29958    0.11970  -2.503 0.012324 *  
ethfrac               0.78512    0.38998   2.013 0.044091 *  
relfrac              -0.33550    0.41231  -0.814 0.415809    
democracy:newlmtnest -0.41501    0.20300  -2.044 0.040914 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1172.3  on 6090  degrees of freedom
Residual deviance: 1106.5  on 6083  degrees of freedom
AIC: 1122.5

Number of Fisher Scoring iterations: 7

The interaction term democracy:newlmtnest has a coefficient of -0.415 and is statistically significant (p = 0.041). But what does this mean substantively?

With a dichotomous interaction variable like our democracy measure, we can calculate the conditional effects manually. The model equation is:

\[\text{logit}(p) = \beta_0 + \beta_1(\text{democracy}) + \beta_2(\text{terrain}) + \beta_3(\text{democracy} \times \text{terrain}) + \text{other controls}\]

This gives us two different equations depending on the democracy level:

For non-democratic countries (democracy = 0) the democracy coefficient gets zeroed out, the interaction term drops out and the model simplifies to:

\[\text{logit}(p) = \beta_0 + \beta_2(\text{terrain}) + \text{other controls}\]

In other words, the effect of terrain is simply the value of the terrain coefficient \(\beta_2 = 0.187\).

For democratic countries (democracy = 1) we include the democracy coefficient and the interaction term (1 + terrain):

\[\text{logit}(p) = \beta_0 + \beta_1(1) + \beta_2(\text{terrain}) + \beta_3(\text{democracy} \times \text{terrain}) + \text{other controls}\] This simplifies to:

\[\text{logit}(p) = \beta_0 + \beta_1 + (\beta_2 + \beta_3)(\text{terrain}) + \text{other controls}\]

So then the marginal effect of terrain is \(\beta_2 + \beta_3 = 0.187 + (-0.415) = -0.228\).

Here is how it would look if we did that with R code.

library(marginaleffects)
library(broom)

conditional_effects_terrain <- slopes(terrain_interaction_model, 
                                      variables = "newlmtnest",
                                      by = "democracy",
                                      type = "link") |>
  tidy() |> # convert to tidy format
  mutate(odds.ratio = exp(estimate)) |> # add column with odds ratios
  select(
    term:std.error, 
    conf.low:odds.ratio
  )

conditional_effects_terrain
# A tibble: 2 × 7
  term       democracy estimate std.error conf.low conf.high odds.ratio
  <chr>          <dbl>    <dbl>     <dbl>    <dbl>     <dbl>      <dbl>
1 newlmtnest         0    0.187    0.0791   0.0315     0.342      1.21 
2 newlmtnest         1   -0.228    0.189   -0.599      0.142      0.796
Understanding the Code

The slopes() function in marginaleffects can calculate marginal effects on different scales. By default, it uses type = "response", which gives us marginal effects on the probability scale that show how much the probability of conflict changes per unit increase in terrain. These effects are typically very small because probabilities are constrained between 0 and 1.

When we specify type = "link", we get marginal effects on the log-odds scale instead. This matches the scale of our model coefficients and our manual calculations above. Using type = “link” is particularly helpful for understanding interactions because it directly corresponds to the coefficients we see in our model summary, making it easier to connect the automated results to the underlying mathematical relationships.

After we run the slopes() function, we use tidy() to convert the results into a tidy format that is easier to read and manipulate. We then calculate the odds ratios by exponentiating the estimates, which allows us to interpret the effects in terms of odds rather than log-odds. Then we select only the relevant columns for clarity.

These results reveal a fascinating pattern! In non-democratic countries, mountainous terrain increases the odds of conflict onset by about 21% for each unit increase in terrain roughness (odds ratio = 1.206). However, in democratic countries, mountainous terrain actually decreases the odds of conflict onset by about 20% for each unit increase (odds ratio = 0.796).

This interaction suggests that democracy fundamentally changes how terrain affects conflict risk. In authoritarian systems, difficult terrain may indeed facilitate insurgency by providing rebels with safe havens and making government control difficult. In democratic systems, perhaps the same geographical isolation is less problematic because remote populations have political voice and peaceful means of addressing grievances.

Your Turn!

Now practice interpreting interaction effects by examining the democracy main effect:

  1. Calculate the effect of democracy when terrain = 0 using the model coefficients
  2. Calculate the effect of democracy when terrain = 1 (one standard deviation above mean)
  3. Convert both to odds ratios and interpret the results
  4. What does this tell us about when democracy has stronger or weaker effects on conflict risk?

Remember: the interaction works both ways! If terrain’s effect depends on democracy, then democracy’s effect also depends on terrain.

Interaction Effect #2: Democracy and Economic Development

Our second interaction examines how democracy moderates the relationship between economic development and conflict risk. For this analysis, we’ll use the continuous democracy measure (v2x_polyarchy) to illustrate why we need more sophisticated tools when dealing with continuous interaction variables.

# Fit model with continuous democracy-wealth interaction
wealth_interaction_model <- glm(ucdponset ~ v2x_polyarchy * wbgdppc2011est + newlmtnest + 
                               wbpopest + ethfrac + relfrac,
                               data = conflict_df,
                               family = "binomial")

# Display the model summary
summary(wealth_interaction_model)

Call:
glm(formula = ucdponset ~ v2x_polyarchy * wbgdppc2011est + newlmtnest + 
    wbpopest + ethfrac + relfrac, family = "binomial", data = conflict_df)

Coefficients:
                             Estimate Std. Error z value Pr(>|z|)    
(Intercept)                  -8.34699    1.77914  -4.692 2.71e-06 ***
v2x_polyarchy                 9.39357    4.04517   2.322 0.020224 *  
wbgdppc2011est               -0.05136    0.17087  -0.301 0.763742    
newlmtnest                    0.15972    0.07470   2.138 0.032496 *  
wbpopest                      0.26812    0.06999   3.831 0.000128 ***
ethfrac                       0.75002    0.38459   1.950 0.051155 .  
relfrac                      -0.24191    0.41891  -0.577 0.563613    
v2x_polyarchy:wbgdppc2011est -1.13001    0.46270  -2.442 0.014598 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1172.3  on 6090  degrees of freedom
Residual deviance: 1107.3  on 6083  degrees of freedom
AIC: 1123.3

Number of Fisher Scoring iterations: 8

The interaction term v2x_polyarchy:wbgdppc2011est has a coefficient of -1.130 and is statistically significant (p = 0.015). This negative coefficient suggests that democracy amplifies the conflict-reducing effect of economic development. But with a continuous moderator variable, we can’t simply calculate “democracy = 0” and “democracy = 1” effects as we did before.

This is where the marginaleffects package becomes invaluable. We can specify at = list() to calculate the conditional effect of economic development at different levels of democracy, taking into account the continuous nature of our democracy measure:

conditional_effects_wealth <- slopes(wealth_interaction_model, 
                       variables = "wbgdppc2011est",
                       newdata = datagrid(v2x_polyarchy = c(0.1, 0.3, 0.5, 0.7, 0.9)),
                       type = "link") |>  
  tidy() |> 
  mutate(odds.ratio = exp(estimate)) |> 
  select(v2x_polyarchy, estimate, std.error, conf.low, conf.high, odds.ratio)


conditional_effects_wealth
# A tibble: 5 × 6
  v2x_polyarchy estimate std.error conf.low conf.high odds.ratio
          <dbl>    <dbl>     <dbl>    <dbl>     <dbl>      <dbl>
1           0.1   -0.164     0.141   -0.440     0.112      0.848
2           0.3   -0.390     0.117   -0.620    -0.161      0.677
3           0.5   -0.616     0.157   -0.925    -0.308      0.540
4           0.7   -0.842     0.230   -1.29     -0.391      0.431
5           0.9   -1.07      0.314   -1.68     -0.454      0.344

These results reveal how democracy progressively amplifies the peace-promoting effects of economic development. In highly authoritarian countries (polyarchy = 0.1), economic development has a modest negative effect on conflict risk. As democracy levels increase, this peace-promoting effect becomes much stronger. In highly democratic countries (polyarchy = 0.9), the conflict-reducing effect of economic development is substantially amplified.

The theoretical interpretation is compelling: economic development may reduce conflict risk in all political systems, but democratic institutions help ensure that the benefits of growth are more widely shared and that economic grievances can be addressed through peaceful political processes rather than violence.

Model Comparison and Selection Considerations

When should we include interaction terms in our models? This decision should be guided by both theoretical considerations and empirical evidence. Interactions should be theoretically motivated rather than the result of fishing expeditions through possible variable combinations.

In our conflict example, both interactions have strong theoretical foundations. The democracy-terrain interaction tests whether democratic institutions provide alternative channels for addressing grievances that might otherwise lead to insurgency in remote areas. The democracy-wealth interaction examines whether democratic institutions help translate economic development into political stability more effectively than authoritarian institutions.

Let’s compare our models to see how interactions affect overall model fit:

# Compare model fit using AIC
models_comparison <- data.frame(
  Model = c("Baseline", "Democracy × Terrain", "Democracy × Wealth"),
  AIC = c(AIC(baseline_model), AIC(terrain_interaction_model), AIC(wealth_interaction_model)),
  Deviance = c(deviance(baseline_model), deviance(terrain_interaction_model), deviance(wealth_interaction_model))
)

models_comparison
                Model      AIC Deviance
1            Baseline 1127.636 1113.636
2 Democracy × Terrain 1122.476 1106.476
3  Democracy × Wealth 1123.313 1107.313

Both models reduce the AIC compared to the baseline model. We can confirm this using likelihood ratio tests, which compare nested models to see if the additional parameters significantly improve fit:

# Test significance of interaction terms using likelihood ratio tests
library(lmtest)

# Test terrain interaction
lrtest(baseline_model, terrain_interaction_model)
Likelihood ratio test

Model 1: ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + wbgdppc2011est + 
    ethfrac + relfrac
Model 2: ucdponset ~ democracy * newlmtnest + wbpopest + wbgdppc2011est + 
    ethfrac + relfrac
  #Df  LogLik Df  Chisq Pr(>Chisq)   
1   7 -556.82                        
2   8 -553.24  1 7.1604   0.007453 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# For wealth interaction, need to adjust baseline model for fair comparison
baseline_continuous <- glm(ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + 
                          wbgdppc2011est + ethfrac + relfrac,
                          data = conflict_df, family = "binomial")

lrtest(baseline_continuous, wealth_interaction_model)
Likelihood ratio test

Model 1: ucdponset ~ v2x_polyarchy + newlmtnest + wbpopest + wbgdppc2011est + 
    ethfrac + relfrac
Model 2: ucdponset ~ v2x_polyarchy * wbgdppc2011est + newlmtnest + wbpopest + 
    ethfrac + relfrac
  #Df  LogLik Df  Chisq Pr(>Chisq)  
1   7 -556.82                       
2   8 -553.66  1 6.3229    0.01192 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The way we read this results is that if the p-value is less than 0.05, then we can conclude that the interaction term significantly improves model fit compared to the baseline model. If the p-value is greater than 0.05, then we cannot conclude that the interaction term significantly improves model fit.

Based on these tests, we can conclude that both interaction terms significantly improve model fit. However, remember that model fit alone should not drive interaction decisions. The theoretical motivation for expecting conditional effects should be the primary consideration.

Summary

This module has equipped you with essential skills for handling realistic logistic regression analyses involving multiple predictors and interaction effects. We discovered how moving from bivariate to multiple predictor models changes coefficient interpretation by controlling for confounding variables. We learned to identify when interaction effects are theoretically justified and how they allow the effect of one variable to depend on another. We mastered both manual calculation techniques for simple interactions and automated tools for complex conditional relationships. Most importantly, we developed skills in visualizing and communicating complex interaction patterns to diverse audiences.

The Fearon-Laitin replication demonstrated how democracy can fundamentally alter the relationships between other variables and conflict risk. Mountainous terrain appears to promote conflict only in non-democratic settings, while economic development has stronger peace-promoting effects in democratic contexts. These findings illustrate why interaction effects are crucial for understanding real-world political phenomena.

Our progression from manual calculation with dichotomous variables to automated tools with continuous variables provides a solid foundation for tackling any interaction scenario. The visualization techniques we learned help communicate complex statistical relationships to policy audiences who need to understand not just whether interactions exist, but what they mean in practice.

References

Fearon, James D., and David D. Laitin. 2003. “Ethnicity, Insurgency, and Civil War.” American Political Science Review 97 (1): 75–90. https://doi.org/10.1017/S0003055403000534.