Moderated multiple regression in R

Jose M Sallan 2021-11-18 8 min read

In this post, I will introduce the moderation relationship in linear regression. After defining moderation, I am presenting two examples with a categorical and continuous moderating variables, respectively. I am taking advantage of the possiblities of the tidyverse to plot interactions with ggplot, and to define a function to compute simple slopes using dplyr programmatically. Finally, I brielfy introduce the interactions package to visualize interactions in linear regression.

A moderator $z$ is a variable that affects the direction and/or strength of the relationship between an independent variable $x$ and a dependent variable $y$ . We often express this relationship in terms of interaction between $x$ and $z$ respect to its relationship with $y$ . In van Vegchel et al. (2005) we can find several possible modelisations of variable interaction.

The most common modelisation of moderation is to assume a linear evolution of the influence of the moderating variable. This linear interaction occurs when the regression coefficient of the product of dependent and moderator is significant.

$\begin{aligned} y & = β_{0} + (β_{1} + β_{2} z) + ε \\ y & = β_{0} + β_{1} x + β_{2} x z + ε \end{aligned}$

The most common way of estimating a linear moderation effect is through moderated multiple regression (Aguinis & Gottfredson, 2010):

$\begin{aligned} y & = β_{0} + β_{1} x + β_{2} z + β_{3} x z + ε \end{aligned}$

To confirm the existence of a moderating variable, we need to check if the regression coefficient of the product (interaction) term $β_{3}$ is significant. Note that in this model the distinction between dependent and moderating variable is theoretical, as ariables $x$ and $z$ are treated similarly.

The moderated multiple regression model can be called from R using a formula like y ~ x * z in the lm function call. This syntax generates regression variables x, z and x:z, the later representing the interaction term. If we wanted to enter the interaction term alone, we just specify a formula like y ~ x:z.

The workflow of the moderation analysis is slightly different depending if the moderator is categorical or continuous. Let’s examine an example of each.

Categorical moderator

We will use the mtcars dataset, that includes fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). My hypothesis is that the influence of weight wt on fuel consumption mpg in miles per gallon is moderated by the categorical variable type of transmission am (0 = automatic, 1 = manual).

Let’s build the moderated multiple regression model:

mt_model <- lm(mpg ~ wt * am, mtcars)

Let’s examine the summary of mt_model:

summary(mt_model)

## 
## Call:
## lm(formula = mpg ~ wt * am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6004 -1.5446 -0.5325  0.9012  6.0909 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  31.4161     3.0201  10.402 4.00e-11 ***
## wt           -3.7859     0.7856  -4.819 4.55e-05 ***
## am           14.8784     4.2640   3.489  0.00162 ** 
## wt:am        -5.2984     1.4447  -3.667  0.00102 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.591 on 28 degrees of freedom
## Multiple R-squared:  0.833,  Adjusted R-squared:  0.8151 
## F-statistic: 46.57 on 3 and 28 DF,  p-value: 5.209e-11

The interaction term wt:am is significant, so we can assert that am moderates the relationship between wt and mpg. I have chosen am as moderator instead of wt on theoretical grounds alone, as the moderated multiple regression model treats both variables equally.

To observe how the interaction works, we can examine the effect of wt on mpg for the two values of am using ggplot:

ggplot(mtcars, aes(wt, mpg, color = factor(am))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  scale_color_manual(name = "transmission", labels = c("automatic", "manual"), values = c("#FF6666", "#6666FF")) +
  theme_classic()

This plot allows us to see how it is the relationship between dependent and independent variables for each value of the categorical moderator. Fuel consumption always increases with weight, but this increase is higher for cars with manual transmission than for cars with automatic transmission. We learn this because the slope for manual transmission is steeper than for automatic transmission.

Continuous moderator

To examine how to deal with a continuous moderator, we will use the depress dataset, obtained from Zhang and Wang (2016-2020):

depress <- read.csv("depress.csv")
head(depress)

##   stress support depress
## 1      7       5      32
## 2      8       7      20
## 3      2       2      30
## 4      7       6      25
## 5      6       9      19
## 6      2       8      25

The theoretical guess made by authors is that the influence of stress on depression depress is moderated by social support. Let’s examine the results of the moderated multiple regression model.

depress_model <- lm(depress ~ support * stress, depress)
summary(depress_model)

## 
## Call:
## lm(formula = depress ~ support * stress, data = depress)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7322 -0.9035 -0.1127  0.8542  3.6089 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     29.2583     0.6909  42.351   <2e-16 ***
## support         -0.2356     0.1109  -2.125   0.0362 *  
## stress           1.9956     0.1161  17.185   <2e-16 ***
## support:stress  -0.3902     0.0188 -20.754   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.39 on 96 degrees of freedom
## Multiple R-squared:  0.9638, Adjusted R-squared:  0.9627 
## F-statistic:   853 on 3 and 96 DF,  p-value: < 2.2e-16

We observe that the interaction term support:stress is significant. To know how is the effect of the moderator on the relationship between the dependent and independent variables, we use simple slopes plots. These plots present the relationship between dependent and independent variable for three subsets of data:

Observations with high values of the moderator $z > \bar{z} + s_{z}$ .
Observations with low values of the moderator $z < \bar{z} - s_{z}$ .
The rest of medium values of the moderator.

I have defined a simple_slopes function, taking as inputs the dataset and character strings with the names of dependent, independent and moderator variables.

simple_slopes <- function(data, dependent, independent, moderator){
  
  dataset <- data %>% select(.data[[dependent]], .data[[independent]], .data[[moderator]])
  names(dataset) <- c("dep", "ind", "mod")
  
  sd_mod <- sd(dataset$mod)
  mean_mod <- mean(dataset$mod)
  
  dataset <- dataset %>%
    mutate(level_mod = case_when(mod < mean_mod - sd_mod ~ "low",
                                mod > mean_mod + sd_mod ~ "high",
                                TRUE ~ "medium")) %>%
    mutate(level_mod = factor(level_mod, levels = c("high", "medium", "low")))
  
  plot <- ggplot(dataset, aes(ind, dep, color = level_mod)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE) +
    scale_color_manual(name = moderator, values = c("#336600", "#66CC00", "#B2FF66")) +
    labs(x = independent, y = dependent) + 
    theme_bw()
  
  return(plot)
  
}

The result of applying the function to the dataset is:

simple_slopes(depress, "depress", "stress", "support")

The simple slopes plot tells us a complex moderating relationship. For high values of support, the relationship between stress and depression is negative, while the relationship is positive for low values of social support.

The interactions package

Instead of the function above, we can use the interactions package (Long, 2021).

library(interactions)

The function interact_plot produces simple slopes plots by specifying the model and the names of the dependent and moderating variables:

interact_plot(depress_model, "stress", "support", plot.points = TRUE)

The package also works with categorical moderators:

interact_plot(mt_model, wt, am, plot.points = TRUE)

In Long (2021) can be found other functionalities of this package, designed to visualize interactions between variables in linear regression.

References

Aguinis, H., & Gottfredson, R. K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31(6), 776–786. https://doi.org/10.1002/job.686
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3806354
Long, J. (2021). Exploring interactions with continuous predictors in regression models. https://cran.r-project.org/web/packages/interactions/vignettes/interactions.html
Programming with dplyr https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
van Vegchel, N., de Jonge, J., & Landsbergis, P. a. (2005). Occupational stress in (inter)action: the interplay between job demands and job resources. Journal of Organizational Behavior, 26(5), 535–560. https://doi.org/10.1002/job.327
Zhang, Z. & Lijuan Wang, L. (2016-2020). Moderation analysis, in Advanced statistics using R. https://advstats.psychstat.org/book/moderation/index.php

Session info

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 10 (buster)
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] interactions_1.1.5 forcats_0.5.1      stringr_1.4.0      dplyr_1.0.7       
##  [5] purrr_0.3.4        readr_2.0.2        tidyr_1.1.4        tibble_3.1.5      
##  [9] ggplot2_3.3.5      tidyverse_1.3.1   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6        lubridate_1.8.0   lattice_0.20-45   assertthat_0.2.1 
##  [5] digest_0.6.27     utf8_1.2.1        R6_2.5.0          cellranger_1.1.0 
##  [9] backports_1.2.1   reprex_2.0.1      evaluate_0.14     highr_0.9        
## [13] httr_1.4.2        blogdown_1.5      pillar_1.6.4      rlang_0.4.12     
## [17] readxl_1.3.1      rstudioapi_0.13   jquerylib_0.1.4   Matrix_1.3-4     
## [21] rmarkdown_2.9     labeling_0.4.2    splines_4.1.2     pander_0.6.4     
## [25] munsell_0.5.0     broom_0.7.10      compiler_4.1.2    modelr_0.1.8     
## [29] xfun_0.23         pkgconfig_2.0.3   mgcv_1.8-38       htmltools_0.5.1.1
## [33] tidyselect_1.1.1  bookdown_0.24     fansi_0.5.0       crayon_1.4.1     
## [37] tzdb_0.1.2        dbplyr_2.1.1      withr_2.4.2       grid_4.1.2       
## [41] nlme_3.1-153      jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.0  
## [45] DBI_1.1.1         magrittr_2.0.1    scales_1.1.1      cli_3.0.1        
## [49] stringi_1.7.3     farver_2.1.0      fs_1.5.0          xml2_1.3.2       
## [53] bslib_0.2.5.1     jtools_2.1.4      ellipsis_0.3.2    generics_0.1.0   
## [57] vctrs_0.3.8       tools_4.1.2       glue_1.4.2        hms_1.1.1        
## [61] yaml_2.2.1        colorspace_2.0-1  rvest_1.0.2       knitr_1.33       
## [65] haven_2.4.3       sass_0.4.0

The Jose M Sallan static website