In this post, I will introduce the moderation relationship in linear regression. After defining moderation, I am presenting two examples with a categorical and continuous moderating variables, respectively. I am taking advantage of the possiblities of the tidyverse to plot interactions with ggplot, and to define a function to compute simple slopes using dplyr programmatically. Finally, I brielfy introduce the interactions package to visualize interactions in linear regression.
A moderator \(z\) is a variable that affects the direction and/or strength of the relationship between an independent variable \(x\) and a dependent variable \(y\). We often express this relationship in terms of interaction between \(x\) and \(z\) respect to its relationship with \(y\). In van Vegchel et al. (2005) we can find several possible modelisations of variable interaction.
The most common modelisation of moderation is to assume a linear evolution of the influence of the moderating variable. This linear interaction occurs when the regression coefficient of the product of dependent and moderator is significant.
\[\begin{align} y &= \beta_0 + \left( \beta_1 + \beta_2z \right) + \varepsilon \\ y &= \beta_0 + \beta_1x + \beta_2xz + \varepsilon \end{align}\]
The most common way of estimating a linear moderation effect is through moderated multiple regression (Aguinis & Gottfredson, 2010):
\[\begin{align} y &= \beta_0 + \beta_1x + \beta_2z + \beta_3xz + \varepsilon \end{align}\]
To confirm the existence of a moderating variable, we need to check if the regression coefficient of the product (interaction) term \(\beta_3\) is significant. Note that in this model the distinction between dependent and moderating variable is theoretical, as ariables \(x\) and \(z\) are treated similarly.
The moderated multiple regression model can be called from R using a formula like y ~ x * z
in the lm
function call. This syntax generates regression variables x
, z
and x:z
, the later representing the interaction term. If we wanted to enter the interaction term alone, we just specify a formula like y ~ x:z
.
The workflow of the moderation analysis is slightly different depending if the moderator is categorical or continuous. Let’s examine an example of each.
Categorical moderator
We will use the mtcars
dataset, that includes fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). My hypothesis is that the influence of weight wt
on fuel consumption mpg
in miles per gallon is moderated by the categorical variable type of transmission am
(0 = automatic, 1 = manual).
Let’s build the moderated multiple regression model:
mt_model <- lm(mpg ~ wt * am, mtcars)
Let’s examine the summary of mt_model
:
summary(mt_model)
##
## Call:
## lm(formula = mpg ~ wt * am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6004 -1.5446 -0.5325 0.9012 6.0909
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.4161 3.0201 10.402 4.00e-11 ***
## wt -3.7859 0.7856 -4.819 4.55e-05 ***
## am 14.8784 4.2640 3.489 0.00162 **
## wt:am -5.2984 1.4447 -3.667 0.00102 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.591 on 28 degrees of freedom
## Multiple R-squared: 0.833, Adjusted R-squared: 0.8151
## F-statistic: 46.57 on 3 and 28 DF, p-value: 5.209e-11
The interaction term wt:am
is significant, so we can assert that am
moderates the relationship between wt
and mpg
. I have chosen am
as moderator instead of wt
on theoretical grounds alone, as the moderated multiple regression model treats both variables equally.
To observe how the interaction works, we can examine the effect of wt
on mpg
for the two values of am
using ggplot:
ggplot(mtcars, aes(wt, mpg, color = factor(am))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
scale_color_manual(name = "transmission", labels = c("automatic", "manual"), values = c("#FF6666", "#6666FF")) +
theme_classic()
This plot allows us to see how it is the relationship between dependent and independent variables for each value of the categorical moderator. Fuel consumption always increases with weight, but this increase is higher for cars with manual transmission than for cars with automatic transmission. We learn this because the slope for manual transmission is steeper than for automatic transmission.
Continuous moderator
To examine how to deal with a continuous moderator, we will use the depress
dataset, obtained from Zhang and Wang (2016-2020):
depress <- read.csv("depress.csv")
head(depress)
## stress support depress
## 1 7 5 32
## 2 8 7 20
## 3 2 2 30
## 4 7 6 25
## 5 6 9 19
## 6 2 8 25
The theoretical guess made by authors is that the influence of stress
on depression depress
is moderated by social support
. Let’s examine the results of the moderated multiple regression model.
depress_model <- lm(depress ~ support * stress, depress)
summary(depress_model)
##
## Call:
## lm(formula = depress ~ support * stress, data = depress)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7322 -0.9035 -0.1127 0.8542 3.6089
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.2583 0.6909 42.351 <2e-16 ***
## support -0.2356 0.1109 -2.125 0.0362 *
## stress 1.9956 0.1161 17.185 <2e-16 ***
## support:stress -0.3902 0.0188 -20.754 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.39 on 96 degrees of freedom
## Multiple R-squared: 0.9638, Adjusted R-squared: 0.9627
## F-statistic: 853 on 3 and 96 DF, p-value: < 2.2e-16
We observe that the interaction term support:stress
is significant. To know how is the effect of the moderator on the relationship between the dependent and independent variables, we use simple slopes plots. These plots present the relationship between dependent and independent variable for three subsets of data:
- Observations with high values of the moderator \(z > \bar{z} + s_z\).
- Observations with low values of the moderator \(z < \bar{z} - s_z\).
- The rest of medium values of the moderator.
I have defined a simple_slopes
function, taking as inputs the dataset and character strings with the names of dependent, independent and moderator variables.
simple_slopes <- function(data, dependent, independent, moderator){
dataset <- data %>% select(.data[[dependent]], .data[[independent]], .data[[moderator]])
names(dataset) <- c("dep", "ind", "mod")
sd_mod <- sd(dataset$mod)
mean_mod <- mean(dataset$mod)
dataset <- dataset %>%
mutate(level_mod = case_when(mod < mean_mod - sd_mod ~ "low",
mod > mean_mod + sd_mod ~ "high",
TRUE ~ "medium")) %>%
mutate(level_mod = factor(level_mod, levels = c("high", "medium", "low")))
plot <- ggplot(dataset, aes(ind, dep, color = level_mod)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
scale_color_manual(name = moderator, values = c("#336600", "#66CC00", "#B2FF66")) +
labs(x = independent, y = dependent) +
theme_bw()
return(plot)
}
The result of applying the function to the dataset is:
simple_slopes(depress, "depress", "stress", "support")
The simple slopes plot tells us a complex moderating relationship. For high values of support, the relationship between stress and depression is negative, while the relationship is positive for low values of social support.
The interactions package
Instead of the function above, we can use the interactions
package (Long, 2021).
library(interactions)
The function interact_plot
produces simple slopes plots by specifying the model and the names of the dependent and moderating variables:
interact_plot(depress_model, "stress", "support", plot.points = TRUE)
The package also works with categorical moderators:
interact_plot(mt_model, wt, am, plot.points = TRUE)
In Long (2021) can be found other functionalities of this package, designed to visualize interactions between variables in linear regression.
References
- Aguinis, H., & Gottfredson, R. K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31(6), 776–786. https://doi.org/10.1002/job.686
- Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3806354
- Long, J. (2021). Exploring interactions with continuous predictors in regression models. https://cran.r-project.org/web/packages/interactions/vignettes/interactions.html
- Programming with dplyr https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
- van Vegchel, N., de Jonge, J., & Landsbergis, P. a. (2005). Occupational stress in (inter)action: the interplay between job demands and job resources. Journal of Organizational Behavior, 26(5), 535–560. https://doi.org/10.1002/job.327
- Zhang, Z. & Lijuan Wang, L. (2016-2020). Moderation analysis, in Advanced statistics using R. https://advstats.psychstat.org/book/moderation/index.php
Session info
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 10 (buster)
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
##
## locale:
## [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] interactions_1.1.5 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
## [5] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4 tibble_3.1.5
## [9] ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 lubridate_1.8.0 lattice_0.20-45 assertthat_0.2.1
## [5] digest_0.6.27 utf8_1.2.1 R6_2.5.0 cellranger_1.1.0
## [9] backports_1.2.1 reprex_2.0.1 evaluate_0.14 highr_0.9
## [13] httr_1.4.2 blogdown_1.5 pillar_1.6.4 rlang_0.4.12
## [17] readxl_1.3.1 rstudioapi_0.13 jquerylib_0.1.4 Matrix_1.3-4
## [21] rmarkdown_2.9 labeling_0.4.2 splines_4.1.2 pander_0.6.4
## [25] munsell_0.5.0 broom_0.7.10 compiler_4.1.2 modelr_0.1.8
## [29] xfun_0.23 pkgconfig_2.0.3 mgcv_1.8-38 htmltools_0.5.1.1
## [33] tidyselect_1.1.1 bookdown_0.24 fansi_0.5.0 crayon_1.4.1
## [37] tzdb_0.1.2 dbplyr_2.1.1 withr_2.4.2 grid_4.1.2
## [41] nlme_3.1-153 jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.0
## [45] DBI_1.1.1 magrittr_2.0.1 scales_1.1.1 cli_3.0.1
## [49] stringi_1.7.3 farver_2.1.0 fs_1.5.0 xml2_1.3.2
## [53] bslib_0.2.5.1 jtools_2.1.4 ellipsis_0.3.2 generics_0.1.0
## [57] vctrs_0.3.8 tools_4.1.2 glue_1.4.2 hms_1.1.1
## [61] yaml_2.2.1 colorspace_2.0-1 rvest_1.0.2 knitr_1.33
## [65] haven_2.4.3 sass_0.4.0