A Dummbell Plot of the Evolution of EU Gini Index

Jose M Sallan 2023-05-26 6 min read

In this post I present how to visualize the evolution of two sets of variables from the same individuals. I have used data of the Gini index obtained from the World Bank with the wbstats package to show the evolution of inequality of the current members of the European Union between 2010 and 2018. I rely on the tidyverse for data handling and visualization, with additional elements from ggimage.

library(tidyverse)
library(wbstats)
library(ggimage)

I start retrieving the Gini index from the World Bank, which is presented in the series SI.POV.GINI, with wbstats::wb_data. The vector eu_iso2c contains the Alpha 2 codes (ISO 3166) of the current members of the European Union.

gini <- wb_data("SI.POV.GINI", start_date = 2000, end_date = 2020)

eu_iso2c <- c("AT", "BE", "BG", "HR", "CY", "CZ",
              "DK", "EE", "FI", "FR", "DE", "GR",
              "HU", "IE", "IT", "LV", "LU", "MT",
              "NL", "PL", "PT", "RO", "SK", "SI",
              "ES", "SE")

gini_eu <- gini |>
  filter(iso2c %in% eu_iso2c, date %in% c(2010, 2018)) |>
  select(iso2c, date, SI.POV.GINI)

The gini_eu table contains the Gini indices of EU countries of 2010 and 2018:

gini_eu
## # A tibble: 52 × 3
##    iso2c  date SI.POV.GINI
##    <chr> <dbl>       <dbl>
##  1 AT     2018        30.8
##  2 AT     2010        30.3
##  3 BE     2018        27.2
##  4 BE     2010        28.4
##  5 BG     2018        41.3
##  6 BG     2010        35.7
##  7 HR     2018        29.7
##  8 HR     2010        32.4
##  9 CY     2018        32.7
## 10 CY     2010        31.5
## # ℹ 42 more rows

With this table, I have created an horizontal bar chart presenting the Gini indices for each country in 2010 and 2018. The character variable iso2c is mutated into a factor, with levels reordered with forcats::fct_reorder2 by the value of two variables: year and Gini index. Then, countries are ordered by value of Gini index in 2018.

gini_eu |>
  mutate(iso2c = fct_reorder2(iso2c, date, -SI.POV.GINI)) |>
  ggplot(aes(SI.POV.GINI, iso2c, fill = factor(date))) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  scale_fill_manual(name = "year", values = c("#66B2FF", "#004C99")) +
  theme(legend.position = "top") +
  labs(title = "Evolution of inequality in EU", x = "Gini index", y = NULL)

This bar chart is useful to present absolute values of Gini index, but can become somewhat cluttered to observe variation. Let’s present the same information with a dumbbell plot.

To create the dumbbell plot, we need to plot a segment for each of the countries with geom_segment(). As I need the values of Gini index for 2010 and 2018 for each country in the same row, I have created a gini_eu_wide with tidyr::pivot_wider(). I have also ordered countries by decreasing value of Gini index in 2018 using forcats::fct_reorder().

As I am focusing in visualizing variations, it can be of interest to show if Gini index has increased or decreased. That’s why I have created an ev variable, equal to the year when Gini index is higher.

gini_eu_wide <- gini_eu |>
  select(iso2c, date, SI.POV.GINI) |>
  pivot_wider(names_from = "date", values_from = "SI.POV.GINI") |>
  mutate(iso2c = as.factor(iso2c)) |>
  mutate(iso2c = fct_reorder(iso2c, `2018`)) |>
  mutate(ev = ifelse(`2018` > `2010`, "2018", "2010"))

That’s how gini_eu_wide looks like:

gini_eu_wide
## # A tibble: 26 × 4
##    iso2c `2018` `2010` ev   
##    <fct>  <dbl>  <dbl> <chr>
##  1 AT      30.8   30.3 2018 
##  2 BE      27.2   28.4 2010 
##  3 BG      41.3   35.7 2018 
##  4 HR      29.7   32.4 2010 
##  5 CY      32.7   31.5 2018 
##  6 CZ      25     26.6 2010 
##  7 DK      28.2   27.2 2018 
##  8 EE      30.3   32   2010 
##  9 FI      27.3   27.7 2010 
## 10 FR      32.4   33.7 2010 
## # ℹ 16 more rows

Let’s do a first draft of the dumbbell plot:

  • I have ordered the levels of iso2c in the same way as in the previous plot, which is also the ordering of gini_eu_wide.
  • In the x axis is presented the Gini index value SI.POV.GINI, and in the y axis the countries.
  • The plates of the dumbbell are created with geom_point(). I have used different colors for values of 2010 and 2018 so we can see if the index increases or decreases.
  • The bars of the dumbbell are created with geom_segment(), and with the gini_eu_wide table. The color of the segments is assigned with the ev variable.
  • To give an additional cue for Gini index increasing or decreasing, I have added an arrow to the segment, pointing at the value of 2018.
gini_eu |>
  mutate(iso2c = fct_reorder2(iso2c, date, -SI.POV.GINI)) |>
  ggplot(aes(SI.POV.GINI, iso2c)) +
  geom_point(aes(color = as.factor(date))) +
  geom_segment(data = gini_eu_wide, 
               mapping = aes(y = as.numeric(iso2c), 
                             yend = as.numeric(iso2c), 
                             x = `2010`, 
                             xend = `2018`,
                             color = ev), arrow = arrow(length = unit(0.02, "npc"))) +
  theme_minimal()

The resulting plot presents the required information: the red points are values of Gini index in 2010 and the blue ones of 2018. Blue segments present an increase of Gini index and red segments a decrease. Let’s do some additional aesthetic improvements:

  • Title and caption are added with labs(). I have also removed the label of the x axis.
  • Countries axis: I have replaced country names with flags. I have removed default labels and axis name with scale_y_discrete()and added the flags with geom_flag() from ggimage.
  • Colors: I have changed the default colors with the same blues of the barplot with scale_color_manual().
  • Background color and legend position: in the theme() I have changed the position of the legend, and changed the color background default by a light yellow, so that the flags can be perceived better.
gini_eu |>
  mutate(iso2c = fct_reorder2(iso2c, date, -SI.POV.GINI)) |>
  ggplot(aes(SI.POV.GINI, iso2c)) +
  geom_point(aes(color = as.factor(date))) +
  geom_segment(data = gini_eu_wide, 
               mapping = aes(y = as.numeric(iso2c), 
                             yend = as.numeric(iso2c), 
                             x = `2010`, 
                             xend = `2018`,
                             color = ev), arrow = arrow(length = unit(0.02, "npc"))) +
  labs(title = "Evolution of inequality in EU",
       caption = "Source: World Bank",
       x = NULL) +
  geom_flag(mapping = aes(x = 24, y = as.numeric(iso2c), image = iso2c),
            data = gini_eu_wide, size = 0.03) +
  scale_color_manual(name = "year", values = c("#66B2FF", "#004C99")) +
  scale_y_discrete(name = element_blank(), labels = element_blank()) +
  theme_minimal() +
  theme(plot.background = element_rect(fill = "#FFFFDD"), legend.position = "top")

An horizontal bar chart is useful to compare absolute values of a set of observations, although it can become too cluttered to present the evolution of two sets of observations. The dumbbell plot is a good alternative to visualize this evolution. It is not included by default in ggplot, but here I have presented how to draw it using plots and segments.

References

Session Info

## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 21.1
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Madrid
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggimage_0.3.2   wbstats_1.0.4   lubridate_1.9.2 forcats_1.0.0  
##  [5] stringr_1.5.0   dplyr_1.1.2     purrr_1.0.1     readr_2.1.4    
##  [9] tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] yulab.utils_0.0.6  sass_0.4.5         utf8_1.2.3         generics_0.1.3    
##  [5] ggplotify_0.1.0    blogdown_1.16      stringi_1.7.12     hms_1.1.3         
##  [9] digest_0.6.31      magrittr_2.0.3     evaluate_0.20      grid_4.3.0        
## [13] timechange_0.2.0   bookdown_0.33      fastmap_1.1.1      jsonlite_1.8.4    
## [17] httr_1.4.5         fansi_1.0.4        scales_1.2.1       jquerylib_0.1.4   
## [21] cli_3.6.1          rlang_1.1.0        munsell_0.5.0      withr_2.5.0       
## [25] cachem_1.0.7       yaml_2.3.7         tools_4.3.0        tzdb_0.3.0        
## [29] colorspace_2.1-0   curl_5.0.0         gridGraphics_0.5-1 vctrs_0.6.2       
## [33] R6_2.5.1           magick_2.7.4       lifecycle_1.0.3    ggfun_0.0.9       
## [37] pkgconfig_2.0.3    pillar_1.9.0       bslib_0.4.2        gtable_0.3.3      
## [41] Rcpp_1.0.10        glue_1.6.2         highr_0.10         xfun_0.39         
## [45] tidyselect_1.2.0   rstudioapi_0.14    knitr_1.42         farver_2.1.1      
## [49] htmltools_0.5.5    labeling_0.4.2     rmarkdown_2.21     compiler_4.3.0