Fuzzy sets for qualitative comparative analysis (QCA) using R

Jose M Sallan 2021-10-05 7 min read

Qualitative comparative analysis (QCA) is a data analysis technique that examines the relationships between an outcome and a set of explanatory variables. Unlike other techniques like linear regression, QCA is focused mainly on examining complex combinations of explanatory variables as antecedents of the outcome. Also unlike linear regression and structural equation modelling, QCA is grounded on Boolean algebra, rather than on correlation or covariance structures.

In this post, I will introduce how to define fuzzy memberships in QCA using the QCA package for qualitative comparative analysis. This intends to be the first of a serie of posts about how to perform QCA in R.

I will also use the tidyverse and patchwork for data manipulation and plotting.

library(QCA)
library(tidyverse)
library(patchwork)

The meaning of variables in QCA is different from regression or correlation analysis. A variable in QCA establishes if a case belongs or not to a set. Depending of the type of data we are analyzing, we have two variants of QCA:

  • Crisp-set QCA (csQCA), in which variables can only take values of zero (false) and one (true). It can be extended to multi-value variables, thus we have multi-value QCA (mvQCA). In csQCA and mvQCA each case can belong or not to the set defined by each variable.
  • Fuzzy-set QCA (fsQCA), where variables have too much values to be treated as multi-value. Fuzzy-set variables represent a degree of membership to a set. They can take any real value from zero (no inclusion) to one (full inclusion).

Obtaining fuzzy memberships with calibration

The calibration of a variable is the process to assign a fuzzy membership score for each case. Fuzzy membership can take values between zero (no membership) and one (full membership). Intermediate values represent the continuum of grades of case’s membership of a set.

Let’s examine how calibration works with the LR dataset of Lipset’s (1959) indicators for the survival of democracy during the inter-war period, included in the QCA package.

head(LR)
##     DEV  URB  LIT  IND STB SURV
## AU  720 33.4 98.0 33.4  10   -9
## BE 1098 60.5 94.4 48.9   4   10
## CZ  586 69.0 95.9 37.4   6    7
## EE  468 28.5 95.0 14.0   6   -6
## FI  590 22.0 99.1 22.0   9    4
## FR  983 21.2 96.2 34.8   5   10

Calibrating LR consists of assigning a fuzzy membership to each case for each of the six variables. In this case, variables are monotonous. For instance, high values of DEV represent a high value of development, so cases with high values of DEV will tend to belong to the set of highly-developed countries. For these kind of variables we will define a S-shaped calibration using three parameters:

  • e: the threshold for non-membership or exclusion e,
  • c: the crossover point c,
  • i: the threshold for full membership or inclusion i.

These values are crucial to replicate a fsQCA analysis, so it is important to report the calibration method and the values of calibration parameteres.

The calibrate function of QCA package allows logistic and linear S-shaped calibration by setting the logistic parameter to TRUE or FALSE, respectively. Let’s calibrate the DEV variable in both ways:

dev <- LR %>%
  select(DEV) %>%
  mutate(logistic = calibrate(DEV, type = "fuzzy", thresholds = "e=500, c=700, i=900", logistic=TRUE),
         linear = calibrate(DEV, ype = "fuzzy", thresholds = "e=500, c=700, i=900", logistic=FALSE))

Let’s plot the calibration results to see the difference between the two methods, and the role of e, c and i parameters:

In some ocassions, it might happen that full membership to a set is associated with intermediate values of the variable. This would be the case if we wanted to define a set of mid-developed countries from DEV. To do this we need to calibrate variables using a bell-shaped curve. For these functions, we need to define in calibrate the parameters e1, c1, i1 for the left-hand side of the bell, and e2, c2, i2 for the right-hand side. The below and above parameters allow a non-linear fit when assigning membership scores. Let’s see some examples with artificial data going from zero to 1000. I am defining a triangular calibration making i1 equal to i2, and a trapezoidal calibration making both variables different.

bell_test <- data.frame(X = seq(0, 1000, 50)) %>%
  mutate(tr_linear = calibrate(X, thresholds = "e1=100, c1=250, i1=500, i2=500, c2=750, e2=900"),
         tr_curve = calibrate(X, thresholds = "e1=100, c1=250, i1=500, i2=500, c2=750, e2=900", below = 3, above = 3),
         tp_linear = calibrate(X, thresholds = "e1=100, c1=250, i1=400, i2=600, c2=750, e2=900"),
         tp_curve = calibrate(X, thresholds = "e1=100, c1=250, i1=400, i2=600, c2=750, e2=900", below = 3, above = 3))

The following plot shows how each calibration works:

The QCA package provides LF, a version of the Lipset dataset calibrated to fuzzy sets:

head(LF)
##     DEV  URB  LIT  IND  STB SURV
## AU 0.81 0.12 0.99 0.73 0.43 0.05
## BE 0.99 0.89 0.98 1.00 0.98 0.95
## CZ 0.58 0.98 0.98 0.90 0.91 0.89
## EE 0.16 0.07 0.98 0.01 0.91 0.12
## FI 0.58 0.03 0.99 0.08 0.58 0.77
## FR 0.98 0.03 0.99 0.81 0.95 0.95

An important thing to consider is that you should avoid getting calibration values of exactly 0.5, as these will be problematic in further steps of the analysis.

Boolean algebra with fuzzy memberships

The QCA package provides LF, a version of the Lipset dataset calibrated to fuzzy sets:

head(LF)
##     DEV  URB  LIT  IND  STB SURV
## AU 0.81 0.12 0.99 0.73 0.43 0.05
## BE 0.99 0.89 0.98 1.00 0.98 0.95
## CZ 0.58 0.98 0.98 0.90 0.91 0.89
## EE 0.16 0.07 0.98 0.01 0.91 0.12
## FI 0.58 0.03 0.99 0.08 0.58 0.77
## FR 0.98 0.03 0.99 0.81 0.95 0.95

In QCA we can examine relationships between an outcome and complex relationships between antecedents. In the Lipset dataset, we can explore the relationship between the outcome survival of democracy SURV and level of development DEV, but we also can explore the relationship between the outcome and variable level of development and literacy DEV*LIT. We can do that defining a Boolean algebra for fuzzy sets:

  • Set intersection or AND operator: \(A*B = min\left(A,B\right)\).
  • Set union or OR operator: \(A+B = max\left(A,B\right)\).
  • Set negation or NOT operator: \(\sim A = 1-A\).

The result of a complex logical operation like \(DEV*URB + LIT*\sim IND\) returns a fuzzy membership score, similar to the ones obtained from calibration of the original explanatory variables. We can use the compute function from the admisc package (loaded when calling QCA) to obtain fuzzy memberships for relationships written as sum of products, where the product stands for logical AND / set intersection, and the sum for logical OR / set union (we need to write admisc::compute to differentiate the function from the compute function of dplyr).

admisc::compute("DEV*URB + LIT*~IND", data = LF)
##  [1] 0.27 0.89 0.58 0.98 0.92 0.19 0.79 0.13 0.88 0.98 0.41 0.98 0.59 0.01 0.17
## [16] 0.09 0.33 0.98

Fuzzy sets and qualitative comparative analysis

Zadeh (1965) defined fuzzy sets as objects with continuum of grades of membership. Rather than belonging or not to a set, the membership of an object is defined by a continuous function between zero and one. We can define a Boolean algebra for fuzzy sets with AND, OR and NOT operations, based on intersection, union and negation of sets. Among many other applications, fuzzy sets and fuzzy logic allow aplying qualitative comparative analysis to continuous variables.

Bibliography and resources

  • Dușa, Adrian (2021). QCA with R: A Comprehensive Resource. https://bookdown.org/dusadrian/QCAbook/
  • Lipset, S. M. (1959). Some Social Requisites of Democracy: Economic Development and Political Legitimacy. American Political Science Review, 53:69-105.
  • Zadeh, L. A. (1965). Fuzzy Sets. Information and Control 8:338–53.

Built with R 4.1.1, patchwork 1.1.1, QCA 3.12 and tidyverse 1.3.1