Qualitative comparative analysis (QCA) is a data analysis technique that examines the relationships between an outcome and a set of explanatory variables using Boolean algebra, rather than analysis of correlation or covariance structures. Unlike other techniques like linear regression, QCA is focused mainly on examining complex combinations of explanatory variables as antecedents of the outcome.
The minimization of the rows of the truth table associated to the outcome allows us to obtain a solution covering all observed sufficient conditions for the outcome. It is frequent to obtain a similar set of solutions for the negated outcome. As QCA is asymmetric, the later is not the negation of the former. As we will see in this post, both solutions must be considered simultaneously.
Social science phenomena have limited diversity: many of the rows of the truth table do not have any or not enough observations, even for large datasets. These rows with not enough observations are called remainders. The solutions considering only rows related to the outcome are complex or conservative solutions, as they assume that none of the remainders is associated to the outcome. Through the Standard Analysis (SA) (Raggin & Sonnet, 2005), the Enhanced Standard Analysis (ESA) and the Theory-Guided Standard Analysis (TESA) (Schneider & Wagemann, 2013) we can obtain more parsimonious (simpler) sufficient solutions, more easy to interpret theoretically. To do so, we use counterfactual analysis, that allows to add some of the remainders to the solution.
In this post, I will discuss the details of the implementation of SA, ESA and TESA using the the R QCA
package for fuzzy-set QCA (fsQCA) analysis.
library(QCA)
Along this post, I will be using the fuzzy-set version of Lipset’s (1959) indicators for the survival of democracy during the inter-war period, included in the QCA
package.
head(LF)
## DEV URB LIT IND STB SURV
## AU 0.81 0.12 0.99 0.73 0.43 0.05
## BE 0.99 0.89 0.98 1.00 0.98 0.95
## CZ 0.58 0.98 0.98 0.90 0.91 0.89
## EE 0.16 0.07 0.98 0.01 0.91 0.12
## FI 0.58 0.03 0.99 0.08 0.58 0.77
## FR 0.98 0.03 0.99 0.81 0.95 0.95
The first step to obtain a sufficient solution in fsQCA is to obtain the truth table. We assign each case to the row of the truth table closer to its membership scores, and we assign to the outcome the rows with a consistency score above the minimum consistency threshold incl.cut
. We can also set a minimum number of cases n.cut
to consider the row. Here I obtain the truth table of LF
for incl.cut = 0.8
and the default n.cut = 1
:
tt_surv <- truthTable(LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB", complete = TRUE, show.cases = TRUE, incl.cut = 0.8)
tt_surv
##
## OUT: output value
## n: number of cases in configuration
## incl: sufficiency inclusion score
## PRI: proportional reduction in inconsistency
##
## DEV URB LIT IND STB OUT n incl PRI cases
## 1 0 0 0 0 0 0 3 0.216 0.000 GR,PT,ES
## 2 0 0 0 0 1 0 2 0.278 0.000 IT,RO
## 3 0 0 0 1 0 ? 0 - -
## 4 0 0 0 1 1 ? 0 - -
## 5 0 0 1 0 0 0 2 0.521 0.113 HU,PL
## 6 0 0 1 0 1 0 1 0.529 0.228 EE
## 7 0 0 1 1 0 ? 0 - -
## 8 0 0 1 1 1 ? 0 - -
## 9 0 1 0 0 0 ? 0 - -
## 10 0 1 0 0 1 ? 0 - -
## 11 0 1 0 1 0 ? 0 - -
## 12 0 1 0 1 1 ? 0 - -
## 13 0 1 1 0 0 ? 0 - -
## 14 0 1 1 0 1 ? 0 - -
## 15 0 1 1 1 0 ? 0 - -
## 16 0 1 1 1 1 ? 0 - -
## 17 1 0 0 0 0 ? 0 - -
## 18 1 0 0 0 1 ? 0 - -
## 19 1 0 0 1 0 ? 0 - -
## 20 1 0 0 1 1 ? 0 - -
## 21 1 0 1 0 0 ? 0 - -
## 22 1 0 1 0 1 1 2 0.804 0.719 FI,IE
## 23 1 0 1 1 0 0 1 0.378 0.040 AU
## 24 1 0 1 1 1 0 2 0.709 0.634 FR,SE
## 25 1 1 0 0 0 ? 0 - -
## 26 1 1 0 0 1 ? 0 - -
## 27 1 1 0 1 0 ? 0 - -
## 28 1 1 0 1 1 ? 0 - -
## 29 1 1 1 0 0 ? 0 - -
## 30 1 1 1 0 1 ? 0 - -
## 31 1 1 1 1 0 0 1 0.445 0.050 DE
## 32 1 1 1 1 1 1 4 0.904 0.886 BE,CZ,NL,UK
Looking at the truth table, we observe that:
- rows 22 and 32 have value
OUT = 1
, so they are associated to the outcome. - rows 1, 2, 5, 6, 23, 24 and 31 have value
OUT = 0
, so they have more thann.cut
cases but are not associated to the outcome. - the other 23 rows have less than
n.cut
cases so they haveOUT = ?
. These are the remainders of the truth table.
Let’s examine the truth table of the negated outcome:
tt_not_surv <- truthTable(LF, outcome = "~SURV", conditions = "DEV, URB, LIT, IND, STB", complete = TRUE, show.cases = TRUE, incl.cut = 0.8)
tt_not_surv
##
## OUT: output value
## n: number of cases in configuration
## incl: sufficiency inclusion score
## PRI: proportional reduction in inconsistency
##
## DEV URB LIT IND STB OUT n incl PRI cases
## 1 0 0 0 0 0 1 3 1.000 1.000 GR,PT,ES
## 2 0 0 0 0 1 1 2 0.982 0.975 IT,RO
## 3 0 0 0 1 0 ? 0 - -
## 4 0 0 0 1 1 ? 0 - -
## 5 0 0 1 0 0 1 2 0.855 0.732 HU,PL
## 6 0 0 1 0 1 1 1 0.861 0.772 EE
## 7 0 0 1 1 0 ? 0 - -
## 8 0 0 1 1 1 ? 0 - -
## 9 0 1 0 0 0 ? 0 - -
## 10 0 1 0 0 1 ? 0 - -
## 11 0 1 0 1 0 ? 0 - -
## 12 0 1 0 1 1 ? 0 - -
## 13 0 1 1 0 0 ? 0 - -
## 14 0 1 1 0 1 ? 0 - -
## 15 0 1 1 1 0 ? 0 - -
## 16 0 1 1 1 1 ? 0 - -
## 17 1 0 0 0 0 ? 0 - -
## 18 1 0 0 0 1 ? 0 - -
## 19 1 0 0 1 0 ? 0 - -
## 20 1 0 0 1 1 ? 0 - -
## 21 1 0 1 0 0 ? 0 - -
## 22 1 0 1 0 1 0 2 0.498 0.281 FI,IE
## 23 1 0 1 1 0 1 1 0.974 0.960 AU
## 24 1 0 1 1 1 0 2 0.495 0.366 FR,SE
## 25 1 1 0 0 0 ? 0 - -
## 26 1 1 0 0 1 ? 0 - -
## 27 1 1 0 1 0 ? 0 - -
## 28 1 1 0 1 1 ? 0 - -
## 29 1 1 1 0 0 ? 0 - -
## 30 1 1 1 0 1 ? 0 - -
## 31 1 1 1 1 0 1 1 0.971 0.950 DE
## 32 1 1 1 1 1 0 4 0.250 0.106 BE,CZ,NL,UK
For this truth table we observe that:
- rows 1, 2, 5, 6, 23 and 31 are associated with the negated outcome and have
OUT = 1
. - rows 22, 24 and 32 have
OUT = 0
, so they are not associated with the negated outcome. - the remainders are the same 23 rows with
OUT = ?
of theSURV
truth table.
Row 24 has OUT = 0
in both tables, so it is not associated neither to the outcome nor its negation.
The complex or conservative solution
We obtain the complex or conservative solution using the default values of the minimize
function:
com_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE)
com_surv
##
## M1: DEV*URB*LIT*IND*STB + DEV*~URB*LIT*~IND*STB -> SURV
##
## inclS PRI covS covU cases
## -----------------------------------------------------------------
## 1 DEV*URB*LIT*IND*STB 0.904 0.886 0.454 0.393 BE,CZ,NL,UK
## 2 DEV*~URB*LIT*~IND*STB 0.804 0.719 0.265 0.204 FI,IE
## -----------------------------------------------------------------
## M1 0.870 0.843 0.658
In this case, the minimization does not have any effect, and the solution are the expressions of rows 32 and 22 united by an OR operator.
This solution is called conservative because it is the most restrictive: we consider that none of the remainders is associated to the outcome. It is the most complex because it is the one using less rows, and so it is the hardest to minimize.
Let’s also obtain the conservative solution for the negated outcome:
com_not_surv <- minimize(tt_not_surv, details = TRUE, show.cases = TRUE)
com_not_surv
##
## M1: ~DEV*~URB*~IND + DEV*LIT*IND*~STB -> ~SURV
##
## inclS PRI covS covU cases
## ---------------------------------------------------------------------------
## 1 ~DEV*~URB*~IND 0.886 0.854 0.678 0.582 GR,PT,ES; IT,RO; HU,PL; EE
## 2 DEV*LIT*IND*~STB 0.981 0.973 0.220 0.124 AU; DE
## ---------------------------------------------------------------------------
## M1 0.897 0.871 0.803
The parsimonious solution
While in the conservative solution we treat the remainders as not belonging to the solution, in the parsimonious solution we treat remainders as don’t care. We can include any of the remainders, as long as it contributes to obtain a simpler solution, through setting include = "?"
in the minimize
function.
par_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE, include = "?")
par_surv
##
## M1: DEV*~IND + URB*STB -> SURV
##
## inclS PRI covS covU cases
## ----------------------------------------------------
## 1 DEV*~IND 0.815 0.721 0.284 0.194 FI,IE
## 2 URB*STB 0.874 0.845 0.520 0.430 BE,CZ,NL,UK
## ----------------------------------------------------
## M1 0.850 0.819 0.714
The parsimonious solution DEV*~IND + URB*STB
is simpler (more parsimonious) than the complex DEV*URB*LIT*IND*STB + DEV*~URB*LIT*~IND*STB
. As the former has been formed adding rows to the later, the parsimonious solution contains the complex solution. In other words:
- The conservative solution is a subset of the parsimonious solution.
- The parsimonious solution is a superset of the complex solution.
We can see which remainders we have used to obtain the parsimonious solution doing:
par_surv$SA$M1
## DEV URB LIT IND STB
## 10 0 1 0 0 1
## 12 0 1 0 1 1
## 14 0 1 1 0 1
## 16 0 1 1 1 1
## 17 1 0 0 0 0
## 18 1 0 0 0 1
## 21 1 0 1 0 0
## 25 1 1 0 0 0
## 26 1 1 0 0 1
## 28 1 1 0 1 1
## 29 1 1 1 0 0
## 30 1 1 1 0 1
Only 12 out of the 23 remainders have been added to obtain the parsimonious solution.
Let’s also obtain the parsimonious solution for the negated outcome:
par_not_surv <- minimize(tt_not_surv, details = TRUE, show.cases = TRUE, include = "?")
par_not_surv
##
## M1: ~DEV + ~STB -> ~SURV
##
## inclS PRI covS covU cases
## ---------------------------------------------------------------
## 1 ~DEV 0.837 0.798 0.783 0.231 GR,PT,ES; IT,RO; HU,PL; EE
## 2 ~STB 0.902 0.871 0.657 0.105 GR,PT,ES; HU,PL; AU; DE
## ---------------------------------------------------------------
## M1 0.849 0.816 0.888
par_not_surv$SA$M1
## DEV URB LIT IND STB
## 3 0 0 0 1 0
## 4 0 0 0 1 1
## 7 0 0 1 1 0
## 8 0 0 1 1 1
## 9 0 1 0 0 0
## 10 0 1 0 0 1
## 11 0 1 0 1 0
## 12 0 1 0 1 1
## 13 0 1 1 0 0
## 14 0 1 1 0 1
## 15 0 1 1 1 0
## 16 0 1 1 1 1
## 17 1 0 0 0 0
## 19 1 0 0 1 0
## 21 1 0 1 0 0
## 25 1 1 0 0 0
## 27 1 1 0 1 0
## 29 1 1 1 0 0
The parsimonious and conservative solutions are the two extremes of a continuum: the conservative solution makes no assumptions but can be hard to interpret, while the parsimonious solution is easier to interpret, but to obtain it we are making many assumptions that can be untenable or contrary to previous theoretical knowledge.
The enhanced parsimonious solution
The Enhanced Standard Analysis (ESA) considers the inclusion of some rows of the truth table as untenable assumptions. For instance, it is contradictory to use the same remainders to simplify the necessary conditions of the outcome and of the negated outcome. In our example, this happens in the following rows:
intersect(row.names(par_not_surv$SA$M1), row.names(par_surv$SA$M1))
## [1] "10" "12" "14" "16" "17" "21" "25" "29"
We can find these same rows with the findRows
function:
findRows(obj = tt_surv, type= 2)
## [1] 10 12 14 16 17 21 25 29
These rows are contradictory simplifying assumptions. To exclude them from the minimization process, we can proceed as follows:
contradictory_rows <- findRows(obj = tt_surv, type= 2)
eps_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE, include = "?", exclude = contradictory_rows)
eps_surv
##
## M1: DEV*URB*STB + DEV*~IND*STB -> SURV
##
## inclS PRI covS covU cases
## --------------------------------------------------------
## 1 DEV*URB*STB 0.901 0.879 0.468 0.378 BE,CZ,NL,UK
## 2 DEV*~IND*STB 0.814 0.721 0.282 0.191 FI,IE
## --------------------------------------------------------
## M1 0.866 0.839 0.660
The solution eps_surv
is the enhanced parsimonious solution: it is a subset of the parsimonious solution, and a superset of the complex solution. to build this solution, we have excluded 8 of the 12 remainders that we used to build the parsimonious solution:
eps_surv$SA$M1
## DEV URB LIT IND STB
## 18 1 0 0 0 1
## 26 1 1 0 0 1
## 28 1 1 0 1 1
## 30 1 1 1 0 1
There are other situations in which we need to remove rows of the truth table representing untenable assumptions:
Combinations of variables that are logical impossibilities. A hypothetical study described in Schneider and Wagemann (2013) contains three variables: whether a person is biologically female A
, whether is pregnant B
and sober C
. It is easy to see that the condition ~AB
is a logical impossibility, in our current state of affairs. Logical impossibilities must be detected by the researcher, and added to the exclude
parameter of minimize
.
When used withtype = 3
, the findRows
function excludes those rows with observations above n.cut
that are above the consistency threshold in the outcome and the negated outcome. Note that these rows are not remainders, but rows with observations that are included in both solutions. This situation is known as simultaneous subset relations in the QCA
package.
The intermediate solution
Once we have removed the untenable assumptions, we are ready to obtain an intermediate solution incorporating theory-driven counterfactuals. The intermediate solution must be a superset of the conservative solution, and a subset of the enhanced parsimonious solution.
Establishing a counterfactual means defining which remainders to include in the simplification process based on extant theoretical knowledge. Counterfactuals can be expressed in terms of directional expectations, hypothesized relationships of sufficiency between combinations of explanatory variables and the outcome.
Let’s consider directional expectations stating that the presence of explanatory variables are sufficient conditions to the outcome:
DEV => SURV URB => SURV LIT => SURV IND => SURV STB => SURV
Once established the directional expectations, we need to control for incoherent counterfactuals: if a condition is necessary for an outcome, its negation cannot be a sufficient condition for the same outcome. Let’s check the necessity of the negated expressions of the directional expectations with the pof
function:
pof(setms = "~DEV+~URB+~LIT+~IND+~STB", outcome = "SURV", relation = "necessity", data = LF)
##
## inclN RoN covN
## ----------------------------------
## 1 ~DEV 0.285 0.587 0.274
## 2 ~URB 0.568 0.452 0.402
## 3 ~LIT 0.096 0.764 0.168
## 4 ~IND 0.417 0.576 0.367
## 5 ~STB 0.218 0.687 0.269
## 6 expression 0.623 0.337 0.387
## ----------------------------------
None of the values of consistency in the inclN
column is high enough to consider that the negated counterfactuals are necessary conditions of the outcome, so we have no incoherent counterfactuals.
To obtain the intermediate solution, we add the counterfactuals defined above with the dir.exp
parameter.
int_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE, include = "?", exclude = contradictory_rows, dir.exp = c(DEV, URB, LIT, IND, STB))
int_surv
##
## From C1P1:
##
## M1: DEV*URB*LIT*STB + DEV*LIT*~IND*STB -> SURV
##
## inclS PRI covS covU cases
## ------------------------------------------------------------
## 1 DEV*URB*LIT*STB 0.901 0.879 0.468 0.378 BE,CZ,NL,UK
## 2 DEV*LIT*~IND*STB 0.814 0.721 0.282 0.191 FI,IE
## ------------------------------------------------------------
## M1 0.866 0.839 0.660
To see how we have obtained the intermediate solution, let’s recap the conservative and the enhanced parsimonious solutions:
com_surv$solution[[1]]
## [1] "DEV*URB*LIT*IND*STB" "DEV*~URB*LIT*~IND*STB"
eps_surv$solution[[1]]
## [1] "DEV*URB*STB" "DEV*~IND*STB"
Then, we examine what terms can we eliminate from the conservative solution to obtain a subset of the parsimonious solution compatible with directional expectations:
- The term
~IND
appears in the enhanced parsimonious solution, whileIND
does not. So we need to remove the counterfactual related withIND
. - The first term of the enhanced parsimonious solution
DEV*URB*STB
is a superset of the term of the complex solutionDEV*URB*LIT*IND*STB
. From the later term we can only removeIND
, so we getDEV*URB*LIT*IND*STB
. - The second term of the enhanced parsimonious solution
DEV*~IND*STB
is a superset of the termDEV*~URB*LIT*~IND*STB
. From the later we can only remove~URB
, so we obtainDEV*LIT*~IND*STB
.
Obtaining a simplified solution in QCA analysis.
In this post, I have presented a workflow to obtain an intermediate solution in QCA analysis. This solution is a superset of the complex solution, and a subset of the enhanced parsimonious solution:
- The complex solution is obtained under the assumption that no remainders are related to the outcome.
- The enhanced parsimonious solution is obtained under the assumption that any remainder whose inclusion is not an untenable assumption has been considered to be related to the outcome
We obtain the intermediate solution incorporating theory-driven directional expectations. The resulting solution should be compatible with extant theoretical knowledge, and easier to interpret than the complex solution.
To deduce how to obtain the intermediate solution from directional expectations, I have adapted the reasoning of Raggin & Sonnet (2005) to the LF
dataset. The discussion about untenable assumptions and the enhanced parsimonious solution comes from Schneider & Wagemann (2013). I refer the reader to these reference to learn about other concepts related to this workflow, like easy and difficult counterfactuals, and more examples of application.
Bibliography and resources
- Dușa, Adrian (2021). QCA with R: A Comprehensive Resource. https://bookdown.org/dusadrian/QCAbook/
- Lipset, S. M. (1959). Some Social Requisites of Democracy: Economic Development and Political Legitimacy. American Political Science Review, 53:69-105.
- Ragin, C. C., & Sonnett, J. (2005). Between complexity and parsimony: Limited diversity, counterfactual cases, and comparative analysis. In Sabine Kropp, S. & Minkenberg, M. (eds.) Vergleichen in der Politikwissenschaft (pp. 180-197). VS Verlag für Sozialwissenschaften. https://escholarship.org/uc/item/1zf567tt
- Schneider, C. Q., & Wagemann, C. (2013). Doing justice to logical remainders in QCA: Moving beyond the standard analysis. Political Research Quarterly, 211-220.
Built with R 4.1.1 and QCA 3.12