Intermediate solutions in QCA analysis using R

Jose M Sallan 2021-10-26 16 min read

Qualitative comparative analysis (QCA) is a data analysis technique that examines the relationships between an outcome and a set of explanatory variables using Boolean algebra, rather than analysis of correlation or covariance structures. Unlike other techniques like linear regression, QCA is focused mainly on examining complex combinations of explanatory variables as antecedents of the outcome.

The minimization of the rows of the truth table associated to the outcome allows us to obtain a solution covering all observed sufficient conditions for the outcome. It is frequent to obtain a similar set of solutions for the negated outcome. As QCA is asymmetric, the later is not the negation of the former. As we will see in this post, both solutions must be considered simultaneously.

Social science phenomena have limited diversity: many of the rows of the truth table do not have any or not enough observations, even for large datasets. These rows with not enough observations are called remainders. The solutions considering only rows related to the outcome are complex or conservative solutions, as they assume that none of the remainders is associated to the outcome. Through the Standard Analysis (SA) (Raggin & Sonnet, 2005), the Enhanced Standard Analysis (ESA) and the Theory-Guided Standard Analysis (TESA) (Schneider & Wagemann, 2013) we can obtain more parsimonious (simpler) sufficient solutions, more easy to interpret theoretically. To do so, we use counterfactual analysis, that allows to add some of the remainders to the solution.

In this post, I will discuss the details of the implementation of SA, ESA and TESA using the the R QCA package for fuzzy-set QCA (fsQCA) analysis.

library(QCA)

Along this post, I will be using the fuzzy-set version of Lipset’s (1959) indicators for the survival of democracy during the inter-war period, included in the QCA package.

head(LF)

##     DEV  URB  LIT  IND  STB SURV
## AU 0.81 0.12 0.99 0.73 0.43 0.05
## BE 0.99 0.89 0.98 1.00 0.98 0.95
## CZ 0.58 0.98 0.98 0.90 0.91 0.89
## EE 0.16 0.07 0.98 0.01 0.91 0.12
## FI 0.58 0.03 0.99 0.08 0.58 0.77
## FR 0.98 0.03 0.99 0.81 0.95 0.95

The first step to obtain a sufficient solution in fsQCA is to obtain the truth table. We assign each case to the row of the truth table closer to its membership scores, and we assign to the outcome the rows with a consistency score above the minimum consistency threshold incl.cut. We can also set a minimum number of cases n.cut to consider the row. Here I obtain the truth table of LF for incl.cut = 0.8 and the default n.cut = 1:

tt_surv <- truthTable(LF, outcome = "SURV", conditions = "DEV, URB, LIT, IND, STB", complete = TRUE, show.cases = TRUE, incl.cut = 0.8)
tt_surv

## 
##   OUT: output value
##     n: number of cases in configuration
##  incl: sufficiency inclusion score
##   PRI: proportional reduction in inconsistency
## 
##      DEV URB LIT IND STB   OUT    n  incl  PRI   cases      
##  1    0   0   0   0   0     0     3  0.216 0.000 GR,PT,ES   
##  2    0   0   0   0   1     0     2  0.278 0.000 IT,RO      
##  3    0   0   0   1   0     ?     0    -     -              
##  4    0   0   0   1   1     ?     0    -     -              
##  5    0   0   1   0   0     0     2  0.521 0.113 HU,PL      
##  6    0   0   1   0   1     0     1  0.529 0.228 EE         
##  7    0   0   1   1   0     ?     0    -     -              
##  8    0   0   1   1   1     ?     0    -     -              
##  9    0   1   0   0   0     ?     0    -     -              
## 10    0   1   0   0   1     ?     0    -     -              
## 11    0   1   0   1   0     ?     0    -     -              
## 12    0   1   0   1   1     ?     0    -     -              
## 13    0   1   1   0   0     ?     0    -     -              
## 14    0   1   1   0   1     ?     0    -     -              
## 15    0   1   1   1   0     ?     0    -     -              
## 16    0   1   1   1   1     ?     0    -     -              
## 17    1   0   0   0   0     ?     0    -     -              
## 18    1   0   0   0   1     ?     0    -     -              
## 19    1   0   0   1   0     ?     0    -     -              
## 20    1   0   0   1   1     ?     0    -     -              
## 21    1   0   1   0   0     ?     0    -     -              
## 22    1   0   1   0   1     1     2  0.804 0.719 FI,IE      
## 23    1   0   1   1   0     0     1  0.378 0.040 AU         
## 24    1   0   1   1   1     0     2  0.709 0.634 FR,SE      
## 25    1   1   0   0   0     ?     0    -     -              
## 26    1   1   0   0   1     ?     0    -     -              
## 27    1   1   0   1   0     ?     0    -     -              
## 28    1   1   0   1   1     ?     0    -     -              
## 29    1   1   1   0   0     ?     0    -     -              
## 30    1   1   1   0   1     ?     0    -     -              
## 31    1   1   1   1   0     0     1  0.445 0.050 DE         
## 32    1   1   1   1   1     1     4  0.904 0.886 BE,CZ,NL,UK

Looking at the truth table, we observe that:

rows 22 and 32 have value OUT = 1, so they are associated to the outcome.
rows 1, 2, 5, 6, 23, 24 and 31 have value OUT = 0, so they have more than n.cut cases but are not associated to the outcome.
the other 23 rows have less than n.cut cases so they have OUT = ?. These are the remainders of the truth table.

Let’s examine the truth table of the negated outcome:

tt_not_surv <- truthTable(LF, outcome = "~SURV", conditions = "DEV, URB, LIT, IND, STB", complete = TRUE, show.cases = TRUE, incl.cut = 0.8)
tt_not_surv

## 
##   OUT: output value
##     n: number of cases in configuration
##  incl: sufficiency inclusion score
##   PRI: proportional reduction in inconsistency
## 
##      DEV URB LIT IND STB   OUT    n  incl  PRI   cases      
##  1    0   0   0   0   0     1     3  1.000 1.000 GR,PT,ES   
##  2    0   0   0   0   1     1     2  0.982 0.975 IT,RO      
##  3    0   0   0   1   0     ?     0    -     -              
##  4    0   0   0   1   1     ?     0    -     -              
##  5    0   0   1   0   0     1     2  0.855 0.732 HU,PL      
##  6    0   0   1   0   1     1     1  0.861 0.772 EE         
##  7    0   0   1   1   0     ?     0    -     -              
##  8    0   0   1   1   1     ?     0    -     -              
##  9    0   1   0   0   0     ?     0    -     -              
## 10    0   1   0   0   1     ?     0    -     -              
## 11    0   1   0   1   0     ?     0    -     -              
## 12    0   1   0   1   1     ?     0    -     -              
## 13    0   1   1   0   0     ?     0    -     -              
## 14    0   1   1   0   1     ?     0    -     -              
## 15    0   1   1   1   0     ?     0    -     -              
## 16    0   1   1   1   1     ?     0    -     -              
## 17    1   0   0   0   0     ?     0    -     -              
## 18    1   0   0   0   1     ?     0    -     -              
## 19    1   0   0   1   0     ?     0    -     -              
## 20    1   0   0   1   1     ?     0    -     -              
## 21    1   0   1   0   0     ?     0    -     -              
## 22    1   0   1   0   1     0     2  0.498 0.281 FI,IE      
## 23    1   0   1   1   0     1     1  0.974 0.960 AU         
## 24    1   0   1   1   1     0     2  0.495 0.366 FR,SE      
## 25    1   1   0   0   0     ?     0    -     -              
## 26    1   1   0   0   1     ?     0    -     -              
## 27    1   1   0   1   0     ?     0    -     -              
## 28    1   1   0   1   1     ?     0    -     -              
## 29    1   1   1   0   0     ?     0    -     -              
## 30    1   1   1   0   1     ?     0    -     -              
## 31    1   1   1   1   0     1     1  0.971 0.950 DE         
## 32    1   1   1   1   1     0     4  0.250 0.106 BE,CZ,NL,UK

For this truth table we observe that:

rows 1, 2, 5, 6, 23 and 31 are associated with the negated outcome and have OUT = 1.
rows 22, 24 and 32 have OUT = 0, so they are not associated with the negated outcome.
the remainders are the same 23 rows with OUT = ? of the SURV truth table.

Row 24 has OUT = 0 in both tables, so it is not associated neither to the outcome nor its negation.

The complex or conservative solution

We obtain the complex or conservative solution using the default values of the minimize function:

com_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE)
com_surv

## 
## M1: DEV*URB*LIT*IND*STB + DEV*~URB*LIT*~IND*STB -> SURV
## 
##                           inclS   PRI   covS   covU   cases 
## ----------------------------------------------------------------- 
## 1    DEV*URB*LIT*IND*STB  0.904  0.886  0.454  0.393  BE,CZ,NL,UK 
## 2  DEV*~URB*LIT*~IND*STB  0.804  0.719  0.265  0.204  FI,IE 
## ----------------------------------------------------------------- 
##                       M1  0.870  0.843  0.658

In this case, the minimization does not have any effect, and the solution are the expressions of rows 32 and 22 united by an OR operator.

This solution is called conservative because it is the most restrictive: we consider that none of the remainders is associated to the outcome. It is the most complex because it is the one using less rows, and so it is the hardest to minimize.

Let’s also obtain the conservative solution for the negated outcome:

com_not_surv <- minimize(tt_not_surv, details = TRUE, show.cases = TRUE)
com_not_surv

## 
## M1: ~DEV*~URB*~IND + DEV*LIT*IND*~STB -> ~SURV
## 
##                      inclS   PRI   covS   covU   cases 
## --------------------------------------------------------------------------- 
## 1    ~DEV*~URB*~IND  0.886  0.854  0.678  0.582  GR,PT,ES; IT,RO; HU,PL; EE 
## 2  DEV*LIT*IND*~STB  0.981  0.973  0.220  0.124  AU; DE 
## --------------------------------------------------------------------------- 
##                  M1  0.897  0.871  0.803

The parsimonious solution

While in the conservative solution we treat the remainders as not belonging to the solution, in the parsimonious solution we treat remainders as don’t care. We can include any of the remainders, as long as it contributes to obtain a simpler solution, through setting include = "?" in the minimize function.

par_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE, include = "?")
par_surv

## 
## M1: DEV*~IND + URB*STB -> SURV
## 
##              inclS   PRI   covS   covU   cases 
## ---------------------------------------------------- 
## 1  DEV*~IND  0.815  0.721  0.284  0.194  FI,IE 
## 2   URB*STB  0.874  0.845  0.520  0.430  BE,CZ,NL,UK 
## ---------------------------------------------------- 
##          M1  0.850  0.819  0.714

The parsimonious solution DEV*~IND + URB*STB is simpler (more parsimonious) than the complex DEV*URB*LIT*IND*STB + DEV*~URB*LIT*~IND*STB. As the former has been formed adding rows to the later, the parsimonious solution contains the complex solution. In other words:

The conservative solution is a subset of the parsimonious solution.
The parsimonious solution is a superset of the complex solution.

We can see which remainders we have used to obtain the parsimonious solution doing:

par_surv$SA$M1

##    DEV URB LIT IND STB
## 10   0   1   0   0   1
## 12   0   1   0   1   1
## 14   0   1   1   0   1
## 16   0   1   1   1   1
## 17   1   0   0   0   0
## 18   1   0   0   0   1
## 21   1   0   1   0   0
## 25   1   1   0   0   0
## 26   1   1   0   0   1
## 28   1   1   0   1   1
## 29   1   1   1   0   0
## 30   1   1   1   0   1

Only 12 out of the 23 remainders have been added to obtain the parsimonious solution.

Let’s also obtain the parsimonious solution for the negated outcome:

par_not_surv <- minimize(tt_not_surv, details = TRUE, show.cases = TRUE, include = "?")
par_not_surv

## 
## M1: ~DEV + ~STB -> ~SURV
## 
##          inclS   PRI   covS   covU   cases 
## --------------------------------------------------------------- 
## 1  ~DEV  0.837  0.798  0.783  0.231  GR,PT,ES; IT,RO; HU,PL; EE 
## 2  ~STB  0.902  0.871  0.657  0.105  GR,PT,ES; HU,PL; AU; DE 
## --------------------------------------------------------------- 
##      M1  0.849  0.816  0.888

par_not_surv$SA$M1

##    DEV URB LIT IND STB
## 3    0   0   0   1   0
## 4    0   0   0   1   1
## 7    0   0   1   1   0
## 8    0   0   1   1   1
## 9    0   1   0   0   0
## 10   0   1   0   0   1
## 11   0   1   0   1   0
## 12   0   1   0   1   1
## 13   0   1   1   0   0
## 14   0   1   1   0   1
## 15   0   1   1   1   0
## 16   0   1   1   1   1
## 17   1   0   0   0   0
## 19   1   0   0   1   0
## 21   1   0   1   0   0
## 25   1   1   0   0   0
## 27   1   1   0   1   0
## 29   1   1   1   0   0

The parsimonious and conservative solutions are the two extremes of a continuum: the conservative solution makes no assumptions but can be hard to interpret, while the parsimonious solution is easier to interpret, but to obtain it we are making many assumptions that can be untenable or contrary to previous theoretical knowledge.

The enhanced parsimonious solution

The Enhanced Standard Analysis (ESA) considers the inclusion of some rows of the truth table as untenable assumptions. For instance, it is contradictory to use the same remainders to simplify the necessary conditions of the outcome and of the negated outcome. In our example, this happens in the following rows:

intersect(row.names(par_not_surv$SA$M1), row.names(par_surv$SA$M1))

## [1] "10" "12" "14" "16" "17" "21" "25" "29"

We can find these same rows with the findRows function:

findRows(obj = tt_surv, type= 2)

## [1] 10 12 14 16 17 21 25 29

These rows are contradictory simplifying assumptions. To exclude them from the minimization process, we can proceed as follows:

contradictory_rows <- findRows(obj = tt_surv, type= 2)
eps_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE, include = "?", exclude = contradictory_rows)
eps_surv

## 
## M1: DEV*URB*STB + DEV*~IND*STB -> SURV
## 
##                  inclS   PRI   covS   covU   cases 
## -------------------------------------------------------- 
## 1   DEV*URB*STB  0.901  0.879  0.468  0.378  BE,CZ,NL,UK 
## 2  DEV*~IND*STB  0.814  0.721  0.282  0.191  FI,IE 
## -------------------------------------------------------- 
##              M1  0.866  0.839  0.660

The solution eps_surv is the enhanced parsimonious solution: it is a subset of the parsimonious solution, and a superset of the complex solution. to build this solution, we have excluded 8 of the 12 remainders that we used to build the parsimonious solution:

eps_surv$SA$M1

##    DEV URB LIT IND STB
## 18   1   0   0   0   1
## 26   1   1   0   0   1
## 28   1   1   0   1   1
## 30   1   1   1   0   1

There are other situations in which we need to remove rows of the truth table representing untenable assumptions:

Combinations of variables that are logical impossibilities. A hypothetical study described in Schneider and Wagemann (2013) contains three variables: whether a person is biologically female A, whether is pregnant B and sober C. It is easy to see that the condition ~AB is a logical impossibility, in our current state of affairs. Logical impossibilities must be detected by the researcher, and added to the exclude parameter of minimize.

When used withtype = 3, the findRows function excludes those rows with observations above n.cut that are above the consistency threshold in the outcome and the negated outcome. Note that these rows are not remainders, but rows with observations that are included in both solutions. This situation is known as simultaneous subset relations in the QCA package.

The intermediate solution

Once we have removed the untenable assumptions, we are ready to obtain an intermediate solution incorporating theory-driven counterfactuals. The intermediate solution must be a superset of the conservative solution, and a subset of the enhanced parsimonious solution.

Establishing a counterfactual means defining which remainders to include in the simplification process based on extant theoretical knowledge. Counterfactuals can be expressed in terms of directional expectations, hypothesized relationships of sufficiency between combinations of explanatory variables and the outcome.

Let’s consider directional expectations stating that the presence of explanatory variables are sufficient conditions to the outcome:

DEV => SURV URB => SURV LIT => SURV IND => SURV STB => SURV

Once established the directional expectations, we need to control for incoherent counterfactuals: if a condition is necessary for an outcome, its negation cannot be a sufficient condition for the same outcome. Let’s check the necessity of the negated expressions of the directional expectations with the pof function:

pof(setms = "~DEV+~URB+~LIT+~IND+~STB", outcome = "SURV", relation = "necessity", data = LF)

## 
##                inclN   RoN   covN  
## ---------------------------------- 
## 1        ~DEV  0.285  0.587  0.274 
## 2        ~URB  0.568  0.452  0.402 
## 3        ~LIT  0.096  0.764  0.168 
## 4        ~IND  0.417  0.576  0.367 
## 5        ~STB  0.218  0.687  0.269 
## 6  expression  0.623  0.337  0.387 
## ----------------------------------

None of the values of consistency in the inclN column is high enough to consider that the negated counterfactuals are necessary conditions of the outcome, so we have no incoherent counterfactuals.

To obtain the intermediate solution, we add the counterfactuals defined above with the dir.exp parameter.

int_surv <- minimize(tt_surv, details = TRUE, show.cases = TRUE, include = "?", exclude = contradictory_rows, dir.exp = c(DEV, URB, LIT, IND, STB))
int_surv

## 
## From C1P1: 
## 
## M1:    DEV*URB*LIT*STB + DEV*LIT*~IND*STB -> SURV 
## 
##                      inclS   PRI   covS   covU   cases 
## ------------------------------------------------------------ 
## 1   DEV*URB*LIT*STB  0.901  0.879  0.468  0.378  BE,CZ,NL,UK 
## 2  DEV*LIT*~IND*STB  0.814  0.721  0.282  0.191  FI,IE 
## ------------------------------------------------------------ 
##                  M1  0.866  0.839  0.660

To see how we have obtained the intermediate solution, let’s recap the conservative and the enhanced parsimonious solutions:

com_surv$solution[[1]]

## [1] "DEV*URB*LIT*IND*STB"   "DEV*~URB*LIT*~IND*STB"

eps_surv$solution[[1]]

## [1] "DEV*URB*STB"  "DEV*~IND*STB"

Then, we examine what terms can we eliminate from the conservative solution to obtain a subset of the parsimonious solution compatible with directional expectations:

The term ~IND appears in the enhanced parsimonious solution, while IND does not. So we need to remove the counterfactual related with IND.
The first term of the enhanced parsimonious solution DEV*URB*STB is a superset of the term of the complex solution DEV*URB*LIT*IND*STB. From the later term we can only remove IND, so we get DEV*URB*LIT*IND*STB.
The second term of the enhanced parsimonious solution DEV*~IND*STB is a superset of the term DEV*~URB*LIT*~IND*STB. From the later we can only remove ~URB, so we obtain DEV*LIT*~IND*STB.

Obtaining a simplified solution in QCA analysis.

In this post, I have presented a workflow to obtain an intermediate solution in QCA analysis. This solution is a superset of the complex solution, and a subset of the enhanced parsimonious solution:

The complex solution is obtained under the assumption that no remainders are related to the outcome.
The enhanced parsimonious solution is obtained under the assumption that any remainder whose inclusion is not an untenable assumption has been considered to be related to the outcome

We obtain the intermediate solution incorporating theory-driven directional expectations. The resulting solution should be compatible with extant theoretical knowledge, and easier to interpret than the complex solution.

To deduce how to obtain the intermediate solution from directional expectations, I have adapted the reasoning of Raggin & Sonnet (2005) to the LF dataset. The discussion about untenable assumptions and the enhanced parsimonious solution comes from Schneider & Wagemann (2013). I refer the reader to these reference to learn about other concepts related to this workflow, like easy and difficult counterfactuals, and more examples of application.

Bibliography and resources

Dușa, Adrian (2021). QCA with R: A Comprehensive Resource. https://bookdown.org/dusadrian/QCAbook/
Lipset, S. M. (1959). Some Social Requisites of Democracy: Economic Development and Political Legitimacy. American Political Science Review, 53:69-105.
Ragin, C. C., & Sonnett, J. (2005). Between complexity and parsimony: Limited diversity, counterfactual cases, and comparative analysis. In Sabine Kropp, S. & Minkenberg, M. (eds.) Vergleichen in der Politikwissenschaft (pp. 180-197). VS Verlag für Sozialwissenschaften. https://escholarship.org/uc/item/1zf567tt
Schneider, C. Q., & Wagemann, C. (2013). Doing justice to logical remainders in QCA: Moving beyond the standard analysis. Political Research Quarterly, 211-220.

Built with R 4.1.1 and QCA 3.12

The Jose M Sallan static website