21  Chi-square

This chapter will cover the chi-square test, a statistical method used to examine the relationship between categorical variables. Unlike regression, which focuses on continuous dependent variables, the chi-square test assesses whether there is an association between two categorical variables. But why do researchers need to examine associations between categorical variables?

Understanding relationships between categorical variables is essential in many fields of research. Real-world behaviors, traits, and classifications are often categorical—such as gender, education level, voting preferences, or disease status. The chi-square test allows researchers to determine whether observed frequencies in different categories differ significantly from what would be expected by chance. By doing so, we can identify patterns and relationships that might not be immediately apparent.

In short, it’s another tool to add to your statistical toolbox.

Tip

Note that the chi-square test can be applying to more than two categorical variables. However, in this chapter we will primarily deal with two variables.

21.1 Some Additional Details

The chi-square test is particularly useful when researchers want to examine whether two categorical variables are independent or related. For example, a researcher might investigate whether gender is associated with voting preference or whether treatment group membership affects recovery rates.

The general form of the chi-square test statistic is:

χ2=(OijEij)2Eij

where:

  • Oij represents the observed frequency for cell ij (actual counts in each category),
  • Eij represents the expected frequency for cell ij (counts that would occur under the assumption of independence),
  • χ2 is the chi-square test statistic, which follows a chi-square distribution.

21.1.1 Key Assumptions

Like all of our analyses thus far, a chi-square test is valid under the certain assumptions. Some of which we have already explored:

1. Independence of Observations
Each observation should belong to only one category, and observations should not be related to one another.

2. Expected Frequency Rule
Expected counts in each category should generally be 5 or more for the chi-square approximation to be valid. When expected counts are low, alternative methods (e.g., Fisher’s Exact Test) may be needed.

3. Large Sample Size
The chi-square test performs best with a sufficiently large sample, as small sample sizes may produce unreliable results.

4. Categorical Data
Both variables should be measured at the categorical level (e.g., nominal or ordinal scales) rather than continuous.

21.2 Contingency Tables and Expected Frequencies

Before conducting a chi-square test, it is important to organize the data into a contingency table. A contingency table, also known as a cross-tabulation or crosstab, displays the frequencies of observations of the two categorical variables. This table allows researchers to compare observed frequencies with expected frequencies under the assumption of independence.

A simple contingency table for two categorical variables (e.g., Gender and Voting Preference) might look like this:

Candidate A (j=1) Candidate B (j=2)
Male (i=1) 40 60 Row total: 100
Female (i=2) 50 50 Row total: 100
Column total: 90 Column total: 110 Total sample size: 200

While a contingency table may only display the actual frequencies in each cell (block), it is helpful to also write the row, column, and grand total, like the above table. It is also helpful to think of each row (i) as and column (j) as having a number. Combining values of row and columns, we can determine a cell of interest. For example, ni=1,j=1, refers to row 1, column 1; this is the cell of the table representing males who voted for candidate A, where n=40.

Note

What is the cell frequency for the cell in the second row, first column: n2,1?

Continuing, to determine whether the variables are independent, we need to calculate the expected frequency for each cell using the formula:

Ei,j=(Row Totali)×(Column Totalj)Grand Total

For example, the expected frequency for Male/Candidate A would be:

E1,1=(100×90)200=45

We need to do this for each cell in our contingency table. Doing so, we would get the following. This first table represents the observed frequencies:

Observed Frequencies

Candidate A (j=1) Candidate B (j=2)
Male (i=1) 40 60
Female (i=2) 50 50

This second table represents the expecte frequencies:

Expected Frequencies

Candidate A (j=1) Candidate B (j=2)
Male (i=1) 45 55
Female (i=2) 45 55
Combining Both

You may find it easy to view discrepancies in observed versus expected frequencies–and to do any potential calculations– by combining both tables into one. Here, expected frequencies are in parentheses following the observed frequencies:

Observed (Expected)

Candidate A (j=1) Candidate B (j=2)
Male (i=1) 40 (45) 60 (55)
Female (i=2) 50 (45) 50 (55)

Comparing these expected frequencies with the observed counts allows us to determine whether any differences are statistically significant.

The next step is to compute the chi-square test statistic and assess its significance using the chi-square distribution.

21.3 Calculating the Chi-Square Test Statistic

After obtaining the observed and expected frequencies, we compute the chi-square test statistic using the formula:

χ2=(OijEij)2Eij

For our example, the chi-square test statistic is calculated as follows:

χ2=(4045)245+(6055)255+(5045)245+(5055)255

Computing each term:

χ2=(5)245+(5)255+(5)245+(5)255

χ2=2545+2555+2545+2555

χ20.56+0.45+0.56+0.45=2.02

21.4 Determining Statistical Significance

Once we calculate the chi-square test statistic, we compare it to the critical value from the chi-square distribution table, or we compute a p-value.

The degrees of freedom (df) for a chi-square test are calculated as:

df=(Number of Rows1)×(Number of Columns1)

For our example:

df=(21)×(21)=1

Using a chi-square table or statistical software, we determine the critical value for our chosen significance level (e.g., α=.05). If our calculated chi-square statistic exceeds the critical value, we reject the null hypothesis, suggesting that the association between the variables in unlikely given a true null hypothesis.

You can find critical chi-square tables online. Additionally, there are websites that can caclulate an exact p-value for a given χ2 and df–such as here. However, most statistical software packages will provide exact p-values, residuals, and effect sizes.

21.5 Effect Size

It’s important to assess the strength of the association between the variables. One common measure of effect size for chi-square tests is Cramer’s V. Cramer’s V provides a standardized measure of association and is calculated as:

V=χ2n×(min(r1,c1))

Where:

  • χ2 is the chi-square statistic,
  • n is the total sample size,
  • r is the number of rows in the contingency table,
  • c is the number of columns in the contingency table.

For example, for our 2x2 table, the effect size can be computed as follows:

V=2.02200×(1)=2.022000.101

Interpretation of Cramér’s V:

  • Small effect: 0.1V<0.3
  • Medium effect: 0.3V<0.5
  • Large effect: 0.5V

In this case, the effect size of 0.101 suggests a small association between the variables.

21.6 Post-hoc Analyses: Residuals

Residuals in a chi-square test help us understand the magnitude of discrepancies between observed and expected frequencies. They are calculated as:

Residual=OijEijEij

The residuals give us an indication of how much each observed frequency deviates from its expected frequency in terms of standard deviations. For each cell, a large residual indicates a large difference between observed and expected frequencies, which could be important for identifying patterns in the data.

For our example:

For Male/Candidate A:

404545=56.7080.745

For Male/Candidate B:

605555=57.4160.674

For Female/Candidate A: 504545=56.7080.745

For Female/Candidate B: 505555=57.4160.674

These residuals can help us determine which specific categories contribute to the overall chi-square statistic.

Chi-square in R

I have found that the best function in R for Chi-square is CrossTable() from the gmodels package. It is comprehensive.

TO calculate Cramer’s V, you can use the cramersv() function from the confintr package.

Our formal test would result in:


   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|            Std Residual |
|-------------------------|

Total Observations in Table:  200 

             |  
             |     [,1]  |     [,2]  | Row Total | 
-------------|-----------|-----------|-----------|
        [1,] |       40  |       60  |      100  | 
             |   45.000  |   55.000  |           | 
             |    0.556  |    0.455  |           | 
             |   -0.745  |    0.674  |           | 
-------------|-----------|-----------|-----------|
        [2,] |       50  |       50  |      100  | 
             |   45.000  |   55.000  |           | 
             |    0.556  |    0.455  |           | 
             |    0.745  |   -0.674  |           | 
-------------|-----------|-----------|-----------|
Column Total |       90  |      110  |      200  | 
-------------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  2.020202     d.f. =  1     p =  0.1552185 

Pearson's Chi-squared test with Yates' continuity correction 
------------------------------------------------------------
Chi^2 =  1.636364     d.f. =  1     p =  0.2008251 

 
       Minimum expected frequency: 45 

and for Cramer’s V:


    Two-sided 95% chi-squared confidence interval for the population
    Cramer's V

Sample estimate: 0.1005038 
Confidence interval:
     2.5%     97.5% 
0.0000000 0.2493303 

Let’s now explore a full example relevant to the study of psychology.

21.7 Predominantly effective?: Another Example

You want to investigate whether teenagers with different ADHD subtypes will prefer various forms of treatment. You have reason to believe, based on a review of the literature, that individuals may prefer psychosocial treatments as opposed to medication treatments; however, results are mixed (e.g., Schatz et al., 2015). You decide to formally investigate the topic.

21.8 Step 1. Generate Hypotheses

The main null and alternative hypotheses for this chi-square test can be stated as follows:

  • Null Hypothesis (H0:Oij=Eij):
    • Therapy preference is independent of ADHD subtype.
    • In other words, there is no relationship between ADHD subtype and therapy preference.
  • Alternative Hypothesis (HA:OijEij):
    • Therapy preference is dependent on ADHD subtype.
    • That is, different ADHD subtypes are associated with different therapy preferences.

Any post-hoc analyses will used standardized residuals 2 to determine particularly influential cells.

21.9 Step 2. Designing the Study

You and your team plan a research study. The method follows:

Participants:

A power analysis using an effect size of ϕ=.2828 (derived from the literature) was used to determine the needed sample to achieve a power of 1β=.8. The results of the power analysis suggested a required sample size of n=300.

Power analysis can be completed in R. The pwr.chisq.test() function from the pwr package is a sound method. It does, however, require Cohen’s W and not Cramer’s ϕ. This is an easy calculated. Per Cohen (1988), W is:

W=i=1m(PAiP0i)2P0i

Where: P0i is the proportion in cell i as indicated by the null hypothesis H0; PAi is the proportion in cell i posited by the alternate hypothesis HA; m = the number of cells.

A major difference in this and the typical analyses we have been doing is that these are proportions, not frequencies.

This may seem taxing, particularly because you don’t have proportions. Well, we can approximate W using:

Wϕk1 Where k is the smallest number of rows or columns. So for our power analysis:

Wϕk1=.28282=0.20

We can then use R to compute our power analysis:


     Chi squared power calculation 

              w = 0.2
              N = 298.3821
             df = 4
      sig.level = 0.05
          power = 0.8

NOTE: N is the number of observations

Which suggests a sample of of n298.38, which we would round up to 300.

Participants were recruited from local ADHD support groups and clinical settings. Flyers and online advertisements were used to reach individuals diagnosed with ADHD. Eligible participants were required to have a confirmed ADHD diagnosis of one of the three subtypes: Predominantly Inattentive (PI), Predominantly Hyperactive-Impulsive (PHI), or Combined Type (CT). A total of 200 participants were surveyed.

Materials:

A structured questionnaire was used to collect self-reported therapy preferences. Participants selected their preferred treatment from three options: Cognitive Behavioral Therapy (CBT), Behavioral Therapy, or Medication

Procedure:

Participants completed an online survey that collected demographic information, ADHD subtype (based on a clinical diagnosis), and their preferred therapy type. Informed consent was obtained before participation. The ethics review board at Grenfell Campus reviewed and approved the study.

21.10 Step 3. Conducting the Study

The study was completed as described, and a total of 300 participants provided data. The responses were summarized in the following contingency table:

ADHD Subtype CBT Behavioral Therapy Medication Total
PI 50 30 20 100
PHI 30 50 70 150
CT 20 40 40 100
Total 100 120 130 300

21.11 Step 4. Analysing the Data

A chi-square test of independence was conducted to determine whether there was a significant relationship between ADHD subtype and therapy preference. The results are as follows:


   Cell Contents
|-------------------------|
|                   Count |
| Chi-square contribution |
|            Std Residual |
|-------------------------|

Total Observations in Table:  350 

             |  
             |      CBT  |       BT  |      Med  | Row Total | 
-------------|-----------|-----------|-----------|-----------|
          PI |       50  |       30  |       20  |      100  | 
             |   16.071  |    0.536  |    7.912  |           | 
             |    4.009  |   -0.732  |   -2.813  |           | 
-------------|-----------|-----------|-----------|-----------|
         PHI |       30  |       50  |       70  |      150  | 
             |    3.857  |    0.040  |    3.663  |           | 
             |   -1.964  |   -0.199  |    1.914  |           | 
-------------|-----------|-----------|-----------|-----------|
          CT |       20  |       40  |       40  |      100  | 
             |    2.571  |    0.952  |    0.220  |           | 
             |   -1.604  |    0.976  |    0.469  |           | 
-------------|-----------|-----------|-----------|-----------|
Column Total |      100  |      120  |      130  |      350  | 
-------------|-----------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  35.82265     d.f. =  4     p =  3.147259e-07 


 
       Minimum expected frequency: 28.57143 

    Two-sided 95% chi-squared confidence interval for the population
    Cramer's V

Sample estimate: 0.2262194 
Confidence interval:
     2.5%     97.5% 
0.1592337 0.3015782 

Our overall Chi-square was statistically significant, indicating that the observed data are unlikely given our expected data. We can further explore which cells seem to be driving our results by inspecting the standardized residuals. In our results, there are two cells that seem to be particularly influential: individuals with predominantly inattentive type (PI) seem to prefer CBT much more than expected, and prefer medication much less than expected.

21.12 Step 5: Write up your results

A chi-square test of independence was conducted to examine the relationship between ADHD subtype (PI, PHI, CT) and therapy type (CBT, Behavioral Therapy, Medication). The results of the chi-square test were statistically significant, χ2(4)=35.82, p<.001, V=.226, 95%[.159,.302], indicating that the distribution of therapy types differs significantly across ADHD subtypes.

To further explore these results, we examined the standardized residuals for each cell. The standardized residuals indicated that individuals with predominantly inattentive type (PI) were more likely to prefer CBT (standardized residual =4.009) and less likely to prefer medication (standardized residual = 2.813) than expected. The remaining cells showed minor deviations from expected frequencies, with standardized residuals less than 2.

These findings suggest a strong preference for CBT among individuals with PI. Further research may be necessary to explore the underlying factors contributing to these preferences.

21.13 Conclusion

The chi-square test is a powerful tool for analyzing relationships between categorical variables. By comparing observed and expected frequencies, we can determine whether a meaningful association exists. While straightforward to compute, the test has key assumptions that must be met for valid results. Understanding and applying the chi-square test correctly is an essential skill for researchers working with categorical data.