Chi-squared tests of independence determine whether a relationship exists between two categorical variables. Do the values of one categorical variable depend on the value of the other categorical variable? If the two variables are independent, knowing the value of one variable provides no information about the value of the other variable.
I’ve previously written about Pearson’s chi-square test of independence using a fun Star Trek example. Are the uniform colors related to the chances of dying? You can test the notion that the infamous red shirts have a higher likelihood of dying. In that post, I focus on the purpose of the test, applied it to this example, and interpreted the results.
In this post, I’ll take a bit of a different approach. I’ll show you the nuts and bolts of how to calculate the expected values, chi-square value, and degrees of freedom. Then you’ll learn how to use the chi-squared distribution in conjunction with the degrees of freedom to calculate the p-value.
I’ve used the same approach to explain how:
Of course, you’ll usually just let your statistical software perform all calculations. However, understanding the underlying methodology helps you fully comprehend the analysis.
Chi-Squared Example Dataset
For the Star Trek example, uniform color and status are the two categorical variables. The contingency table below shows the combination of variable values, frequencies, and percentages.
|Column total||136||55||239||N = 430|
|Column percentage (Dead)||5.15%||16.36%||10.04%|
If uniform color and fatality rates are independent, we’d expect the column percentage in the bottom row to be roughly equal for all uniform colors. After all, if there is no connection between these variables, there’s no reason for the fatality rates to be different.
However, our fatality rates are not equal. Gold has the highest fatality rate at 16.36%, while Blue has the lowest at 5.15%. Red is in the middle at 10.04%. Does this inequality in our sample suggest that the fatality rates are different in the population? Does a relationship exist between uniform color and fatalities?
Thanks to random sampling error, our sample’s fatality rates don’t exactly equal the population’s rates. If the population rates are equal, we’d likely still see differences in our sample. So, the question becomes, after factoring in sampling error, are the fatality rates in our sample different enough to conclude that they’re different in the population? In other words, we want to be confident that the observed differences represent a relationship in the population rather than merely random fluctuations in the sample. That’s where Pearson’s chi-squared test for independence comes in!
The two hypotheses for the chi-squared test of independence are the following:
- Null: The variables are independent. No relationship exists.
- Alternative: A relationship between the variables exists.
Calculating the Expected Frequencies for the Chi-squared Test of Independence
The chi-squared test of independence compares our sample data in the contingency table to the distribution of values we’d expect if the null hypothesis is correct. Let’s construct the contingency table we’d expect to see if the null hypothesis is true for our population.
For chi-squared tests, the term “expected frequencies” refers to the values we’d expect to see if the null hypothesis is true. To calculate the expected frequency for a specific combination of categorical variables (e.g., blue shirts who died), multiply the column total (Blue) by the row total (Dead), and divide by the sample size.
Row total X Column total / Sample Size = Expected value for one table cell
To calculate the expected frequency for the Dead/Blue cell in our dataset, do the following:
- Find the row total for Dead (40)
- Find the column total for Blue (136)
- Multiply those two values and divide by the sample size (430)
40 * 136 / 430 = 12.65
If the null hypothesis is true, we’d expect to see 12.65 fatalities for wearers of the Blue uniforms in our sample. Of course, we can’t have a fraction of a death, but that doesn’t affect the results.
Contingency Table with the Expected Values
I’ll calculate the expected values for all six cells that represent the combinations of the three uniform colors and two statuses. I’ll also include the observed values in our sample. Expected values are in parentheses.
|Dead||7 (12.65)||9 (5.12)||24 (22.23)||40|
|Alive||129 (123.35)||46 (49.88)||215 (216.77)||390|
|Column% (Expected Dead)||9.3%||9.3%||9.3%|
In this table, notice how the column percentages for the expected dead are all 9.3%. This equality occurs when the null hypothesis is valid, which is the condition that the expected values represent.
Using this table, we can also compare the values we observe in our sample to the frequencies we’d expect if the null hypothesis that the variables are not related is correct.
For example, the observed frequency for Blue/Dead is less than the expected value (7 < 12.65). In our sample, deaths of those in blue uniforms occurred less frequently than we’d expect if the variables are independent. On the other hand, the observed frequency for Gold/Dead is greater than the expected value (9 > 5.12). Meanwhile, the observed frequency for Red/Dead approximately equals the expected value. This interpretation matches what we concluded by assessing the column percentages in the first contingency table.
Pearson’s chi-squared test works by mathematically comparing observed frequencies to the expected values, and boiling all those differences down into one number. Let’s see how it does that!
Related post: Using Contingency Tables to Calculate Probabilities
Calculating the Chi-Squared Statistic
Most hypothesis tests calculate a test statistic. For example, t-tests use t-values and F-tests use F-values as their test statistics. These statistical tests compare your observed sample data to what you would expect if the null hypothesis is true. The calculations reduce your sample data down to one value that represents how different your data are from the null.
For chi-squared tests, the test statistic is, unsurprisingly, chi-squared, or χ2.
The chi-squared calculations involve a familiar concept in statistics—the sum of the squared differences between the observed and expected values. This concept is similar to how regression models assess goodness-of-fit using the sum of the squared differences.
Here’s the formula for chi-squared.
Let’s walk through it!
To calculate the chi-squared statistic, take the difference between a pair of observed (O) and expected values (E), square the difference, and divide that squared difference by the expected value. Repeat this process for all cells in your contingency table and sum those values. The resulting value is χ2. We’ll calculate it for our example data shortly!
Important Considerations about the Chi-Squared Statistic
Notice several important considerations about chi-squared values:
Zero represents the null hypothesis. If all your observed frequencies equal the expected frequencies exactly, the chi-squared value for each cell equals zero, and the overall chi-squared statistic equals zero. Zero indicates your sample data exactly match what you’d expect if the null hypothesis is correct.
Squaring the differences ensures both that cell values must be non-negative and that larger differences are weighted more than smaller differences. A cell can never subtract from the chi-squared value.
Larger values represent a greater difference between your sample data and the null hypothesis. Chi-squared tests are one-tailed tests rather than the more familiar two-tailed tests. The test determines whether the entire set of differences exceeds a significance threshold. If your χ2 passes the limit, your results are statistically significant! You can reject the null hypothesis and conclude that the variables are dependent–a relationship exists.
Related post: One-tailed and Two-tailed Hypothesis Tests
Calculating Chi-Squared for our Example Data
Let’s calculate the chi-squared statistic for our example data! To do that, I’ll rearrange the contingency table, making it easier to illustrate how to calculate the sum of the squared differences.
The first two columns indicate the combination of categorical variable values. The next two are the observed and expected values that we calculated before. The last column is the squared difference divided by the expected value for each row. The bottom line sums those values.
Our chi-squared test statistic is 6.17. Ok, great. What does that mean? Larger values indicate a more substantial divergence between our observed data and the null hypothesis. However, the number by itself is not useful because we don’t know if it’s unusually large. We need to place it into a broader context to determine whether it is an extreme value.
Using the Chi-Squared Distribution to Test Hypotheses
One chi-squared test produces a single chi-squared value. However, imagine performing the following process.
First, assume the null hypothesis is valid for the population. At the population level, there is no relationship between the two categorical variables. Now, we’ll repeat our study many times by drawing many random samples from this population using the same design and sample size. Next, we perform the chi-squared test of independence on all the samples and plot the distribution of the chi-squared values. This distribution is known as a sampling distribution, which is a type of probability distribution.
If we follow this procedure, we create a graph that displays the distribution of chi-squared values for a population where the null hypothesis is true. We use sampling distributions to calculate probabilities for how unlikely our sample statistic is if the null hypothesis is correct. Chi-squared tests use the chi-square distribution.
Fortunately, we don’t need to collect many random samples to create this graph! Statisticians understand the properties of chi-squared distributions so we can estimate the sampling distribution using the details of our design.
Our goal is to determine whether our sample chi-squared value is so rare that it justifies rejecting the null hypothesis for the entire population. The chi-squared distribution provides the context for making that determination. We’ll calculate the probability of obtaining a chi-squared value that is at least as high as the value that our study found (6.17).
This probability has a name—the P-value! A low probability indicates that our sample data are unlikely when the null hypothesis is true.
Graphing the Chi-Squared Test Results for Our Example
For chi-squared tests, the degrees of freedom define the shape of the chi-squared distribution for a design. Chi-square tests use this distribution to calculate p-values. The graph below displays several chi-square distributions with differing degrees of freedom.
For a table with r rows and c columns, the method for calculating degrees of freedom for a chi-square test is (r-1) (c-1). For our example, we have two rows and three columns: (2-1) * (3-1) = 2 df.
Read my post about degrees of freedom to learn about this concept along with a more intuitive way of understanding degrees of freedom in chi-squared tests of independence.
Below is the chi-squared distribution for our study’s design.
The distribution curve displays the likelihood of chi-squared values for a population where there is no relationship between uniform color and status at the population level. I shaded the region that corresponds to chi-square values greater than or equal to our study’s value (6.17). When the null hypothesis is correct, chi-square values fall in this area approximately 4.6% of the time, which is the p-value (0.046). With a significance level of 0.05, our sample data are unusual enough to reject the null hypothesis.
The sample evidence suggests that a relationship between the variables exists in the population. While this test doesn’t indicate red shirts have a higher chance of dying, there is something else going on with red shirts. Read my other post chi-squared to learn about that!
Pearson’s chi-squared test for independence doesn’t tell you the effect size. To understand the strength of the relationship, you’d need to use something like Cramér’s V, which is a measure of association like Pearson’s correlation—except for categorical variables. That’s the topic of a future post!