The Chi-square test of independence determines whether there is a statistically significant relationship between categorical variables. It is a hypothesis test that answers the question—do the values of one categorical variable depend on the value of other categorical variables?

As you no doubt guessed, I’m a huge fan of statistics. I’m also a big Star Trek fan. Consequently, it’s not surprising that I’m writing a blog post about both! In the *Star Trek* TV series, Captain Kirk and the crew wear different colored uniforms to identify the crewmember’s work area. Those who wear red shirts have the unfortunate reputation of dying more often that those who wear gold or blue shirts.

In this post, I’ll show you how the Chi-square test of independence works. Then, I’ll show you how to perform the analysis and interpret the results by working through the example. I’ll use this test to determine whether wearing the dreaded red shirt in Star Trek is the kiss of death!

If you need a primer on the basics, read my hypothesis testing overview.

## Overview of the Chi-Square Test of Independence

The Chi-square test of association evaluates relationships between categorical variables. Like any statistical hypothesis test, the Chi-square test has both a null hypothesis and an alternative hypothesis.

- Null hypothesis: There are no relationships between the categorical variables. If you know the value of one variable, it does not help you predict the value of another variable.
- Alternative hypothesis: There are relationships between the categorical variables. Knowing the value of one variable
*does*help you predict the value of another variable.

The Chi-square test of independence works by comparing the distribution that you observe to the distribution that you expect if there is no relationship between the categorical variables. In the Chi-square context, the word “expected” is equivalent to what you’d expect if the null hypothesis is true. If your observed distribution is sufficiently different than the expected distribution (no relationship), you can reject the null hypothesis and infer that the variables are related.

For a Chi-square test, a p-value that is less than or equal to your significance level indicates there is sufficient evidence to conclude that the observed distribution is not the same as the expected distribution. You can conclude that a relationship exists between the categorical variables.

## Star Trek Fatalities by Uniform Colors

We’ll perform a Chi-square test of independence to determine whether there is a statistically significant association between shirt color and deaths. We need to use this test because these variables are both categorical variables. Shirt color can be only blue, gold, or red. Fatalities can be only dead or alive.

The color of the uniform represents each crewmember’s work area. We will statistically assess whether there is a connection between uniform color and the fatality rate. Believe it or not, there are “real” data about the crew from authoritative sources and the show portrayed the deaths onscreen. The table below shows how many crewmembers are in each area and how many have died.

Color | Areas | Crew | Fatalities |

Blue | Science and Medical | 136 | 7 |

Gold | Command and Helm | 55 | 9 |

Red | Operations, Engineering, and Security | 239 | 24 |

Ship’s total | All | 430 | 40 |

## Performing the Chi-Square Test of Independence for Uniform Color and Fatalities

For our example, we are going to determine whether the observed counts of deaths by uniform color is different from the distribution that we’d expect if there is no association between the two variables.

The table below shows how I’ve entered the data into the worksheet. You can also download the CSV dataset for StarTrekFatalities.

Color | Status | Frequency |

Blue | Dead | 7 |

Blue | Alive | 129 |

Gold | Dead | 9 |

Gold | Alive | 46 |

Red | Dead | 24 |

Red | Alive | 215 |

You can use the dataset to perform the analysis in your preferred statistical software. The Chi-squared test of independence results are below. As an aside, I use this example in my post about degrees of freedom in statistics. Learn why there are two degrees of freedom for the table below.

In our statistical results, both p-values are less than 0.05. We can reject the null hypothesis and conclude there is a relationship between shirt color and deaths. The next step is to define that relationship.

Describing the relationship between categorical variables involves comparing the observed count to the expected count in each cell of the Dead column. I’ve annotated this comparison in the statistical output above. Additionally, you can graph the contribution of each table cell to the Chi-square statistic, which is below.

Surprise! It’s the blue and gold uniforms that contribute the most to the Chi-square statistic and produce the statistical significance! Red shirts add almost nothing. In the statistical output, the comparison of observed counts to expected counts shows that blue shirts die less frequently than expected, gold shirts die more often than expected, and red shirts die at the expected rate.

The graph below reiterates these conclusions by displaying the percentage of fatalities by uniform color along with the overall death rate.

The Chi-square test indicates that red shirts don’t die more frequently than expected. Hold on. There’s more to this story!

Time for a bonus lesson and a bonus analysis in this blog post!

## 2 Proportions test to compare Security Red-Shirts to Non-Security Red-Shirts

The bonus lesson is that is vital to include the truly pertinent variables in the analysis. Perhaps the color of the shirt is not the important variable but rather the crewmember’s work area. Crewmembers in Security, Engineering, and Operations all wear red shirts. Maybe only security guards have a higher death rate?

We can test this theory using the 2 Proportions test. We’ll compare the fatality rates of red-shirts in security to red-shirts who are not in security.

The summary data are below. In the table, the events represent the counts of deaths while the trials are the number of personnel.

Events | Trials | |

Security | 18 | 90 |

Not security | 6 | 149 |

The p-value of 0.000 signifies that the difference between the two proportions is statistically significant. Security has a mortality rate of 20% while the other red-shirts are only at 4%.

Security officers have the highest mortality rate on the ship, closely followed by the gold-shirts. Red-shirts that are not in security have a fatality rate similar to the blue-shirts.

As it turns out, it’s not the color of the shirt that has an effect; it’s the duty area. That makes more sense.

## Risk by Work Area Summary

The Chi-square test of independence and the 2 Proportions test both indicate that the death rate varies by work area on the U.S.S. Enterprise. Doctors, scientists, engineers, and those in ship operations are the safest with about a 5% fatality rate. Crewmembers that are in command or security have death rates that exceed 15%!

MA says

Excellent Example, Thank you.

Jim Frost says

You’re very welcome. I’m glad it was helpful!

priyanka adhikary says

Helpful post. I can understand now

K.V.S.Sarma says

A good presentation. My experience with researchers in health sciences and clinical studies is that very often people do not bother about the hypotheses (null and alternate) but run after a p-value, more so with Chi-Square test of independence!! Your narration is excellent.

Palavardhan says

Nice post.

Jim Frost says

Thank you!

Jessica Escorcia says

Amazing post!! In the tabulated statistics section, you ran a Pearson Chi Square and a Likelihood Ratio Chi Square test. Are both of these necessary and do BOTH have to fall below the significance level for the null to be rejected? I’m assuming so. I don’t know what the difference is between these two tests but I will look it up. That was the only part that lost me:)

Jim Frost says

Thanks again, Jessica! I really appreciate your kind words!

When the two p-values are in agreement (e.g., both significant or insignificant), that’s easy. Fortunately, in my experience, these two p-values usually do agree. And, as the sample size increases, the agreement between them also increases.

I’ve looked into what to do when they disagree and have not found any clear answers. This paper suggests that as long as all expected frequencies are at least 5, use the Pearson Chi-Square test. When it is less than 5, the article recommends an adjusted Chi-square test, which is neither of the displayed tests!

These tests are most likely to disagree when you have borderline results to begin with (near your significance level), and particularly when you have a small sample. Either of these conditions alone make the results questionable. If these tests disagree, I’d take it as a big warning sign that more research is required!

I hope this helps!

Aftab Siddiqui says

very nice, thanks