What is a Contingency Table?
A contingency table displays frequencies for combinations of two categorical variables. Analysts also refer to contingency tables as crosstabulation and two-way tables.
Contingency tables classify outcomes for one variable in rows and the other in columns. The values at the row and column intersections are frequencies for each unique combination of the two variables.
Use contingency tables to understand the relationship between categorical variables. For example, is there a relationship between gender (male/female) and type of computer (Mac/PC)?
I love these tables because they organize your data and allow you to answer diverse questions. In this post, learn about contingency tables, including how to interpret, graph, and analyze them.
Example Contingency Table
The contingency table example below displays computer sales at our fictional store. Specifically, it describes sales frequencies by the customer’s gender and the type of computer purchased. It is a two-way table (2 X 2). I cover the naming conventions at the end.
In this contingency table, columns represent computer types and rows represent genders. Cell values are frequencies for each combination of gender and computer type. Totals are in the margins. Notice the grand total in the bottom-right margin.
At a glance, it’s easy to see how two-way tables both organize your data and paint a picture of the results. You can easily see the frequencies for all possible subset combinations along with totals for males, females, PCs, and Macs.
For example, 66 males bought PCs while females bought 87 Macs. Furthermore, there are 117 females, 106 males, 96 PC sales, 127 Mac sales, and a grand total of 223 observations in the study.
Marginal and Conditional Distributions in Contingency Tables
Contingency tables are a fantastic way of finding marginal and conditional distributions. These two distributions are types of frequency distributions. Learn more about Frequency Tables: How to Make and Interpret.
Marginal Distribution
These distributions represent the frequency distribution of one categorical variable without regard for other variables. Unsurprisingly, you can find these distributions in the margins of a contingency table.
The following marginal distribution examples correspond to the blue highlights.
For example, the marginal distribution of gender without considering computer type is the following:
- Males: 106
- Females: 117
Alternatively, the marginal distribution of computer types is the following:
- PC: 96
- Mac: 127
Learn more about Marginal Distributions.
Conditional Distribution
For these distributions, you specify the value for one of the variables in the contingency table and then assess the distribution of frequencies for the other variable. In other words, you condition the frequency distribution for one variable by setting a value of the other variable. That might sound complicated, but it’s easy using a contingency table. Just look across one row or down one column.
The following conditional distribution examples correspond to the green highlights.
For example, the conditional distribution of computer type for females is the following:
- PC: 30
- Mac: 87
Alternatively, the conditional distribution of gender for Macs is the following:
- Males: 40
- Females: 87
Learn more about Conditional Distributions.
Finding Relationships in a Contingency Table
In the contingency table below, the two categorical variables are gender and ice cream flavor preference. This is a two-way table (2 X 3) where each cell represents the number of times males and females prefer a particular ice cream flavor. The CSV datasheet shows one format you can use to enter the data into your software: Flavor Preference.
How do we go about identifying a relationship between gender and flavor preference?
If there is a relationship between ice cream preference and gender, we’d expect the conditional distribution of flavors in the two gender rows to differ. From the contingency table, females are more likely to prefer chocolate (37 vs. 21), while males prefer vanilla (32 vs. 12). Both genders have an equal preference for strawberry. Overall, the two-way table suggests that males and females have different ice cream preferences.
The Total column indicates the researchers surveyed 66 females and 71 males. Because we have roughly equal numbers, we can compare the raw counts directly. However, when you have unequal groups, use percentages to compare them.
Row and Column Percentages in Contingency Tables
Row and column percentages help you draw conclusions when you have unequal numbers in the margins. In the contingency table example above, more women than men prefer chocolate, but how do we know that’s not due to the sample having more women? Use percentages to adjust for unequal group sizes. Percentages are relative frequencies. Learn more about Relative Frequencies and their Distributions.
Here’s how to calculate row and column percentages in a two-way table.
- Row Percentage: Take a cell value and divide by the cell’s row total.
- Column Percentage: Take a cell value and divide by the cell’s column total.
For example, the row percentage of females who prefer chocolate is simply the number of observations in the Female/Chocolate cell divided by the row total for women: 37 / 66 = 56%.
The column percentage for the same cell is the frequency of the Female/Chocolate cell divided by the column total for chocolate: 37 / 58 = 63.8%.
Interpreting Percentages in a Contingency Table
The contingency table below uses the same raw data as the previous table and displays both row and column percentages. Note how the row percentages sum to 100% in the right margin while the column percentages sum to 100% at the bottom.
Whether you focus on row percentages or column percentages in a contingency table depends on the question you’re answering. In our case, we want to know whether flavor preference depends on gender. Because the two genders display in separate rows, we’ll look for differences in the row percentages.
56% of females prefer chocolate versus only 29.6% of males. Conversely, 45% of males prefer vanilla, while only 18.2% of females prefer it. These results reconfirm our previous findings using the raw counts.
How to Graph a Contingency Table
You can use bar charts to display a contingency table. The following clustered bar chart shows the row percentages for the previous two-way table. I’ve set the graph to cluster the female and male pairs of bars together for each flavor, making comparisons easier. I think it gives a nice oomph to the tabular results.
This bar chart reiterates our conclusions from the contingency table. Women in this sample prefer chocolate, men favor vanilla, and both genders have an equal preference for strawberry.
Learn more about Bar Charts: Using, Examples and Interpreting.
How to Analyze a Contingency Table
We’ve already looked at various ways to analyze a contingency table. Here are two more methods that take it to another level.
Contingency tables are a fantastic way to display and find various types of probabilities. Use these tables to calculate joint, marginal, and conditional probabilities. I’ve written an article about calculating probabilities using two-way tables, and it includes all the definitions, notation, and formulas you need. Read about Using Contingency Tables to Calculate Probabilities.
In this post, we looked for a relationship between gender and ice cream preference by noting the differences between counts and row percentages in the contingency table. If we’re using this sample to draw inferences about the entire population of ice cream consumers, we’ll need to use a hypothesis test to evaluate the relationship.
In other words, are the differences we noticed in the sample large enough to support the notion that a relationship exists in the population? Or can we chalk up the differences to random sampling error? Learn how the chi-square test of independence can help us out by analyzing contingency tables!
Naming Conventions for Contingency Tables
Contingency tables come in a variety of flavors. The key considerations for naming the types are the number of categorical variables and the number of values for each categorical variable.
Number of Categorical Variables
You must have at least two categorical variables to create a contingency table. When you have two variables, it’s a two-way table. If you have three, it’s a three-way table, and so on.
For example, suppose we run a computer store and record the sales using the two categorical variables of gender and computer type. Those variables create a two-way contingency table. If we add a third categorical variable for store location, it becomes a three-way table.
How do you present a three-way table?
Because contingency tables display in two dimensions, you need multiple tables to represent anything more than a two-way table.
If there are four store locations in our three-way example, we’ll need to use four two-way tables. Each table displays gender and computer type for one location.
Number of Rows and Columns
The rows represent values of one categorical variable, while the columns denote the values of another. Analysts indicate the number of values for each variable by describing these tables as an A X B contingency table, where A represents the number of rows and B signifies the number of columns.
For example, at our computer store, Gender has two possible values and computer type has two values (PC and Mac). Hence, we have a 2 X 2 contingency table. If we add a third type of computer as a new column, it becomes a 2 X 3 table.
Qaseem Siddiqui says
Very nice analysis
Nick says
Hi Jim,
Thank you for your awesome explanation; I learned a lot. Can we extend a 2×2 contingency table to a 2×3 contingency table with 2×2 data under some assumptions and constraints? Let’s say we are introducing another type of computer (maybe IBM, for example). Thanks
Jim Frost says
Hi Nick,
Thanks! So glad it was helpful.
Yes, you can definitely extend it beyond 2 X 2 and your 2 X 3 example is a good one. They can be even larger. You just need two categorical variables where each variable has mutually exclusive categories. Those variables can have more than 2 values.
yvette.mwiza says
what if I want to compare data from two different universities
such as number of females number of boys, with their number of admission and drop out
Jim Frost says
Hi Yvette,
You can definitely make contingency tables out of your data because you have categorical data where the values are mutually exclusive. However, you have three variables: university, gender, and admission status. That means you can’t create a regular two-way table to see all the results at once. You’d need three dimensions to create one tree-way table with all those variables.
Instead, you might create separate two-way contingency tables for each university. Each university could have a two-way table with columns for male and female (possibly non-binary options as well) and rows for admission status (enrolled, dropped out). Something like that.
Archana says
can contingency table be prepared for 3 variables?
Jim Frost says
Hi Archana,
You can but you’ll need to use multiple tables. Each individual table displays the values of two variables. Using multiple tables, each table sets the value of the third variable at a specific value.
Imagine you’re evaluating variables X, Y, Z. And Z can have the values of 1, 2, and 3.
So, you’ll have three tables, one for each value of Z. Each table shows X and Y for the rows and columns. One table will show the X and Y values for Z = 1, the next one for Z = 2, and then Z = 3.
Pedro says
Hi Jim,
Thank you for you great job explaining statistics.
Recently, I was using contingency tables to compare two populations and their symptoms.
I was wondering if the best way to do it, is to use each symptom individually or to make a 2×4 table.
ie.
Pneumonia (n=10) No Pneumonia (n=8)
Cough 9 1
Fever 2 3
Fatigue 1 4
Sweats 2 3
I wonder if because this categorical variable can be an individual characteristic for each patient if I make a 2×4 table it would count twice the observation, is this something that may affect the outcomes?
Jim Frost says
Hi Pedro,
Unfortunately, you can’t create a contingency tables with those data. You do have two categorical variables, disease status and symptoms. Unfortunately, the symptoms are not mutually exclusive. All categories need to be mutually exclusive. In your example, a patient could have both cough and fever as a symptoms, or other combinations of multiple symptoms.