Contingency tables are a great way to classify outcomes and calculate different types of probabilities. These tables contain rows and columns that display bivariate frequencies of categorical data. Analysts also refer to contingency tables as crosstabulation (cross tabs), two-way tables, and frequency tables.
Statisticians use contingency tables for a variety of reasons. I love these tables because they both organize your data and allow you to answer a diverse set of questions. In this post, I focus on using them to calculate different types of probabilities. These probabilities include joint, marginal, and conditional probabilities.
Contingency Table Basics
Contingency tables classify outcomes in rows and columns. Table cells at the intersections of rows and columns indicate frequencies of both events coinciding.
For example, the table below displays events for computer sales at a fictional store. Specifically, it describes the frequencies of sales by the customer’s gender and the type of computer purchased. The cells’ counts represent the number of PCs and Macs purchased by both genders. Additionally, the table contains sums for each row and column, along with the grand total of all observations.
At first glance, it’s easy to see how these tables both organize your data and paint a picture of the results. For example, 66 males bought PCs while females bought 87 Macs. Furthermore, there are a total of 117 females, 106 males, 96 PC sales, 127 Mac sales, and a grand total of 223 observations in the study.
Note that this study assesses completed sales only. However, we could include an additional column for No Sales if we wanted to include that outcome.
These tables are more flexible than they first appear because they allow you to answer a diverse set of probability questions. What are the joint, marginal, and conditional probabilities of events occurring?
As you work through the different types of probabilities, keep in mind that, in a general sense, all probabilities equal the following ratio:
When using a contingency table to calculate different types of probabilities, it’s just a matter of determining which table values go in the numerator and denominator. All the information you need is right there in the table!
Throughout this post, I’ll first walk you through each type of probability and how to calculate it using a contingency table, allowing you to understand it intuitively. Then, I’ll show you the formal notation and equations so you become familiar with them.
Related post: Probability Fundamentals
How to Calculate Joint Probabilities in Contingency Tables
Joint probabilities are the probabilities that events occur together. For example, what is the joint probability of a Mac purchase by a female?
Contingency tables really shine at highlighting joint probabilities because each cell displays the number of times events occurred together. Those cell values are the joint events for the numerator. The grand total is the number of outcomes for the denominator.
Consequently, to calculate joint probabilities in a contingency table, take each cell count and divide by the grand total.
For our example, the joint probability of females buying Macs equals the value in that cell (87) divided by the grand total (223).
Joint Probability Notation and Calculations
P(A ⋂ B) is the notation for the joint probability of event “A” and “B” occurring together.
For our example, we determined that:
P(Female ⋂ Mac) = 0.390
The equation for calculating joint probabilities in a contingency table for a cell in row i, column j is the following:
The process for calculating joint probabilities using a contingency table is the following:
- The numerator equals the count of occurrences for the specific combination of events in which you’re interested.
- The denominator equals the grand total number of observations.
In the table below, the values in parentheses are the joint probabilities for the cells. Joint probabilities for an entire table always sum to 1.
Easy peasy, right?
How to Calculate Marginal Probabilities in Contingency Tables
Marginal probabilities are the probabilities that a single event occurs with no regard to other events in the table. These probabilities do not depend on the condition of another outcome. This lack of dependency differs from joint probabilities (above) and conditional probabilities (below). In our table, the single events are gender (male or female) and computer type (PC or Mac).
In contingency tables, you can locate the marginal probabilities in the row and column totals. Statisticians refer to them as marginal probabilities because you find them in the margins of contingency tables!
Choose the individual event you’re interested in and use the corresponding row or column total in the numerator. Then, use the grand total for the denominator.
For example, if you want to determine the probability for a Mac purchase and disregard gender, you simply take the column total for Mac (127) and divide it by the grand total (223). Or, if you want to determine the probability of a female purchasing a computer and not consider the type of computer, take the row total for Female (117) and divide by the grand total (223).
Marginal Probability Notation and Calculations
P(A) denotes the probability of event A occurring. For our example, we determined that:
P(Mac) = 0.570
The equations for calculating marginal probabilities in a contingency table for row i and column j are the following:
The process for calculating marginal probabilities using a contingency table is the following:
- The numerator equals the row or column total for the individual event in which you’re interested.
- The denominator equals the grand total number of observations.
In the table below, the values in parentheses are marginal probabilities for each condition. The column marginal probabilities (PC and Mac) sum to 1. Similarly, the row marginal probabilities (Male and Female) also sum to 1.
How to Calculate Conditional Probabilities in Contingency Tables
Conditional probabilities are the probability that an event occurs given that another event has occurred. For example, given that a customer is female, what is the probability she’ll purchase a Mac?
These probabilities sound a bit more complicated, but they are easy to calculate using contingency tables. Let’s answer the following conditional probability questions.
- What is the probability that the purchase will be a Mac given that the customer is female?
- Given a purchase of a PC, what is the probability that the purchaser is a male?
Both of these are conditional probabilities because they provide a “given” event. Assuming that a particular event occurs, what is the probability of the other event occurring?
Fortunately, using contingency tables to calculate conditional probabilities is straightforward. It’s merely a matter of dividing a cell value by a row or column total.
As with a joint probability, we are interested in a particular combination of events that the table records in a cell. Use the cell value of interest in the numerator.
However, unlike joint and marginal probabilities, we do not use the grand total in the denominator. Instead, we’re conditioning the probability on a particular outcome rather than the entire sample space. Consequently, we use the row or column total for the condition event (the “given” in the problem statement) in the denominator.
Let’s determine the probability that the purchase will be a Mac given that the customer is female.
We need to use the female/Mac cell value (87) in the numerator and the female row total in the denominator (117).
Let’s try another one. Given a PC sale, what is the probability that the purchaser is male?
We need to use the male/PC cell value (66) in the numerator and the PC column total in the denominator (96).
Conditional Probability Notation and Calculations
p(A|B) denotes the conditional probability of A occurring given that B has occurred.
For our two examples of conditional probabilities, we determined the following:
p(Mac|Female) = 0.744
p(Male|PC) = 0.688
The equation for the conditional probability of A given B is the following:
Calculating a conditional probability involves using a joint probability in the numerator and a marginal probability in the denominator.
The process for calculating conditional probabilities using a contingency table is the following:
- The numerator equals the count of occurrences for the specific combination events in which you’re interested. This count is in a cell.
- The denominator equals the count of occurrences for the “given” portion of the question. This value can be either a row total or a column total that includes the cell in step 1.
In this post, I use the counts in the cells, row and column totals, and the grand total to calculate probabilities. However, if you have a table that display probabilities rather than frequencies, you can use the same methodology. Simply enter the probabilities into the ratios rather than the counts. You’ll get the same answers!
Contingency tables are deceptively simple tools. They display frequency counts for pairs of categorical variables and summarize the multivariate relationship between several categorical variables. However, you can also use them to calculate joint, marginal, and conditional probabilities!
Related post: Detecting Relationships in a Contingency Table