Using Contingency Tables to Calculate Probabilities

By Jim Frost 18 Comments

Contingency tables are a great way to classify outcomes and calculate different types of probabilities. These tables contain rows and columns that display bivariate frequencies of categorical data. Analysts also refer to contingency tables as crosstabulation (cross tabs), two-way tables, and frequency tables.

Statisticians use contingency tables for a variety of reasons. I love these tables because they both organize your data and allow you to answer a diverse set of questions. In this post, I focus on using them to calculate different types of probabilities. These probabilities include joint, marginal, and conditional probabilities.

Contingency Table Basics

Contingency tables classify outcomes in rows and columns. Table cells at the intersections of rows and columns indicate frequencies of both events coinciding.

For example, the table below displays events for computer sales at a fictional store. Specifically, it describes the frequencies of sales by the customer’s gender and the type of computer purchased. The cells’ counts represent the number of PCs and Macs purchased by both genders. Additionally, the table contains sums for each row and column, along with the grand total of all observations.

At first glance, it’s easy to see how these tables both organize your data and paint a picture of the results. For example, 66 males bought PCs while females bought 87 Macs. Furthermore, there are a total of 117 females, 106 males, 96 PC sales, 127 Mac sales, and a grand total of 223 observations in the study.

Note that this study assesses completed sales only. However, we could include an additional column for No Sales if we wanted to include that outcome.

These tables are more flexible than they first appear because they allow you to answer a diverse set of probability questions. What are the joint, marginal, and conditional probabilities of events occurring?

As you work through the different types of probabilities, keep in mind that, in a general sense, all probabilities equal the following ratio:

When using a contingency table to calculate different types of probabilities, it’s just a matter of determining which table values go in the numerator and denominator. All the information you need is right there in the table!

Throughout this post, I’ll first walk you through each type of probability and how to calculate it using a contingency table, allowing you to understand it intuitively. Then, I’ll show you the formal notation and equations so you become familiar with them.

Related post: Probability Fundamentals

How to Calculate Joint Probabilities in Contingency Tables

Joint probabilities are the probabilities that events occur together. For example, what is the joint probability of a Mac purchase by a female?

Contingency tables really shine at highlighting joint probabilities because each cell displays the number of times events occurred together. Those cell values are the joint events for the numerator. The grand total is the number of outcomes for the denominator.

Consequently, to calculate joint probabilities in a contingency table, take each cell count and divide by the grand total.

For our example, the joint probability of females buying Macs equals the value in that cell (87) divided by the grand total (223).

Joint Probability Notation and Calculations

P(A ⋂ B) is the notation for the joint probability of event “A” and “B” occurring together.

For our example, we determined that:

P(Female ⋂ Mac) = 0.390

The equation for calculating joint probabilities in a contingency table for a cell in row i, column j is the following:

The process for calculating joint probabilities using a contingency table is the following:

The numerator equals the count of occurrences for the specific combination of events in which you’re interested.
The denominator equals the grand total number of observations.

In the table below, the values in parentheses are the joint probabilities for the cells. Joint probabilities for an entire table always sum to 1.

Easy peasy, right?

How to Calculate Marginal Probabilities in Contingency Tables

Marginal probabilities are the probabilities that a single event occurs with no regard to other events in the table. These probabilities do not depend on the condition of another outcome. This lack of dependency differs from joint probabilities (above) and conditional probabilities (below). In our table, the single events are gender (male or female) and computer type (PC or Mac).

In contingency tables, you can locate the marginal probabilities in the row and column totals. Statisticians refer to them as marginal probabilities because you find them in the margins of contingency tables!

Choose the individual event you’re interested in and use the corresponding row or column total in the numerator. Then, use the grand total for the denominator.

For example, if you want to determine the probability for a Mac purchase and disregard gender, you simply take the column total for Mac (127) and divide it by the grand total (223). Or, if you want to determine the probability of a female purchasing a computer and not consider the type of computer, take the row total for Female (117) and divide by the grand total (223).

Marginal Probability Notation and Calculations

P(A) denotes the probability of event A occurring. For our example, we determined that:

P(Mac) = 0.570

The equations for calculating marginal probabilities in a contingency table for row i and column j are the following:

The process for calculating marginal probabilities using a contingency table is the following:

The numerator equals the row or column total for the individual event in which you’re interested.
The denominator equals the grand total number of observations.

In the table below, the values in parentheses are marginal probabilities for each condition. The column marginal probabilities (PC and Mac) sum to 1. Similarly, the row marginal probabilities (Male and Female) also sum to 1.

How to Calculate Conditional Probabilities in Contingency Tables

Conditional probabilities are the probability that an event occurs given that another event has occurred. For example, given that a customer is female, what is the probability she’ll purchase a Mac?

These probabilities sound a bit more complicated, but they are easy to calculate using contingency tables. Let’s answer the following conditional probability questions.

What is the probability that the purchase will be a Mac given that the customer is female?
Given a purchase of a PC, what is the probability that the purchaser is a male?

Both of these are conditional probabilities because they provide a “given” event. Assuming that a particular event occurs, what is the probability of the other event occurring?

Fortunately, using contingency tables to calculate conditional probabilities is straightforward. It’s merely a matter of dividing a cell value by a row or column total.

As with a joint probability, we are interested in a particular combination of events that the table records in a cell. Use the cell value of interest in the numerator.

However, unlike joint and marginal probabilities, we do not use the grand total in the denominator. Instead, we’re conditioning the probability on a particular outcome rather than the entire sample space. Consequently, we use the row or column total for the condition event (the “given” in the problem statement) in the denominator.

Let’s determine the probability that the purchase will be a Mac given that the customer is female.

We need to use the female/Mac cell value (87) in the numerator and the female row total in the denominator (117).

Let’s try another one. Given a PC sale, what is the probability that the purchaser is male?

We need to use the male/PC cell value (66) in the numerator and the PC column total in the denominator (96).

Conditional Probability Notation and Calculations

p(A|B) denotes the conditional probability of A occurring given that B has occurred.

For our two examples of conditional probabilities, we determined the following:

p(Mac|Female) = 0.744

p(Male|PC) = 0.688

The equation for the conditional probability of A given B is the following:

Calculating a conditional probability involves using a joint probability in the numerator and a marginal probability in the denominator.

The process for calculating conditional probabilities using a contingency table is the following:

The numerator equals the count of occurrences for the specific combination events in which you’re interested. This count is in a cell.
The denominator equals the count of occurrences for the “given” portion of the question. This value can be either a row total or a column total that includes the cell in step 1.

Related post: Conditional Probabilities

Wrapping Up!

In this post, I use the counts in the cells, row and column totals, and the grand total to calculate probabilities. However, if you have a table that display probabilities rather than frequencies, you can use the same methodology. Simply enter the probabilities into the ratios rather than the counts. You’ll get the same answers!

Contingency tables are deceptively simple tools. They display frequency counts for pairs of categorical variables and summarize the multivariate relationship between several categorical variables. However, you can also use them to calculate joint, marginal, and conditional probabilities!

Comments

Abdoul Seck says

January 29, 2022 at 5:35 pm

I have a question on probabilities for normally distributed variables. If I have an outcome occurring at a 99% percentile of a Monte carlo simulation. If it’s contingent to another event which has a 95% probability of occuring (or 5% of failing to happen),
What would be the new quantile of the distribution contingent of the other event ? What is the formula?
I used it a few times but couldnt derive again.. Thanks for the help

Loading...

Reply
Collinz says

January 19, 2022 at 11:44 am

hello Jim, I hope you are doing well. Some times life gets so busy here in Israel where I have come as a visiting research student from Africa. Straight to my inquiry, am doing a survey on 5 different dairy farms in Israel to assess the rate of occurence of mastitis during different seasons (winter and summer). I thought that summerizing this data in a contingency table will give more insight. But the challenge is that am looking at data recorded during the subsequent previous 3 years, which is not making sense again in the contingency table. The 3 years (2019,2020,2021) are acting like replicates. The rows I have are summer and winter while the columns are the 5 different dairy farms in South Israel. I will be glad to receive your view. thank you

Loading...

Reply
Kelly Papapavlou says

April 29, 2021 at 4:57 am

Thank you very much for clatifying these points!

Loading...

Reply
Kelly Papapavlou says

April 27, 2021 at 3:24 pm

Dear Jim

Following your advise I have estimated that when a group of people enters a mall on a Sunday the conditional probability of ordering a coffee (versus not ordering a coffee) given that they have entered on a Sunday is 0,15 whereas the conditional probability of ordering a cofee given that they have entered on a Saturday is 0.18.

Am I correct to further estimate that if 300 people enter on a Sunday then 300*0.15 from them are expected tp order a coffee?

Many thanks for providing your valuable knowledge

Kelly

Loading...

Reply
- Jim Frost says
  
  April 28, 2021 at 10:29 pm
  
  Hi Kelly,
  
  Yes, that would be the expected value. Of course, there are some assumptions. A key assumption is that you took a representative sample. Also, as with any sample, there is samplilng error. This method of calculating probabilities differs from other methods for things like coin flips, drawing cares, etc. where you know the long term average. This way of estimating the relative frequency of events has additional error because it’s a sample. And, even with coin flips you won’t get exactly the predicted value. So, just be aware there is that additional source of error using this method, but sometimes it’s the only method you can use!
  
  What you’d conclude is that using the best available information you have, you’d expect 300 * 0.15 = 45 people to order coffee on Sunday.
  
  Loading...
  
  Reply
Joe says

March 27, 2021 at 5:44 am

Hi Jim,

Thank you your enlighten tutorials! I love them. I have a question about your eBooks? Will your eBooks frequently be updated when you have new topic release? Is it life eBooks?

Loading...

Reply
- Jim Frost says
  
  March 29, 2021 at 3:21 pm
  
  Hi Joe,
  
  Yes, when I update my ebooks, you get the updates for free. Although, just FYI, my ebooks have well developed content, so there are not frequent updates at this point. But any time they are updated, you get them for free for life!
  
  Loading...
  
  Reply
Hameed M. Attom says

February 23, 2021 at 8:07 am

Thanks. Vey easy and straightforward method

Loading...

Reply
Ramesh Chandra Das says

February 18, 2021 at 2:34 pm

Thanks a lot. Nice post and nicely explained

Loading...

Reply
Bal Ram Bhui says

February 17, 2021 at 9:01 am

Thanks, Joe for raising very practical issues. I also get confused with it. In fact the most difficulty is to understand the question and figure out whether it is joint, marginal or conditional. Whithout looking at the contingency table in case of purchase of type of computer and gender, what are the possible question that can come to our mind. I can think of a example below:
What is the probability that purchase is a Mac by a female? What is the probability that a female would buy a Mac? Are these two questions the same? I guess these tow are different. Both are conditional; in the former purchase is known which is a Mac and we would like to know the second event if it is a female. So it would have value in the intersection of Mac and female in the numerator, and total Mac purchase is in the denominator. In the later question, the first event is female which is know and whether or not the purchase is a Mac is to be determined. In this case, the numerator is still same but the denominator would be total female. I stands to be corrected. I am trying to be clear.

What would a layman’s question would look like for a joint probability?

Loading...

Reply
joe says

February 16, 2021 at 10:32 pm

Thanks Jim!
Your reply makes it ++clear.

Yes,
the way the question is worded
is truly important.

I imagine that a Mgr. (a ‘biz type) –
might ask such a question
but WITHOUT any of the the words:
“Joint”, “Conditional” or “Marginal”.

Then,
to answer that Mgr. ,
you really need to categorize her question
into one of the 3 types
as described in this article.

For example,
you give a nice Tip:

“If the question
involves the pointer words:
…GIVEN THAT…,
then it’s a CONDITIONAL Probability Question!.
Very clear language pointer!.

I wonder
if there are any similar, typical Tip words,
that would indicate
the question being
a Joint VS a Marginal Probability?.

These “pointer words”
in a question,
would really help determine
which specific Probability type to calculate!.

Hope my suggestion makes sense…
(sorry for the long comment).

Joe SF

Loading...

Reply
joe says

February 16, 2021 at 2:35 pm

Thanks Jim,
for yet another clear post!.

for the Q:
“what is the joint probability of a Mac purchase by a female?”
would it not be:
87 / 127
(the total # of Macs sold)…?

because:
223 in the denominator,
is the TOTAL number of PCs and Macs sold.
(not only the total number of Macs sold).

I guess
one has to be careful how the Question is worded,
to interpret it carefully.

You see my confusion,
I hope.

Loading...

Reply
- Jim Frost says
  
  February 16, 2021 at 4:51 pm
  
  Hi Joe,
  
  Thanks for writing. It is easy to get these confused! What you’re asking about gets to the difference between joint probabilities and conditional probabilities.
  
  A joint probability is out of all possible outcomes. So, in this example, it answers the question of: Out of all purchases, what is the probability of a Mac purchase by a female? That’s out of all purchases, so we use the grand total of 223 outcomes in the denominator. For joint probabilities, we’re NOT assuming that one condition occurred, which is why we use the grand total. Instead, joint probabilities are the probability of both occurring together.
  
  However, if we were to use the number of Mac sales in the denominator, we are conditioning the probability on Mac sales, which is what you’re asking about. In that case, we’re answering the question: Given that the purchase is a Mac, what is the probability that it was purchased by a female. In that case, we use the female/Mac cell in the numerator (87) and the total Macs sold (the “given”) in the denominator (127). 87/127 = 0.685. Or, P(Female|Mac) = 0.685. In other words, we’re reducing our sample space down to only Mac sales, hence we use the total Macs sold instead of the grand total.
  
  The trick is to know whether you’re considering joint events out of all possible outcomes (all sales), in which case it is a joint probability. Or, are you conditioning on a specific “given” condition, in which case it is a conditional probability that uses a subset of all possible outcomes (a row or column total).
  
  Wording is important! Also, the notation helps keep it clear as long as you’re familiar with the notation.
  
  Loading...
  
  Reply
Bal Ram Bhui says

February 16, 2021 at 10:30 am

Thank you, Jim. Your blogs are so useful and intuitive. I studied it in my academic work, passed the exams well. Reading this from you, it makes me wonder what did I learn and understand in those days. This is very practical and will stamp a permanent knowledge in the brain. Thank you. I have bought your book on Intro to Stat and Regression. Thanks

Loading...

Reply
- Jim Frost says
  
  February 16, 2021 at 4:58 pm
  
  Hi Bal Ram, thanks so much! I’m so happy to hear that my website and books have been helpful! And, thank you for supporting my books!! 🙂
  
  Loading...
  
  Reply
Solomon Yemidi says

February 16, 2021 at 7:42 am

interesting and straightforward

Loading...

Reply
Denny S. Fernandez says

February 15, 2021 at 1:16 pm

Thank you Jim. I am just going to teach probabilities this week and am going to try your method. Denny S.

Loading...

Reply
ZAK says

February 15, 2021 at 12:29 pm

Thank you Jim. It is quite informative. Though I am not a statistician nor studied Statistic but motivating.

Loading...

Reply