What is Joint Probability?
Joint probability is the likelihood that two or more events will coincide. Knowing how to calculate them allows you to solve problems such as the following. What is the probability of:
- Getting two heads in two coin tosses?
- Consecutively drawing two aces from a deck of cards?
- The next customer being a woman who buys a Mac computer?
- A bike rental customer getting both a flat front tire and a flat rear tire?
Statisticians use the notation of P(A ∩ B) to indicate the joint probability of events “A” and “B” occurring together. For example, P(F ∩ Mac) denotes the likelihood of a female buying a Mac. Equivalent variants of this notation include P(A and B) and P (A,B).
The notation includes the symbol “∩,” which signifies an intersection. This intersection specifies how two or more events, such as A and B, coincide. Consequently, joint probability is also known as the intersection of events.
A Venn diagram is a useful visual tool for understanding intersections because it shows the overlap between sets or events.
There are several ways to find joint probabilities. The following sections discuss three standard methods: tables, independent, and dependent events.
Related post: Probability Definition and Fundamentals
Joint Probability Table
When dealing with multiple events, creating a table to organize the likelihoods can be helpful. A joint probability table lists the chances of event combinations at each row and column intersection.
Remember how ∩ represents an intersection? That makes sense in a table!
For example, suppose a survey asks people about their favorite color and animal. The researchers organize the results in the table below:
Cat | Dog | Other | |
Red | 0.10 | 0.05 | 0.03 |
Green | 0.08 | 0.12 | 0.05 |
Blue | 0.03 | 0.06 | 0.02 |
We want to find the likelihood that someone chooses red as their favorite color and a dog as their favorite animal. We can locate the intersection of the “Red” row and the “Dog” column, which is 0.05.
Therefore, the joint probability is:
P(Red ∩ Dog) = 0.05
If you have a contingency table that displays frequencies rather than likelihoods, you can use it to calculate joint probabilities. For instance, the previous table might have started as a regular contingency table. Learn how in my post, Using Contingency Tables to Calculate Probabilities.
Now let’s see how to calculate joint probabilities when you know the event likelihoods.
Joint Probability Formula for Independent Events
When two events are independent, the occurrence of one event does not affect the chances of the other event occurring. In this case, we can find the joint probability by multiplying the likelihood of one event by the likelihood of another.
The joint probability formula for independent events is the following:
P(A ∩ B) = P (A) * P(B)
For example, suppose we have a coin that we flip twice. We want to find the chances of getting heads on both the first and second flips. Because each flip is independent, the probability of the first heads is 1/2, and the likelihood of heads on the second flip is also 1/2. Therefore, the joint probability is the following:
P(H1 ∩ H2) = P(heads on first flip) x P(tails on second flip)
= 1/2 x 1/2
= 1/4
Similarly, the joint probability of rolling two sixes on six-sided dice is the following:
P(61 ∩ 62) = P(6 on first roll) x P(6 on second roll)
= 1/6 X 1/6 = 1/36
Related post: Independent Events
Formula for Dependent Events
We can use the general multiplication rule to calculate joint probabilities for dependent events. This rule allows us to factor in how the occurrence of one event affects the likelihood of the other event. Learn more about the Multiplication Rule.
The joint probability formula for dependent events is the following:
P(A ∩ B) = P(A) * P(B|A)
Here, P(A) represents the chances of event A occurring, while P(B|A) represents the conditional probability of event B occurring, given that event A has already happened. By multiplying these two likelihoods, we can calculate the joint probability of both events coinciding.
To solve this type of problem, you must know how the first event affects the likelihood of the second event.
Related post: Conditional Probability: Definition, Formula & Examples
Example
Suppose we need to calculate the likelihood of drawing two aces consecutively from a standard deck of 52 cards when we don’t replace the cards. Initially, the deck contains four aces, so the likelihood of drawing an ace on the first draw is 4/52 or 1/13. If we draw an ace (event A1), only three aces and 51 cards remain in the deck. Consequently, the conditional probability of drawing another ace (event A2) is now 3/51.
Using the general multiplication rule, we can find the joint probability of drawing two aces in a row:
P(A1 ∩ A2) = P(A1) * P(A2|A1)
P(A1) = 4/52 = 1/13
P(A2|A1) = 3/51
P(A1 ∩ A2) = (1/13) * (3/51) = 3/663 = 1/221
So, the joint probability of drawing two aces in a row is 1/221 or 0.0045.
In conclusion, joint probability is a powerful tool in statistics. They can model complex systems and help us make more informed decisions. Choosing the correct method to calculate them depends on the specific problem at hand.
Kirsten says
OK, think I got it. Your point of “And, another point is that you can’t just declare yourself to be a member of a population and expect it to have an effect on outcomes.” is what I was always stuck on trying to find an “official” way of explaining. Don’t worry, I won’t join the population. After all, Dexter teaches us you are MORE likely to be killed if you are a serial killer! 😀
Kirsten says
I definitely get the main part of this argument and so thank you for it. Glad to find a question you haven’t come across before!
I think where I am stuck is not understanding this part of what you said: “It wouldn’t have to do with the sizes of the populations, because probabilities apply the same to small and large populations.” That flummoxes me! It seems like the probability of some event happening (getting killed by a serial killer) on a specific population would have to change based on how large that population was? If I am a population of 1 based on some variable the chances of X happening to me must be less than the chance of that event happening to someone in a population of 20 million. Like the chance of a person who has a tattoo that says their exact social security number being a victim vs. the chance that a person who has brown hair is a victim. Or to put it another way: if on a roulette wheel there is one instance of the number 20 and 20 instances of the number 1, isn’t it much more likely that the number 1 will come up? I hope I’m explaining this properly. What am I missing here?
Jim Frost says
Hi Kirsten,
I’m sure I didn’t explain that aspect well about size. Let’s take two towns. One has a population of 10,000 and the other has a population of 1,000,000. There are two outcomes, A and B. They’re independent and applies everywhere. Outcome A has a probability of 0.1 and B has a probability of 0.25. The probability of both outcomes occurring to an individual is 0.1 X 0.25 = 0.025. That probability applies to both the larger and smaller town at the individual level. However, of course, the large town will have more sheer numbers 25,000 vs 250 in the small town. So, you need to distinguish between the individual probability which is a consistent 0.025 versus the sheer numbers which do change.
And, another point is that you can’t just declare yourself to be a member of a population and expect it to have an effect on outcomes. When various populations have different outcomes, it’s usually because they have different traits. The population of professional basketball players is very different from the general population in terms of height.
For your example, you need to determine which population you legitimately belong to for any given research question. You’re speculating about being the victim of a serial killer, and you’d have to look at properties that make you more or less likely to be a victim. You’re suppose that serial killers are less likely to be serial killers. Their individual probabilities of being a victim are different based on the characteristics of the serial killer population (not its size like in my example). All hypothetical of course!
Sadly, in your example, you couldn’t just say you’re a serial killer, you’d have to take on their relevant characteristics! Don’t do it Kirsten! It’s not worth it! 😉
Kirsten says
I have a ridiculous question that I have never known how to explain properly in the context of probability. Here goes: The chance of being killed by a serial killer is a small one. Given that serial killers are not a large population, the chances that a serial killer randomly chooses a fellow serial killer to kill would be an even smaller chance. If I decide to become a serial killer to reduce my chances of being killed by a serial killer, can you explain why this would not work? I know it wouldn’t, obviously, to try to place yourself in a different population to alter your chances, but I can’t explain why properly! I told you it was ridiculous! I guess you could use any other characteristic that you could change willingly to sub for becoming a serial killer.
Jim Frost says
Hi Kirsten,
If I had a dollar very every time someone asked me that question, I’d have . . . a dollar! That’s a unique question!
But it’s not actually that ridiculous. Different populations have different characteristics. And those characteristics can affect various outcomes.
If we are to assume that serial killers are less likely to be killed by serial killers (I’m not sure that’s true but let’s make that assumption). There must be some aspect of being a serial killer that changes their probabilities of being a victim of a serial killer. It wouldn’t have to do with the sizes of the populations, because probabilities apply the same to small and large populations.
Instead, it would have something to do with the characteristics of the serial killer population. Perhaps a serial killer is less likely to be killed by one because they knew their habits, traits, methods, etc. They can recognize one when they see one. Hence, they’re less likely to be killed by one. It would be something like that.
There’s many other more realistic examples you can point to, which is why your question isn’t ridiculous. Consider the population of sedentary people with high BMIs and high cholesterol. They’re more likely to be killed by heart disease (that’s a different type of serial killer I suppose). However, if someone changed their characteristics from the less healthy population from that a more active, healthier weight and diet population, then they’re less likely to be die from heart disease.
Populations have traits that affect outcomes. If you can effectively change your population by some traits (e.g., health risk factors), you can change the probability of various outcomes. Of course, you can’t always change your population. But sometimes you can!
I hope that answers your question!
Woody says
Hi Jim
I am a bit confused about the representation of the probabilities of two independent events. If two events are independent, isn’t the intersection of their sample spaces supposed to be an empty set? Then the corresponding probability should always be 0 ? P(A|B) = P(A) is much understandable …
Jim Frost says
When we say that two events, A and B, are independent, it doesn’t imply that their intersection is empty. In fact, if the intersection were empty (meaning A and B cannot happen at the same time), they would be mutually exclusive, not independent.
Independence between two events means that the occurrence of one event does not affect the probability of occurrence of the other event. Mathematically, we express this as:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
The formula you mentioned:
P(A∣B)=P(A)
Is a way to express the independence of two events. It means that the probability of A occurring, given that B has occurred, is just the probability of A occurring by itself—further evidence that B’s occurrence has no effect on A.
Tuan Vu says
Hi,
I have problem while comparing the probability calculated by Venn diagram and by joint formula for a survey. In the survey, 100 students were asked about their favorite color and pet. Among them, 15 students like red color, 25 students like dogs and 5 students like both red and dogs. Using joint formula, I have the probability of picking a student who likes both red color and dogs: P(A and B) = P(A) x P(B) = 0.15 x 0.25 = 0.0375. Meanwhile, using Venn diagram, I have P(A and B) = 5/100 = 0.05. Please help me to explain why the results are different. Thank you very much.
Best regards,
Tuan
Tom Kendall says
Hi Jim,
Not sure that this is the proper venue for this question. On December 8, 2023 two jackpot winning tickets in the Mega Millions lottery game were sold at the same retail location, specifically a Chevron gas station in Los Angeles. Would you discuss the statistics involved in such an occurrence? As background, the odds of one ticket winning is one in 302,575,350. While having two winning tickets in a draw is a rather remote possibility, having them both being sold at the same retail location is beyond incredible.
Harmeet says
Hi, when did you write this article, I need to reference in my assignment, could you please help me with the year?
Jim Frost says
Hi!
When citing online resources, you typically use an “Accessed” date rather than a publication date because online content can change over time. For more information, read Purdue University’s Citing Electronic Resources.
Dr Peter Altman says
Hi – I have a question related to the Birthday Paradox. In a room of 200 people what are the chances that 2 people will share the same specific birthday and year. for example 10 October 1941? Many thanks.
Jim Frost says
Hello, that’s a rather complicated question. I can tell you that the probability of having at least two people having a given birthday of month and year (e.g., October 10) in a group of 200 is 0.42. We’ll use that value later.
However, to add in the year is difficult. I’d need to know the distribution of birth years in the overall population. And you’d have to assume that the people in the room were randomly selected. However, a group of 200 will often have some reason for the years being non-random in birth years, such as reunions, anniversary parties, and classes. In those cases, you’d expect truncated distributions centered around different values.
I would think using the birth year distribution for the general population, 1941 would be relatively rare to begin with. So having two people from that year is more rare but then also share the Month and Day would be even rarer. However, if it was during say a class reunion, then most people might be born that year for a certain reunion!
Let’s go with the general population distribution for the US. I did some quick research and back of the envelope calculations. Apparently, there were about 2.7 million people born in the US in 1941. About half are alive today. So, given the current population of the US is 332 million, 1.35m / 332m = 0.004. Hence, approximately 0.4% of the current population were born in 1941. So, to calculate the probability we just multiply 0.004 * 0.004 * 0.42 = 0.00000672. That’s your probability assuming you drew randomly from the US population. That’s miniscule!
However, if you’re not working with a situation where you’d expect the general population distribution or you’re outside the US, the results can be markedly different. For instance, f you’re working with a situation where you’d expect more older people (e.g., a class reunion or wedding anniversary for an older couple), then the probability might be notably higher. Also some states have older populations. Maine and Florida have the highest percentages of those over 65. If you’re in one of those states, the probability will be higher.
So, I can’t give you a precise answer but you have an idea of the factors at play!
Jennie says
Why would you multiply 0.004 twice?
Jim Frost says
Hi Jennie,
That’s because 0.004 is the probability that one person drawn randomly from the U.S. population was born in 1941. The original question askes about two people being born that year, so we need to multiply the two values.