What is the Poisson Distribution?
The Poisson distribution is a discrete probability distribution that describes probabilities for counts of events that occur in a specified observation space. It is named after Siméon Denis Poisson.
In statistics, count data represent the number of events or characteristics over a given length of time, area, volume, etc. For example, you can count the number of cigarettes smoked per day, meteors seen per hour, the number of defects in a batch, and the occurrence of a particular crime by county.
Ladislaus Bortkiewicz, a Russian economist, used this probability distribution to analyze the annual count of Prussian army officer deaths caused by horse kicks from 1875-1894.
Count data have discrete values comprised of non-negative integers (0, 1, 2, 3, etc.), and their distributions are frequently skewed. These characteristics make using statistical analyses designed for continuous data (e.g., t-tests, least squares regression) potentially problematic.
The distribution below reflects a study area that averages 2.24 counts during the observation period. You can see the distribution itself consists of discrete counts and is right-skewed.
If only we had a special probability distribution designed for this type of data . . . cue the Poisson distribution! This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables.
The Poisson distribution is defined by a single parameter, lambda (λ), which is the mean number of occurrences during an observation unit. A rate of occurrence is simply the mean count per standard observation period. For example, a call center might receive an average of 32 calls per hour.
To estimate lambda, simply calculate the sample’s mean rate of occurrence. Lambda is also a parameter for the exponential and gamma distributions. These three distributions all model different aspects of a Poisson process. Read my posts about the exponential distribution and gamma distribution to learn about their relationship with the Poisson distribution.
Related post: Understanding Probability Distributions
Using the Poisson Distribution in Statistical Analyses
Analysts frequently use this probability distribution for quality control, survival analysis, and insurance analysis.
The Poisson distribution can help you estimate probabilities for counts of occurrences. For example, it can calculate the likelihood of horse kicks killing three or more Prussian officers in a year.
Hypothesis tests that use the Poisson distribution assess the rate of occurrence. For example, Poisson Rate Tests can determine whether the difference between the count of customer complaints per day at two stores is statistically significant.
Poisson regression models determine how changes in the independent variables correspond to changes in the counts of events that the dependent variable measures. For example, these models can evaluate how multiple independent variables predict the count of gold medals that countries win in the Olympics.
Normal Approximation of the Poisson Distribution
The normal distribution can adequately approximate the Poisson distribution when the mean (λ) is ~20 or more. The normal approximation uses the lambda and the square root of lambda for its mean and standard deviation, respectively. In general, as lambda increases, the distribution becomes less skewed and increasingly approximates the normal distribution, as shown below.
The probability plot below shows a normal distribution that closely follows a Poisson distribution with a lambda of 25.
Related post: Normal Distribution
Requirements for the Poisson Distribution
A variable follows a Poisson distribution when the following conditions are true:
- Data are counts of events.
- All events are independent.
- The average rate of occurrence does not change during the period of interest.
The last two points relate to an assumption that statisticians refer to as Independent and Identically Distributed (IID) Data.
Comparing the Poisson and Binomial Distributions
The Poisson and binomial distributions are similar because they both model the occurrence of events. However, the Poisson distribution places no upper bound on the count per observation unit. For example, while the number of meteors observed per hour might fall within a typical range, the Poisson distribution does not impose an upper limit.
Conversely, the binomial distribution calculates the probability of an event occurring a particular number of times in a set number of trials. Specifically, it calculates the likelihood of X events happening within N trials. For the binomial distribution, the number of events (X) cannot be greater than the number of trials. For example, it can calculate the probability of getting seven heads during ten coin tosses. Obviously, the number of heads cannot exceed the number of coin tosses.
Related post: Binomial and other Distributions for Binary Data
I enjoy reading your posts. Thanks for sharing!
To check if a distribution is Poisson distriubtion or not, I am confused with its requirement mentioned in this article. The 3rd requirement “The average rate of occurrence does not change during the period of interest” means the variance and mean are the same for the data at certain period?
Thank you and have a nice day!
Jim Frost says
That just means that the average rate is stable in the population. After all, if the rate is changing in the population, it’s going to affect samples drawn from that population. You can see an example of this in my post about using control charts with hypothesis tests. Control charts test whether certain properties are stable. I don’t use a Poisson rate in the examples, but the same principles apply. Read that article and I think the idea will make much more sense!
Can the Poisson Distribution be used to model the amount of patients that arrive at a Hospita’s Emergency?
Jim Frost says
It sounds like a reasonable distribution to try. To learn how to determine whether your data fits a Poisson distribution, read my post about Goodness-of-Fit Tests for Discrete Distributions.
Elias Greece says
I would like to add that Poisson is the limit of Binomial distribution for rare events, ie when the a priori probability of a single event is p 100 . In this case we use Poisson with λ = Νp
Brion Hurley says
Last year, I was trying to model 3-pointers made in a game by a basketball player. I was trying to predict if they would break a record by the end of the season (he didn’t break it, as I predicted). When I tried to model the data In Minitab, it wouldn’t let me use the Poisson or Binomial distribution. I used the 3-parameter Weibull instead to get a pretty good distribution fit, but I think Poisson is the correct distribution. The only assumption violation is that there would be a limit on how many 3-pointers could be made in a game (given number of shots possible with a time clock).
Have you run across that before in Minitab? I can randomly generate data from Binomial and Poisson, but cannot fit those distributions to real data using a histogram or their Individual Distribution Identification option.
Great job Jim! I always enjoy your posts.
Jim Frost says
Thanks so much, Shawn!
Gemechu Asfaw says
Gemechu Asfaw says
why linear regression is problematic in the count data?.
Jim Frost says
Linear regression is designed for continuous data. Using it with count data might produce predictions for non-integers and negative values, which can’t exist with count data. Additionally, while linear regression does not require the DV to follow the normal distribution (only the residual need to be normal), it can be more challenging to obtain normal residuals when the DV is skewed. Poisson regression is designed to handle a DV that is count data.