What is the Poisson Distribution?
The Poisson distribution is a discrete probability distribution that describes probabilities for counts of events that occur in a specified observation space. It is named after Siméon Denis Poisson.
In statistics, count data represent the number of events or characteristics over a given length of time, area, volume, etc. For example, you can count the number of cigarettes smoked per day, meteors seen per hour, the number of defects in a batch, and the occurrence of a particular crime by county.
Ladislaus Bortkiewicz, a Russian economist, used this probability distribution to analyze the annual count of Prussian army officer deaths caused by horse kicks from 1875-1894.
Count data have discrete values comprised of non-negative integers (0, 1, 2, 3, etc.), and their distributions are frequently skewed. These characteristics make using statistical analyses designed for continuous data (e.g., t-tests, least squares regression) potentially problematic.
The distribution below reflects a study area that averages 2.24 counts during the observation period. You can see the unimodal distribution itself consists of discrete counts and is right-skewed.
If only we had a special probability distribution designed for this type of data . . . cue the Poisson distribution! This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables.
The Poisson distribution is defined by a single parameter, lambda (λ), which is the mean number of occurrences during an observation unit. A rate of occurrence is simply the mean count per standard observation period. For example, a call center might receive an average of 32 calls per hour.
To estimate lambda, simply calculate the sample’s mean rate of occurrence. Lambda is also a parameter for the exponential and gamma distributions. These three distributions all model different aspects of a Poisson process. Read my posts about the exponential distribution and gamma distribution to learn about their relationship with the Poisson distribution.
Lambda is also the expected value of the Poisson distribution. Learn more about Expected Values: Definition, Formula & Finding.
Related post: Understanding Probability Distributions
Using the Poisson Distribution in Statistical Analyses
Analysts frequently use this probability distribution for quality control, survival analysis, and insurance analysis.
The Poisson distribution can help you estimate probabilities for counts of occurrences. For example, it can calculate the likelihood of horse kicks killing three or more Prussian officers in a year.
Hypothesis tests that use the Poisson distribution assess the rate of occurrence. For example, Poisson Rate Tests can determine whether the difference between the count of customer complaints per day at two stores is statistically significant.
Poisson regression models determine how changes in the independent variables correspond to changes in the counts of events that the dependent variable measures. For example, these models can evaluate how multiple independent variables predict the count of gold medals that countries win in the Olympics.
Normal Approximation of the Poisson Distribution
The normal distribution can adequately approximate the Poisson distribution when the mean (λ) is ~20 or more. The normal approximation uses the lambda and the square root of lambda for its mean and standard deviation, respectively. In general, as lambda increases, the distribution becomes less skewed and increasingly approximates the normal distribution, as shown below.
The probability plot below shows a normal distribution that closely follows a Poisson distribution with a lambda of 25.
Related post: Normal Distribution
Requirements for the Poisson Distribution
A variable follows a Poisson distribution when the following conditions are true:
- Data are counts of events.
- All events are independent.
- The average rate of occurrence does not change during the period of interest.
The last two points relate to an assumption that statisticians refer to as Independent and Identically Distributed (IID) Data.
Comparing the Poisson and Binomial Distributions
The Poisson and binomial distributions are similar because they both model the occurrence of events. However, the Poisson distribution places no upper bound on the count per observation unit. For example, while the number of meteors observed per hour might fall within a typical range, the Poisson distribution does not impose an upper limit.
Conversely, the binomial distribution calculates the probability of an event occurring a particular number of times in a set number of trials. Specifically, it calculates the likelihood of X events happening within N trials. For the binomial distribution, the number of events (X) cannot be greater than the number of trials. For example, it can calculate the probability of getting seven heads during ten coin tosses. Obviously, the number of heads cannot exceed the number of coin tosses.
Related post: Binomial and other Distributions for Binary Data