• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Poisson Distribution: Definition & Uses

By Jim Frost 11 Comments

What is the Poisson Distribution?

The Poisson distribution is a discrete probability distribution that describes probabilities for counts of events that occur in a specified observation space. It is named after Siméon Denis Poisson.

In statistics, count data represent the number of events or characteristics over a given length of time, area, volume, etc. For example, you can count the number of cigarettes smoked per day, meteors seen per hour, the number of defects in a batch, and the occurrence of a particular crime by county.

Ladislaus Bortkiewicz, a Russian economist, used this probability distribution to analyze the annual count of Prussian army officer deaths caused by horse kicks from 1875-1894.

Count data have discrete values comprised of non-negative integers (0, 1, 2, 3, etc.), and their distributions are frequently skewed. These characteristics make using statistical analyses designed for continuous data (e.g., t-tests, least squares regression) potentially problematic.

The distribution below reflects a study area that averages 2.24 counts during the observation period. You can see the distribution itself consists of discrete counts and is right-skewed.

Graph of a Poisson distribution example.

If only we had a special probability distribution designed for this type of data . . . cue the Poisson distribution! This distribution is an example of a Probability Mass Function (PMF) because it calculates likelihoods for discrete random variables.

The Poisson distribution is defined by a single parameter, lambda (λ), which is the mean number of occurrences during an observation unit. A rate of occurrence is simply the mean count per standard observation period. For example, a call center might receive an average of 32 calls per hour.

To estimate lambda, simply calculate the sample’s mean rate of occurrence. Lambda is also a parameter for the exponential and gamma distributions. These three distributions all model different aspects of a Poisson process. Read my posts about the exponential distribution and gamma distribution to learn about their relationship with the Poisson distribution.

Related post: Understanding Probability Distributions

Using the Poisson Distribution in Statistical Analyses

Analysts frequently use this probability distribution for quality control, survival analysis, and insurance analysis.

The Poisson distribution can help you estimate probabilities for counts of occurrences. For example, it can calculate the likelihood of horse kicks killing three or more Prussian officers in a year.

Hypothesis tests that use the Poisson distribution assess the rate of occurrence. For example, Poisson Rate Tests can determine whether the difference between the count of customer complaints per day at two stores is statistically significant.

Poisson regression models determine how changes in the independent variables correspond to changes in the counts of events that the dependent variable measures. For example, these models can evaluate how multiple independent variables predict the count of gold medals that countries win in the Olympics.

Normal Approximation of the Poisson Distribution

The normal distribution can adequately approximate the Poisson distribution when the mean (λ) is ~20 or more. The normal approximation uses the lambda and the square root of lambda for its mean and standard deviation, respectively. In general, as lambda increases, the distribution becomes less skewed and increasingly approximates the normal distribution, as shown below.

Graphs showing how the shape of the Poisson distribution changes based on the value of lambda.

The probability plot below shows a normal distribution that closely follows a Poisson distribution with a lambda of 25.

Graph that shows how the normal distribution approximates the Poisson distribution with a lambda of 25.

Related post: Normal Distribution

Requirements for the Poisson Distribution

A variable follows a Poisson distribution when the following conditions are true:

  • Data are counts of events.
  • All events are independent.
  • The average rate of occurrence does not change during the period of interest.

The last two points relate to an assumption that statisticians refer to as Independent and Identically Distributed (IID) Data.

Comparing the Poisson and Binomial Distributions

The Poisson and binomial distributions are similar because they both model the occurrence of events. However, the Poisson distribution places no upper bound on the count per observation unit. For example, while the number of meteors observed per hour might fall within a typical range, the Poisson distribution does not impose an upper limit.

Conversely, the binomial distribution calculates the probability of an event occurring a particular number of times in a set number of trials. Specifically, it calculates the likelihood of X events happening within N trials. For the binomial distribution, the number of events (X) cannot be greater than the number of trials. For example, it can calculate the probability of getting seven heads during ten coin tosses. Obviously, the number of heads cannot exceed the number of coin tosses.

Related post: Binomial and other Distributions for Binary Data

Share this:

  • Tweet

Related

Filed Under: Probability Tagged With: conceptual, distributions, graphs

Reader Interactions

Comments

  1. Vincent says

    May 24, 2022 at 6:12 am

    Hi Jim,

    I enjoy reading your posts. Thanks for sharing!

    To check if a distribution is Poisson distriubtion or not, I am confused with its requirement mentioned in this article. The 3rd requirement “The average rate of occurrence does not change during the period of interest” means the variance and mean are the same for the data at certain period?

    Thank you and have a nice day!

    Reply
    • Jim Frost says

      May 25, 2022 at 3:15 pm

      Hi Vincent,

      That just means that the average rate is stable in the population. After all, if the rate is changing in the population, it’s going to affect samples drawn from that population. You can see an example of this in my post about using control charts with hypothesis tests. Control charts test whether certain properties are stable. I don’t use a Poisson rate in the examples, but the same principles apply. Read that article and I think the idea will make much more sense!

      Reply
  2. danellrapozo says

    August 11, 2021 at 12:22 pm

    Hi Jim,

    Can the Poisson Distribution be used to model the amount of patients that arrive at a Hospita’s Emergency?

    Reply
    • Jim Frost says

      August 11, 2021 at 5:45 pm

      Hi Danell,

      It sounds like a reasonable distribution to try. To learn how to determine whether your data fits a Poisson distribution, read my post about Goodness-of-Fit Tests for Discrete Distributions.

      Reply
  3. Elias Greece says

    August 11, 2021 at 7:25 am

    I would like to add that Poisson is the limit of Binomial distribution for rare events, ie when the a priori probability of a single event is p 100 . In this case we use Poisson with λ = Νp

    Reply
  4. Brion Hurley says

    August 8, 2021 at 7:23 pm

    Last year, I was trying to model 3-pointers made in a game by a basketball player. I was trying to predict if they would break a record by the end of the season (he didn’t break it, as I predicted). When I tried to model the data In Minitab, it wouldn’t let me use the Poisson or Binomial distribution. I used the 3-parameter Weibull instead to get a pretty good distribution fit, but I think Poisson is the correct distribution. The only assumption violation is that there would be a limit on how many 3-pointers could be made in a game (given number of shots possible with a time clock).

    Have you run across that before in Minitab? I can randomly generate data from Binomial and Poisson, but cannot fit those distributions to real data using a histogram or their Individual Distribution Identification option.

    Reply
  5. Shawn says

    August 6, 2021 at 8:31 am

    Great job Jim! I always enjoy your posts.

    Reply
    • Jim Frost says

      August 6, 2021 at 11:41 pm

      Thanks so much, Shawn!

      Reply
  6. Gemechu Asfaw says

    August 6, 2021 at 5:48 am

    Thanks

    Reply
  7. Gemechu Asfaw says

    August 6, 2021 at 3:51 am

    why linear regression is problematic in the count data?.

    Reply
    • Jim Frost says

      August 6, 2021 at 4:03 am

      Hi Gemechu,

      Linear regression is designed for continuous data. Using it with count data might produce predictions for non-integers and negative values, which can’t exist with count data. Additionally, while linear regression does not require the DV to follow the normal distribution (only the residual need to be normal), it can be more challenging to obtain normal residuals when the DV is skewed. Poisson regression is designed to handle a DV that is count data.

      Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Z-table
    • How to do t-Tests in Excel
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Find the P value: Process and Calculations
    • Mean, Median, and Mode: Measures of Central Tendency
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • Understanding Interaction Effects in Statistics
    • One-Tailed and Two-Tailed Hypothesis Tests Explained

    Recent Posts

    • Probability Mass Function: Definition, Uses & Example
    • Using Scientific Notation
    • Selection Bias: Definition & Examples
    • ANCOVA: Uses, Assumptions & Example
    • Fibonacci Sequence: Formula & Uses
    • Undercoverage Bias: Definition & Examples

    Recent Comments

    • Morris on Validity in Research and Psychology: Types & Examples
    • Jim Frost on What are Robust Statistics?
    • Allan Fraser on What are Robust Statistics?
    • Steve on Survivorship Bias: Definition, Examples & Avoiding
    • Jim Frost on Using Post Hoc Tests with ANOVA

    Copyright © 2023 · Jim Frost · Privacy Policy