• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Chi-Square Goodness of Fit Test: Uses & Examples

By Jim Frost 4 Comments

The chi-square goodness of fit test evaluates whether proportions of categorical or discrete outcomes in a sample follow a population distribution with hypothesized proportions. In other words, when you draw a random sample, do the observed proportions follow the values that theory suggests.

Analysts frequently use the chi-square goodness of fit test to determine whether the proportions of categorical outcomes are all equal. Or the analyst can specify the set of proportions to include in the test. Alternatively, this test can evaluate whether observed outcomes follow a discrete probability distribution, such as the Poisson distribution.

As a hypothesis test, the chi-square goodness of fit test allows you to use your sample to draw conclusions about an entire population. For example, use this test to answer the following questions. Was your sample drawn from a population where the proportions of:

  • Red, green, blue, and yellow candies are equal?
  • Cards from a bottomless deck in an online poker game follow the expectations for a fair game?
  • Local car colors follow the global distribution?
  • Monthly car accident counts at an intersection follow the Poisson distribution?
  • Do the leading digits of numbers in a dataset follow Benford’s law?

Photograph of a heterogeneous set of jelly beans by color.

Suppose we theorize that a candy’s manufacturing process produces equal numbers of red, green, blue, and yellow candies. If this suspicion is correct, each color comprises 25% of the population. However, if we were to randomly sample the candy, our sample won’t match the population proportions exactly, thanks to random sampling error. We might find 35% red candies, 15% green, 22% blue, and 28% yellow.

Is this difference from our expectations large enough to disprove our hypothesis? Or can we chalk up the difference to random sampling error? The chi-square goodness of fit test can help us out!

Related post: Sampling Error: Definition, Sources & Minimizing

Chi-Square Goodness of Fit Test Details

The chi-square goodness of fit test takes counts of observed and expected outcomes and evaluates the differences between them. The process converts the count for each outcome into a proportion of all outcomes.

When the differences between the observed and expected counts are sufficiently large, the test results are statistically significant. You did not draw the sample from a population with the hypothesized proportions.

The null and alternative hypotheses for the chi-square goodness of fit test are the following:

  • Null: The sample data follow the hypothesized distribution.
  • Alternative: The sample data do not follow the hypothesized distribution.

When the p-value for the chi-square goodness of fit test is less than your significance level, reject the null hypothesis. Your data favor the hypothesis that the sample does not follow the hypothesized distribution.

Let’s work through two examples using the chi square goodness of fit test! In one example, we’ll specify the test proportions, and in the other, we’ll see whether our data follow the Poisson distribution. The data for both examples are available in this CSV file: DiscreteGOF.

Learn more about how Chi Square Tests Work.

Chi-Square Goodness of Fit Test Example

PPG Industries researched global new car colors in 2012. In this example, we want to determine whether a random sample of local car colors follows the global distribution. To perform this chi-square goodness of fit test, we need to know the global proportions and our regional sample proportions.

In this form of the test, the global proportions are the expected values, while the local sample proportions are the observed values.

The table below contains the data.

Worksheet that contains the data for car color discrete distribution.

The OurState column contains the count of car colors we observed. The global proportions are from PPG Industries.

The Chi-square goodness of fit test determines whether our local distribution differs from the global distribution.

This table displays a frequency distribution and a relative frequency distribution. For more information, read my post, Relative Frequencies and Their Distributions.

Interpreting the Test Results

Chi-squared goodness of fit test results for the discrete distribution of car colors.

The chi-square goodness of fit test assesses the differences between the observed and expected proportions. Because the p-value is less than the significance level, we reject the null hypothesis and conclude that these differences are statistically significant. We conclude that we did not draw our local random sample from a population that follows the global proportions.

The Contribution to Chi-square column indicates that the largest differences occur with gray and red cars. By comparing the observed to expected counts, we can see that our sample has a higher proportion of grey cars and a lower proportion of red cars than the global test proportions.

Chi-Square Goodness of Fit Test for the Poisson Distribution

The chi-square goodness of fit test can evaluate a sample and see if it follows the Poisson distribution.

The Poisson distribution is a discrete probability distribution that can model counts of events or attributes in a fixed observation space. Many but not all count processes follow this distribution. Consequently, analysts often need to verify whether a set of counts follows the Poisson distribution.

The Poisson distribution is discrete because its values must be integers. Because it uses discrete counts, we can use the chi-square goodness of fit test to evaluate whether data follow the Poisson distribution.

For the Poisson version of this test, the null and alternative hypotheses are the following:

  • Null: The sample data follow the Poisson distribution.
  • Alternative: The sample data do not follow the Poisson distribution.

The test uses the same process as the previous example. However, instead of the analyst specifying the expected counts and proportions, the procedure use values that the Poisson distribution expects. Typically, your software calculates them for you.

Let’s work through an example where a safety inspector monitors car accidents at a bustling intersection. The inspector enters the counts of monthly accidents as shown below.

Example worksheet that contains the number of accidents per month in each cell.

Each cell signifies the count of accidents for a month. The full dataset covers 50 months.

Now, let’s perform the test!

Related post: Using the Poisson Distribution

Interpreting the Poisson Test Results

The statistical results for the chi-square goodness of fit test for the Poisson distribution.

The statistical output with its observed and expected counts looks similar to the previous example. In this example, the software calculates the expected counts using the Poisson distribution.

Because the p-value is greater than our significance level of 0.05, we fail to reject the null hypothesis. For distribution tests, failing to reject the null suggests that the data follow the specified distribution. We can conclude that our count data follow the Poisson distribution.

Various analyses assume the data follow the Poisson distribution, including Poisson rate analyses and the U chart. Our data are suitable for these analyses.

Finally, the examples in this post involve comparing the p-value to the significance level. As an alternative to using the p-value, you can compare the chi-square value, which is the test statistic, to the critical value in a chi-square table to determine statistical significance. The two methodologies always agree and allow you to draw the same conclusions.

Share this:

  • Tweet

Related

Filed Under: Hypothesis Testing Tagged With: analysis example, conceptual, distributions, interpreting results

Reader Interactions

Comments

  1. Schmoo says

    July 1, 2022 at 6:42 am

    Hello,

    This post is very helpful and very nicely written, thank you!

    I was wondering what test to use to identify the distribution of other types of categorical data that are not integers and might be following a different distribution.

    Thank you,
    Schmoo

    Reply
  2. BK says

    June 25, 2022 at 10:15 pm

    Thanks for your post. I would like to ask, with regards to the accidents example, why is the df 3 instead of 4?
    Since there are 5 categories – 0,1,2,3, > 4.. df = 5-1 = 4. Also could you advise what N* is?
    Thank you

    Reply
  3. Nik Sahni says

    May 4, 2022 at 11:09 pm

    Hi Jim – I am running into trouble with zeros in a chi-squared test. My data has 4 the usage of four sites for a type of surgery, and I have a 100% distribution today all in site 1. We then have an distribution based on expectations as certain alternative sites can be used as technology evolves. How do I statistically test if distribution 2 is different than distribution 1? The expectations represent an average of 100 different people’s perspectives, each of which was a row of data (so the 75% for example in Site 1 is the mean, and has a stdev as well).

    Today Expected
    Site 1 100% 75%
    Site 2 0% 10%
    Site 3 0% 5%
    Site 4 0% 10%

    Reply
  4. kanchan Singh says

    April 11, 2022 at 9:47 am

    Dr. Jim,
    Your post has been very valuable for me. I have enjoyed reading it. Written in simple English, I could learn it better. Moreover, your examples are befitting the situation. I have all appreciation for you.
    Please continue sending your posts to me in future as well.

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Mean, Median, and Mode: Measures of Central Tendency
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • Choosing the Correct Type of Regression Analysis
    • How to Find the P value: Process and Calculations
    • Interpreting Correlation Coefficients
    • How to do t-Tests in Excel
    • Z-table

    Recent Posts

    • Fishers Exact Test: Using & Interpreting
    • Percent Change: Formula and Calculation Steps
    • X and Y Axis in Graphs
    • Simpsons Paradox Explained
    • Covariates: Definition & Uses
    • Weighted Average: Formula & Calculation Examples

    Recent Comments

    • Dave on Control Variables: Definition, Uses & Examples
    • Jim Frost on How High Does R-squared Need to Be?
    • Mark Solomons on How High Does R-squared Need to Be?
    • John Grenci on Normal Distribution in Statistics
    • Jim Frost on Normal Distribution in Statistics

    Copyright © 2023 · Jim Frost · Privacy Policy