Chi-Square Goodness of Fit Test: Uses & Examples

What is the Chi Square Goodness of Fit Test?

The chi-square goodness of fit test evaluates whether proportions of categorical or discrete outcomes in a sample follow a population distribution with hypothesized proportions. In other words, when you draw a random sample, do the observed proportions follow the values that theory suggests.

Analysts frequently use the chi-square goodness of fit test to determine whether the proportions of categorical outcomes are all equal. Or the analyst can specify the set of proportions to include in the test. Alternatively, this test can evaluate whether observed outcomes follow a discrete probability distribution, such as the Poisson distribution.

When to Use the Chi Squared Test for Goodness of Fit?

As a hypothesis test, the chi-square goodness of fit test allows you to use your sample to draw conclusions about an entire population. For example, use this test to answer the following questions. Was your sample drawn from a population where the proportions of:

Red, green, blue, and yellow candies are equal?
Cards from a bottomless deck in an online poker game follow the expectations for a fair game?
Local car colors follow the global distribution?
Monthly car accident counts at an intersection follow the Poisson distribution?
Do the leading digits of numbers in a dataset follow Benford’s law?

Suppose we theorize that a candy’s manufacturing process produces equal numbers of red, green, blue, and yellow candies. If this suspicion is correct, each color comprises 25% of the population. However, if we were to randomly sample the candy, our sample won’t match the population proportions exactly, thanks to random sampling error. We might find 35% red candies, 15% green, 22% blue, and 28% yellow.

Is this difference from our expectations large enough to disprove our hypothesis? Or can we chalk up the difference to random sampling error? The chi-square goodness of fit test can help us out!

Chi-Square Goodness of Fit Test Details

The chi-square goodness of fit test takes counts of observed and expected outcomes and evaluates the differences between them. The process converts the count for each outcome into a proportion of all outcomes.

When the differences between the observed and expected counts are sufficiently large, the test results are statistically significant. You did not draw the sample from a population with the hypothesized proportions.

The null and alternative hypotheses for the chi-square goodness of fit test are the following:

Null: The sample data follow the hypothesized distribution.
Alternative: The sample data do not follow the hypothesized distribution.

When the p-value for the chi-square goodness of fit test is less than your significance level, reject the null hypothesis. Your data favor the hypothesis that the sample does not follow the hypothesized distribution.

Let’s work through two examples using the chi square goodness of fit test! In one example, we’ll specify the test proportions, and in the other, we’ll see whether our data follow the Poisson distribution. The data for both examples are available in this CSV file: DiscreteGOF.

Learn more about how Chi Square Tests Work.

Chi Squared Test for Goodness of Fit Example

PPG Industries researched global new car colors in 2012. In this example, we want to determine whether a random sample of local car colors follows the global distribution. To perform this chi-square goodness of fit test, we need to know the global proportions and our regional sample proportions.

In this form of the test, the global proportions are the expected values, while the local sample proportions are the observed values.

The table below contains the data.

The OurState column contains the count of car colors we observed. The global proportions are from PPG Industries.

The Chi-square goodness of fit test determines whether our local distribution differs from the global distribution.

This table displays a frequency distribution and a relative frequency distribution. For more information, read my post, Relative Frequencies and Their Distributions.

Interpreting the Test Results

The chi-square goodness of fit test assesses the differences between the observed and expected proportions. Because the p-value is less than the significance level, we reject the null hypothesis and conclude that these differences are statistically significant. We conclude that we did not draw our local random sample from a population that follows the global proportions.

The Contribution to Chi-square column indicates that the largest differences occur with gray and red cars. By comparing the observed to expected counts, we can see that our sample has a higher proportion of grey cars and a lower proportion of red cars than the global test proportions.

Chi-Square Goodness of Fit Test for the Poisson Distribution

The chi-square goodness of fit test can evaluate a sample and see if it follows the Poisson distribution.

The Poisson distribution is a discrete probability distribution that can model counts of events or attributes in a fixed observation space. Many but not all count processes follow this distribution. Consequently, analysts often need to verify whether a set of counts follows the Poisson distribution.

The Poisson distribution is discrete because its values must be integers. Because it uses discrete counts, we can use the chi-square goodness of fit test to evaluate whether data follow the Poisson distribution.

For the Poisson version of this test, the null and alternative hypotheses are the following:

Null: The sample data follow the Poisson distribution.
Alternative: The sample data do not follow the Poisson distribution.

The test uses the same process as the previous example. However, instead of the analyst specifying the expected counts and proportions, the procedure use values that the Poisson distribution expects. Typically, your software calculates them for you.

Let’s work through an example where a safety inspector monitors car accidents at a bustling intersection. The inspector enters the counts of monthly accidents as shown below.

Each cell signifies the count of accidents for a month. The full dataset covers 50 months.

Now, let’s perform the test!

Related post: Using the Poisson Distribution

Interpreting the Poisson Test Results

The statistical output with its observed and expected counts looks similar to the previous example. In this example, the software calculates the expected counts using the Poisson distribution.

Because the p-value is greater than our significance level of 0.05, we fail to reject the null hypothesis. For distribution tests, failing to reject the null suggests that the data follow the specified distribution. We can conclude that our count data follow the Poisson distribution.

Various analyses assume the data follow the Poisson distribution, including Poisson rate analyses and the U chart. Our data are suitable for these analyses.

Finally, the examples in this post involve comparing the p-value to the significance level. As an alternative to using the p-value, you can compare the chi-square value, which is the test statistic, to the critical value in a chi-square table to determine statistical significance. The two methodologies always agree and allow you to draw the same conclusions.

Jim Frost says

March 6, 2024 at 5:00 pm

Hi Sonia,

There are several issues here that will determine the appropriate analysis. I have questions about the nature of your database and the “gold standard” data. Is your database a sample of a larger population or is it its own subpopulation? Perhaps a hospital database that provides full coverage of the population of hospital patients?

And, the gold standard data is probably some governmental data that attempts to use a representative sample to infer the properties of an entire population (perhaps a country).

If your DB and the gold standard are both samples from a population, then you can use a proportions Z or t test or chi-square of goodness of fit test to determine whether the proportions or frequencies differ between the two populations. These are inferential statistical tests designed to work with samples drawn from different populations and determine whether the proportions or frequencies are different. For example, are the proportions of hip replacements in your DB sample and the gold standard sample different?

On the other hand, if your DB represents an entire population (e.g., patients from a specific hospital), you wouldn’t use an inferential test because you have a population value for your DB’s population. If the gold standard data is a representative sample, there should be a confidence interval for the proportion of the population with hip replacements. You’d just check and see if your DB proportion falls within that range.

That’s probably more info that you were hoping for, but the details matter. You need to determine whether your comparing two samples or a population with a sample.

There are other complications. You have 20 years of data in your DB. You’ll need to consider whether the hip replacement rate might have changed during that time and what time frame the gold standard covers.

So, lots going on there to consider!

Comments

Sonia says

March 6, 2024 at 11:22 am

I’m so confused about what kind of test I can use for my data. Basically, we have a database of people and need to compare the results obtained from our data base to published results. This isn’t the real scenario, but let’s say I have a database of people who had hip replacements over the past 20 years with data about their demographic information, etc. I would need to determine if aggregate counts from the database match, or are “close enough”, to the data from, say, a government data source to conclude the database is comprehensive. Say I made a table of the number of hip replacements per year and need compare this to a table of hip replacements per year generated from another “gold standard” data source that I don’t have access to. How can I determine if my results are “close enough” to that gold standard?

Loading...

- Jim Frost says
  
  March 6, 2024 at 5:00 pm
  
  Hi Sonia,
  
  There are several issues here that will determine the appropriate analysis. I have questions about the nature of your database and the “gold standard” data. Is your database a sample of a larger population or is it its own subpopulation? Perhaps a hospital database that provides full coverage of the population of hospital patients?
  
  And, the gold standard data is probably some governmental data that attempts to use a representative sample to infer the properties of an entire population (perhaps a country).
  
  If your DB and the gold standard are both samples from a population, then you can use a proportions Z or t test or chi-square of goodness of fit test to determine whether the proportions or frequencies differ between the two populations. These are inferential statistical tests designed to work with samples drawn from different populations and determine whether the proportions or frequencies are different. For example, are the proportions of hip replacements in your DB sample and the gold standard sample different?
  
  On the other hand, if your DB represents an entire population (e.g., patients from a specific hospital), you wouldn’t use an inferential test because you have a population value for your DB’s population. If the gold standard data is a representative sample, there should be a confidence interval for the proportion of the population with hip replacements. You’d just check and see if your DB proportion falls within that range.
  
  That’s probably more info that you were hoping for, but the details matter. You need to determine whether your comparing two samples or a population with a sample.
  
  There are other complications. You have 20 years of data in your DB. You’ll need to consider whether the hip replacement rate might have changed during that time and what time frame the gold standard covers.
  
  So, lots going on there to consider!
  
  Loading...
  
Schmoo says

July 1, 2022 at 6:42 am

Hello,

This post is very helpful and very nicely written, thank you!

I was wondering what test to use to identify the distribution of other types of categorical data that are not integers and might be following a different distribution.

Thank you,
Schmoo

Loading...

BK says

June 25, 2022 at 10:15 pm

Thanks for your post. I would like to ask, with regards to the accidents example, why is the df 3 instead of 4?
Since there are 5 categories – 0,1,2,3, > 4.. df = 5-1 = 4. Also could you advise what N* is?
Thank you

Loading...

Nik Sahni says

May 4, 2022 at 11:09 pm

Hi Jim – I am running into trouble with zeros in a chi-squared test. My data has 4 the usage of four sites for a type of surgery, and I have a 100% distribution today all in site 1. We then have an distribution based on expectations as certain alternative sites can be used as technology evolves. How do I statistically test if distribution 2 is different than distribution 1? The expectations represent an average of 100 different people’s perspectives, each of which was a row of data (so the 75% for example in Site 1 is the mean, and has a stdev as well).

Today Expected
Site 1 100% 75%
Site 2 0% 10%
Site 3 0% 5%
Site 4 0% 10%

Loading...

kanchan Singh says

April 11, 2022 at 9:47 am

Dr. Jim,
Your post has been very valuable for me. I have enjoyed reading it. Written in simple English, I could learn it better. Moreover, your examples are befitting the situation. I have all appreciation for you.
Please continue sending your posts to me in future as well.

Loading...

Chi-Square Goodness of Fit Test: Uses & Examples

What is the Chi Square Goodness of Fit Test?

When to Use the Chi Squared Test for Goodness of Fit?

Chi-Square Goodness of Fit Test Details