Use Control Charts with Hypothesis Tests

By Jim Frost 17 Comments

Typically, quality improvement analysts use control charts to assess business processes and don’t have hypothesis tests in mind. Do you know how control charts provide tremendous benefits in other settings and with hypothesis testing? Spoilers—control charts check an assumption that we often forget about for hypothesis tests!

Before we get to using control charts with hypothesis tests, bear with me while I quickly explain their standard usage in statistical process control (SPC) and quality improvement initiatives.

Control charts plot process data and help you identify common cause and special cause variation. These graphs can determine whether the process is stable and if variability is a problem. If variability is problematic, control charts can determine whether the variability is intrinsic to the process or related to specific sources. By identifying the different sources of variation, you can keep your process stable without over-correction. Control charts guide your remedial actions.

When control charts determine that a process is stable, you can perform additional analyses to draw conclusions about the process. However, an unstable process is unpredictable, and you can’t draw reliable conclusions about its behavior. Any conclusions that you draw today might not be correct tomorrow.

Control charts are linked to business processes, but I’ll make the case that these plots provide tremendous benefits for processes and hypothesis testing that fall outside the realm of quality improvement. I’ll show a real example where control charts gave me a clear answer that would’ve been hard to find otherwise.

Related post: Control Charts: Uses, Example, and Types

Control Charts can Assess Non-Business Processes

The trick to seeing how control charts work in a wide variety of settings is to enlarge your notion of processes to include non-business processes. After all, instability and variability are problems in many other environments. For instance:

Teaching is the process of transferring knowledge that is measured by testing.
People with diabetes have a process for maintaining blood sugar at a stable level.
I had a process for causing research participants to experience impacts of 6 times their body weight.

These processes can be unstable or stable, have some natural variability, and might have special causes of variability. Assessing these issues can help you improve the processes. Just like business processes, if your data aren’t stable, conclusions that you draw using hypothesis tests are unreliable.

Let’s start by showing how control charts can provide crucial information in a non-business process.

The third bullet point above recounts a study that I was a part of. Our study had middle school participants jumping off 24-inch steps 30 times on alternating school days. The research goal was to determine whether these impacts would cause their bone density to increase. We defined the treatment as impacts of six times their body weight. Unfortunately, not all subjects experienced impacts of this magnitude initially.

While the mean of the impacts was above six times the body weight, and a hypothesis test confirmed this, we knew this was not good enough. All subjects should achieve the target impact force.

Using a Control Chart in My Research Study

To devise a solution, I conducted a pilot study and plotted the data from this process on an Xbar-S control chart.

To interpret this chart, you start by looking at the S chart on the bottom, which displays the variability of each subject’s landing impacts. There are no points outside of the control limits. Consequently, this graph indicates that each subject has their own consistent landing style. This variability is in control.

The Xbar chart on the top shows that the overall mean (6.141) is greater than our target. Unfortunately, data points fall outside of the control limits, which indicate that this process is out of control. Different subjects have dramatically different average landing impacts.

Taken all together, the interpretation of the control chart indicates that some participants have large impacts consistently while others have lower impacts consistently. However, this variability is not intrinsic to the process (common cause variation) but assignable to differences between the participants (special cause variation).

This information guided the corrective measures that we implemented. Had the variability been inherent in the process, we probably would have built higher steps. However, because we could attribute the variability to the subjects, we decided to teach the subjects how to land and have a nurse watch all jumping sessions to provide feedback on the spot. This combination lessened the variability enough so that all impacts were greater than six body weights.

Success! Even though this was not a business process, a control chart provided invaluable information.

This study occurred early in my scientific research career. To learn more about the experiences and challenges I faced in this study, read my post about using applied statistics to expand human knowledge.

Use Control Charts to Test Assumptions for Hypothesis Tests

Controls charts verify the assumption that a process is stable. We don’t usually think of applying this assumption to hypothesis tests. However, data for a hypothesis test must also be stable otherwise the conclusions aren’t reliable.

To illustrate this point, suppose we need to compare test scores between two groups. You can download the CSV data file: 2TControlCharts. To compare the means, we’ll perform a 2-sample t-test. The results are below.

Related post: How T-Tests Work

The statistical output shows that group A has a higher mean than group B. Furthermore, the p-value of 0.000 indicates that this difference is statistically significant. Group B’s standard deviation is slightly higher, but this test does not assume they are equal. If you perform normality tests on the samples, you’ll find that both groups are normally distributed. Although, our sample sizes are large enough that we don’t have to worry about this assumption. It all looks good, right?

The I-MR charts tells us another story!

The I-MR chart for group A indicates that these scores are in control. However, group B has many points that are out-of-control. Group B is unstable, and you can see the negative trend. It is not valid to draw conclusions from the unstable group even though the data satisfies all of the other assumptions. The difference between the two groups is not constant and depends on when you take your measurements.

The I-MR chart for group B displays just one of many types of problems that control charts can detect. Control charts are a valuable addition to your toolbox because other methods can miss these problems.

Using the Different Types of Control Charts

This blog post highlights only the tip of the iceberg for the capabilities of control charts. There are different kinds of control charts you can use based on your data and whether you have subgroups in your data.

For the examples in this post, the Xbar-S and I-MR charts both assess the mean but look at different forms of variability. Additionally, the Xbar-S chart assesses data that are in subgroups while the I-MR chart does not.

There are other types of control charts for other kinds of data. For example, if you are assessing:

Proportions, consider using the P Chart before performing a 1 Proportion or 2 Proportion hypothesis test.
Count data, consider using the U chart before conducting a 1-Sample or 2-Sample Poisson Rate hypothesis test.

Learn more about control charts even if you’re not working in the field of quality improvement. They can be tremendously helpful when you’re analyzing data and performing hypothesis tests!

Comments

Gregory C. Alexander says

February 10, 2023 at 9:18 am

I love your practical approach and explanations. And it always frustrated me that too often control charts are taught in control phase of six sigma training. It’s one reason I bifurcated all the tools from the DMAIC general approach and methodology in my training material reference. This is because many of the tools (including control charts) are valuable in different phases. It avoids that “when did we learn that tool?” problem when searching for the tool in the PowerPoint slide dump printouts often used (which I also have abandoned in favor of actual reference material using the Information Mapping (R) methodology).

When I teach hypothesis testing, I emphasize that you always want to visually assess the four “S sounding words” of Shape, Centering, Spread, and Stability for continuous data. 3 of the 4 you can see with a histogram, but stability requires time series. So I introduce control charts in measure phase, and then reinforce it in control phase.

So I have been working hard for some years to counter the situation you mention: “Controls charts verify the assumption that a process is stable. We don’t usually think of applying this assumption to hypothesis tests.”

Kind regards

Loading...

Reply
Terry says

May 13, 2022 at 5:09 pm

Yes. You would use a p chart.

Loading...

Reply
Ilya says

August 12, 2020 at 10:02 am

Is it possible to use control charts for errors generation process evaluation? For example, a web server processes requests from clients, N requests per day. M requests per day fail, i.e. M is number of errors, N>>M. In other words, the server generates a number of errors (failed requests) per day. Can I apply control charts to numbers of failed requests to investigate the error generation process?

Loading...

Reply
Stan Aleeman says

August 6, 2020 at 3:51 pm

Hi Jim, The article is a good demonstration of the use of control charts to test IID. It is also a good demonstration of the need to plot data. A histogram of both A and B on a common axis clearly demonstrates the difference between the two groups. Interestingly, the Shapiro-Wilks test of both group for normality gives p=0.3845 for group B and p=0.082 for group A. This suggests the groups may be considered normal. But normality plots of both groups shows that group A is not normal. The lesson here is that p-values close to the hurdle rate should be carefully scrutinized. If you send me an email address, I will send you the normality plots.

Also, the power curve reports that with 80% power, a difference of +/- 3 units cannot be detected by these data.

Loading...

Reply
- Jim Frost says
  
  August 6, 2020 at 4:01 pm
  
  Hi Stan, I entirely agree with all your points. Check your email for an email from me. Thanks!
  
  Loading...
  
  Reply
Michael says

August 3, 2020 at 8:16 am

Great post, thanks for writing it up! When creating a control chart do we have to plot the observations in the order they were recorded?

Thinking about the sample T-test example in this post, couldn’t we make either population look in control or out of control depending on how we sort and plot the data?

Does the order have to have some intrinsic meaning for a control chart to apply?

Loading...

Reply
- Jim Frost says
  
  August 5, 2020 at 12:29 am
  
  Hi Michael,
  
  You do need to record some time related information about when the data are recorded or when the subject/item completed the process (e.g., manufactured). Basically, there is a need to record time data along with the observations. If it’s an individuals control chart, you’ll need to know the order they were measured/produced. If the chart uses subgroups, you’ll need to record the subgroup to which each observation belongs.
  
  In a nutshell, if you don’t have that time information for each item/subject, you can’t use control charts. That information is crucial for these charts.
  
  Thanks for asking the great question!
  
  Loading...
  
  Reply
Mary A Marion says

November 30, 2019 at 11:57 am

Can you expand on whether to run two tailed or one tailed hypothesis tests when using control charts?

Loading...

Reply
- Jim Frost says
  
  November 30, 2019 at 5:44 pm
  
  Hi Mary,
  
  Control charts don’t help you decided whether to use a one-tail or two-tail test. That’s based on theory and goals of the analysis. For more information, read my post about one- and two-tailed hypothesis tests. And, a follow up post about why you should only rarely use one-tailed tests.
  
  Loading...
  
  Reply
Toni Segovia says

May 26, 2019 at 12:07 pm

Hi Jim,

Thank you for sharing your knowledge.

I am doing a retrospective study of a process using historical data. These are individual measurements.

1- I assessed the normal distribution of the data.
2- Took the first 100 measurements and plotted them in an I-MR chart. Called it my baseline data. Calculated control limit values. Locked the control limits for next step.
3- I plotted the whole series of historical data (500 values = 100 (previously used in step 2) + 400 (following values)) in a longer I-MR chart.

Then I visually inspect the second I-MR chart and see two things:
During 100 extra values (until value 200), values remain within control limits. After that, values start surpassing on and off the upper control limit for the rest of the series. On the other hand, if a look at the general trend of the whole series, I seem to perceive an upward trend from the beginning to the end of the series. The type of regression model that fits best a Response(Y): data vs Predictor (X): time plot is a Quadratic one with an R-Sq = 34%.

MY QUESTION: In order to look for an assignable cause, how can I discriminate if the process (a) became unstable at a specific point in time (value 200) OR actually (b) was drifting upwards from earlier time until it crossed the upper control limit? Are there any statistical tools that could help me determine if I am faced with one situation or the other?

Thank you very much in advance and keep up your good work.

Best regards,
Toni

Loading...

Reply
R says

February 15, 2019 at 1:18 pm

Hi Jim,

I would like to know the difference between hypothesis testing and statistical process control results. For example, assume there are a number of out-of-control observations in a certain period of a control chart (e.g., annual revenues), if I use the classic hypothesis testing to test the difference between the mean of these out of control points (e.g., 2017 and 2018 revenues) with the mean of the other points in the same chart which are under control (e.g., 2005-2016 revenues), would I get a statistically significant results? And why?

In other words, are classic hypothesis testing and SPC both based on the same statistical theory? Thank you.

Loading...

Reply
- Jim Frost says
  
  February 15, 2019 at 5:08 pm
  
  Hi,
  
  You use control charts and hypothesis tests for different purposes. While there are some graphs, such as histograms and boxplots, that you can use to illustrate the results of hypothesis tests, that’s not the case with control charts.
  
  The purpose of control charts is to determine whether you obtained your sample from a stable population. If the population isn’t stable (e.g., its mean and/or variance are changing), then any hypothesis test you perform using those data are meaningless. For hypothesis test results to be generalizeable beyond the sample, you must be drawing your samples from stable populations. When the populations are unstable, the results only apply to the moment in time that you obtain the sample because the population is changing. In other words, if you obtained your sample at a different time, the results would be different due to the instability.
  
  So, hypothesis testing and control charts serve different purposes. You can think of control charts as testing one of the underlying assumptions (that’s often unstated) for hypothesis tests.
  
  It’s true that control charts do use some hypothesis test like procedures when it performs various tests on the data to identify out of control points. But, those tests allow the chart to serve its main function of assessing the stability of a population (i.e., its in control).
  
  Looking at your example, you would not use control charts to determine whether the mean revenue is different. However, you could use control charts to determine whether the populations are stable. If they are stable, you can then use a hypothesis test, such as a t-test, to determine whether the difference between means is statistically significant.
  
  Loading...
  
  Reply
VamC (@vamc789) says

January 24, 2019 at 6:31 pm

Thanks for the response Jim!

So, does the same apply to any control chart (variable/attribute) – comparison of stable processes only to tell the differences are significant?

I know usually for stable processes, variation would be little and will be difficult to tell if there is significant difference. So more sample size the better for picking a timeframe?

I did an ANOVA on 3 periods of different sample sizes and they are significantly different from each other as the variation is more and they are not stable processes.

Loading...

Reply
- Jim Frost says
  
  January 24, 2019 at 11:53 pm
  
  Hi,
  
  Yes, while it’s not often taught (it should be!), hypothesis tests assume that you are comparing stable samples–that the mean, proportion, variance, etc are not changing.
  
  A stable process doesn’t necessarily have a small variability. It just means that the measure of central tendency (which ever one) and the variability are not changing over the course of the time frame. You can have a sample that has a relatively large variability still be in control as long as it’s stable.
  
  When you do have low variability, you actually have relatively more statistical power. In other words, for a given sample size and effect size, when variability is low, you’ll have a greater chance of detecting an effect if it exists in the population (i.e., greater statistical power).
  
  If you’re performing ANOVA and the variability is notably different, use Welch’s ANOVA. With unequal sample sizes, a good rule of thumb is that if any group has twice the variability of another group, you’re already experiencing results that you can’t trust. When sample sizes are equal, the problems are just starting to occur when you get to that threshold, but you’re past it with unequal sample sizes. So, check the magnitude of the differences between group standard deviations. You can read more about this problem in my post about Welch’s ANOVA. In that post, the comments from December 20 refer to what I write about when problems start occurring.
  
  But, if those groups aren’t stable, it’s hard to trust the results to begin with!
  
  Loading...
  
  Reply
VamC (@vamc789) says

January 23, 2019 at 6:30 pm

Can we use this method other way around. Meaning from a control chart say p-chart select 2 different periods and conduct test for proportions to see if there is any significant difference? A out of control period to a control period? Thanks!

Loading...

Reply
- Jim Frost says
  
  January 24, 2019 at 9:45 am
  
  Hi, you can use a proportions test to compare the difference in proportion between two timeframes. However, you’d still need to be sure that the proportions are in control during each of those timeframes. The proportion can vary between timeframes but each time frame itself should be in control. If the proportion is out of control in one or both timeframes, then you’re not comparing two stable processes each with it’s own proportion. And, statistical significance, or lack thereof, does not indicate whether these timeframes are in control.
  
  So, really you should use both in conjunction with each other to get the full picture. If one or both timeframes are out of control, your data are not satisfying a basic requirement of the proportions test.
  
  I hope this helps!
  
  Loading...
  
  Reply
Char Paul says

September 25, 2017 at 2:57 am

That was very interesting. I can see the usefulness for continuous data; am going to test it using “means” of Likert scales.

Loading...

Reply