When Can I Use One-Tailed Hypothesis Tests?

By Jim Frost 16 Comments

One-tailed hypothesis tests offer the promise of more statistical power compared to an equivalent two-tailed design. While there is some debate about when you can use a one-tailed test, the general consensus among statisticians is that you should use two-tailed tests unless you have concrete reasons for using a one-tailed test.

In this post, I discuss when you should and should not use one-tailed tests. I’ll cover the different schools of thought and offer my own opinion.

If you need to learn the basics about these two types of test, please read my previous post: One-Tailed and Two-Tailed Hypothesis Tests Explained.

Two-Tailed Tests are the Default Choice

The vast majority of hypothesis tests that analysts perform are two-tailed because they can detect effects in both directions. This fact is generally the clincher. In most studies, you are interested in determining whether there is a positive effect or a negative effect. In other words, results in either direction provide essential information. If this statement describes your study, you must use a two-tailed test. There’s no need to read any further. Typically, you need a strong reason to move away from using two-tailed tests.

On the other hand, there are some cases where one-tailed tests are not only a valid option, but truly are a requirement.

Consequently, there is a spectrum that ranges from cases where one-tailed tests are definitely not appropriate to cases where they are required. In the middle of this spectrum, there are cases where analysts might disagree. The breadth of opinions extends from those who believe you should use one-tailed tests for only a few specific situations when they are required to those who are more lenient about their usage.

A Concrete Rule about Choosing Between One- and Two-Tailed Tests

Despite this disagreement, there is a hard and fast rule about the decision process itself upon which all statisticians agree. You must decide whether you will use a one-tailed or two-tailed test at the beginning of your study before you look at your data. You must not perform a two-tailed analysis, obtain non-significant results, and then try a one-tailed test to see if that is statistically significant. If you plan to use a one-tailed test, make this decision at the beginning of the study and explain why it is the proper choice.

The approach I take is to assume you’ll use a two-tailed test and then move away from that only after carefully determining that a one-tailed test is appropriate for your study. The following are potential reasons for why you might use a one-tailed hypothesis test.

One-Tailed Tests Can Be the Only Option

For some hypothesis tests, the mechanics of how a test functions dictate using a one-tailed methodology. Chi-squared tests and F-tests and are often one-tailed for this reason.

Chi-squared tests

Analysts often use chi-squared tests to determine whether data fit a theoretical distribution and whether categorical variables are independent. For these tests, when the chi-squared value exceeds the critical threshold, you have sufficient evidence to conclude that the data do not follow the distribution or that the categorical variables are dependent. The chi-squared value either reaches this threshold or it does not. For all values below the threshold, you fail to reject the null hypothesis. There is no other interpretation for very low chi-squared values. Hence, these tests are one-tailed by their nature.

F-tests

F-tests are highly flexible tests that analysts use in a wide variety of scenarios. Some of these scenarios exclude the possibility of a two-tailed test. For instance, F-tests in ANOVA and the overall test of significance for linear models are similar to the chi-squared example. The F ratio can increase to the significance threshold or it does not. In one-way ANOVA, if the F-value surpasses the threshold, you can conclude that not all group means are equal. On the other hand, all F-values below the threshold yield the same interpretation—the sample provides insufficient evidence to conclude that the group means are unequal. No other effect or interpretation exists for very low F-values.

When a one-tailed version of the test is the only meaningful possibility, statistical software won’t ask you to make a choose. That’s why you’ll never need to choose between a one or two-tailed ANOVA F-test or chi-square tests.

In some cases, the nature of the test itself requires using a one-sided methodology, and it does not depend on the study area.

Effects can Occur in Only One Direction

On the other hand, other hypothesis tests can legitimately have one and two-tailed versions, and you need to choose between them based on the study area. Tests that fall in this category include t-tests, proportion tests, Poisson rate tests, variance tests, and some nonparametric tests for the median. In these cases, base the decision on subject-area knowledge about the possible effects.

For some study areas, the effect can exist in only one direction. It simply can’t exist in the other direction. To make this determination, you need to use your subject-area knowledge and understanding of physical limitations. In this case, if there were a difference in the untested direction, you would attribute it to random error regardless of how large it is. In other words, only chance can produce an observed effect in the other direction. If you have even the smallest notion that an observed effect in the other direction could be a real effect rather than random error, use a two-tailed test.

For example, imagine we are comparing an herbicide’s ability to kill weeds to no treatment. We randomly apply the herbicide to some patches of grass and no herbicide to other patches. It is inconceivable that the herbicide can promote weed growth. In the worst-case scenario, it is entirely ineffective, and the herbicide patches should be equivalent to the control group. If the herbicide patches ultimately have more weeds than the control group, we’ll chalk that up to random error regardless of the difference—even if it’s substantial. In this case, we are wholly justified using a one-tailed test to determine whether the herbicide is better than no treatment.

No Controversy So Far!

So far, the preceding two reasons fall entirely on safe ground. Using one-tailed tests because of its mechanics or because an effect can occur in only one direction should be acceptable to all statisticians. In fact, some statisticians believe that these are the only valid reasons for using one-tailed hypothesis tests. I happen to fall within this school of thought myself.

In the next section, I’ll discuss a scenario where some analysts believe you can choose between one and two-tailed tests, but others disagree with that notion.

You Only Need to Know About Effects in One Direction

In this scenario, effects can exist in both directions, but you only care about detecting an effect in one direction. Analysts use the one-tailed approach in this situation to boost the statistical power of the hypothesis test.

To even consider using a one-tailed test for this reason, you must be entirely sure there is no need to detect an effect in the other direction. While you gain more statistical power in one direction, the test has absolutely no power in the other direction.

Suppose you are testing a new vaccine and want to determine whether it’s better than the current vaccine. You use a one-tailed test to improve the test’s ability to learn whether the new vaccine is better. However, that’s unethical because the test cannot determine whether it is less effective. You risk missing valuable information by testing in only one direction.

However, there might be occasions where you, or science, genuinely don’t need to detect an effect in the untested direction. For example, suppose you are considering a new part that is cheaper than the current part. Your primary motivation for switching is the price reduction. The new part doesn’t have to be better than the current part, but it cannot be worse. In this case, it might be appropriate to perform a one-tailed test that determines whether the new part is worse than the old part. You won’t know if it is better, but you don’t need to know that.

As I mentioned, many statisticians don’t think you should use a one-tailed test for this type of scenario. My position is that you should set up a two-tailed test that produces the same power benefits as a one-tailed test because that approach will accurately capture the underlying fact that effects can occur in both directions.

However, before explaining this alternate approach, I need to describe an additional problem with the above scenario.

Beware of the Power that One-Tailed Tests Provide

The promise of extra statistical power in the direction of interest is tempting. After all, if you don’t care about effects in the opposite direction, what’s the problem? It turns out there is an additional penalty that comes with the extra power.

First, let’s see why one-tailed tests are more powerful than two-tailed tests with the same significance level. The graphs below display the t-distributions for two t-tests with the same sample size. I show the critical t-values for both tests. As you can see, the one-tailed test requires a less extreme t-value (1.725) to produce a statistically significant result in the right tail than the two-tailed test (2.086). In other words, a smaller effect is statistically significant in the one-tailed test.

Both tests have the same Type I error rate because we defined the significance level as 0.05. This type of error occurs when the test rejects a true null hypothesis—a false positive. This error rate corresponds to the total percentage of the shaded areas under the curve. While both tests have the same overall Type I error rate, the distribution of these errors is different.

To understand why, keep in mind that the critical regions also represent where the Type I errors occur. For a two-tailed test, these errors are split equally between the left and right tails. However, for a one-tailed test, all of these errors arise specifically in the one direction that you are interested in. Unfortunately, the error rate doubles in that direction compared to a two-tailed test. In the graphs above, the right tail has an error rate of 5% in the one-tailed test compared to 2.5% in the two-tailed test.

Related Post: Types of Errors in Hypothesis Tests

You Haven’t Changed Anything of Substance

By switching to a one-tailed test, you haven’t changed anything of substance to gain this extra power. All you’ve done is to redraw the critical region so that a smaller effect in the direction of interest is statistically significant. In this light, it’s not surprising that merely labeling smaller effects as being statistically significant also produces more false positives in that direction! And, the graphs reflect that fact.

If you want to increase the test’s power without increasing the Type I error rate, you’ll need to make a more fundamental change to your study’s design, such as increasing your sample size or more effectively controlling the variability.

Is the Higher False Positive Rate Worthwhile?

To use a one-tailed test to gain more power, you can’t care about detecting an effect in the other direction, and you have to be willing to accept twice the false positives in the direction you are interested. Remember, a false positive means that you will not obtain the benefits you expect.

Should you accept double the false positives in the direction of interest? Answering that question depends on the actions that a significant result will prompt. If you’re considering changing to a new production line, that’s a very costly decision. Doubling the false positives is problematic. Your company will spend a lot of money for a new manufacturing line, but it might not produce better products. However, if you’re changing suppliers for a part based on the test result, and their parts don’t cost more, a false positive isn’t an expensive problem.

Think carefully about whether the additional power is worth the extra false positives in your direction of interest! If you decide that the added power is worth the risk, consider my alternative approach below. It produces an equivalent amount of statistical power as the one-tailed approach. However, it uses a methodology that more accurately reflects the underlying reality of the study area and the goals of the analyst.

Alternative: Use a Two-Tailed Test with a Higher Significance Level

In my view, determining the possible directions of an effect and the statistical power of the analysis are two independent issues. Using a one-tailed test to boost power can obscure these matters and their ramifications. My recommendation is to use the following process:

Identify the directions that an effect can occur, and then choose a one-tailed or two-tailed test accordingly.
Choose the significance level to correctly set the sensitivity and false-positive rate based on your specific requirements.

This process breaks down the questions you need to answer into two separate issues, which allows you to consider each more carefully.

Now, let’s apply this process to the scenario where you’re studying an effect that can occur in both directions, but the following are both true:

You care about effects in only one direction.
Increasing the power of the test is worth a higher risk of false positives in that direction.

In this situation, using a one-tailed test to gain extra power seems like an acceptable solution. However, that approach attempts to solve the right problem by using the wrong methodology. Here’s my alternative method.

Instead of using a one-tailed test, consider using a two-tailed test and doubling the significance level, such as from 0.05 to 0.10. This approach increases your power while allowing the test methodology to match the reality of the situation better. It also increases the transparency of your goals as the analyst.

Related Post: Significance Levels and P-values

How the Two-Tailed Approach with a Higher Significance Level Works

To understand this approach, compare the graphs below. The top graph is one-sided and uses a significance level of 0.05. The bottom graph is two-sided and uses a significance level of 0.10.

As you can see in the graphs, the critical region on the right side of both distributions starts at the same critical t-value (1.725). Consequently, both the one- and two-tailed tests provide the same power in that direction. Additionally, there is a critical region in the other tail, which means that the test can detect effects in the opposite direction as well.

The end result is that the two-tailed test has the same power and an equal probability of a Type I error in the direction of interest. Great! And, you can detect effects in the other direction even though you might not need to know about them. Okay, that’s not a bad thing.

This Approach Is More Transparent

What’s so great about this approach? It makes your methodology choices more explicit while accurately reflecting a study area where effects can occur in both directions. Here’s how.

The significance level is an evidentiary standard for the amount of sample evidence required to reject the null hypothesis. By increasing the significance level from 0.05 to 0.10, you’re explicitly stating that you are lowering the amount of evidence necessary to reject the null, which logically increases the power of the test. Additionally, as you raise the significance level, the Type I error rate also increases by definition. This approach produces the same power gains as a one-tailed test. However, it more clearly indicates how the analyst set up a more sensitive test in exchange for a higher risk of false positives.

The problem with gaining the additional power by switching to a one-tailed test is that it obscures the fact that you’re weakening the evidentiary standard. After all, you’re not explicitly changing the significance level. That’s why the increase in the Type I error rate in the direction of interest can be surprising!

Decision Guidelines

We covered a lot in this post. Here’s a brief recap of when to use each type of test. For some tests, you don’t have to worry about this choice. However, if you do need to decide between using a one-tailed and a two-tailed test, follow these guidelines. If the effect can occur in:

One direction: Use a one-tailed test and choose the correct alternative hypothesis.
Both directions: Use a two-tailed test.
Both directions, but you care about only one direction and you need the higher statistical power: Use a two-tailed test and double the significance level. Be aware that you are doubling the probability of a false positive.

Comments

Grace Gibson says

April 13, 2021 at 10:02 am

Thanks Jim!

Reply
Grace Gibson says

April 12, 2021 at 1:57 pm

Hi Jim,

Another great post.

If my hypothesis was say, that intelligence overall will be greater for first group that took the study in 2010 than the second group that took the same test in 2020. Would this be one tailed because I have made a specific prediction about the direction of intelligence over time?

Thanks again,
Grace

Reply
- Jim Frost says
  
  April 13, 2021 at 12:22 am
  
  Hi Grace,
  
  I think you’d have a stronger case for a one-tailed test if the studies were closer together in time. When they’re so far apart, it’s possible that intelligence could decline over the years. (I’ve seen it happen!) But, if the studies were say a month apart, you’d have a stronger case for saying that intelligence wouldn’t decline over such a short span of time and, therefore, a one-tailed test might be called for. Whenever you can say that an effect is only possible in one direction, that’s the strongest case for a one-tailed test where you won’t get any debate.
  
  It sounds like you’re asking about a one-tailed test based on a prediction about the hypothesis. That’s not usually a good enough reason to use a one-tailed test by itself. Of course, as I mention, there is some debate about when it’s ok. At the very least, it could be based on your prediction and the fact that you don’t care about results in the other direction. If you wanted to get published in a journal, that wouldn’t fly. Outside the academic context, you’d probably get some analysts to agree with that case and others wouldn’t.
  
  Just be aware of the drawbacks that I mention. By going to a one-tailed tests, you’re doubling the false positives in the hypothesis direction in which you’re interested. I only recommend one-tailed tests for cases where the effect can only possibly exist in one direction.
  
  Reply
Jerry says

November 24, 2020 at 12:33 pm

Brilliant post, Jim! I use hypothesis tests all the time (always two-tailed), but with the explanation you provided here, I can raise the significance if more false positives (i.e., Type I errors) are not a problem. With that said, this approach would still have to get past reviewers in a manuscript submission, which is no sure thing. I’ll play with the numbers if this comes up again in my work — and I will read this post at least once more, too. Thanks for the insight.

Reply
- Jim Frost says
  
  November 24, 2020 at 10:51 pm
  
  Hi Jerry,
  
  Great to hear from you again! I’m glad this post was helpful. I think typically you wouldn’t want to raise the significance level higher that 0.05. However, for those who change to a one-sided test and leave the significance level at 0.05, they’re doing that in effect.
  
  Best wishes and Happy Thanksgiving!
  
  Reply
Eddie liang says

November 18, 2020 at 6:19 am

Thanks Jim

Reply
Eddie liang says

November 8, 2020 at 12:49 am

By “not all the data falls within a particular region”, do you mean that some of the data collected fall in the region and others don’t , BUT the mean of all data in this particular sample do, which is the whole point of hypothesis testing? As to the curve, I think that is the hypothesized sampling distribution of the sample mean, with the sample collected being a member of the overall set. please advise whether this is right, if not , then, there really is something wrong with the understanding and I will go back to the text book : ), otherwise, my question regarding one tailed test remains, thank you so much Jim.

Reply
- Jim Frost says
  
  November 8, 2020 at 3:35 pm
  
  Hi Eddie,
  
  I do highly recommend that you read the post I link for you. It’ll help!
  
  There seems a crucial piece that you’re missing. Again, it’s totally understandable because it’s not obvious.
  
  These tests don’t assess where individual data points fall in a distribution.
  
  Instead, these tests assess one estimate of a population parameter and compare it to the null hypothesis value.
  
  Let’s look at that in the context of a 1-sample t-test. In this case, you’re comparing the sample mean (which estimates the population mean) and comparing it to the null hypothesis value. So, it’s just one value (the sample mean), not all the data points. And, you’re looking to see where that value falls in relation to the null hypothesis value. In the graphs, the null hypothesis value is the peak. And, you’re looking to see how far out the mean is. And because the mean is only one value, it’ll fall only at one point on the graphs.
  
  Again, read the other post. It’ll answer your questions. It doesn’t make sense for me retype what I wrote in that post here in the comments. If after reading that post you have more questions, I’ll be happy to answer! 🙂
  
  Reply
Eddie liang says

November 7, 2020 at 12:42 am

Use your illustration
“Null: The effect is less than or equal to zero.
Alternative: The effect is greater than zero.”
Say if the significance level is 0.05, all on the left side, we are saying that 5% of the data are in the region, and if we observe this unlikely event, then it’s unlikely that the hypothesized mean is the true mean, depending how rare you want your criteria to be. If say the alpha level is 1%, where the critical value is even further from the mean, and if the p value is still in there, then we can be even more confident.I hope I am right about the above, but even if I am, I am still not as comfortable with “less than” as with “ not equal to”, even though I can work through the mechanics and get most my practice questions right. Is it ok to say that, at most 1% of data is in there, given the distribution, because any means greater will have a lower percentage, so 0.9%, 0.8%, 0.7% etc as you shift your means to the right with no boundaries, so you can shift infinitely, therefore we can be 99% confident that it is less than, please? Or if not, what’s the logic in words please? Thank you.

Reply
- Jim Frost says
  
  November 8, 2020 at 12:15 am
  
  Hi Eddie,
  
  That’s not what the significance level indicates. The significance level doesn’t indicate where the data fall. If you’re performing a one-tailed test and get significant results, it doesn’t mean all the data falls in a particular region of the curve. It means that the sample statistic, such as the mean effect, falls far enough away from zero in a particular direction such that the test statistic falls in the corresponding critical region. The curves you’re seeing in the graphs are not data distributions. They’re sampling distributions for the test statistic, which is an entirely different thing.
  
  I think before trying to understand one-tailed tests, you should read more about how hypothesis tests work in general. Click that link to learn more about how they work, sampling distributions, and what significance levels and p-values actually mean. I can tell you have a few misconceptions about them. That’s ok because they’re tricky concepts. But, it’ll be difficult to understand one-tailed tests without fully understanding how hypothesis tests work.
  
  Reply
Eddie liang says

November 5, 2020 at 9:19 am

I can tell that in a two tailed test, the rejection regions are such that only a certain percentage of data points falls within that range and if you happen to observe a data point within that range, then it’s ok to conclude that the hypothesized mean is unlikely the true mean. However, if I shift all the rejection region to one side, knowing how unlikely I will find something in there, and then somehow observe a data within the range, how does it lead to a conclusion that the true mean is greater or smaller than the hypothesized value please? How can I draw any conclusion from this observation ? If the true mean is to either the left or right of the hypothesized value, it will have its own distribution , rendering the existing distribution irrelevant for drawing conclusion about a different mean?

Reply
- Jim Frost says
  
  November 6, 2020 at 9:23 pm
  
  Hi Eddie,
  
  To be technically correct, you’re not looking for data points to fall in the critical regions. Instead, you’re looking for sample statistics that fall in those regions. You don’t need to worry about the distribution changing based on whether the mean is greater than or less than. It all works on the same distribution, which is a sampling distribution for the test statistic.
  
  Read my post about one-tailed and two-tailed tests. It’ll show you how they work and I believe will answer your questions. I show the distributions for both types. If you do have more questions after reading that, don’t hesitate to ask!
  
  Reply
Tom says

May 10, 2020 at 5:32 am

Nice article, thank you Mr. Frost. I am a statistician and I run in this problem regularly and I am still not clear with it. With “Both directions but you care about only one direction” I use the approach that I do 2-tail test on 5 % sig. level and if this is significant and my client is interested only in one direction, then I interpret that the one-sided effect is significant at 5 % level. Which may look weird, but it is a correct statement. Basically, I avoid stating that one sided effect is significant at 5 % level in the situation where the 2-sided p-value is e.g. 0.07 and 1-sided is 0.035. This I don’t interpret as significant on 5 % even if my client is interested only in one direction.

Reply
luo.la says

December 26, 2018 at 1:20 am

Ye ! This Is A Good Blog!

Reply
Sreekumar says

November 12, 2018 at 10:00 am

Its a good article Mr. Jim. It gives clarication to the one tailed and two tailed tests that we commonly use in research.

Reply
- Jim Frost says
  
  November 12, 2018 at 11:00 am
  
  Thanks, Sreekumar! I’m glad it was helpful!
  
  Reply