Nonparametric tests don’t require that your data follow the normal distribution. They’re also known as distributionfree tests and can provide benefits in certain situations. Typically, people who perform statistical hypothesis tests are more comfortable with parametric tests than nonparametric tests.
You’ve probably heard it’s best to use nonparametric tests if your data are not normally distributed—or something along these lines. That seems like an easy way to choose, but there’s more to the decision than that.
In this post, I’ll compare the advantages and disadvantages to help you decide between using the following types of statistical hypothesis tests:
 Parametric analyses to assess group means
 Nonparametric analyses to assess group medians (sometimes)
In particular, I’d like you to focus on one key reason to perform a nonparametric test that doesn’t get the attention it deserves! If you need a primer on the basics, read my hypothesis testing overview.
Related Pairs of Parametric and Nonparametric Tests
Nonparametric tests are a shadow world of parametric tests. In the table below, I show linked pairs of statistical hypothesis tests.
Parametric tests of means  Nonparametric tests of medians 
1sample ttest, Paired ttest  Sign test, Wilcoxon signed rank test 
2sample ttest  MannWhitney U test 
OneWay ANOVA  KruskalWallis test, Mood’s median test 
Factorial DOE with a factor and a blocking variable  Friedman test 
Additionally, Spearman’s correlation is a nonparametric alternative to Pearson’s correlation. Use Spearman’s correlation for nonlinear, monotonic relationships and for ordinal data. For more information, read my post Spearman’s Correlation Explained!
For this topic, it’s crucial you understand the concept of robust statistical analyses. Learn more in my post, What are Robust Statistics?
Advantages of Parametric Tests
Advantage 1: Parametric tests can provide trustworthy results with distributions that are skewed and nonnormal
Many people aren’t aware of this fact, but parametric analyses can produce reliable results even when your continuous data are nonnormally distributed. You just have to be sure that your sample size meets the requirements for each analysis in the table below. Simulation studies have identified these requirements. Read here for more information about these studies.
Parametric analyses  Sample size requirements for nonnormal data 
1sample ttest  Greater than 20 
2sample ttest  Each group should have more than 15 observations 
OneWay ANOVA 

You can use these parametric tests with nonnormally distributed data thanks to the central limit theorem. For more information about it, read my post: Central Limit Theorem Explained.
Related posts: The Normal Distribution and How to Identify the Distribution of Your Data.
Advantage 2: Parametric tests can provide trustworthy results when the groups have different amounts of variability
It’s true that nonparametric tests don’t require data that are normally distributed. However, nonparametric tests have the disadvantage of an additional requirement that can be very hard to satisfy. The groups in a nonparametric analysis typically must all have the same variability (dispersion). Nonparametric analyses might not provide accurate results when variability differs between groups.
Conversely, parametric analyses, like the 2sample ttest or oneway ANOVA, allow you to analyze groups with unequal variances. In most statistical software, it’s as easy as checking the correct box! You don’t have to worry about groups having different amounts of variability when you use a parametric analysis.
Related post: Measures of Variability
Advantage 3: Parametric tests have greater statistical power
In most cases, parametric tests have more power. If an effect actually exists, a parametric analysis is more likely to detect it.
Related post: Statistical Power and Sample Size
Advantages of Nonparametric Tests
Advantage 1: Nonparametric tests assess the median which can be better for some study areas
Now we’re coming to my preferred reason for when to use a nonparametric test. The one that practitioners don’t discuss frequently enough!
For some datasets, nonparametric analyses provide an advantage because they assess the median rather than the mean. The mean is not always the better measure of central tendency for a sample. Even though you can perform a valid parametric analysis on skewed data, that doesn’t necessarily equate to being the better method. Let me explain using the distribution of salaries.
Salaries tend to be a rightskewed distribution. The majority of wages cluster around the median, which is the point where half are above and half are below. However, there is a long tail that stretches into the higher salary ranges. This long tail pulls the mean far away from the central median value. The two distributions are typical for salary distributions.
In these distributions, if several very highincome individuals join the sample, the mean increases by a significant amount despite the fact that incomes for most people don’t change. They still cluster around the median.
In this situation, parametric and nonparametric test results can give you different results, and they both can be correct! For the two distributions, if you draw a large random sample from each population, the difference between the means is statistically significant. Despite this, the difference between the medians is not statistically significant. Here’s how this works.
For skewed distributions, changes in the tail affect the mean substantially. Parametric tests can detect this mean change. Conversely, the median is relatively unaffected, and a nonparametric analysis can legitimately indicate that the median has not changed significantly.
You need to decide whether the mean or median is best for your study and which type of difference is more important to detect.
Related posts: Determining which Measure of Central Tendency is Best for Your Data and Median: Definition and Uses
Advantage 2: Nonparametric tests are valid when our sample size is small and your data are potentially nonnormal
Use a nonparametric test when your sample size isn’t large enough to satisfy the requirements in the table above and you’re not sure that your data follow the normal distribution. With small sample sizes, be aware that normality tests can have insufficient power to produce useful results.
This situation is difficult. Nonparametric analyses tend to have lower power at the outset, and a small sample size only exacerbates that problem.
Advantage 3: Nonparametric tests can analyze ordinal data, ranked data, and outliers
Parametric tests can analyze only continuous data and the findings can be overly affected by outliers. Conversely, nonparametric tests can also analyze ordinal and ranked data, and not be tripped up by outliers. Learn more about Ordinal Data: Definition, Examples & Analysis.
Sometimes you can legitimately remove outliers from your dataset if they represent unusual conditions. However, sometimes outliers are a genuine part of the distribution for a study area, and you should not remove them.
You should verify the assumptions for nonparametric analyses because the various tests can analyze different types of data and have differing abilities to handle outliers.
If you’re using a Likert scale and you want to compare two groups, read my post about which analysis you should use to analyze Likert data.
Related posts: Data Types and How to Use Them and 5 Ways to Find Outliers in Your Data
Advantages and Disadvantages of Parametric and Nonparametric Tests
Many people believe that choosing between parametric and nonparametric tests depends on whether your data follow the normal distribution. If you have a small dataset, the distribution can be a deciding factor. However, in many cases, this issue is not critical because of the following:
 Parametric analyses can analyze nonnormal distributions for many datasets.
 Nonparametric analyses have other firm assumptions that can be harder to meet.
The answer is often contingent upon whether the mean or median is a better measure of central tendency for the distribution of your data.
 If the mean is a better measure and you have a sufficiently large sample size, a parametric test usually is the better, more powerful choice.
 If the median is a better measure, consider a nonparametric test regardless of your sample size.
Lastly, if your sample size is tiny, you might be forced to use a nonparametric test. It would make me ecstatic if you collect a larger sample for your next study! As the table shows, the sample size requirements aren’t too large. If you have a small sample and need to use a less powerful nonparametric analysis, it doubly lowers the chance of detecting an effect.
If you’re learning about hypothesis testing and like the approach I use in my blog, check out my Hypothesis Testing book! You can find it at Amazon and other retailers.
Dipesh Patel says
Hi Jim!
Thank you for creating this great website.I have never find it so easy to understand such complex topic like statistic.
My query is could you please write something more info of how to interpret Friedman test and all related important terminologies whether it is the right test,when to use etc just like you have discussed every other topic ?
I shall be grateful to you, Honestly even my uni stats team were not able to explain me so easily in the way you have taught the concept of statistics.
Jim Frost says
Hi Dipesh!
Thanks so much for writing. I’m so glad to hear that my website has been helpful!
I will definitely write about the Friedman test. I put it on my list!
Be sure to join my email list if you haven’t already so you’ll know when it’s published. It might be several weeks.
Natalya says
Dear Jim, Many thanks for your great work!
JD says
Hello, my research has pretest, posttest and a delayed posttest. I have 2 groups (control and treatment) of 10 participants each. Based on your page, you mentioned that it’s possible to carry out parametric tests if there is more than 20 participants:
1. Does that mean if I have 20 participants that is not enough to carry out a parametric test even though my data is not normally distributed? So, I can carry out paired sample ttest or should I just use Wilcoxon Signed Rank Test?
2. How do I go about testing for delayed posttest? The reason I am doing a delayed posttest was to ensure the reliability of the posttest results.
Rana says
Thanks a lot for your valuable website and information. If I used 10 animal as a sample size and I have high partial eta square, can I apply parametric ANOVA.
Thanks again.
Jim Frost says
Hi Rana,
If you have a sample size of 10 per group and you are sure they follow a normal distribution, you can use parametric ANOVA. Your sample size is small, which means you must satisfy the normality assumption.
Rohith Venkatakrishnan says
Hey Jim,
Thanks for your article. I am confronted with a similar situation where I have 4 conditions (20 subjects per condition, one of which is a control group). I see that this meets the 15 subjects requirement for 29 groups but what I want to know is, when would you consider the data to be extremely skewed and unfit for parametric analysis?
Any thresholds to determine “extreme skew”?
Jim Frost says
Hi Rohith,
That’s kind of a trick question because there is no clearcut dividing line. In most cases, you’ll be fine given the number of subjects per group. If you really want to check, you can do a resampling method to see what kind of distribution it produces. Does the resulting distribution look fairly normal? To see what I’m talking about, read my post about the central limit theorem. I show examples of sampling distributions that do and do not converge on the normal distribution for different distributions and sample sizes. You can try it with your data to see what it looks like. There’s no statistical test but if your sampling distribution looks fairly normal, you’re safe.
Filip says
Hello Jim, thank you for this article. I have a problem, the images don’t load.
Jim Frost says
Hi Filip, I just checked and I’m seeing the image in this post with no problem (there’s only one image).
Tom says
Hello Jim,
I recently discovered your site and it is extremely helpful. Thanks! I have been struggling figuring out how to report data. Say I am analyzing the response to a medication in 3 groups of a patients, and looking at response vs blood concentrations of the drug. I am trying to come up a reference range that says: These patients will respond to symptoms when in the following blood concentrations (eg 525 mcg/mL). My total n= ~900 patients. 1 group has ~80 patients (responders), one group has ~600 cases (partial responder) and one group has ~150 (nonresponders). The data is not normally distributed based on several normality tests. In order to establish the reference range, I need to capture the central 95% of patient blood concentrations when looking at the responders group (ideally just those that fully responder, but also those that fully respond + partially respond). If mean +/ 2SD is used, then I end up at a negative blood concentration, which obviously isn’t possible. However if I use median, the boxplot and whisker seems to capture a good range and indicates outlier. Is this latter way the correct way to go?
I hope this makes sense
Thanks!
Sanjeda Tamanna says
Thank you so much for your reply.
Sanjeda Tamanna says
Hello Jim,
Thank you so much for writing wonderful articles. Your articles helped me a lot to understand statistics. They are making my data analysis easier. I will be very grateful if you would like to provide me with some suggestions regarding my sample size in ANOVA and parametric analysis. I have 4 groups having 50, 16, 54, 70 sizes respectively. I checked their distributions. They dont follow normal distributions. I did ordinary one way ANOVA or Welchs ANOVA depending on difference in their SD values. Among these 4 groups, first one is control group, 2nd one is experimental group comprising two types of patients, and the rest two groups are the two types of patient groups each which make up the 2nd group. Am i doing the right form of analysis?
Jim Frost says
Hi Sanjeda,
If you look at the table in this post, you’ll see that when you use oneway ANOVA and have 29 groups, you typically don’t need to worry about normality when each group has at least 15 observations. You have four groups, so this applies to you. And all of your groups have at least 15 observations. Although one group is very close. I think you’re safe using oneway ANOVA unless your data are extremely skewed.
However, because you have unequal sample sizes across your groups, the equal variances assumption is particularly relevant. If your variances are not equal, definitely use Welch’s ANOVA. If they’re roughly equal, the regular oneway ANOVA should be fine.
If you have significant results, you should perform a post hoc analysis to see which groups are different. Because you have a control and treatment group, I recommend Dunnett’s method. Click the link to learn about them and I include an example that uses Dunnett’s.
I don’t know what you mean when you say that some groups are made up of two types of patients. In a oneway ANOVA, all subjects should be a random sample from the same population. Their primary difference between groups should be the grouping variable in your ANOVA, which is experimental group in your case.
Baris says
Hi Jim, I found the solution. I’m going to do an ordinal logistics regression analysis! I just wanted to let you know so you have more time to answer other questions. Thank you!
Jim Frost says
Hi Baris,
Sorry for the delay in replying but I’m glad you found your answer. One thing I wasn’t sure about from your original question was about your IVs and DV. Ordinal logistic regression is a good choice when your DV is ordinal, like Likert scale data. However, I’m not sure what variables you’ll use as IVs? If they’re also ordinal, then you’ll need to enter them either as continuous or categorical. Ordinal has characteristics of both, but you’ll have to choose one or the other for each IV. Although, you don’t have to make the same decision for all IVs. The correct choice depends on the nature and amount of your data along with the goals of your study.
Baris says
Hi Jim,
Below you can find the survey question that tried to measure the impact of cognitive biases induced by marketing messages on consumer decision making to purchase in ecommerce.
“When you shop online, which one of these sales aspects impacts your decisionmaking to purchase?”
——————
Stock Availability: (1) Not at all (2) Rarely impacted (3) Sometimes impacted (4) Usually impacted (5) Highly
Reviews of people: (1) Not at all (2) Rarely impacted (3) Sometimes impacted (4) Usually impacted (5) Highly
Countdown timer: (1) Not at all (2) Rarely impacted (3) Sometimes impacted (4) Usually impacted (5) Highly
Nr. of likes: (1) Not at all (2) Rarely impacted (3) Sometimes impacted (4) Usually impacted (5) Highly
(Likert scale)
Based on the existing literature and other online sources, the following marketing messages are used to induce cognitive biases of consumers in ecommerce. Each marketing message in a way manifests a cognitive bias.
Stock availability –> Scarcity Bias
Reviews –> Bandwagon effect
Countdown timer –> Loss aversion
Nr. of likes on the product –> Bandwagon effect
390 people responded to the survey and my hypothesis are as follows:
Ho: Stock availability has no impact on consumer decision making to purchase
H1: Stock availability impacts the consumer decision making to purchase
Ho: Reviews of people have no impact on consumer decision making to purchase
H1: Reviews of people impact the consumer decision making to purchase
Ho: Countdown timer has no impact on consumer decision making to purchase
H1: Countdown timer impacts the consumer decision making to purchase
Ho: Nr. of likes on a product have no impact on consumer decision making to purchase
H1: Nr. of likes on a product impact the consumer decision making to purchase
Given this information:
1. What kind of hypothesis testing should I use?
Ps: sorry for my long comment, I tried to be as clear as possible
Thank you in advance!
Vijay S Pawar says
Hii Jim,
you mentioned above sample size requirement about nonnormal data. How you fix that requirement, by doing some practical basis or from any reference?
Jim Frost says
Hi Vijay,
Those sample size requirements come from a simulation study conducted by some smart people I used to work with. You’ll find a link to it in the Advantage #1 section under Advantages of Parametric Tests. Click the link to read the study.
Gabriel says
Dear Jim,
Thank you so much for your prompt reply. Indeed your answers have me reassured! thank you! In short, one should not outrightly reject the application of parametric approaches under the nonnormal distribution of data. It is still accurate and valid under that condition. The keyword here is “robustness” of the parametric approach even though it is used to analyse the highly skewed data. Robustness here of course is relying on several factors such as sample size, confidence interval set, or p value, Am i right?
By the way, i feel reluctant to use spearman rank correlation although my data (both continuous) are not normally distributed. Many articles and experts said we should use spearman in this case but i feel unsure due to the fact that spearman, by its name and intention of the analysis, it should be used on rank/ordinal data like Likert scale data. However, as i mentioned, there are many, not just several, scholars recommend such an application of spearman for both continuous or interval data (not ranked). I am confused as i have read some articles (from old to new articles) suggesting that spearman is strictly meant for analysing ranked data. Therefore my question is should we be really concerned about the data type by which how spearman correlation is used?
Thanks in advance
Jim Frost says
Hi Gabriel,
Robustness indicates that the test performs satisfactorily even with nonnormal data. Specifically, all hypothesis tests have a Type I error rate. That’s basically a false positive. Imagine the null hypothesis is true. You perform a hypothesis test, get a low pvalue, reject the null hypothesis, and conclude that the effect/relationship exists in the population. In our thought experiment, we know that the test result is incorrect but in the real world you never know that for sure.
But, we know how often Type I errors occur. When a test performs correctly, the Type I error rate equals the significance level you use (e.g., 0.05). When a test is robust to departures from normality, the Type I error rate equals the significance level even with nonnormal data. When a test is NOT robust, then nonnormal data will cause the Type I error rate to NOT equal the significance level. The simulation studies have found that when you satisfy the sample size guidelines, the listed tests are robust to departures from normality.
If Spearman’s correlation is appropriate for data and Pearson’s is not, you really need to use Spearman’s. It’s NOT just for rank and ordinal data. It’s also for nonlinear relationships that are monotonic. Read my post about Spearman’s correlation to understand what that means and the types of relationships for which you should use it. You’ll need to graph your data to make that determination. The are definitely cases where you have continuous data and Spearman’s will be the appropriate type of correlation to use. That may or may not be the case for your data but you need to make that determination. Again, read my post about Spearman’s.
Gabriel says
Hi Jim, Thank you so much for the clarification on the use of parametric approaches for nonnormal distributed data, provided that other requirements like sample size needs to be reasonably large. I noticed that you have provided some rule of thumbs in terms of sample size for t test and anova under the nonnormal distribution of means but may i know what what about Pearson’s correlation? What would be an adequate sample size under the nonnormal distribution.
If by the above rule of thumb, the parametric approach is valid (e.g., the sample size is 150 or 200), should we still need to perform normality test (skewness and kurtosis)? or we can assume that it should be fine? or even the latter contradicts the former, will the latter prevail over the former?
FYI, i am not a statistician, however i came across an article by Professor Geoff Norman debunking various myths about statistics like many of so called experts claim that once it is non normal, data are categorical/ordinal, sample size is too small then you have no choice but to use nonparametric approaches.
Thank you very much. Look forward to hearing from you.
Jim Frost says
Hi Gabriel,
Thanks for the great question! Yes, you’re quite right, there are similar guidelines for Pearson’s correlation.
In general, the sample size for correlation should be greater than 25. There’s no formal rule for this number, but you need a certain number of observations to identify patterns such as correlation.
In terms of normality, it’s not necessarily an issue for the correlation coefficient itself but it is for the pvalue. However, in some cases, the nature of the relationship will require you to use a different type of correlation, such as Spearman correlation. Fortunately, Pearson and Spearman correlation are robust to nonnormal data when you have more than 25 paired observations. One caveat. The confidence intervals for the Pearson’s correlation coefficient remain sensitive to nonnormality regardless of the sample size. The pvalues for Spearman’s correlation are even robust to nonnormal data because it’s a nonparametric method that uses ranks.
Your sample size of 150 or 200 are so much larger than the guideline value that you don’t need to worry about normality.
As for the article by Professor Norman, which I have not read, it’s inaccurate to say that you can’t use parametric methods with nonnormal data. Thanks to the central limit theorem, you can use parametric methods with nonnormal data when your sample size is large enough. The sample size guidelines I present are based on simulation studies that compare simulated test results to known correct results for various distributions and sample sizes. These studies find that when you satisfy these sample size guidelines, the tests work correctly even with nonnormal data. However, if you have nonnormal data and a small sample size, then you might need to use a nonparametric test, which I discuss in this article.
I hope that helps!
Simon Tanios says
Thanks so much!!
Zeb says
Thanks for perfect explanation sir. Sir I have a question regarding my data analysis. I have conducted a study and I want to compare the present situation with previous. All participants (male and female) have already experienced the pre and post situation. To compare the present situation with previous one with the options of; (Not Available), (Worst Condition), (Average Condition), (Better Condition). Let say, to compare the “Drinking water facility” with above scale/options. Any suggestion how to analyze or whats kind of statistical test can be used for this kind of data.
I will strongly appreciate your valuable inputs.
Thanks
Zeb
Adrian says
Dear Jim,
Let me add a few notes from my 10year practice in the clinical research biostatistics.
1) by simulations, I only rarely obtained the assumed type1 error and assumed power with using parametric tests on highly skewed (like lognormal) and multimodal data of different dispersion across groups, with so small data sizes. Not rarely I work with so specific data, so even N=300 doesn’t give reliable results. This is unacceptable in this industry I work in. It is interesting, however, to see how the outcomes vary depending on our experience.
This was visible especially on the ANCOVA on changefrombaseline adjusted for the baseline (the recommended by guidelines standard of analyses in the RCTs) in more complex designs and multiple repetitions over time (fit either via GLS or a mixed model). But then I either switch to the (weighted) GEE estimation or choose quantile regression with random effects and run a set of the LR tests over it to get the assumptionfree ANOVA over the underlying model.
Moreover, it makes entirely no statistical sense to compare means in skewed data. These are wrong measures of the central tendency most of the time. Why? Because the arithmetic mean is by definition an additive measure, which has nothing to do with multiplicative processes or processes that can be described with the lognormal distribution. The two are incompatible. For exactly this reason it makes no sense to bootstrap the difference in means or to run a permutation test over the means – because still, however technically possible, it makes no statistical sense to use means to describe such data.
Sure, one could logtransform the data, but transforming isn’t the best option here, because it changes too many things: the hypothesis, biases the backtransformed CIs (Jensen’s inequality), affects the underlying model errors, affects the meanvariance relationship and more. Instead, we need to use the generalized model, which properly deals with the conditional expected value (rather than raw data) linked to the predictor, or by employing quantile regression followed by the LR tests to get the main and interaction effects.
2) Almost neither (except maybe 35) of the nonparametric test (ouf ot about 320 I know) requires formally equal variance (or more generally – dispersion). And neither assesses the medians in general (sadly, this is repeated even in many textbooks, luckily not all, and the awareness grows quickly). It holds IF and ONLY IF the distributions are equal (IID): same shape, same dispersion AND both are symmetric. Otherwise is practically never happens. MannWhitney, KruskalWallis are about stochastic equivalence, assessed via pseudomedian. They all fail entirely as tests of equality of medians just by the definition of the pseudomedian and its properties. Lots of the literature is available on this, also the simulations confirm it. It’s very easy to have numerically equal medians and the test report significant results due to the difference in shape of dispersions. And that’s OK, because it was designed as stochastic equivalence and not median tests. Sure, if we want to restrict ourselves to equal dispersions AND symmetry of the distributions (must be by the definition of pseudomedian), then we can treat it as asymptotic tests of medians, but – then – this is a perfect situation for the CLT and, actually, the standard t test (median approaches the arithmetic mean here).
3) By the way, there are also modern tests, like the ATS (ANOVAType Statistics), WTS (WaldType Statistics), permuted WTS and ART ANOVA (AlignedRank Transform), which are much more flexible (handle up to 35, depending on implementation, main effects + interaction + repeated observations) and powerful. They use socalled relative effects.
Dr Rakesh Ranjan Pathak says
Sir while comparing parametric and nonparametric methods we miss the two real question
1) what if we use nonparametric tests in parametric conditions ?
2) what if we use parametric tests in nonparametric conditions ?
Please detail on the error in outcome as the real life deterrent, Thanks
Jim Frost says
Hi, I touch on those issues in this post. Specifically:
1) Typically, nonparametric tests have less power than their parametric counterparts. For power reasons, you’ll want to use a parametric test when it’s valid. Using a nonparametric test in these conditions increases the Type II error rate (false negatives)
2) If you use a parametric test when a nonparametric test is appropriate, you’ll obtain inaccurate results. The Type I error rate won’t necessarily equal the significance level you define for the test. I’m not sure if there is a consistent direction of change in that error rate. I suspect that the Type I error rate can be higher or lower than the significance level depending on the nature of the violation.
I hope that helps.
Maria says
Great article, thank you
But may I ask when to say its better to choose the mean or the median as the best measure of central tendency for my data? is there any guide?
Jim Frost says
Hi Maria,
Thanks for writing! In my post about measures of central tendencies, I write about which measure is best for different situations, including choosing between the mean and median. I’d recommend reading it. In a nutshell, the mean is better when your data are symmetric, or at least not extremely skewed, while the median is better when your data are fairly skewed. In my other post, I show why that’s the case.
Rafi Mohammed says
Hi Jim. Very informative article. I would like know one more thing.
Can we use parametric tests to analyse ordinal data? If so, in what circumstances? Please advise.
Jim Frost says
Hi Rafi,
That questions has been behind many debates in statistics! In some cases, yes! In this post, I have a link near the end for an article I wrote about analyzing Likert scaled data. Likert scale is an ordinal scale. And for those data, you can use the parametric 2sample ttest. That’s based on a thorough simulation study. However, I would not say that means you can always use parametric tests for all scenarios where you have ordinal data. There are probably requirements for samples sizes and number of ordinal levels. At any rate, read that one post about analyzing Likert data to get an idea of some of the issues and how it works out for 2sample ttests.
I hope that helps!
Kenny L says
Hi Jim,
Thanks for the very informative Article. It looks great to see all Hypothesis tests in one article, and appreciate the details and depth of the explanation.
One thing that I been struck upon is to make the best choice between Parametric and nonparametric tests, when there are many varying features and under the influence of many varying features the distribution become highly uneven making it hard to compare and harder to draw inferences.
But this is the actual case in practical application when you want to do A/B Testing. Real life A/B testing involves dealing with distributions that vary largely due to high number of Features(columns or variables).
For doing A/B Testing with varying distributions in the 2 experiments under conditions of multiple features involved, would you recommend Parametric Statistical Hypothesis Tests or NonParametric Statistical Hypothesis Tests?
( I have tried Parametric Statistical Hypothesis Tests but it was getting hard to meet the statistical significance, as there are multiple features involved. If I remove/ignore most of the variables I may endup getting the statistical significance, but that may not be the intended purpose of A/B testing though.)
Can you throw some light,please?
MahNoor Ashrif says
Hi Jim!
A researcher conducted a research that majority of people who died during pandemic bought a new phone during last year. What type of research is this? If his assumption is correct which statistical test should be apropriate to analyse the data?
please answer this question in detail. i will be really thankful to you.
Jim Frost says
Hi MahNoor, apparently this is a question from a test because someone else recently asked the identical question. I’m not going to do your test for you. However, I will point you towards a 2sample proportions test, which will allow you to determine whether there is a difference the proportion of fatalities between those who bought a new phone and those who didn’t.
Ben Craggnon says
Amazing thanks!
Ben Craggnon says
Hi Jim,
Thanks so much for explaining this all!
I want to compare the ages of two groups I have (one is only 17 people and one is 51 people). Because the first group is <20 people do I need a MannWhitney U test or can I just use a t test here?
Many thanks!
Ben
Jim Frost says
Hi Ben,
Do you have any theoretical reasons or empirical data that suggests the population for the smaller group follows a nonnormal distribution? If you can reasonably assume that it follows a normal distribution, you can probably use a ttest. However, if you have any doubts about that, best to go with MannWhitney.
aruna says
hi jim ,,,, thank you for the wonderful article ,,,,can you tell special features of factorial design.. it would be very helpful
ELZED LIEW says
Thanks heaps for this excellent overview.
However, I am bit confused with ‘The groups in a nonparametric analysis typically must all have the same variability (dispersion).’
As far as I can remember, ANOVA, as a parametric test assumes equal variances of the samples that wil be tested.
Do you think i should stick to ANOVA if the samples are normally distributed but have unequal variance?
Jim Frost says
Hi Elzed,
If you have unequal variances, you can use Welch’s ANOVA. Click the link to read my post about it!
Lisa says
Thanks a bunch Jim !
Lisa says
Hi Jim,
Thanks for this article! I would like to kindly seek your advise
I’m currently looking to filter out variables that are highly correlated so that I may remove one or the other for an analysis, I was thinking of using the non parametric test Spearmans Rank Correlation, would that be correct? Data are of equal groups, each group >20 observations, continuous data.
Jim Frost says
Hi Lisa,
You can use that or even just the regular Pearson’s correlation. If you’re performing regression analysis and worried about multicollinearity, you can fit the model with the variables and then check the VIFs.
Heather says
Hi Jim.
Thankyou for your article it was very helpful. I was wondering if you could help me I’m currently doing my thesis and am carrying out a few statistical tests. One is an independent samples t test with 1 categorical independent variable (PP group 1, N= 57, PP group 2, N=45) with one continuous dependent variable. However, my data has violated the assumptions: Normality, Homogeneity of variance & has a few outliers. In this case, would I bootstrap my ttest or use the alternative nonparametric test (Mannu Whitney). How would I make this decision? What would the criteria be for using bootstrapping over the alternative nonparametric test?
Thanks in advance for any insight you can offer! 🙂
Jim Frost says
Hi Heather,
In your case, I would strongly consider using the ttest. In fact there are specific reasons for not using a nonparametric test in your case.
Specifically, you have a large enough sample size in each group so that the central limit theorem kicks in (see the table in the post for sample size requirements). Even though the data in your groups are nonnormal, the sampling distributions should follow a normal distribution, which gives you valid results. Additionally, ttests can handle unequal variances. Just be sure that your statistical software uses the version of the ttest that does NOT assume equal variances.
While nonparametric tests don’t assume that your data follow a particular distribution, they do assume that the spread of the data in each group is the same. Because your data have different variances, it violates that assumption for nonparametric tests.
I’d use the ttest! You could also use bootstrapping, but a ttest should work fine.
Benny Zuse Rousso says
Hi Jim, very good post (along many others in your blog). Could you please provide any formal reference for the table of minimum sampling size?
Thanks a lot!
Ben
vivian says
Thanks a lot for the valuable information, but may I ask how much do you mean by tiny size of data, are they less than 30?
Mukhles says
Thank you.
Mukhles says
Hello Jim, when did you publish this article? I would like to cite it for my school work
Jim Frost says
Hi Mukhles,
I’m glad this article was helpful for you! When you cite web pages, you actually use the date you accessed the article. See this link at Purdue University for Electronic Citations. Look in the section for “A Page on a Website.” Thanks!
Akshat Garg says
Hi jim would u Please answer one of my doubt, i m badly stuck in
Jim Frost says
Hi Akshat,
Please find the blog post that is closest to the topic of your question. There is a search box in the right hand column part way down that can help you. Ask your question in the comments of the appropriate post and I’ll answer it!
Subhabrata Chakraborti says
Just wanted to add that the book “Nonparametric Statistical Inference, fifth edition” by Gibbons and Chakraborti (2010; CRC Press) has discussions about the power of some nonparametric tests, including Minitab Macro codes to simulate power. The updated edition (work inprogress) will discuss R codes. Hope this helps.
Julia Kirchner says
Hi Jim! Great article, it really helped me for my study.
Only problem now is that I need scientific papers for the statements made in your text, to refer to them in my study.
Specifically I was wondering if you coud provide me with the paper you used to draw this conclusion “parametric tests have more power. If an effect actually exists, a parametric analysis is more likely to detect it”
Thanks a lot!
Jim Frost says
Hi Julia,
Thanks for your kind words. I’m glad it was helpful!
It’s generally recognized that nonparametric tests have somewhat lower power compared to a similar parametric test. In other words, to have the same power as a similar parametric test, you’d need a somewhat larger sample size for the nonparametric test. That’s the tendency.
However, calculating the power for a nonparametric test and understanding the difference in power for a specific parametric and nonparametric tests is difficult. The problem arises because the specific difference in power depends on the precise distribution of your data. That makes it impossible to state a constant power difference by test. In other words, the power difference doesn’t just depend on the tests themselves but also the properties of your data.
For more information about these considerations, look at the following texts:
Walsh, J.E. (1962) Handbook of Nonparametric Statistics, New York: D.V. Nostrand.
Conover, W.J. (1980). Practical Nonparametric Statistics, New York: Wiley & Sons.
Andrea says
Jim, do you have anything which describes how to estimate the power of a nonparametric test?
Jim Frost says
Hi Andrea,
Calculating power for nonparametric tests can be a bit complicated. For one thing, while nonparametric tests don’t require particular distributions, you need to know the distribution to be able to calculate statistical power for these tests. I don’t think many statistical packages have built in analyses for this type of power analysis. I’ve also heard of people using bootstrap methods or Monte Carlo simulations to come up with an answer. For these methods, you’ll still need either representative data or knowledge about the distribution.
Apparently, the pwr.boot function in R uses the bootstrap method to calculate power for nonparametric tests. Unfortunately, I have not used it myself but could be something to try. The problem is that you should not use data from a hypothesis test to calculate the power for that hypothesis test. If the test was statistically significant, power will be high. If the test was not significant, the power is low. You don’t know the real power. So, I’m not sure about the rational for using this command, but it is one approach.
John says
Hi. I wanted to leave a comment . . .
Jim Frost says
Hi John,
Thanks for the headsup. I tried sending you an email but it bounced.
Jovana says
Hi Jim,
Thank you for this nice explanation. I must consult with you regarding the situation I have with my data. I have 10 data sets (10 different metals), each data set consisting of 20 values (5 values in 4 seasons). These are the measurements of the metal concentrations in fish liver and I want to assess if there are seasonal variations. I tested the normality of distribution and got normal distribution for 7 metals, and for 3 a non normal distribution. I have tested the homogeneity of variance (Leven’s test) and got result that 6 of the metals have homogeneous variation, while other 4 metals (3 of which have non normal distribution) does not have homogeneous variance. Finally, my question is, should I use parametric test (One way ANOVA) for all the 10 data sets, since majority of samples have normal distribution and homogeneous variance? Should I use non parametric (KruskalWallis H) since my data sets are not large (20 values)? Or should I test normally distributed data with parametric, and non normally distributed data with non parametric?
Thank you in advance,
Kind regards,
Jovana
Pam says
Hi again Jim,
This time my query regarding missing data when sample size is low. How do we deal with missing dependent variables in a continuous data set observed at different time intervals?
Is multiple imputation a good option when data (sample) is missing at some time points and some were not detected due to method limitations. Some suggest replacing undetected data with the lowest possible value, such as 1/2 of the limit of detection instead of using zero. Can undetected data be treated as missing data?
I have looked up some multiple imputation methods in SPSS but not sure how much acceptable it is and how to report if acceptable.
Please enlighten with your expertise.
Thank you in advance!
Jim Frost says
Hi Pam,
Generally speaking, the less data you have the more difficult it is to estimate missing data. The missing values also play a larger role because they’re part of a smaller set. I don’t have personal experience using SPSS’ missing data imputation. I’ve read about it and it sounds good, but I’m not sure about limitations.
I’m not really sure about the detection limits issue. For one thing, I’d imagine that it depends on whether the lowest detectable value is still large enough to be important to your study. In other words, if it is so low that you’re not missing anything important, it might not be a problem. Perhaps the lowest detectable value is so low that in practical terms it’s not different from zero. But, that might not be the case. Additionally, I’d imagine it also depends on how much of your data fall in that region. If you’re obtaining lots of missing values or zeroes because much of the observations fall within that range, it becomes more problematic. Consequently, how to address it become very context sensitive and I wouldn’t be able to give you a good answer. I’d consult with subjectarea specialists and see how similar studies have handled it. Sorry I couldn’t give you a more specific answer.
Pam says
Great! Thanks Jim. This is really helpful.
Cheers!
Brittney says
Thank you so much for this article! I wasn’t planning on using statistics in my research, but my research took a turn and my committee wanted to see testable hypotheses…for paleontology! Ugh. But, this article and your website is incredibly useful in dusting off the stats in my brain!
Jim Frost says
Your kind words mean so much to me. Thank you, Brittney!
Pam says
Hi Jim,
Thank you for making statistics a lot easier to understand. I now understand that parametric tests can be performed on a nonnormal data if the sample size is big enough as indicated.
I have a few confusions regarding when and when not to perform log transformation of skewed data?
When does the data have to be log transformed to perform statistical analysis? Can parametric tests be done on a log transformed data and how do we report the results after log transformation?
Do you have a blog post regarding this? Please provide your expert insights on these when possible.
Thank you
Jim Frost says
Hi Pam,
Yes, you can log transform data and use parametric analyses although it does change a key aspect of the test. You can present the results as saying that the difference between the log transformed means are statistically significant. Then, back transform those values to the natural units and present those as well. Also, note that using log transformed data changes the nature of that test so that it is comparing geometric means rather than the usual arithmetic means. Be sure that is acceptable. Also, check that the transformed data follow the normal distribution.
However, you generally don’t need to do this if you have a large enough sample size per group–as I point out in this post. Consider using transformations only when the data are severely skewed and/or you have a smaller sample size. Unfortunately, I don’t have a blog post on this process. However, unless you have a strong need to transform your data, I would not use that approach.
I hope this helps!
Mrinali says
Very helpful article. Nice explanation
Lynn says
Jim, your site in general and this page has helped me understand statistics so much better as a novice. Regarding the Wilcoxon, although super helpful in understanding the basics I’m still unsure about how I can relate this to my study. It’s been loosely suggested to me by a peer that I use the Wilcoxon text, but I’m not sure how to confirm this.
I have 13 participants. They each watched Video 1 and answered 16 corresponding questions (8 for construct A and 8 for construct B). They then watched Video 2 and answered the same 16 questions (8 for construct A and 8 for construct B). The questions were 3, 5, and 7 likert scale questions.
I want to find the differences in ratings between Videos 1 and 2 for construct A, the differences in ratings between Videos 1 and 2 for construct B, and the highest rated Video in total (combining both constructs). Any advice? Thanks
Asmat says
It is really helpful article. I learned a lot. Thanks for posting.
Jim Frost says
You’re very welcome. I’m glad it was helpful!
Asmat says
Thanks Jim. Which posthoc test would you suggest in this case. I really appreciate it. Thanks.
Jim Frost says
The posthoc test I’m most familiar with is the GamesHowell test, which is similar to Tukey’s test. I’m sure there are others, but I’m not familiar with them. For more information and an example of Welch’s with this posthoc test, read my post on Welch’s ANOVA.
Asmat says
Hi Jim,
I am dealing with 6 groups of a data set with different number of sample sizes. The minimum sample size of one group is 56 and maximum is 350 and other groups sample sizes are in between these two points. My data is not normal and through levene’s test I found that the variances are not equal. I think comparison of mean is somehow meaningful compared to median. Could you please guide me to select between Welchtest ANOVA or Kruskal Wallis test?
Thanks
Jim Frost says
Hi Asmat,
Given your large sample sizes, unequal variances, and the fact that you want to compare means, I ‘d use Welch’s ANOVA.
Best of luck with your analysis!
Ferhat says
Hi from Turkey
I have followed your post for 6 months. Every article is better than the last. Thank you for have loved the statistic.
Jim Frost says
Hi Ferhat, thank you so much! That means a lot to me!
John says
Hi Jim,
This is really an insightful article. I have a question though regarding my study. Can I still use a parametric test even if the distribution is not normal and the variances aren’t homogeneous? I checked those assumptions via ShapiroWilk test and Levene’s Ftest and the results suggested that both assumptions were violated. Other online articles mentioned that if this is the case, I should use a nonparametric test but I also read somewhere that oneway ANOVA would do. By the way, I have 3 groups with equal number of observations, i.e., 21 for each group.
Thanks for your time.
Jim Frost says
Hi John,
If you sample size per group meets the requirements that I present in the Advantage #1 for parametric test, then nonnormal data are not a problem. These tests are robust to departures from normality as long as you have a sufficient number of observations per group.
As for unequal variances, you often have stricter requirements when you use nonparametric tests. This fact isn’t discussed much but nonparametric tests typically requires the same spread across groups. For ttests and ANOVA, you have options that allow you to use them when variances are not equal. For example, for ANOVA you can use Welch’s ANOVA. For details on that method, read my post about Welch’s ANOVA.
Based on your sample size per group, you should be able to use ANOVA regardless of whether the data are normally distributed. If you suspect that the variances are not equal, you can use Welch’s ANOVA.
I hope this helps.
John says
Thanks a lot for your prompt response, Jim. Really appreciate it. I’ll check on Welch’s ANOVA, then. Again, many thanks!
jain says
My data of 350 doesnt follow normal distribution.. which one should i take median or mean..how should it be reported.. should i report on mean sd cv etc
Jim Frost says
Hi Jain,
The answer to this question depends on which measure best represents the middle of your distribution and what is important to the subject area. In general, the more skewed your distribution, the more you should consider using the median. Graph your data to help answer this question. Also, I’ve written a post about the different measures of central tendency that you should read!
I hope this helps!
Muhammad Nazir says
Thanks Respected Sir
I got your point. You are great.
Jim Frost says
You’re welcome. I’m glad I could help!
Muhammad Nazir says
there is no significant difference in preintervention scores of groups with p value>0.05 but when we see Mean scores of groups there are minor difference among the groups. In this case Can I use ANCOVA?
Jim Frost says
ANCOVA allows you include a covariate (a continuous variable that might be correlated with the dependent variable) in the analysis along with your categorical variables (factors). Telling me about the means of the groups is not applicable to whether you should use ANCOVA specifically. Do you have a continuous independent variable to include in the analysis?
I’m not sure why you’re analyzing the preintervention scores? However, it is entirely normal to see differences between the group means when the pvalue is greater than 0.05. However, that issue does not relate to whether you should use ANCOVA or not.
If you have only the 5 groups and there are no other variables in your analysis, no you can’t use ANCOVA because you don’t have a covariate. Seems like you should use oneway ANOVA. You can subtract the pretest scores from the posttest scores so you’re analyzing the differences by group. This process will tell you how the changes in the experimental groups compare to the change in the control group.
Muhammad Nazir says
Respected Sir, please answer my last two questions too.
Jim Frost says
I will, Muhammad. Please keep in mind that the website is something I do in my spare time. I try to answer all questions but sometimes it will take a day or two depending on what else I have going on.
Muhammad Nazir says
Thanks Great Sir
Muhammad Nazir says
Dear Jim Frost thanks for your kind reply,
Please also guide and answer my two questions more:
1. NO significant difference was found among the Covariates with p>0.05 before intervention. But there is minor difference in their mean score. In this case, Can I use ANCOVA for analysis with covariates having significant score with p>0.05?
Is it okay that using ANCOVA will remove the initial differences found in mean score of covariate though there was No significant difference found in terms of p>0.05 before intervention?
2. In my experimental study sample size is 50. There are 5 groups (4 experimental and 1 control group). I am using randomized pretestposttest control group design but some people say this research design is not appropriate. Please guide is this research okay or not? if not then please tell the appropriate design?
I am giving different interventions to 4 experimental groups, No intervention to control group. Please reply immediately.
Jim Frost says
Hi Muhammad,
I’m a bit confused by your first question. Covariates are continuous variables so there are not any significant differences. Covariates don’t assess the differences between the means of the levels of a categorical variable. Instead, you use the pvalue to determine whether there is a significant relationship between the covariate and dependent variable in the same manner as for linear regression. Usually, if the it is not significant, you don’t include it in the model. However, if theory strongly suggests that it should be in the model, it is ok to include it even when the pvalue is greater than 0.05.
I don’t see why a pretestposttest would not be OK. But, I don’t have much information to go by. Why did they say it was not appropriate?
Muhammad Nazir says
Actually I have only 10 subjects each group which is not greater than 15. thats why I asked?
Jim Frost says
Hi Muhammad,
That size limit is only important when your data don’t follow a normal distribution. You said that your data do follow the normal distribution. So, it shouldn’t be a problem!
Muhammad Nazir says
I have 5 groups in experimental study (4 experimental and 01 control). Sample size 50 with 10 subjects in each group. All groups have normal distribution. Can I use parametric test, please reply immediate.
Jim Frost says
Hi Muhammad, given what you state, I see no reason why you couldn’t use a parametric test.
sam says
Hi Jim, thanks for the overview! Do you happen to have a source/reference I can refer to when using the claims you make as argumentation in my paper?
Jim Frost says
Hi Sam, I include a link in this post to a white paper about the sample size claims. You’ll find your answers there!
Mohammad Hasan says
Wonderful article…love all your articles…😃
Jim Frost says
Thank you, Mohammad! That means a lot to me!
david okurut says
I have benefited from your information. May God bless You.
Jim Frost says
Thank you, David! It makes me happy to hear that this has been helpful for you!
Anitha Suseelan.s. says
Very nice explanation
Of central tendencies
Jim Frost says
Thank you, Anitha!
Mosbah says
How can I cite this article?
Jim Frost says
Hi, there are several standard formats for electronic sources, such as MLA, APA, and Chicago style. You’ll need to check with your institution to determine which one you should use.
BIRUK AYALEW Wondem says
very nice
Lucas says
Great article. This is one of those statistical tests that took a while to understand. But you explained it very nicely!
Jim Frost says
Thank you so much Lucas!
Jim Frost says
Thanks, Lucas!