How do you analyze Likert scale data? Likert scales are the most broadly used method for scaling responses in survey studies. Survey questions that ask you to indicate your level of agreement, from strongly agree to strongly disagree, use the Likert scale. The data in the worksheet are five-point Likert scale data for two groups.

Likert data seem ideal for survey items, but there is a huge debate over how to analyze these data. The general question centers on whether you should use a parametric or nonparametric test to analyze Likert data.

Read my post that compares parametric and nonparametric hypothesis tests.

Most people are more familiar with using parametric tests. Unfortunately, Likert data are ordinal, discrete, and have a limited range. These properties violate the assumptions of most parametric tests. The highlights of the debate over using each type of test with Likert data are as follows:

- Parametric tests assume that the data are continuous and follow a normal distribution. Although, with a large enough sample, parametric tests are valid with nonnormal data. The 2-sample t-test is a parametric test.
- Nonparametric tests are accurate with ordinal data and do not assume a normal distribution. However, there is a concern that nonparametric tests have a lower probability of detecting an effect that actually exists. The Mann-Whitney test is an example of a nonparametric test.

What is the best way to analyze Likert scale data? This choice can be a tough one for survey researchers to make.

## Which Test is Better for Analyzing Likert Scale Data

Studies have attempted to resolve this debate once and for all. Unfortunately, many of these studies assessed a small number of Likert distributions, which limits the generalizability of the results. Recently, more powerful computers have allowed simulation studies to meticulously analyze a broad spectrum of distributions.

In this post, I highlight a study by de Winter and Dodou*. Their study is a simulation study that assesses the capabilities of the Mann-Whitney test and the 2-sample t-test to analyze five-point Likert scale data for two groups. Let’s find out if one of these statistical tests is better to use!

The investigators assessed a group of 14 distributions of Likert data that cover the gamut. The computer simulation generated independent pairs of random samples that contained all possible combinations of the 14 distributions. The study produced 10,000 random samples for each of the 98 combinations of distributions. Whew! That’s a lot of data!

The study statistically analyzed each pair of samples with both the 2-sample t-test and the Mann-Whitney test. Their goal is to calculate the error rates and statistical power of both tests to determine whether one of the analyses is better for Likert data. The project also looked at different sample sizes to see if that made a difference.

## Comparing Error Rates and Power When Analyzing Likert Scale Data

After analyzing all pairs of distributions, the results indicate that both types of analyses produce type I error rates that are nearly equal to the target value. A type I error rate is essentially a false positive. The test results are statistically significant but, unbeknownst to the investigator, the null hypothesis is actually true. This error rate should equal the significance level.

The 2-sample t-test and Mann-Whitney test produce nearly equal false positive rates for Likert scale data. Further, the error rates for both analyses are close to the significance level target. Excessive false positives are not a concern for either hypothesis test.

Regarding statistical power, the simulation study shows that there is a minute difference between these two tests. Apprehensions about the Mann-Whitney test being underpowered were unsubstantiated. In most cases, if there is an actual difference between populations, the two tests have an equal probability of detecting it.

There is one qualification. A power difference between the two tests exists for several specific combinations of distribution pairs. The difference in power affects only a small portion of the possible combinations of distributions. My suggestion is to perform both tests on your Likert data. If the test results disagree, look at the article to determine whether a difference in power might be the cause.

In most cases, it doesn’t matter which of the two statistical analyses you use to analyze your Likert data. If you have two groups and you’re analyzing five-point Likert data, both the 2-sample t-test and Mann-Whitney test have nearly equivalent type I error rates and power. These results are consistent across group sizes of 10, 30, and 200.

Sometimes it’s just nice to know when you don’t have to stress over something!

## Reference

*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, *Practical Assessment, Research and Evaluation*, 15(11).

Naveen Kumar S says

Hi sir,

am Naveen Kumar S, from india. recently on 1st july 2017 GST was implemented across India and am writing a research paper on GST and the issues faced by the respondents (both CAs and tax payers) after GST implementation. for this i had received the responses through likert scale based questions and now stuck in analyzing the data. dont know in which perspective i have to initiate (the main theme is-issues faced by them in post GST implementation) and also as a learner cant able to frame the null and altenate hypothesis…

pls help me in this regard and give some hint/ solution for the same as early as possible…

thanks in advance

with regards

Naveen S

naven.s121212@gmail.com

niaz hussain ghumro says

Good and very informative

Jim Frost says

Thank you!

Bokossa Sidoine says

in my comprehension we can you use 2-sample t-test or Mann-Whitney If we have two groups and analyzing five-point Likert.i have one question.

what about if we have more than two groups and more than five-point Likert?

very intersting thanks you so much..

Jim Frost says

That’s correct. As for the other cases you mention, it looks promising but we can’t say definitively from this research. However, as you increase the number of values (e.g., a 7-point scale), the data are becoming more like a continuous variable, which is good. And the F-test in ANOVA is a generalization of the t-test. So, the results should be applicable to these other cases. The question in my mind is that as you increase the number of groups with ANOVA, you’d need to be sure to keep the number of observations per group at a good number. So, it looks promising for these other cases that you mention, but I can’t state definitively that it’s true based on the specific research that I’ve read.