How do you analyze Likert scale data? Likert scales are the most broadly used method for scaling responses in survey studies. Survey questions that ask you to indicate your level of agreement, from strongly agree to strongly disagree, use the Likert scale. The data in the worksheet are five-point Likert scale data for two groups.

Likert data seem ideal for survey items, but there is a huge debate over how to analyze these data. The general question centers on whether you should use a parametric or nonparametric test to analyze Likert data.

Read my post that compares parametric and nonparametric hypothesis tests.

Most people are more familiar with using parametric tests. Unfortunately, Likert data are ordinal, discrete, and have a limited range. These properties violate the assumptions of most parametric tests. The highlights of the debate over using each type of test with Likert data are as follows:

- Parametric tests assume that the data are continuous and follow a normal distribution. Although, with a large enough sample, parametric tests are valid with nonnormal data. The 2-sample t-test is a parametric test.
- Nonparametric tests are accurate with ordinal data and do not assume a normal distribution. However, there is a concern that nonparametric tests have a lower probability of detecting an effect that actually exists. The Mann-Whitney test is an example of a nonparametric test.

What is the best way to analyze Likert scale data? This choice can be a tough one for survey researchers to make.

## Which Test is Better for Analyzing Likert Scale Data

Studies have attempted to resolve this debate once and for all. Unfortunately, many of these studies assessed a small number of Likert distributions, which limits the generalizability of the results. Recently, more powerful computers have allowed simulation studies to meticulously analyze a broad spectrum of distributions.

In this post, I highlight a study by de Winter and Dodou*. Their study is a simulation study that assesses the capabilities of the Mann-Whitney test and the 2-sample t-test to analyze five-point Likert scale data for two groups. Let’s find out if one of these statistical tests is better to use!

The investigators assessed a group of 14 distributions of Likert data that cover the gamut. The computer simulation generated independent pairs of random samples that contained all possible combinations of the 14 distributions. The study produced 10,000 random samples for each of the 98 combinations of distributions. Whew! That’s a lot of data!

The study statistically analyzed each pair of samples with both the 2-sample t-test and the Mann-Whitney test. Their goal is to calculate the error rates and statistical power of both tests to determine whether one of the analyses is better for Likert data. The project also looked at different sample sizes to see if that made a difference.

## Comparing Error Rates and Power When Analyzing Likert Scale Data

After analyzing all pairs of distributions, the results indicate that both types of analyses produce type I error rates that are nearly equal to the target value. A type I error rate is essentially a false positive. The test results are statistically significant but, unbeknownst to the investigator, the null hypothesis is actually true. This error rate should equal the significance level.

The 2-sample t-test and Mann-Whitney test produce nearly equal false positive rates for Likert scale data. Further, the error rates for both analyses are close to the significance level target. Excessive false positives are not a concern for either hypothesis test.

Regarding statistical power, the simulation study shows that there is a minute difference between these two tests. Apprehensions about the Mann-Whitney test being underpowered were unsubstantiated. In most cases, if there is an actual difference between populations, the two tests have an equal probability of detecting it.

There is one qualification. A power difference between the two tests exists for several specific combinations of distribution pairs. The difference in power affects only a small portion of the possible combinations of distributions. My suggestion is to perform both tests on your Likert data. If the test results disagree, look at the article to determine whether a difference in power might be the cause.

In most cases, it doesn’t matter which of the two statistical analyses you use to analyze your Likert data. If you have two groups and you’re analyzing five-point Likert data, both the 2-sample t-test and Mann-Whitney test have nearly equivalent type I error rates and power. These results are consistent across group sizes of 10, 30, and 200.

Sometimes it’s just nice to know when you don’t have to stress over something!

## Reference

*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, *Practical Assessment, Research and Evaluation*, 15(11).

Mohamed says

Hi Jim,

Thanks for this informative website, I went through fruitful ideas but I didn’t find exactly how to deal with my current case.

I have a questionnaire for satisfaction and to check which factors contribute more to the customer satisfaction. along with Age, Gender and type of service, I have many factors that reviewed by customers in an ordinal response (Extremely poor,Poor,Need improvement,Acceptable,Good,Excellent) and the satisfaction (is either satisfied or not) so which model and analysis method I can use to predict satisfaction given these different type of factors?

AM thinking to use Categorical PCA and for modelling am not sure which to use? should I scale it (1-6) and use K-means?

Appreciate your support

Sridhar V says

Hi Jim:

I managed to locate the reference for my above clarification on substituting likert scale with other values. Here it is:

https://amp.reddit.com/r/statistics/comments/82perc/how_to_analyze_ranking_data_eg_1st_2nd_3rd/

There is a further link therein:

https://statmodeling.stat.columbia.edu/2015/07/13/dont-do-the-wilcoxon/

I value your comments on the above at your convenience.

Thanks in anticipation.

Sridhar says

Thanks Jim for your clarification.

Sridhar V says

Hi Jim. I am Sridhar from Bangalore, India. I have been following some of your blogs recently and thank you for such a lucid explanation of statistics.

I have a request to make in connection with analysis of data collected in response to Likert Scale based questionnaire.

I have noted down that the best way to convert 5-point likert scales in to numerical values is by using -1.28,0.52,0.0,5.2,1.28. Unfortunately, I haven’t noted down the article or the book chapter.

I was wondering if you this way of converting data rings a bell with you or anyone else seeing this message. Can you help me with the context of this way of converting the responses in to numbers.

Thanks

Jim Frost says

Hi Sridhar,

I’m not familiar with that approach of converting 5-point Likert scale items. The two main problems with the Likert scale:

I’m not sure that the recoding you describe solves those problems. Perhaps there is a rationale behind it that I’m not aware of. Even if there is, I’d be leery about assuming it is applicable to all data.

Will says

Hi Jim,

I am struggling with WHICH type of regression analysis I should use and HOW. I created a questionnaire in order to understand the relationship of 4 (IVs) factors and 1(DV).

Each of the (IVs) has Five questions with 5 Likert Scale Items. For example:

(First Factor)

Q1 (strongly disagree=1, disagree=2, neutral=3, agree=4, strongly agree= 5)

Q2 (strongly disagree=1, disagree=2, neutral=3, agree=4, strongly agree= 5) and so on.

The (DV) also has Five questions with 5 Likert Scale Items. For example:

Q1 (strongly dissatisfied =1, dissatisfied =2, neutral=3, satisfied =4, strongly satisfied = 5)

WHICH type of regression analysis or statistical test I should use and HOW?

Thanks in advance,

Will

Jim Frost says

Hi Will,

Because your DV is ordinal, you’ll need to use ordinal logistic regression. Read my post about choosing the correct type of regression analysis and look for it in there.

Ordinal IVs can be tricky. I don’t have a post to direct you towards but I strongly recommend getting my eBook about regression analysis. In it, I talk how to handle ordinal IVs. You’ll need to use them either as categorical or continuous variables because ordinal variables have a mix of traits.

From there you just fit the model, check the residuals, and interpret the residuals. Read my post about fitting the correct model.

Irena says

Hello

I’m currently working on my master’s thesis and I have to find out if the student evaluations are biased or not. I have a questionnaire data made with likert scale from 7 faculties. Can you help me with which model should I use? Thank you.

Jim Frost says

Hi Irena,

There’s not enough information in those several sentences to be able to understand your research project goals, data collect, etc., and provide recommendations. I’d recommend consulting with a statistician or advisor at your institution who can give your study the time it deserves.

Dale says

Hi Jim! I’m all too aware that my cognitive strengths do not lie in the stats area…

I have done a survey to determine elements of professional identity of a group of analysts. The study is exploratory mixed methods (the first study of this specific profession), and I need help to make sense of my data.

I have used Likert 5 scale questions for 2 elements of professional identity that I have taken from scholars who have developed the instruments 1=strongly agree = 5 strongly disagree). I have already done descriptive analysis where I id’ed the % of respondents on each scale. I don’t want to test reliability of the instrument etc but would like to compare the different demographic variables like genders/professional org membership, country, etc for these 2 elements. N=75.

I have run Mann-Whitney for the 2 factor variables, and thought about using Kruskal-Wallis H tests for those variables with more than 2 possibilities eg countries.

The results in SPSS Mann-Whitney show for instance that for one sub-element that the distribution is not the same across the genders (Asymp sig of .013). The mean rank for males (n=57) is 34.86 and that of females N=18) is 47.94. Sum of ranks is male: 1987.00 and female: 863.00.

But now to analyse this and understand and write this? What does this mean? Are the men more positive about this aspect than women? What is the “so what?” here? I need to see the value if what I’m doing here, otherwise, if you can show me to a more appropriate method?

Other people have “just” used a table to compare the mean scores of the different demographic groupings, for instance Gender: F=2.19 M= 2.76, Qualification: Doctorate=1.16, Masters degree=3.12, bachelors=2.76 etc. This is sufficient for me, as I need to know if there is a big difference between the genders? What is “big difference”. This is why I thought Mann-Whitney is a good method to say that “the genders share the same opinion about 8 of the 9 elements, but in the 9th element, men feel more positive than females.”

Please help!

Dale

Jim Frost says

Hi Dale,

Personally, I find Likert data to be aggravating. I know it’s easy to ask those types of questions in a survey. It easy for respondents to figure out how to answer. However, Likert scale data are ordinal data, which presents analysis problems because they’re a bit like continuous data and a bit like categorical data. How do you treat and analyze them? There’s been a long standing debate over whether you should use parameteric or nonparametric analyses for them. The study I cite suggests that when you’re looking at two sample analyses, such as for male versus female, it doesn’t matter much as long as you have at least 10 observations per group. Your data qualifies, so I’d say you could use either approach. If others in your field use means, it’s probably OK to go that route and not fight the current! And, you can cite this article to support the decision.

For cases where there are more than two groups, such as qualification, I don’t know if there is similar research. I would expect it would show similar results, although I don’t know what the minimum sample sizes per group would be. If you do go with the nonparametric analyses, analysts will often report the medians for each group.

Finally, I think your question about “big difference” touches upon the difference between statistical significance versus practical significance. Click that link for my post on that topic. In a nutshell, a statistically significant results doesn’t necessarily guarantee that the effect or difference is important in the real world.

Best of luck with your analysis!

UMA MAHESH KUMMARI says

Sir, If you had made a video on it, please post it here.

Thank you.

Jim Frost says

Hi Uma,

Currently, I don’t have videos. In the future, I plan to create video lessons.

Antehun Atanaw says

Sir, the note is fine but still I am not clear about how to analyze Likert Scale(with five choices) using SPSS data analysis. I do believe that SPSS can only perform on Likert scale to show frequency, mean meridian. standred devation and other T-test, ANOVA, correlation and the others are not possible to be done by SPSS.

Mean that Likert scale didn’t show basic statstics.

Mrs Helen E. Pearson says

Hello,

I’m analysing data from someone else’s questionnaire, I have 148 respondents for a 40 statement questionnaire. It uses a four point Likert scale. I’m using Wilcoxon Signed ranks to analyse significant improvement in scores between test/retest. I’d also like to correlated some of the variables.

There are four main areas of interest, and each of those is divided into sub-sections. My problem is that some

of the sub-sections are negatively scored, yet the section they relate to is totalled.

is it reasonable to manipulate the reversed scored items so all the ‘unhelpful’ high scores become low ‘helpful’ scores?

Jim Frost says

Hi Helen,

If think what you want to do makes sense. That’ll allow you to obtain high, positive correlations when helpful scores correlate with other helpful scores. Otherwise, you’ll obtain negative correlations. You’d still obtain the same information, but it’s less intuitive and potentially confusing to others.

Nadine says

I am conducting an experimental research to test the effectiveness of task-based approach on improving the students’ literacy skills. I had two groups: experimental and control and each answered a questionnaire of 17 items using likert scale to express their attitude towards the lesson given (with and without task-based approach). What statistical test should I use to test the effectiveness of this approach on improving the students’ literacy skills?

I appreciate your answer asap.

Jim Frost says

Based on the information and research that I present in this blog post, it should be clear that you can use either the two-sample t-test or the Mann-Whitney test. Unfortunately, many reviewers and advisors might have strong opinions about one being more appropriate than the other, but the research shows that either test is valid overall.

Samuel kobina otu says

Hello sir, please do we have anything called ‘test value’ when analyzing data from a likert scale? What is the meaning of that term?

Qaisar Sohail says

Hi jim

it was really informative post. sir i have a data set of 32 variable which is filled from 180 respondent. my all variable are on nominal scale and likert scale. their is no response variable. can i took gender variable as a response variable and use logistic regression?? if it is not suitable than tell me which type of analysis can i done rather than of cross table???

i am waiting for your response.

Syed Abbas says

Hi Jim,

Thanks for another great article.

I have a question follow to that of NA that you explained here. How we should treat ‘don’t know’ response, do the ‘don’t know’ suppose to be treated the same as NA. usually we exclude ‘Don’t know’ and replaced them with SYS MIS in SPSS when we use agree scales (10 point) in regressions. Is this the right way of doing or we should assign some code to them???

Kaushal Kumar Bhagat says

Dear Jim,

Thanks a lot for your prompt reply. I am just wondering how to do chi-square test of independence. In your example you have two IV (color and area) but in my case I have only items. Kindly ellaborate the process of chi-square test of independence. for my case.

I have a sincere request. Please video lessons statstical analysis usning any software like SPSS/Mplus/R

Waiting for your kind reply.

Jim Frost says

Hi Kaushal,

You could one item that uses the Likert scale for one variable and another Likert scale item as the other variable. If you collapse it to three values as you describe, this would give you a 3X3 grid. If you perform a chi-square analysis on this, you’d learn whether the two items are independent or if there is sufficient evidence to conclude that they are associated. If they’re associated, you might find that those who agree on one item are more likely to agree on the other item, or maybe they’d be more likely to disagree. Or, perhaps, there is no association between the two items. How they respond to one item does not correlate with how they respond on the other. It’s very similar to the example I use, but you’ll have a 3X3 grid that has all the combinations of agree/agree, agree/neutral, agree/disagree, disagree/agree, disagree/neutral, disagree/disagree, and so on. Do respondents fall into those cells randomly or does there response on one item correlate to their response on the other item?

I’ve actually done exactly that with survey results years ago. I surveyed faculty about their comfort in using technology personally and in the classroom. Each time used a five point Likert scale. I didn’t convert to a 3 point scale like you’re considering. Not only the table show where they fall for both items, you can see how their responses compare for both items. I saw a pattern that showed faculty were less comfortable using technology in the classroom than on their own.

I am planning to create courses as you describe. So, those are coming! However, I have two books planned first. Hopefully, I can begin creating the courses in 2020. It’s definitely something I want to do!

Kaushal Kumar Bhagat says

Hi Jim,

First of all, I would like to thank you.I always read your post. It is very informative and helpful. Following is my query:

Objective: To check for significant differences between the proportions of disagreeing, neutral, and agreeing students

Let us suppose that i58, i59, i60 belongs to Factor A. ( strongly agree=1, agree=2, neutral=3, disagree=4, strongly disagree=5).

1.How to collapse the five-point scale into three categories: ‘disagree’ (i.e. strongly disagree and disagree), ‘neutral’ (i.e. neither agree nor disagree) and ‘agree’ (i.e. agree and strongly agree).

2. Please illustrate how to find significant differences between the proportions of disagreeing, neutral, and agreeing students using chi-square test in excel or SPSS.

Item Strongly agree Agree Neutral Disagree Strongly disagree

i58 270 440 63 11 8

i59 400 354 28 5 5

i60 239 428 104 15 5

Jim Frost says

Hi Kaushal,

I’m so happy to hear that my posts have been helpful.

I’d recommend using one of the methods I discuss in this post for analyzing a five-point Likert scale data. Both are shown to work effectively.

If you’re set on collapsing categories, that’s just a recoding issue. All 1s and 2s become “agree” and all 4s and 5s become “disagree.” Zeros are neutral. With this three-point scale, you might not be able to use t-tests or Mann-Whitney as I discuss in this post. I haven’t heard of using a proportions test on this type of data. You could try ordinal logistic regression or chi-square test of independence.

Best of luck with your analysis!

Voon Teng says

Dear Sir, greeting.

I examined satisfaction level of something with likert scale (5 ratings), is it possible to run simple independent t test for this with Age/Gender variable? I have 7 questions under satisfaction level section.

I read that independent T test just for interval scale instead of ordinal scale.

Very appreciated if can get your reply. Thank you

Jim Frost says

Hi Voon,

As I write in this post, yes, you can use a 2-sample t-test, which is for independent samples, with 5 point Likert scale items.

Deidre Whitfield says

How do you analyze NA data when computing the average using the Likert? Should you use zero as the value or can you assign it a value? If so, is there a best value to give NA responses?

Jim Frost says

Hi Deidre,

NA responses can be difficult to include in your analysis. There’s no one size fits all answer. You’ll need to determine if NA fits in logically with your scale, and what value it represents. That’ll vary based on the subject area and the scale.

In some cases, NA values may need to be excluded. For example, in a strongly agree to strongly disagree scale, if NA truly means not applicable, the respondent is indicating that the item does not apply to them. In those cases, you should consider excluding their response from the dataset for that item. You’ll have to think about whether NA is different than say Neither agree or disagree, or whatever the middle value is.

However, if you can take NA to represent some sort of middle value, or something else, you can use it for that. However, you have to be very careful. And, in fact, I’d say that if you can use NA to represent some other value for that question, it represents bad survey design because you have two different options for item that are equivalent. If NA maps to another option, it’s probably best to not even include NA as option for that item in the first place. For example, if the question is, how strongly does an issue affect you? And the scale ranges from very strongly affects me to does not affect me at all, an NA response probably corresponds to does not affect me at all. But, why include both because they’re redundant?

For that reason, my guess is that NA does not map directly to another option most of the time. But, you’ll have to consider the scale and whether a value on it maps to NA. I can see cases where NA might equal no opinion.

Traci says

Jim,

Love you website, it is easy to understand and has helped me a lot. I have a question. I have 2 sets of survey data. One from patients (n=42) and one from staff (n=12). There are two sub-scales that I want to compare the two groups on but am concerned about the difference in sample sizes. Would I use a Mann-Whitney just on the two sub-scales?? The sub-scales were measured on a 5-point Likert scale. Your help is appreciated.

Jim Frost says

Hi Traci, thanks for the kind words. I really appreciate them!

I wouldn’t worry about the unequal sample sizes as long as your smallest group has more than 10, which it does.

The benefit of equal sized group comes in the form of statistical power, which the ability to detect a difference. It appears like you have 54 observations. Now, if you had two groups each with 27 observations to produce that total of 54, your test would have more power than what you’ve actually got. However, reality isn’t always nice and neat, and you have to work with what you’ve got. So, it’s fine to test those two groups. The statistical power is somewhat less than what it would’ve been with equal sized groups, but it’s not inherently problematic.

Shuchi says

Hi Jim,

I m a learner and doing a correlational research on job satisfaction and attitude of teachers,for job satisfaction I have used a likert scale.In this scale there are 8 factors of job satisfaction and there are fixed number of statements for each factor like factor A has 7 statements.Also the scale has 52 statements.Before data analysis can I use mean score of each participant i.e. raw score /52.And same approach for factor analysis too i.e. raw score of factor A /7

Jim Frost says

Hi Shuchi,

There’s general rule of thumb that if you have a discrete variable that has 10 equally spaced values or more, and the data are spread across those values, you can treat it as a continuous variable. If you satisfy that, I think you’re safe. And, it sounds like if you’re summing the scores for those statements you’ll be ok. Using the average is

probablyok too because you are effectively using the same amount of information.Joseph says

My is not a response to the question asked but a more clarification on this issue. Assuming one uses a four point scale can we still use t-test? Also can we call a four point scale likert?

Jim Frost says

Hi Joseph,

Unfortunately, I don’t know for sure. I haven’t read research about it that says one way or another. The research that is the basis for this post only assessed 5-point scales. Honestly, I was surprised that t-tests worked as well as they did for 5 point scales! Four points scales satisfy the t-test assumptions even less than 5 point scales. So, I think that would be risky–but I can’t say that I know for sure.

As for the terminology, I’ve seen it argued both ways. Some say that a Likert scale specifically refers to a 5-point ordinal scale. While I’ve seen others say that it doesn’t have to be 5-points. My own take is that there are probably other more important considerations for what constitutes a Likert scale. Namely, that the values need to be balanced between positive and negative relative to a neutral value. Additionally, the distance between values are equal.

In other words, a Likert scale is a special type of ordinal data scale. Ordinal data don’t require those properties (balance, neutral value, and equal spacing), but in my mind, Likert scales do require those properties, but don’t require specifically 5-points. With four points, I’m not sure that your data can satisfy all of those requirements.

I hope this helps!

Haris says

Hi Jim, I know this is a long time since your post. I had a query.

I am involved with analysing some days where an educational intervention was performed for students. Likert scale was used to assess pre- and post- intervention changes in knowledge, confidence level etc.

I do not think one can use the scale data like Likert as they are, to assess significance with t-test or the Mann-Whitney test.

Do you think we should assign numerical values to the scale data before using the tests of statistical significance?

For example 0.2 for a Likert reading of 1 on a scale of 5, 0.8 for 4/5 etc. ?

Bokossa Sidoine says

in my comprehension we can you use 2-sample t-test or Mann-Whitney If we have two groups and analyzing five-point Likert.i have one question.

what about if we have more than two groups and more than five-point Likert?

very intersting thanks you so much..

Jim Frost says

That’s correct. As for the other cases you mention, it looks promising but we can’t say definitively from this research. However, as you increase the number of values (e.g., a 7-point scale), the data are becoming more like a continuous variable, which is good. And the F-test in ANOVA is a generalization of the t-test. So, the results should be applicable to these other cases. The question in my mind is that as you increase the number of groups with ANOVA, you’d need to be sure to keep the number of observations per group at a good number. So, it looks promising for these other cases that you mention, but I can’t state definitively that it’s true based on the specific research that I’ve read.

niaz hussain ghumro says

Good and very informative

Jim Frost says

Thank you!

Naveen Kumar S says

Hi sir,

am Naveen Kumar S, from india. recently on 1st july 2017 GST was implemented across India and am writing a research paper on GST and the issues faced by the respondents (both CAs and tax payers) after GST implementation. for this i had received the responses through likert scale based questions and now stuck in analyzing the data. dont know in which perspective i have to initiate (the main theme is-issues faced by them in post GST implementation) and also as a learner cant able to frame the null and altenate hypothesis…

pls help me in this regard and give some hint/ solution for the same as early as possible…

thanks in advance

with regards

Naveen S

[email protected]