How do you analyze Likert scale data? Likert scales are the most broadly used method for scaling responses in survey studies. Survey questions that ask you to indicate your level of agreement, from strongly agree to strongly disagree, use the Likert scale. The data in the worksheet are five-point Likert scale data for two groups

Likert data seem ideal for survey items, but there is a huge debate over how to analyze these data. The general question centers on whether you should use a parametric or nonparametric test to analyze Likert data.

Read my post that compares parametric and nonparametric hypothesis tests.

Most people are more familiar with using parametric tests. Unfortunately, Likert data are ordinal, discrete, and have a limited range. These properties violate the assumptions of most parametric tests. The highlights of the debate over using each type of test with Likert data are as follows:

- Parametric tests assume that the data are continuous and follow a normal distribution. Although, with a large enough sample, parametric tests are valid with nonnormal data. The 2-sample t-test is a parametric test.
- Nonparametric tests are accurate with ordinal data and do not assume a normal distribution. However, there is a concern that nonparametric tests have a lower probability of detecting an effect that actually exists. The Mann-Whitney test is an example of a nonparametric test.

What is the best way to analyze Likert scale data? This choice can be a tough one for survey researchers to make.

If you want to find correlations between Likert items, be sure to read my post about Spearman’s correlation because that analysis is designed for ordinal data.

Learn more about Ordinal Data: Definition, Examples & Analysis.

## Which Test is Better for Analyzing Likert Scale Data

Studies have attempted to resolve this debate once and for all. Unfortunately, many of these studies assessed a small number of Likert distributions, which limits the generalizability of the results. Recently, more powerful computers have allowed simulation studies to meticulously analyze a broad spectrum of distributions.

In this post, I highlight a study by de Winter and Dodou*. Their study is a simulation study that assesses the capabilities of the Mann-Whitney test and the 2-sample t-test to analyze five-point Likert scale data for two groups. Let’s find out if one of these statistical tests is better to use!

The investigators assessed a group of 14 distributions of Likert data that cover the gamut. The computer simulation generated independent pairs of random samples that contained all possible combinations of the 14 distributions. The study produced 10,000 random samples for each of the 98 combinations of distributions. Whew! That’s a lot of data!

The study statistically analyzed each pair of samples with both the 2-sample t-test and the Mann-Whitney test. Their goal is to calculate the error rates and statistical power of both tests to determine whether one of the analyses is better for Likert data. The project also looked at different sample sizes to see if that made a difference.

## Comparing Error Rates and Power When Analyzing Likert Scale Data

After analyzing all pairs of distributions, the results indicate that both types of analyses produce type I error rates that are nearly equal to the target value. A type I error rate is essentially a false positive. The test results are statistically significant but unbeknownst to the investigator, the null hypothesis is actually true. This error rate should equal the significance level.

The 2-sample t-test and Mann-Whitney test produce nearly equal false positive rates for Likert scale data. Further, the error rates for both analyses are close to the significance level target. Excessive false positives are not a concern for either hypothesis test.

Regarding statistical power, the simulation study shows that there is a minute difference between these two tests. Apprehensions about the Mann-Whitney test being underpowered were unsubstantiated. In most cases, if there is an actual difference between populations, the two tests have an equal probability of detecting it.

There is one qualification. A power difference between the two tests exists for several specific combinations of distribution pairs. The difference in power affects only a small portion of the possible combinations of distributions. My suggestion is to perform both tests on your Likert data. If the test results disagree, look at the article to determine whether a difference in power might be the cause.

In most cases, it doesn’t matter which of the two statistical analyses you use to analyze your Likert data. If you have two groups and you’re analyzing five-point Likert data, both the 2-sample t-test and Mann-Whitney test have nearly equivalent type I error rates and power. These results are consistent across group sizes of 10, 30, and 200.

Sometimes it’s just nice to know when you don’t have to stress over something!

## Reference

*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, *Practical Assessment, Research and Evaluation*, 15(11).

Dena Eaton-Colles says

Hello! It has been 18 years since I have done any stats. I am trying to work out what method is best to analyze the following: We have about 50 responses to a likert (1-7) scale for 10 questions.

An example question is: On a scale of 1-7 (1-least 7 -most), how important is flexible seating in your classroom?

Would I assign each person a “score” by adding their scores on the items together. For the question example above, the lowest score possible would be 0, the highest 7.

Compute the average and standard deviation of the bias scores for the group.

Conduct a one-sample t-test, testing the hypothesis that the item in the question is most (Ho: Mu = 2N) against the hypothesis that they are least important?

I’d love your thoughts.

Liza says

Hi Jim.

I want to correlate data that have Likert-scale items and mostly dichotomous. two of the questions in dichotomous have 4 choices and 3 choices, respectively. Looking at the previous posts here, I know that if they are both Likert-type items, then I can use Spearman’s. But I got confused with the questions which involve mostly dichotomous ones.

Appreciate your inputs.

Thanks.

Liza

Mutuku Peter says

Hello Jim,

I have a 10-item likert questionnaire for attitude assessment. I want to group my results into poor, fair, good, & excellent attitude. I have scored each respondents and some have 10/10, 7/10, 6/10, 4/10 etc. Unfortunately I do not know how to categorize these scores into poor, fair, good and excellent attitude.

Your advice is highly appreciated,

Jim Frost says

Hi Mutuku,

Converting a 10-point scale into a 4-point scale can be difficult because the old categories will have to be unevenly divided into the new categories. They are consistent ranges. If you were starting with an 8-point scale, you could’ve evenly divided those down to a 4-point scale. 1-2 would be poor, 3-4 fair, and so on.

Even better would’ve been to start with the 4-point scale! That would have eliminated any “translation” issue from the 10-point scale to your descriptive 4-point scale. Would your respondents agree with the descriptive terms you applied to those ranges?

I don’t know what to suggest. You’ll need to use some scheme that makes sense for your subject area. Unfortunately, I don’t know what that is. I’d suggest looking into similar research to see what other researchers have done.

Aleena says

Hey I have a question… I am using career decsion makinf self-efficacy scale where rhere are 10 points from 0 – 9, the higher the number higher is the self-efficacy. The total score is obtained by adding scores of each ten items from( 0-9) There are 50 items in scale so max range is 0-450. The total score obtained by my participant is 227 now what should i call it high self efficacy or low

Jim Frost says

Hi Aleena,

While that score is half the maximum possible, which would lead you to think that it’s low, that’s not necessarily the case. Perhaps while, theoretically, it is half the maximum value, it might be at the high-end of what someone can realistically score. You’d need to look at the percentile corresponding to that value to really know how that score compares to other scores. Click the link to learn more!

A A M ALMADHEM says

Hi Dear,

Could you please help me for this question

Where data are collected through Likert scales, the validity of (for example) “averaging” those scores needs to be justified.

Jim Frost says

Hi, please click the link to read my reply to this question in another comment.

Daniel Poor says

Hi Jim,

This article and your book(pp 337-341) left off just where my questions begin. I am dusting off my very dusty stats training to help a not-for-profit hiking group analyze membership data. I have a large (4K response – 32% response rate) survey response population, which is off course not a random sample. I want to compare a series of binary independent variables (gender, LGQBT, veteran, etc.) against Likert-like scale attitudinal question/responses (ordinal), and unevenly worded so sometimes they are uni-directional and sometimes bi-directional answer scales. Simply XLS histograms show variance in independent variable responses distributions to various questions. I want to test if these are likely non-random variance.

I read your book and several other books and on-line resources and I cannot find a clear dummies-friendly “Here is how you should test Categorical x Ordinal non-normal distribution but large data set size” guidance. I am beyond the edge of the envelope of my understanding when I read articles about Kendall’s tau vs. Mann-Whitney vs. Kruskal-Wallace as the proper method for this space.

Adding to this is I have the constraint that I have Excel (highly skilled) and am a rank beginner in RStudio and I cannot afford SPSS, as I am unaffiliated w/an ed institution.

Your guidance would be most appreciated in helping my group derive actionable insights from our big data set vs. MythBuster’s like (erroneous) “commonsense” conclusions you point out in your book.

DP

Ayesha says

Hi Jim i have a question for you if you answer kindly,

I want to know what statistical test is useful when i have a pre and post study survey. Furthermore, Questions type in questionnaire is likert scale with five points and some questions with yes, Neutral and No options.

Tanisha says

Hi Jim.

I have a 7-point satisfaction scale and need to group them to report satisfied neutral dissatisfied.

1- strongly dissatisfied

2 – dissatisfied

3 – somewhat dissatisfied

4- neutral

5- somewhat satisfied

6- satisfied

7-strongly satisfied

What is the best way of doing this? Do I group 1, 2 & 3 for dissatisfied; 4 neutral and then 5,6 & 7 for satisfied. Or should it be ok 1&2, then 3,4&5, and 6&7?

Thank you!

ai xuan says

Hi Sir,

I have conducted a survey using 5-point Likert scale. The survey includes 5 sections.

Section 1 is to understand the intention of leaving of the current employment of the respondents with 4 questions. Section 2-5 focuses on different factors that may impact on the employee turnover with 6 questions individually. The weighting of the scales will be 1 = strongly disagree and 5 = strongly agree.

My research objective is to know whether the intention of leaving has a linear relationship with the factors respectively. How should I screen the valid respondent? Can I take the average of the 4 questions from section 1 and the average score more than 4.5 as my valid respondent?

For the r/s of intention to leave with other factors, can I take the sum of each section and analyses through the pearson coefficient of correlation?

Appreciate your kind help for the lost student.

Regards,

Ai xuan.

Adam Lee says

Hi, I just want to ask your opinion. I have a 5 point Likert scale on Attitude

5 – strongly agree

4 – agree

3 – not sure

2 – disagree

1 – strongly disagree

Basically i want to score positive attitude with 1 point and negative attitude with 0 point :

strongly agree 1 point

agree 1 point

not sure 0 point

disagree 0 point

strongly disagree 0 point

Is it ok to do in such style? Personally i have not encounter anyone using this method. Thanks.

Jim Frost says

Hi Adam,

It appears like you want to be able to group the response so you can assess those who agree at least somewhat. What I’d do instead is use the regular Likert scoring method. After data collection is complete, you can rescore them to form different groups. Such as agree, not sure, disagree. In that manner, you can still compare them how you’d like but you’ll also be retaining the full spectrum of the Likert responses in case you need that for some reason.

Terry Crandall says

Hi Jim. Thanks for this!

What if I wanted to match 2 individuals based on their Likert scores?

Imagine a 3 question dating app:

Q1) I like long walks on the beach. Strongly Disagree – Disagree – Neutral – Agree – Strongly Agree

Q2) I always know where I want to eat. Strongly Disagree – Disagree – Neutral – Agree – Strongly Agree

Q3) I will be 100% faithful. Strongly Disagree – Disagree – Neutral – Agree – Strongly Agree

Assuming both answer truthfully and that the 3 questions have equal weights, What is their % match for each question and overall? How would you calculate it?

Example Answers:

Lucy’s answers:

1) Strongly Agree

2) Strongly Disagree

3) Agree

Ricky’s answers:

1) Agree

2) Strongly Agree

3) Strongly Disagree

What is I wanted to change the weight of the questions?

Thanks!

Terry

[email protected]

Jim Frost says

Hi Terry,

You could use Spearman’s rank correlation.

Nurye Mohammed says

thank you

Claire says

Hi Jim,

Thank you so much for such an interesting article! Are you aware of a nonparametric test equivalent to the two-way mixed ANOVAs that would work to compare two groups (patients vs controls) over time (pre vs post) when the dependent variable has only got three possible values (either 0, 1 or 2) and is not normally distributed (but non-significant Box’s and Levene’s tests)? (sample size is 19 participants per group). Or would a two-way mixed ANOVA be suitable in this case?

Many thanks

Temesgen says

Hi M/R Jim my name is Toni

My thesis title is “factor affecting the performances of small businesses enterprises.”i worked on descriptive research. But

My adviser told me to change the title so how can I make amendment to change this title in to descriptive researchers . ( this title is explanatory research he said )

Which one is more closed to descriptive research title or if you have any more please

1. Assessments of factors affecting the performance of small enterprises

2. What factors affecting the performance of small enterprises.

Kirollos says

Dear Jim,

Thank you for saving me with my thesis! This problem is hunting me since I started.

I am using separate Likert Items for my quantitative research about construction supply chains, to gain professionals’ opinions about critical factors affecting the degree of resilience, and I am confused about which test to use for the inferential analysis.I believe that I need a non-parametric test, as the sample is not normally distributed. could you please advise?

Regards,

Kirollos

Nwankwo Tony says

i appreciate your write up. i actually need your opinion. am carrying out a study involving one independent variable and one dependent variable. however, each variable is operationalized using four different constructs each. i want to find out the correlation between the constructs. is spearman the best fit? again i have one moderating variable which i also want to test its effect on the dependent and independent variable. i intend to use partial correlation to test the influence. is partial correlation the best fit for it ?

all data where measured on five point likert scale

thanks

Jim Frost says

Hi Nwankwo,

If you’re summing multiple Likert items for each variable, you can often treat them as continuous variables. It sounds like that is what you’re doing. It gets more tricky if you’re just using individual Likert items for each variable.

If they were individual Likert items, I’d recommend Spearman’s correlation but it sounds like you’re either summing or averaging multiple items together. In that case, you might be able to use regular Pearson’s correlation. However, to really know, graph the two variables in a scatterplot and see if the relationship follows a straight line.

You could use partial correlation, but really I’d recommend putting them into a regression model and assessing their effects and significance that way. By including the IV and your moderator, the model estimates the effect of each one while while controlling for the other.

Here are a couple of posts you might helpful. First, read about Spearman’s correlation and when you want to use it. And, learn more about using moderators in a regression model. In that article, I refer to them as interaction effects, but that’s the same thing as a moderating variable.

Anouk says

Hi Jim,

I hope you are doing fine! We are planning a study to increase satisfaction through systematic/better pre-operative information. We will assess satisfaction in a before and after cohort. Satisfaction is measured on a 5 point scale. I have a question concerning type of test and calculation of a priori statistical power.

1) power calculation: how many patients to include in the two cohorts? I generally use Gpower. I tried:

a) MW-U with expected means (most frequent response, a bit out of the blue but for example purpose) of 3 and 2 in cohort 1 and 2 respectively (SD 0.5, power 80%, aplha 0.05, two tailed), this resulted in 6 patients to be included in each cohort

b) z-test, proportion, difference between two independent proportions (chi-2?). Assuming satisfaction first two answer options=satisfied, other three answer options=not satisfied). When entering proportion satisfied cohort 1 0.50 and cohort 2 0.75, and again power 80%, aplha 0.05, two tailed, I get 58 patients to be included in each cohort

=> which one makes most sense? Or do I need another type of analysis?

Hope you have time to answer my question.

Kind regards,

Anouk

Hans says

Hi Jim,

First thanks a lot for your statistical help through these blogs and comments section which are a great help for me in past 6- 7 months.

For below questions, I think chi square test should be used knowing both variables have ordinal scores. We need to perform independence test but confused how to group the data or form the contingency table …

—————————-

We are interested if loudness of a flight attendant was related to passenger enjoyment. and below collected ordinal scores:

Participant Loudness Enjoyment

1 1 4

2 1 3

3 1 4

4 2 5

5 2 4

6 2 6

7 3 6

8 3 5

9 3 6

10 4 6

11 4 7

12 4 6

—————————-

Thanks a lot and regards,

Jim Frost says

Hi Hans,

For these data, you should perform a Spearman’s rank order correlation because they are ordinal data rather than categorical.

I hope that helps!

Leslie says

Hello Jim –

Thank you for your reply. Yes, the groups have different samples of people, but, there is overlap in respondents. The total sample represents responses from different companies. Our interest is knowing if there is a significant to Agreement on an issue (a 4 [agree] or 5 [strongly agree]) vs. all other responses.

If I use Excel’s feature to run a t-Test: Two-Sample assuming Unequal Variances do I use the range for Variable 1 as all responses in the pre and then Variable 2 is all responses in the post with the different sample sizes. This would represent a parametric test, correct?

Should I also run a chi-square test as well. If I use that method, can I bundle the 4 and 5 responses vs. the 1,2,3 responses in the Observed and Expected pre/post?

Thanks!

David Jenkins says

Hey Jeff,

Thanks for a great write up!

Before I get to my question, I am a bit out of practice as far as business analytic is concerned because my position hasn’t required it, so forgive me if I can accurately articulate what I am seeking.

I am working on building a 5 point, likert scale survey for work. I will be polling the same population quarterly, however it is optional so I anticipate the sample group (I would expect >20) varying in size and sentiment for each iteration. My question is how do I remove any bias that may be presented in the data because of these variances. Or is simply averaging the data sufficient in this case.

Thank you for the help.

David Jenkins

JImmy Mims says

Hi Jim – I am analyzing a dataset. The DV is likert scale, there are 3 IVs that are all likert scale data and control variables that are categorical. I am stuck between analyzing the data using multiple linear regression or ordinal regression. It seems the argument is made both ways. Any thoughts would be great! Thank you for the article!

Jim Frost says

Hi Jimmy,

Because your DV uses the Likert scale, you should use ordinal logistic regression. The exception would be if the DV is a sum or average of multiple Likert items, in which case it might be OK to treat it as a continuous variable and use multiple linear regression–but be extra sure to check those residual plots! However, if the DV is just a single Likert item, you should use ordinal logistic regression.

GJ Hagenaars says

If you want to have 5 points, two of which express a negative response, two a positive response and one a “meh” or “not applicable” (and you want to treat both of those answers essentially the same), you can code your scale as -2, -1, 0, 1, 2, and you can still use the same descriptions for the points (strongly disagree, disagree, neutral, agree, strongly agree).

It doesn’t change the issue of such a small set of answers not being continuous, but it DOES drop the “I have no strong opinion about this” answers in a very natural way (because they’re coded as zero in between the stronger opinions).

Jim Frost says

The scaling doesn’t really matter. Your five point scale can be -2 to +2 as you suggest or from 1 to 5, where 3 is the neutral value. You just need to know the scale to interpret the results. If the mode is 0 for the -2 to +2 scale, that equates to a mode of 3 on the 1 to 5 scale. Although, I do like the -2 to +2 as the results are more intuitive. But, it’s no problem if you use a different scale.

Leslie Damesek Litsky says

Hi Jim – Thank you for this information. Glad to see so many others struggling with their Likert Scales. I have a pre/post survey of an educational training. I coded as Strongly Agree (5), Agree (4), Neither Agree nor disagree (3), Disagree (2), Strongly Disagree (1).

There are 2400+ pre-surveys and 248 post-surveys. A single question shows a 40% change in the Agree+Strongly Agree VS. the Disagree+Strongly Disagree from pre to post so I am hopeful to see a significant change.

Using Excel, the t-Test: two sample assuming equal variances P two-tail is 4.13739E-39. That’s a crazy number. Should I be using a non-parametric test, are the distribution not normal?

Thanks,

Leslie

Jim Frost says

Hi Leslie,

You’re using pre and post surveys, but I’m gathering from the differences in numbers that it’s not the same group of people taking the survey? I’m asking because if they’re the same people, then you’d use a paired t-test. But, you’d also have to have the same numbers before and after! Usually when I see a pretest/posttest analysis, it uses the same subject in both groups. But, it just affects the analysis you use. If they’re different people, using the 2-sample t-test is a good choice.

Yes, Likert scale data are confusing! It seems to be intuitive and easy method for people taking the survey. But that type of data presents various statistical challenges!

With such a large sample size, you don’t really have to worry about the data following the normal distribution. The very low p-value is likely the result of the effect size and the fairly large sample size (even in the smaller group). I can’t say for sure there’s no error in the calculations obviously, but I do think it’s unlikely.

One thing you should do is check that your variances are actually “equal.” They don’t have to be exactly equal but if one variance is twice the other (or more), you should start to worry. If you want to play it safe, don’t assume equal variances. Your results should still be significant but somewhat less so.

As I mention in this post, there is a debate about the best way to analyze Likert scale data. There are some strong feelings on both sides. So, if you have an advisor/reviewer to worry about, take them into consideration when choosing. But, the article I reference suggests that t-tests and nonparametric alternatives are nearly equally as good. If you were to try this analysis using a nonparametric test, I’d imagine it would also be significant. For more information, read my post about nonparametric vs. parametric tests.

I hope that helps!

Emily Benett says

Hi Jim,

Thanks for a great post. I am stuck on how to analyse my Likert data and would greatly appreciate any help.

I have likert scale questions measure whether CSR can influence customer loyalty. I have three behavioural questions and six attitudinal questions.

One example of a question is “The CSR activities that a business carries out would influence my choice to switch”. All questions are on a 5-point likert scale.

Do you have any tips on how I could analyse this likert scale data with inferential statistics please?

Any help would be much appreciated.

(from a very confused and stressed researcher!)

Best wishes,

Emily

Jim Frost says

Hi Emily,

If you have multiple variables and you want to see how they collectively influence the choice to switch, you could use multiple regression analysis. If it’s just one variable and its relation to the choice to switch, you could use simple regression or correlation.

I’m not sure what form your dependent variable (presumably choice to switch takes). If it’s binary. then you could use binary logistic regression. You could also perform a 2-sample t-test to determine whether the means of the other variables are significantly different between the group that switches versus the group that doesn’t.

I hope that gives you some ideas!

Joe Daniels says

Hi Jim!

Thanks for this excellent post. I came upon it while in the midst of a real roadblock with my research. Might you be able to guide me?

I’m using 7-point Likert scales in a linguistics study to gauge whether or not participants notice a particular cue inserted in a question and then having them rate a response using the scales. There are only two cues (and thus two conditions) but all other elements in the question are identical. In condition 1, the cue primes a certain interpretation response which should, theoretically, yield a lower rating in comparison to the second condition. In condition 2, the cue does not contain this priming effect so it should (again theoretically) be rated higher. In short, my question to you is how can I separate people based on the differentiation in response behavior between “differentiators” and “non-differentiators”. The differentiators would be the ones who, overall, had higher ratings in condition 2 vs. condition 1. However, I really have no idea how I can credibly argue a line of demarcation. For example, if the mean in condition 1 is 3.5 and the mean in condition 2 is 3.75 (a +0.25 difference), is this enough to categorize them as a “differentiator”? That’s one method. Another method I thought of is that there must be at least a 1 point difference in rating between the mean of condition 1 and the mean of condition 2 in order for the participant to be a “differentiator”. Finally, I thought of creating a “differentiator” group if the mean rating in condition 2 was at least 1 standard deviation above the mean of condition 1.

This is a really troublesome problem to me and I was wondering if you point me in the direction of some literature I can read or use as a source in my research? Or is the demarcation up to the researcher and, thus, essentially arbitrary?

Thank you very much and keep producing excellent content please!

Joe Daniels

Anthony Babbitt says

I had a quick question that I would like some input on and perhaps you are the best source. It may even be a good topic for an article on your site!

In survey research, there is something of a debate on how to code Likert surveys. For instance, a 5-option response (Strongly Disagree, Disagree, No Opinion, Agree, Strongly Agree), can be coded in a few different ways.

1 – Strongly Disagree

2 – Disagree

3 – No Opinion

4 – Agree

5 – Strongly Agree

The problem with this coding is that “No Opinion (3)” negates the disagree options when calculating means. A single “3-No Opinion” can offset a response of 1 or 2 on other questions. The result is that the mean is higher than it would be if the No Opinions were removed. This skews the results toward the agree side.

Alternately, you can code:

1 – Strongly Disagree

2 – Disagree

0 – No Opinion (or removed from the data set)

3 – Agree

4 – Strongly Agree

This coding method makes the disagree/agree median 2.5 on the bell curve. This is my favored approach, but have I overlooked something? You mentioned in other comments that more points is preferable, but we don’t always get to design the surveys we use. Some are validated on a 4 or 5 point Likert scale.

An alternative would be to code:

1 – Strongly Disagree

2 – Disagree

Removed – No Opinion

4 – Agree

5 – Strongly Agree

This coding makes the median 3, but I believe it would make a normal distribution curve into a two-humped camel curve, no? It also raises the question of why not code “9 – Agree” & “10 – Strongly Agree”, on a 10-point scale. Certainly, that would result in major differences between responses and perhaps be easier to find a statistically significant result. But this would be quite a long body on a 2-humped camel! Am I correct in my thinking? What have I overlooked?

Anyway, Likert developed his scale to make qualitative data into quantitative data. I see in your other responses that you seem to think it is non-parametric by nature. Likert believed that you HAD to offer No Opinion, simply because forcing a response increased the amount of noise in a sample. By offering “No Opinion” as an option, you could remove those responses because they offered no information. After all, on an opinion survey, what good does including data from people without opinions? So researchers should include the No Opinion option, but how should they code it best to keep those “No Opinions” from skewing the results?

As an alternative, should you simply give the “No Opinions” a score of the mean of the scores on the other answered questions? So instead of coding “No Opinion” as 2.5 or 3, you would use the mean of the 16 answered questions on a 1-4 scale? If the mean was 3.2, then no opinions would get this value for that one survey. If the mean was 1.3, then no opinions would all have this value, at least on the particular survey. However, if you have 100 surveys, then “no opinion” could have 100 different mean values (assuming all the surveys had at least 1 “No Opinion” response).

I’m thinking, as an example, of a 20-question Likert survey where some of the responses include opinions and some state “No Opinion”. If 10 of the 20 responses were No Opinion, and the researcher removes them, this unfairly weights the remaining responses. So there is certainly a threshold where too many “No Opinions” mean the entire survey is worthless. What is that threshold? 20%? 10%?

If you set your threshold at 20%, what value do you assign to those 4 (4 out of 20) questions with “No Opinion” so that the rest of the data is not skewed? The scales above become much more important, no? If you use the 1-4 scale, no opinions must be set to 2.5. If you use the 1-5 scale with no opinion included, then the “3” coding would negate much of the negative opinions, correct?

It’s really an interesting rabbit hole, and I can’t think of anyone better to answer it than you! Please let me know your thoughts. If you have already printed something that addresses this, please let me know. I have read your books, but don’t recall seeing this question raised before. Thank you in advance for your help, suggestions, and ideas!

Jim Frost says

Hi Anthony,

Yes, there’s quite the debates surrounding Likert scale data! In addition to all the issues you mention, there’s the issue I talk about in this post where there’s a large debate over how to analyze Likert data! There are debates over whether the mean makes any sense in the context of ordinal data in general and Likert data in particular. The options must be equal distance apart for the mean to have any real meaning.

I’ve only done a little bit of survey research and analysis. I don’t have anything Earth shattering to add but I’ll give you my two cents! First, because of all the debates surround coding and analyzing Likert data, I’d avoid using it whenever possible. It feels like a natural way to ask questions, but it really creates a lot of problems. There are some alternatives, which I’ve read about long ago. Of course, I’m not remembering them now but Google would be your friend. Essentially, you’re finding ways to record similar type of responses with continuous values instead of ordinal. You can avoid lots of headaches that way.

On to your comments and questions!

I don’t see a problem with using either 0 or 3 as No Opinion. All that does is shift your distribution left or right. You just need to know how to interpret it. Suppose you had a mean of 4. If you use 3 as no opinion, then the mean corresponds to Agree. However, if you use zero for no opinion, four corresponds with Strongly Agree. You just need to know the scaling and interpret the results accordingly. I don’t see a problem either way. By the way, you mention the bell-curve, and I think Likert data very frequently don’t fit that curve!

As for whether to include No Opinion, that’s another tricky one. I agree with the school of thought that you should include that option. As you say, forcing respondents to have an opinion when they don’t introduces noise into the data. Plus, it might be nice have it recorded which questions had more people with no opinions.

However, there is another related issue between no opinion and neutral. My ideal arrangement would be:

Strongly Disagree

Disagree

Neutral (Neither Agree or Disagree)

Agree

Strongly Agree

No Opinion (Don’t know enough about the topic)

That makes sense to me because I believe that there is a difference between having a neutral opinion versus no opinion. Neutral is where you neither agree or disagree but you are informed on the topic. No opinion would be for those who didn’t have enough information to have an opinion. But, I think that’s likely to cause confusion among readers! The text in parenthesis might help explain that and could be better labels. Maybe. In that arrangement, I’d remove the No Opinion responses. And, it provides a useful indicator of how many people are relatively uninformed on the topic.

However, in the end, I could be talked into using Neutral/Neither Agree or Disagree for both of those cases and having that as a midpoint. One way or another, they just don’t have strong feelings about the topic. You won’t know if that’s because they don’t know enough to form an opinion or know enough and decided they don’t care! In this scenario, then the center of the values seems like a good place to me. In that case, I’d include respondents who choose the neutral, neither agree or disagree option in the data. I think that’s what I would do if push came to shove! While I personally like having both a neutral and a don’t know option, it probably introduces too many questions!

I’ve probably made this a clear as mud! Please take this just as my two cents! I have not written anything on this topic other than this post about how to analyze it.

Sofia says

Hi Jim, I have a similar question as Helen. So is it ok to do 2 way mixed ANOVA on the likert scale dependent variable?

Thank you!

Samer says

Dear Colleges

A paper by “Geoff Norman” said that parametric statistics are robust with respect to violations of the following assumptions:

(a) the sample size is too small, (b) the data may not be normally distributed, or (c) The data are from Likert scales, which are ordinal, so parametric statistics cannot be used

So, I thing we can use parametrical tests

DOI 10.1007/s10459-010-9222-y

Jim Frost says

Hi Samer,

Thanks for sharing. The article I refer to in this post also indicates that it’s ok to use parametric tests. I’ll need to read Norman’s article to compare findings.

Melody Schumann says

The Likert scale is widely used in social work research, and is commonly constructed with four to seven points. It is usually treated as an interval scale, but strictly speaking it is an ordinal scale, where arithmetic operations cannot be conducted. There are pros and cons in using the Likert scale as an interval scale, but the controversy can be handled by increasing the number of points. Several researchers have suggested bringing the number up to eleven, on the basis of empirical data. In this article the authors explore this rational and share the same view, but simulate artificial data from both symmetrical normal and skewed distributions where the underlying metric is known in advance. Results show that more Likert scale points will result in a closer approach to the underlying distribution, and hence normality and interval scales. To increase generalizability social work practitioners are encouraged to use 11-point Likert scales from 0 to 10, a natural and easily comprehensible range.

Source: https://www.tandfonline.com/doi/abs/10.1080/01488376.2017.1329775#:~:text=The%20Likert%20scale%20is%20widely,arithmetic%20operations%20cannot%20be%20conducted.

Dr. Schumann

Jim Frost says

While I think the article I cite is conclusive in showing that you can use t-tests to analyze Likert data, I have to admit that I do not like Likert data. It seems to create problems unnecessarily from an analysis stand point. I love the suggestion of using an 11-point Likert scale. It’s been a longstanding recommendation that when you have discrete data with at least 10 values, you can generally treat it as continuous data. That fits in with the study you cite. Thanks for sharing! I’ll read it with great interest!

Dr. Fendi says

Dear Jim

With regard to your reply to Brenda on October 13, 2020 at 12:57 pm , yo said “I’d imagine that if your sample size is large enough, ANOVA should be fine because it’s a generalization of the t-test. ”

But we know in statistics that in order to apply ANOVA you should fulfill 4 assumptions, one of them is the scale of data must be interval. Likert scale scores are not interval (Ordinal)? so you explain that

Jim Frost says

Hi,

As you hopefully read about in this post, there has been much debate over how to analyze Likert scale data. Some thought nonparametric was the way to go while others thought parametric tests (such as t-tests and ANOVA) would be ok. All hypothesis tests have assumptions. However, some assumptions are more stringent while others can be waived. For example, the normality assumption is one that can be waived for many parametric tests when you exceed a relatively small sample size. Simulation studies are useful for determining the degree to which assumption violations affect hypothesis test decisions.

As you point out, one of the assumptions for t-tests and ANOVA is that the data are continuous. Clearly, Likert data violate that assumption. However, the simulation study I cite in this post finds that this violation does not affect the results much at all. The conclusion of the study is that t-tests and Mann-Whitney tests are nearly equivalent for comparing groups of Likert scores. In other words, this study finds that t-test results are valid even though the data use an ordinal scale. I recommend you read the original article I cite if you still have questions.

Unfortunately, I don’t have a study that shows that the same results apply to ANOVA. However, because ANOVA is an extension of t-tests, I’d assume that the same findings apply. Indeed, an ANOVA that compares two groups produces identical results as a 2-sample t-test. I would feel more confident about that if I had a study which assessed ANOVA and Likert data directly. However, I strongly suspect it’s true.

I hope that clarifies it for you!

Melody Herb says

Greetings Jim

I am a doctoral candidate researching 2 groups-home school students and public students in the US. The dependent variable is a “motivation” score obtained from participants when they respond to a 28 question survey -The Academic Motivation Scale (AMS) by Vallerand. This is a 7 point like scale. I am confused as to what SPSS tests I should run. The research question is:

What, if any, statistically significant difference exists in self-determination index scores between public school and homeschooled students as measured by The Academic Motivation Scale that includes intrinsic motivation, extrinsic motivation, and amotivation subscales.

I intend to only calculate the mean score of each participant and then calculate the mean score of each group (homeschool and public school).

I would appreciate any suggestions that you can share with me. Thank you

Sincerely

Melody Herb

AC Lopez says

Hi Jim!

I just want to ask, we are looking for the effectiveness of Social media as an aid to News and information for the students, we have 10 questions 8 of those are answered through (Strongly Agree, Agree, Neutral,Disagree,Strongly Disagree) 1 is (Facebook, Twitter, Instagram, YouTube) and last is their Age (18-21, 22-25)

What kind of formula can we use to measure their answers through likert scale? or in SPSS?

Thank you in advance!

Priya Mohan says

Hi Jim,

I am assessing awareness of cancer among patients. I used 5 point Likert like scale and h responses were strongly agree, agree, unsure, strongly disagree and agree. I would like to assess the awareness between various socio-demographic factors such as age, agender, socio-economic status, education etc., Now do I have to first divide the awareness as binary – high and low awareness and and then do contingency tables or what do you suggest and do I have to use logistic regression as well.

Thank you,

Priya

Jim Frost says

Hi Priya,

The nature of your dependent variable isn’t clear to me, so I can’t really answer. Is your DV the five point Likert scale item, binary awareness, or something else?

Ogboy Smart says

Dear Sir, I am working on a topic effect of covid-19 on Secondary Schools using Likert four point. Which statistics test is better? Should l pair de means and compare them? Reply. Thanks Smart, Nigeria

Laura says

Hello Jim,

Thank you for the great article!

I am doing my first research project and I am having trouble with the analysis.

I used a 5-point scale (with only two end labels 1:not descriptive and 5:descriptive – numbers 2,3, and 4 had no labels associated with them) on a within-subjects study looking at 11 watches and asking the participants (n=20) to rate how well a set of 20 personalities (adjectives) described each watch. The 20 personalities came from a product personality scale that was published in a journal.

My goal is to find at least 4 watches that have distinct personalities (I will use these in another study).

In SPSS, I ran 19 paired t-tests between the personality with the highest average rating (ex. cheerful) and the remaining 19 personalities (ex. interesting, relaxed, serious, cute, lively, modest, etc). I determined that a watch’s personality was the combination of personalities that were not* statistically significantly different (i.e. p>0.05) to the personality with the highest average rating. So for example, a watch ended up being cheerful, relaxed, cute, and lively.

Is this procedure acceptable for determining the personalities that describe each watch?

I think this process is okay, but I am not sure how to explain that it is okay…

If this process is not acceptable, what process do you recommend?

I sincerely appreciate your advice!!

Kind regards,

Laura

P.S. I am sorry for the long message. In general I am also struggling with the lingo in statistics so I figured I be thorough with the information I provided to you.

Ahmed Abdel-Ghany says

Hi

I was involved in problems like that. What I’ve done myself is to get the average of those questions related to a given subject as long as all of them has the same Likert scale. The resultant average though quantitative, can then be classified to the same Likert scale and therefore, you get a trend of the whole story.Since, in most of these questionnaires the number of respondents is somewhat high, far beyond 30, I analyse them all as quantitative variables.

I hope that would help.

Ayesha Javed says

Hi Jim!

I am research student I want your little help related to statistical test suggestions. My research topic is comparative analysis on visualizations so I performed tasks on it and filled questionnaire (five-Likert scale). I divided 100 participants data into two groups. Now I am confused which statistical test should apply on it. Kindly guide me on it.

Thank You In Advance.

Jim Frost says

Hi,

You should be able to legitimately perform either a 2-sample t-test or a Mann-Whitney test. The study I reference in this post indicates that either test is valid.

Choi Aera says

Eh sorry for the mistake, so the Q1+Q2= 3+4=7, so the total practice score is 7?

Jim Frost says

Yes, that sounds correct. Summing Likert items makes them more like continuous data.

Choi Aera says

Hi Sir, I’m Choi Aera from Malaysia, I’m so confuse to calculate the total practice score for each respondent. For example,

(strongly disagree=1, disagree=2, neutral=3, agree=4, strongly agree = 5)

Q1= i feel responsible to dump the rubbish in the bin.

Q2= i think preventive measures of environmental pollution should be tighten by laws.

Respondent 1 (he/she answer) : Q1= 3 Q2= 4, total practice score=Q1+Q2= 5.

Does this a correct way to calculate the total practice score?

….

I saw some articles calculated total practice by giving 0 score for (strongly disagree/disagree), 1 score for neutral and 3 score for ( agree/strongly agree).

And how about the scoring when you decided your own fix answer for that question. For example my fix answer for Q1 is strongly agree=5, so if respondent answer other than that i should give them a acore recording to the likert scale score or if the respondent answered strongly disagree, then should i just give him/her 0 for the incorrect answer or 1 score according to likert?

So basically I am so confused which one is the correct way to calculate the total practice/attitude/knowledge that use more than 3 likert scale.

Samgab says

Hi Jim,

I have benefitted immensely from the knowledge shared by you, may God continue to bless you with deeper insight statistically. I need clarity on recoding 5 point Likert scale to dichotomous variable. I did my recoding thus, Strongly agree (5) and agree (4) = 1; while strongly disagree (1), disagree (2) and neutral (3) = 0. Is there anything wrong with this recoding? Can you assist with any relevant literature specifically on recoding? —- Samgab

DR. OBIAMAKA says

I love your website, it is very informative. Good job!

Jim Frost says

Thank you very much!

Brenda says

Dear Jim,

Your site is incredibly helpful, congratulations. I am working on the data analysis for my dissertation, I have surveyed school principals and my data consists of Likert scale items (5-point scale) for four different categories which I am interested in, the first one consists of 10 items, the second of 6 items, the third of 6 items, and the fourth of 6 items. I am interested in running ANOVAs for in order to determine if there is a difference in responses between groups (e.g. elementary schools, middle schools, high schools). First, would it be better to find the mean of the responses, or the sum in order to run the test? Second, what is the best way to test the assumptions for normality and homegeneity for Likert scale data? What assumptions are critical to test before running the statistical test?

Thank you, in advance, for your help.

Brenda

Jim Frost says

Hi Brenda,

This is a bit of a tricky issue. As I write in this post, there has a been a longstanding debate about the best methods to test Likert scale data. When you have two groups, the article I cite indicates that either the parametric (2-sample t-test) or the nonparametric (Mann-Whitney test) are acceptable. However, with three or more groups, you’d need to use ANOVA (parametric) or a nonparametric equivalent, such as the Friedman test. I don’t have a good citation to know the answer to that. I’d imagine that if your sample size is large enough, ANOVA should be fine because it’s a generalization of the t-test. But that’s a hunch. So, I’m afraid I can’t give you a concrete answer.

I can say that you’d be on more solid ground if you either summed multiple Likert items or averaged multiple items because they become more like continuous data. Either summing or averaging would be equally as good from a statistical perspective.

Additionally, please note in my article about parametric vs. nonparametric tests, that normality isn’t an issue for ANOVA when you have a sufficiently large sample size (see table in that article). However, you do have to worry about homogeneity of variances. If your variances aren’t roughly equal, I’d strongly recommend using Welch’s ANOVA. Click the link to read about that analysis.

I hope this helps!

Erik Wood says

Thank you for this forum. Is one obligated to process data from a Likert-type survey using either parametric or nonparametric methods? If, for example, the simplicity of a short, 5-question national survey to fire chiefs is best structured using a 5 level Likert style method, can the raw statistic compilation be used in academic journals? This might mean just showing that (again for example) “53% of US Fire Chiefs agreed strongly agreed that their community was not prepared for a natural disaster.” For the purpose of a study I am considering, I like the Likert structure directed at one large group via emailed survey but my data presentation does not need to be overly analyzed to convey results. Thank you in advance. – Erik

Jim Frost says

Hi Erik,

This might be a case of just using descriptive statistics. If you just want to say that 53% of US Fire Chiefs who took the survey agreed strongly, then you might not need to use an inferential procedure. For more details about the difference, read my post about the difference between descriptive and inferential statistics.

If you are just describing the sample and not trying to infer beyond the sample, use descriptive statistics. However, if you need to generalize the results from the respondents to the population, you’d need to use inferential procedures, which can get a bit tricky with Likert scale items. Also, you’d need to consider whether your sample even approximates a random sample. If it doesn’t approximate a random sample, then using the sample to generalize to the population might be invalid. There’s no reason to expect that a non-random sample will adequately represent the population.

If you can just go the descriptive route, that might be the way to go.

I hope this helps!

mamerto says

Good day, can I use normal distribution after I use two t-test in my research?

Tatv says

Hi Jim,

Thank you so much .Your advice has solved the most crucial problem in my work.

Regards

Tatv

Jim Frost says

You’re very welcome, Tatv!

Yuchen says

Hi Jim,

Really appreciate your advice! my samples sizes can be considered quite large, I will go for repeated measures ANOVA design.

It is so great to have someone like to you to share with ppl about your knowledge related to stats selflessly!

Have a nice day!

Yuchen

Jim Frost says

You’re very welcome, Yuchen! Best of luck with your analysis!

Tatv says

Hi Jim,

Thank you so much for the reply which has helped a lot.

Kindly allow me to confirm other part of my query. Can I run Kruskàl-Wallis on summated score of Likert items.

Thanks and regards

Tatv

Jim Frost says

Hi, yes, you can use Kruskal-Wallis on summed Likert items assuming the groups have the same general shape and spread (a common assumption for nonparametric tests that compare groups).

Yuchen says

Hi Jim, it is great to see this sharing article about stats.

May I ask what kind of test should I use in such research study outcome analysis?

I have 2 groups including control and experimental. Participants were given questionnaires which provided me continuous data. data collection happened at 3 time points, before intervention, after intervention and 12 weeks after intervention.

Now, I would like to compare:

1. the changes within each group (either control or experimental) at 3 time points

2. compare control and experimental at baseline, post intervention and 12 week after

What kind of test should I use? (already tested some groups of data were not normal, I guess non-parametric test should be used?)

Thank you for you time, it would be really helpful…

Jim Frost says

Hi Yuchen,

This sounds like a repeated measures ANOVA design to me. The nonnormal distribution can be problematic if you have small small sample sizes. However, read this post about nonparametric vs. parametric tests and look for the table with recommended sample sizes. If you meet the recommendation, then nonnormality should not be a problem.

However, if your samples are too small, or the distributions are extremely nonnormal, you can use the nonparametric Friedman test. I believe you use that procedure with a repeated measures design, although I haven’t used it that way myself.

I hope that helps point you in the right direction!

Tatv says

Hi Jim,

I have to analyze Likert scale data with 9 to 10 items/statements to be responded on 5 point Likert scale i.e. from strongly disagree(1) to strongly agree(5).There are three groups independent of each other. I wish to compare the groups and find if there is any difference among them.can I sum individual responses to the scale items find one score for individual and also mean of responses. For example, if an individual has responded to scale items Like 1,3,2,5,4,1,5,3,3,5 .Can I find total of these responses as 32 and do the same for all respondants. Thereafter can I run one way ANOVA in SPSS or I should do Kruskall -Wallis

What inference can I draw?

Kindly urgently advise on analyzing the data..

Jim Frost says

Hi,

Yes, analysts will often sum or average multiple Likert scale items. This creates data that are similar to continuous data. However, be aware that ordinal data, such a Likert scale items, have some inherent limitations. For example, you can’t be sure that the difference between each value is constant. For example, you might code responses as 1, 2, 3, etc. The different between 3-2 and 2-1 are both 1 unit. However those differences of one might not represent the same change. For example, in a race, the time difference between the first and second place finishers might not be the same as the time difference between the 2nd and third place finishers. If that’s the case, calculating groups means become less reliable. Just something to be aware of.

When you sum or average multiple Likert items, the data become more like continuous data and it should be fine to use ANOVA to assess the differences between group means–at least from the standpoint of data assumptions. The previous caveat still applies.

I hope that helps!

matthew chung says

Hi Jim,

Many thanks for your informative posts.

I have no experience in using statistics and was hoping whether you could provide some advice as I have been getting rather confused after doing some research online.

I have conducted a likert survey of 10 questions looking at patients anxiety to attending hospital in the period of COVID lockdown. I have a sample size of over 300 with the 5 point scale ranging from relaxed to very worried.

I want to analyse answers to each questions separately without comparing groups – is it best to present the data as descriptive via charts/tables or is there a place for statistics here – if so what test would you recommend?

I may also want to then compare anxiety before and after attending the hospital and may also compare responses to this between genders – could you please advice what test you would use in this case?

Many thanks

Matthew

Mani says

Thanks so much! That does help

Manini says

This is terrific, Jim. Thanks so much. I have a question, JIm.

If I am interested in understanding whether the difference between a pre and post score on a questionnaire that uses a Likert-type scale is significantly different — what statistical test should I use? (e.g. I am thinking of a context where a person takes a test before an intervention and once again after completing it and their ratings on the likert-type scale is compared).

So grateful for your help, Jim.

Jim Frost says

Hi Manini,

You should be able to use a paired t-test for that. The article I reference in this post shows that it is ok to use a 2-sample t-test to compare two independent groups. The article doesn’t discuss using a paired t-test for before and after scores. However, given that a t-test is ok in one scenario, I don’t see that it wouldn’t be OK for another. However, I don’t have a reference to provide for that.

I hope that helps!

Adam says

Perfect. Thanks for the help, Jim!

Adam says

Hi Jim,

Thanks so much, this has helped me loads with my dissertation! But with my survey there was also a ranking question. I asked participants to rank 8 different categoric threats to their honeybee hives. There were 8 threats in total, so I basically asked which they think is the 1st, 2nd, 3rd….8th, most dangerous threat for their bees . Can I use a Mann-Whitney U test to see if there are significant differences between the median ranks of each threat? The survey has 331 responses in total so there is enough repetition

Thanks for all the help,

Adam

Jim Frost says

Hi Adam,

Because you have more than two categories, you’d need to use Kruskal-Wallis. That’s nonparametric alternative to a one-way ANOVA. Mann-Whitney can compare only two groups, like a 2-sample t-test.

Best of luck with your dissertation!

Sebastian Elias Zavala Marin says

Hi! First of all I appreciate your information, thank you very much.

Second, I wanted to ask you the following if possible: I am preparing a survey with responses likert scale (1-5) on the impact of the pandemic on rural workers, identified by area of work

An example of some questions would be this:

Q: The pandemic has affected the way you access your work area

with answers like this:

strongly disagree = 1, disagree = 2, neutral = 3, agree = 4,

strongly agree = 5

So I would like to know what statistical method you recommend, the idea is to identify if there is an impact of the pandemic on each topic addressed in each question.

Thank you very much in advance!

ps: sorry for my english im not native speaker

Johanna S. says

Dear Jim,

thank you for saving me with my thesis! This problem is hunting me since I started.

I analyse three IVs (1-4 Likert items) and several DVs (1-5 Likert items). However, the variables are not single item variables (only one is) but I construct them from several items (two-item index). To do so I calculate the mean for every participant. Does this mean I treat them as continuous anyways?

And if I do so can I use OLS since the DVs are calculated as indices, or do I use orders light regression? (Or both to compare the results)? Can I even use order login with the indices I created?

Any helping word would mean a lot to me!

All the very best from Sweden

Johanna

Jim Frost says

Hi Johanna,

When researchers have Likert items and they either sum or average multiple items together, they can often use them as continuous variables at that point. However, just be aware that ordinal data (which includes Likert items) can be tricky because a one unit change doesn’t necessarily always represent a consistent amount of change, and that can do weird things to the model. Just be sure to check those residual plots. If the residual plots look good, then you can reasonably trust your model.

Sophie says

Dear Jim, thank you for this comprehensive blog!

I am currently working with ISSP survey data and I cannot wrap my head around this problem. Every source I read seems t warn me of treating my data as continuous while every social science paper I read is doing precisely that.

My question to you: If I combine two survey items (1-4 Likert scales) into an index to operationalise a concept (by calculating the mean), I automatically treat the data as continuous, right? In that case, I might just as well continue treating the data as such and run OLS with it?

I have been recommended to standardise the data, but I don’t fully understand why (since my items all have the same 1-4 Likert scale). Can you explain that to me? If I do standardise the variables I am struggling with my descriptive statistics since the interpretation of the standardised indices is rather difficult (who counts as agreeing and who doesn’t). My aim was to show frequencies that divide the sample into people who agree (including agree and strongly agree) and those who don’t.

Any help would be very much appreciated!

Kind regards

Sophie

Ugan Collins says

Hello Jim

Am actually using a likert 4.0 scale for my data analysis, i tried using chi-square for testing of hypothesis buh i can’t quiet get it right,.. wanted to ask if there is actually another of method of hypothesis testing that suits the likert data presentation

Jim Frost says

Hi Ugan, if you’re comparing means like I talk about in this article, I think a 4 point Likert scale violates the assumptions more than a 5 point Likert scale. I’d lean towards using nonparametric methods.

One thing I have done in the past with Likert scale data is to take two items and use them in a chi-squared test of independence. That can tell you if there is a relationship between the two. Click the link for an example. For your case, you’d use one Likert item for each variable.

Khenrab says

greeting Jim

i am very new to statistic and in the process of doing my research. i am focusing students attitude towards learning biology, a mixed method, the tool used are survey questionnaires- five point likert types, and interview. after feeding all the raw data in SPSS now i am struck with how to go on with analyses. your immediate help over this matter will be highl appreciated.

thank you

Tesfaye says

I have likert response type questioner to analyse my research data how can i enter in stata to analyse it.

Francesca says

Hi Jim,

I mistyped the skew value (which is 1.559). I have read the article and it has been very useful.

Thank you very much for your help.

Francesca

Helen says

Hi Jim

I am also working with x2 different questionnaires data (pre and post) and an intervention and a control group and a 4 point likert type scale for both questionnaires. N=85 for each group.

I am planning to run a mixed way anova (2×2) within groups for time and between groups for intervention and non intervention but before I do that – the data needs to meet certain assumptions – which tests for normality would you recommend here?

I would be very grateful for some advice.

Helen

Jim Frost says

Hi Helen,

Typically, for ANOVA (and regression) you assess the distribution of the residuals using residual plots rather than the dependent variable itself. You can just plot the residuals in a normal probability plot to see if the are normal.

Francesca says

Hi Jim,

I am currently working on my thesis and I have some doubts how to analyze questionnaire data.

I am looking at whether control of attention differs between two groups of infants (Typical Likelihood and Elevated likelihood) at 10 and 14 months old using a behavioural paradigm. I am also looking at whether control of attention at 14 months is correlated to Regulation Capacity variable.

Regulation Capacity is a measure of temperament trait within the Infant Behavioural questionnaire (which uses a 7-Likert scale response).I have looked at the descriptive statistics (means and SD) and data for the elevated likelihood group have a skew value of – 1559. My question is whether the Regulation Capacity variable should follow a normal distribution in order to correlate the variable with measure of control of attention (behavioural data). Should I look into transforming this variable?

Thank you in advance.

Francesca

Jim Frost says

Hi Francesca,

Is the skew value really -1559? That’s an extremely skewed value. Typically, we’d say that a skew value greater than +1 or less than -1 is very large. -1559 is off the charts. I’m doubtful about that value.

Also, be aware the descriptive statistics are less useful for ordinal data, such as Likert scales.

I’d follow the guidelines of this article. If your sample size is large enough, you can probably use either a 2-sample t-test or Mann-Whitney to compare your two groups. Although look into that off-the-scale skew value. Graph the data make sure you understand what it is telling you visually.

I’m not sure about transforming a Likert scale variable. The changes between individual values might not represent a consistent change, which is why descriptive statistics are less useful and might make transformation invalid. I’m not sure what the literature says about transforming ordinal data.

Mayank Jain says

Thank You so much for your help! Your explanations are extremely simple and quite effective!

Mayank Jain says

Thank You for considering my doubt

But then, why is the p value extremely small? This clearly leads to Alpha being greater than p (p< .05)

Jim Frost says

A small p-value like that indicates the difference between your two groups is statistically significant. You can reject the null hypothesis and conclude that the population means are different. Read my post about t-tests to learn more about how to interpret them.

Mayank Jain says

Hello Jim

This is Mayank Jain from India. Your article was extremely helpful in clearing my doubts as to which test to apply but I do have a doubt.

I am working on a research paper having a likert scale rating from (Most preferred, Preferred, Neutral, Not preferred and Least preferred) and gave them a quantitative value of 5,4,3,2,1 respectively. I applied a t-Test for Two-Sample Assuming Unequal Variances in MS Excel and got the p value as 4.976e-79 (which is extremely small). I wanted to inquire whether am I using the right test or should I use any other statistic.

Thank You.

Jim Frost says

Hi Mayank,

Yes, it sounds like you have used the right test! The article I reference at the bottom of this post supports the idea for using the 2-sample t-test in case you need a citation. Assuming unequal variances is the safe choice.

Amit sharma says

Hi Jim..

Amit this side from India.

Jim, I am doing an organizational research to understand the relationship between sales and marketing teams using a 5-point Likert scale.

My respondents have varied responses. Suggest me which test to use ?

Thank you

Amit Sharma

Mohamed says

Hi Jim,

Thanks for this informative website, I went through fruitful ideas but I didn’t find exactly how to deal with my current case.

I have a questionnaire for satisfaction and to check which factors contribute more to the customer satisfaction. along with Age, Gender and type of service, I have many factors that reviewed by customers in an ordinal response (Extremely poor,Poor,Need improvement,Acceptable,Good,Excellent) and the satisfaction (is either satisfied or not) so which model and analysis method I can use to predict satisfaction given these different type of factors?

AM thinking to use Categorical PCA and for modelling am not sure which to use? should I scale it (1-6) and use K-means?

Appreciate your support

Sridhar V says

Hi Jim:

I managed to locate the reference for my above clarification on substituting likert scale with other values. Here it is:

https://amp.reddit.com/r/statistics/comments/82perc/how_to_analyze_ranking_data_eg_1st_2nd_3rd/

There is a further link therein:

https://statmodeling.stat.columbia.edu/2015/07/13/dont-do-the-wilcoxon/

I value your comments on the above at your convenience.

Thanks in anticipation.

Sridhar says

Thanks Jim for your clarification.

Sridhar V says

Hi Jim. I am Sridhar from Bangalore, India. I have been following some of your blogs recently and thank you for such a lucid explanation of statistics.

I have a request to make in connection with analysis of data collected in response to Likert Scale based questionnaire.

I have noted down that the best way to convert 5-point likert scales in to numerical values is by using -1.28,0.52,0.0,5.2,1.28. Unfortunately, I haven’t noted down the article or the book chapter.

I was wondering if you this way of converting data rings a bell with you or anyone else seeing this message. Can you help me with the context of this way of converting the responses in to numbers.

Thanks

Jim Frost says

Hi Sridhar,

I’m not familiar with that approach of converting 5-point Likert scale items. The two main problems with the Likert scale:

I’m not sure that the recoding you describe solves those problems. Perhaps there is a rationale behind it that I’m not aware of. Even if there is, I’d be leery about assuming it is applicable to all data.

Will says

Hi Jim,

I am struggling with WHICH type of regression analysis I should use and HOW. I created a questionnaire in order to understand the relationship of 4 (IVs) factors and 1(DV).

Each of the (IVs) has Five questions with 5 Likert Scale Items. For example:

(First Factor)

Q1 (strongly disagree=1, disagree=2, neutral=3, agree=4, strongly agree= 5)

Q2 (strongly disagree=1, disagree=2, neutral=3, agree=4, strongly agree= 5) and so on.

The (DV) also has Five questions with 5 Likert Scale Items. For example:

Q1 (strongly dissatisfied =1, dissatisfied =2, neutral=3, satisfied =4, strongly satisfied = 5)

WHICH type of regression analysis or statistical test I should use and HOW?

Thanks in advance,

Will

Jim Frost says

Hi Will,

Because your DV is ordinal, you’ll need to use ordinal logistic regression. Read my post about choosing the correct type of regression analysis and look for it in there.

Ordinal IVs can be tricky. I don’t have a post to direct you towards but I strongly recommend getting my eBook about regression analysis. In it, I talk how to handle ordinal IVs. You’ll need to use them either as categorical or continuous variables because ordinal variables have a mix of traits.

From there you just fit the model, check the residuals, and interpret the residuals. Read my post about fitting the correct model.

Irena says

Hello

I’m currently working on my master’s thesis and I have to find out if the student evaluations are biased or not. I have a questionnaire data made with likert scale from 7 faculties. Can you help me with which model should I use? Thank you.

Jim Frost says

Hi Irena,

There’s not enough information in those several sentences to be able to understand your research project goals, data collect, etc., and provide recommendations. I’d recommend consulting with a statistician or advisor at your institution who can give your study the time it deserves.

Dale says

Hi Jim! I’m all too aware that my cognitive strengths do not lie in the stats area…

I have done a survey to determine elements of professional identity of a group of analysts. The study is exploratory mixed methods (the first study of this specific profession), and I need help to make sense of my data.

I have used Likert 5 scale questions for 2 elements of professional identity that I have taken from scholars who have developed the instruments 1=strongly agree = 5 strongly disagree). I have already done descriptive analysis where I id’ed the % of respondents on each scale. I don’t want to test reliability of the instrument etc but would like to compare the different demographic variables like genders/professional org membership, country, etc for these 2 elements. N=75.

I have run Mann-Whitney for the 2 factor variables, and thought about using Kruskal-Wallis H tests for those variables with more than 2 possibilities eg countries.

The results in SPSS Mann-Whitney show for instance that for one sub-element that the distribution is not the same across the genders (Asymp sig of .013). The mean rank for males (n=57) is 34.86 and that of females N=18) is 47.94. Sum of ranks is male: 1987.00 and female: 863.00.

But now to analyse this and understand and write this? What does this mean? Are the men more positive about this aspect than women? What is the “so what?” here? I need to see the value if what I’m doing here, otherwise, if you can show me to a more appropriate method?

Other people have “just” used a table to compare the mean scores of the different demographic groupings, for instance Gender: F=2.19 M= 2.76, Qualification: Doctorate=1.16, Masters degree=3.12, bachelors=2.76 etc. This is sufficient for me, as I need to know if there is a big difference between the genders? What is “big difference”. This is why I thought Mann-Whitney is a good method to say that “the genders share the same opinion about 8 of the 9 elements, but in the 9th element, men feel more positive than females.”

Please help!

Dale

Jim Frost says

Hi Dale,

Personally, I find Likert data to be aggravating. I know it’s easy to ask those types of questions in a survey. It easy for respondents to figure out how to answer. However, Likert scale data are ordinal data, which presents analysis problems because they’re a bit like continuous data and a bit like categorical data. How do you treat and analyze them? There’s been a long standing debate over whether you should use parameteric or nonparametric analyses for them. The study I cite suggests that when you’re looking at two sample analyses, such as for male versus female, it doesn’t matter much as long as you have at least 10 observations per group. Your data qualifies, so I’d say you could use either approach. If others in your field use means, it’s probably OK to go that route and not fight the current! And, you can cite this article to support the decision.

For cases where there are more than two groups, such as qualification, I don’t know if there is similar research. I would expect it would show similar results, although I don’t know what the minimum sample sizes per group would be. If you do go with the nonparametric analyses, analysts will often report the medians for each group.

Finally, I think your question about “big difference” touches upon the difference between statistical significance versus practical significance. Click that link for my post on that topic. In a nutshell, a statistically significant results doesn’t necessarily guarantee that the effect or difference is important in the real world.

Best of luck with your analysis!

UMA MAHESH KUMMARI says

Sir, If you had made a video on it, please post it here.

Thank you.

Jim Frost says

Hi Uma,

Currently, I don’t have videos. In the future, I plan to create video lessons.

Antehun Atanaw says

Sir, the note is fine but still I am not clear about how to analyze Likert Scale(with five choices) using SPSS data analysis. I do believe that SPSS can only perform on Likert scale to show frequency, mean meridian. standred devation and other T-test, ANOVA, correlation and the others are not possible to be done by SPSS.

Mean that Likert scale didn’t show basic statstics.

Mrs Helen E. Pearson says

Hello,

I’m analysing data from someone else’s questionnaire, I have 148 respondents for a 40 statement questionnaire. It uses a four point Likert scale. I’m using Wilcoxon Signed ranks to analyse significant improvement in scores between test/retest. I’d also like to correlated some of the variables.

There are four main areas of interest, and each of those is divided into sub-sections. My problem is that some

of the sub-sections are negatively scored, yet the section they relate to is totalled.

is it reasonable to manipulate the reversed scored items so all the ‘unhelpful’ high scores become low ‘helpful’ scores?

Jim Frost says

Hi Helen,

If think what you want to do makes sense. That’ll allow you to obtain high, positive correlations when helpful scores correlate with other helpful scores. Otherwise, you’ll obtain negative correlations. You’d still obtain the same information, but it’s less intuitive and potentially confusing to others.

Nadine says

I am conducting an experimental research to test the effectiveness of task-based approach on improving the students’ literacy skills. I had two groups: experimental and control and each answered a questionnaire of 17 items using likert scale to express their attitude towards the lesson given (with and without task-based approach). What statistical test should I use to test the effectiveness of this approach on improving the students’ literacy skills?

I appreciate your answer asap.

Jim Frost says

Based on the information and research that I present in this blog post, it should be clear that you can use either the two-sample t-test or the Mann-Whitney test. Unfortunately, many reviewers and advisors might have strong opinions about one being more appropriate than the other, but the research shows that either test is valid overall.

Samuel kobina otu says

Hello sir, please do we have anything called ‘test value’ when analyzing data from a likert scale? What is the meaning of that term?

Qaisar Sohail says

Hi jim

it was really informative post. sir i have a data set of 32 variable which is filled from 180 respondent. my all variable are on nominal scale and likert scale. their is no response variable. can i took gender variable as a response variable and use logistic regression?? if it is not suitable than tell me which type of analysis can i done rather than of cross table???

i am waiting for your response.

Syed Abbas says

Hi Jim,

Thanks for another great article.

I have a question follow to that of NA that you explained here. How we should treat ‘don’t know’ response, do the ‘don’t know’ suppose to be treated the same as NA. usually we exclude ‘Don’t know’ and replaced them with SYS MIS in SPSS when we use agree scales (10 point) in regressions. Is this the right way of doing or we should assign some code to them???

Kaushal Kumar Bhagat says

Dear Jim,

Thanks a lot for your prompt reply. I am just wondering how to do chi-square test of independence. In your example you have two IV (color and area) but in my case I have only items. Kindly ellaborate the process of chi-square test of independence. for my case.

I have a sincere request. Please video lessons statstical analysis usning any software like SPSS/Mplus/R

Waiting for your kind reply.

Jim Frost says

Hi Kaushal,

You could one item that uses the Likert scale for one variable and another Likert scale item as the other variable. If you collapse it to three values as you describe, this would give you a 3X3 grid. If you perform a chi-square analysis on this, you’d learn whether the two items are independent or if there is sufficient evidence to conclude that they are associated. If they’re associated, you might find that those who agree on one item are more likely to agree on the other item, or maybe they’d be more likely to disagree. Or, perhaps, there is no association between the two items. How they respond to one item does not correlate with how they respond on the other. It’s very similar to the example I use, but you’ll have a 3X3 grid that has all the combinations of agree/agree, agree/neutral, agree/disagree, disagree/agree, disagree/neutral, disagree/disagree, and so on. Do respondents fall into those cells randomly or does there response on one item correlate to their response on the other item?

I’ve actually done exactly that with survey results years ago. I surveyed faculty about their comfort in using technology personally and in the classroom. Each time used a five point Likert scale. I didn’t convert to a 3 point scale like you’re considering. Not only the table show where they fall for both items, you can see how their responses compare for both items. I saw a pattern that showed faculty were less comfortable using technology in the classroom than on their own.

I am planning to create courses as you describe. So, those are coming! However, I have two books planned first. Hopefully, I can begin creating the courses in 2020. It’s definitely something I want to do!

Kaushal Kumar Bhagat says

Hi Jim,

First of all, I would like to thank you.I always read your post. It is very informative and helpful. Following is my query:

Objective: To check for significant differences between the proportions of disagreeing, neutral, and agreeing students

Let us suppose that i58, i59, i60 belongs to Factor A. ( strongly agree=1, agree=2, neutral=3, disagree=4, strongly disagree=5).

1.How to collapse the five-point scale into three categories: ‘disagree’ (i.e. strongly disagree and disagree), ‘neutral’ (i.e. neither agree nor disagree) and ‘agree’ (i.e. agree and strongly agree).

2. Please illustrate how to find significant differences between the proportions of disagreeing, neutral, and agreeing students using chi-square test in excel or SPSS.

Item Strongly agree Agree Neutral Disagree Strongly disagree

i58 270 440 63 11 8

i59 400 354 28 5 5

i60 239 428 104 15 5

Jim Frost says

Hi Kaushal,

I’m so happy to hear that my posts have been helpful.

I’d recommend using one of the methods I discuss in this post for analyzing a five-point Likert scale data. Both are shown to work effectively.

If you’re set on collapsing categories, that’s just a recoding issue. All 1s and 2s become “agree” and all 4s and 5s become “disagree.” Zeros are neutral. With this three-point scale, you might not be able to use t-tests or Mann-Whitney as I discuss in this post. I haven’t heard of using a proportions test on this type of data. You could try ordinal logistic regression or chi-square test of independence.

Best of luck with your analysis!

Voon Teng says

Dear Sir, greeting.

I examined satisfaction level of something with likert scale (5 ratings), is it possible to run simple independent t test for this with Age/Gender variable? I have 7 questions under satisfaction level section.

I read that independent T test just for interval scale instead of ordinal scale.

Very appreciated if can get your reply. Thank you

Jim Frost says

Hi Voon,

As I write in this post, yes, you can use a 2-sample t-test, which is for independent samples, with 5 point Likert scale items.

Deidre Whitfield says

How do you analyze NA data when computing the average using the Likert? Should you use zero as the value or can you assign it a value? If so, is there a best value to give NA responses?

Jim Frost says

Hi Deidre,

NA responses can be difficult to include in your analysis. There’s no one size fits all answer. You’ll need to determine if NA fits in logically with your scale, and what value it represents. That’ll vary based on the subject area and the scale.

In some cases, NA values may need to be excluded. For example, in a strongly agree to strongly disagree scale, if NA truly means not applicable, the respondent is indicating that the item does not apply to them. In those cases, you should consider excluding their response from the dataset for that item. You’ll have to think about whether NA is different than say Neither agree or disagree, or whatever the middle value is.

However, if you can take NA to represent some sort of middle value, or something else, you can use it for that. However, you have to be very careful. And, in fact, I’d say that if you can use NA to represent some other value for that question, it represents bad survey design because you have two different options for item that are equivalent. If NA maps to another option, it’s probably best to not even include NA as option for that item in the first place. For example, if the question is, how strongly does an issue affect you? And the scale ranges from very strongly affects me to does not affect me at all, an NA response probably corresponds to does not affect me at all. But, why include both because they’re redundant?

For that reason, my guess is that NA does not map directly to another option most of the time. But, you’ll have to consider the scale and whether a value on it maps to NA. I can see cases where NA might equal no opinion.

Traci says

Jim,

Love you website, it is easy to understand and has helped me a lot. I have a question. I have 2 sets of survey data. One from patients (n=42) and one from staff (n=12). There are two sub-scales that I want to compare the two groups on but am concerned about the difference in sample sizes. Would I use a Mann-Whitney just on the two sub-scales?? The sub-scales were measured on a 5-point Likert scale. Your help is appreciated.

Jim Frost says

Hi Traci, thanks for the kind words. I really appreciate them!

I wouldn’t worry about the unequal sample sizes as long as your smallest group has more than 10, which it does.

The benefit of equal sized group comes in the form of statistical power, which the ability to detect a difference. It appears like you have 54 observations. Now, if you had two groups each with 27 observations to produce that total of 54, your test would have more power than what you’ve actually got. However, reality isn’t always nice and neat, and you have to work with what you’ve got. So, it’s fine to test those two groups. The statistical power is somewhat less than what it would’ve been with equal sized groups, but it’s not inherently problematic.

Shuchi says

Hi Jim,

I m a learner and doing a correlational research on job satisfaction and attitude of teachers,for job satisfaction I have used a likert scale.In this scale there are 8 factors of job satisfaction and there are fixed number of statements for each factor like factor A has 7 statements.Also the scale has 52 statements.Before data analysis can I use mean score of each participant i.e. raw score /52.And same approach for factor analysis too i.e. raw score of factor A /7

Jim Frost says

Hi Shuchi,

There’s general rule of thumb that if you have a discrete variable that has 10 equally spaced values or more, and the data are spread across those values, you can treat it as a continuous variable. If you satisfy that, I think you’re safe. And, it sounds like if you’re summing the scores for those statements you’ll be ok. Using the average is

probablyok too because you are effectively using the same amount of information.Joseph says

My is not a response to the question asked but a more clarification on this issue. Assuming one uses a four point scale can we still use t-test? Also can we call a four point scale likert?

Jim Frost says

Hi Joseph,

Unfortunately, I don’t know for sure. I haven’t read research about it that says one way or another. The research that is the basis for this post only assessed 5-point scales. Honestly, I was surprised that t-tests worked as well as they did for 5 point scales! Four points scales satisfy the t-test assumptions even less than 5 point scales. So, I think that would be risky–but I can’t say that I know for sure.

As for the terminology, I’ve seen it argued both ways. Some say that a Likert scale specifically refers to a 5-point ordinal scale. While I’ve seen others say that it doesn’t have to be 5-points. My own take is that there are probably other more important considerations for what constitutes a Likert scale. Namely, that the values need to be balanced between positive and negative relative to a neutral value. Additionally, the distance between values are equal.

In other words, a Likert scale is a special type of ordinal data scale. Ordinal data don’t require those properties (balance, neutral value, and equal spacing), but in my mind, Likert scales do require those properties, but don’t require specifically 5-points. With four points, I’m not sure that your data can satisfy all of those requirements.

I hope this helps!

Haris says

Hi Jim, I know this is a long time since your post. I had a query.

I am involved with analysing some days where an educational intervention was performed for students. Likert scale was used to assess pre- and post- intervention changes in knowledge, confidence level etc.

I do not think one can use the scale data like Likert as they are, to assess significance with t-test or the Mann-Whitney test.

Do you think we should assign numerical values to the scale data before using the tests of statistical significance?

For example 0.2 for a Likert reading of 1 on a scale of 5, 0.8 for 4/5 etc. ?

Bokossa Sidoine says

in my comprehension we can you use 2-sample t-test or Mann-Whitney If we have two groups and analyzing five-point Likert.i have one question.

what about if we have more than two groups and more than five-point Likert?

very intersting thanks you so much..

Jim Frost says

That’s correct. As for the other cases you mention, it looks promising but we can’t say definitively from this research. However, as you increase the number of values (e.g., a 7-point scale), the data are becoming more like a continuous variable, which is good. And the F-test in ANOVA is a generalization of the t-test. So, the results should be applicable to these other cases. The question in my mind is that as you increase the number of groups with ANOVA, you’d need to be sure to keep the number of observations per group at a good number. So, it looks promising for these other cases that you mention, but I can’t state definitively that it’s true based on the specific research that I’ve read.

niaz hussain ghumro says

Good and very informative

Jim Frost says

Thank you!

Naveen Kumar S says

Hi sir,

am Naveen Kumar S, from india. recently on 1st july 2017 GST was implemented across India and am writing a research paper on GST and the issues faced by the respondents (both CAs and tax payers) after GST implementation. for this i had received the responses through likert scale based questions and now stuck in analyzing the data. dont know in which perspective i have to initiate (the main theme is-issues faced by them in post GST implementation) and also as a learner cant able to frame the null and altenate hypothesis…

pls help me in this regard and give some hint/ solution for the same as early as possible…

thanks in advance

with regards

Naveen S

[email protected]