Welch’s ANOVA is an alternative to the traditional analysis of variance (ANOVA) and it offers some serious benefits. One-way analysis of variance determines whether differences between the means of at least three groups are statistically significant. For decades, introductory statistics classes have taught the classic Fishers one-way ANOVA that uses the F-test. It’s a standard statistical analysis, and you might think it’s pretty much set in stone by now. Surprise, there’s a significant change occurring in the world of one-way analysis of variance! [Read more…] about Benefits of Welch’s ANOVA Compared to the Classic One-Way ANOVA
Search Results for: Range
When is Easter this Year?
When is Easter in 2024? I ask this question every year! The next Easter occurs on March 31, 2024. And then, in the next year, Easter falls on April 20, 2025. I have a hard time remembering when it occurs in any given year. I think that March Easters are both early and unusual. Is that true?
Being a statistician, my first thought is to study the distribution of Easter dates. By analyzing the distribution, we can determine which dates are rare and which are common. How unusual are Easter dates in March? Are there patterns in the dates? [Read more…] about When is Easter this Year?
How to Analyze Likert Scale Data
How do you analyze Likert scale data? Likert scales are the most broadly used method for scaling responses in survey studies. Survey questions that ask you to indicate your level of agreement, from strongly agree to strongly disagree, use the Likert scale. The data in the worksheet are five-point Likert scale data for two groups [Read more…] about How to Analyze Likert Scale Data
Standard Error of the Regression vs. R-squared
The standard error of the regression (S) and R-squared are two key goodness-of-fit measures for regression analysis. While R-squared is the most well-known amongst the goodness-of-fit statistics, I think it is a bit over-hyped. The standard error of the regression is also known as residual standard error.
[Read more…] about Standard Error of the Regression vs. R-squared
Statistics, Exoplanets, and the Search for Earthlike Planets
I love astronomy! The discovery of thousands of exoplanets has made it only more exciting. You often hear about the really weird planets in the news. You know, things like low density puffballs, hot Jupiters, rogue planets, planets that orbit their star in hours, and even a Jupiter mass planet that is one huge diamond! As neat as these discoveries are, I also want to know how Earth fits in. [Read more…] about Statistics, Exoplanets, and the Search for Earthlike Planets
Inferential statistics
Inferential statistics use a random sample to draw conclusions about the population. Typically, it is not practical to obtain data from every member of a population. Instead, we collect a random sample from a small proportion of the population. From the sample, statistical procedures can infer the likely properties of the population.
For example, it is impractical to measure the height of every adult woman, but you can measure the heights of a random sample and use that information to make generalizations about the heights of all women. For example, a confidence interval provides a range that the population mean height is likely to fall in.
Descriptive statistics
Descriptive statistics are numbers that summarize data, such as the mean, standard deviation, percentages, rates, counts, and range. Descriptive statistics simply describe the data but do not try to generalize beyond the data.
For example, we can describe starting salaries of college majors by calculating the mean salary and the range for each type of major. We can also describe the percentage of college graduates by major who obtain jobs within six months of graduation.
To be able to generalize beyond the sample, you need to use inferential statistics.
Correlation
A correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. A correlation coefficient measures both the direction and the strength of this tendency to vary together.
- A positive correlation indicates that as one variable increases the other variable tends to increase.
- A correlation near zero indicates that as one variable increases, there is no tendency in the other variable to either increase or decrease.
- A negative correlation indicates that as one variable increases the other variable tends to decrease.
The correlation coefficient can range from -1 to 1. The extreme values of -1 and 1 indicate a perfectly linear relationship where a change in one variable is accompanied by a perfectly consistent change in the other. In practice, you won’t see either type of perfect relationship.
The two most common types of correlation coefficients are Pearson’s product moment correlation and the Spearman rank-order correlation.
Pearson product moment correlation
The Pearson correlation evaluates the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable.
Spearman rank-order correlation
Also called Spearman’s rho, the Spearman correlation evaluates the monotonic relationship between two continuous or ordinal variables. In a monotonic relationship, the variables tend to change together, but not necessarily at a constant rate. The Spearman correlation coefficient is based on the ranked values for each variable rather than the raw data.
Confidence interval of the prediction
A confidence interval of the prediction provides a range of values for the mean response associated with specific predictor settings. For example, for a 95% confidence interval of the prediction of [7 8], you can be 95% confident that the mean response will fall within this range.
The prediction interval is always wider than the confidence interval because of the added uncertainty involved in predicting a single response versus the mean response.
Prediction intervals [PI]
A prediction interval is a range of values that is likely to contain the value of a single new observation given specified settings of the predictors. For example, for a 95% prediction interval of [5 10], you can be 95% confident that the next new observation will fall within this range.
After you fit a regression model that provides an adequate fit to the data, you can use the model to generate predictions based on specific predictor values. However, predictions are not as simple as a single predicted value. The predicted value is actually the mean response value. Like any mean, there is variability around that mean.
Prediction intervals account for the variability around the mean response inherent in any prediction. Like confidence intervals, predictions intervals have a confidence level and can be a two-sided range, or an upper or lower bound. Unlike confidence intervals, prediction intervals predict the spread for individual observations rather than the mean.
Note that a prediction interval is different than a confidence interval of the prediction. The prediction interval is always wider than the confidence interval of the prediction because of the added uncertainty involved in predicting a single response versus the mean response.
Glossary
Are you puzzled by strange statistical terms or abbreviations? Are you looking for a statistical dictionary that explains these statistical terms in plain English? You’re at the right place! Jim’s Statistics Glossary lists and explains the most commonly used terms in statistics. This is the best place for those learning statistics to start and familiarize themselves with statistical jargon. If you would like for me to explain something that is not listed here, please contact me.
- Alternative hypothesis
- Binary logistic regression
- Binary variables
- Categorical variables
- Qualitative variables
- Attribute variables
- Confidence interval of the prediction
- Continuous variables
- Correlation
- Pearson product moment correlation
- Spearman rank-order correlation
- Descriptive statistics
- Effect
- Estimator
- Biased estimator
- Unbiased estimator
- Factors
- Fitted line plots
- Fitted values
- Predicted values
- Fixed and Random factors
- Random factors
- Inferential statistics
- Statistical inference
- Mode
- Nominal logistic regression
- Nominal variables
- Ordinal logistic regression
- Ordinary least squares
- Linear least squares
- OLS
- Outliers
- P-value
- Parameter
- Poisson variables
- Population
- Prediction intervals
- PI
- R-squared
- Coefficient of determination
- Regression analysis
- Regression coefficients
- Coefficients
- Reliability
- Residuals
- Sample
- Significance level
- Alpha
- Standard error of the regression
- Standardization
- Standard scores
- Statistics
- Validity
About Me
I’m Jim Frost, and I have extensive experience in academic research and consulting projects. In addition to my statistics website, I am a regular columnist for the American Society of Quality’s Statistics Digest. Additionally, my most recent journal publication as a coauthor is The Neutral Gas Properties of Extremely Isolated Early-Type Galaxies III (2019) for the American Astronomical Society (abstract).
I’ve been the “data/stat guy” for research projects that range from osteoporosis prevention to analysis of online user behavior. My role has been to design the proper research settings, collect a large amount of valid measurements, and figure out what it all means. Typically, I’m the first person on the project to learn about new findings while interpreting the results of the statistical analysis. Even if the findings are not newsworthy, that thrill of discovery is an awesome job perk!
I love statistics and analyzing data! I’ve been performing statistical analysis on-the-job for 20 years and helping people learn statistics for over ten years at a statistical software company. I love talking and writing about statistics.
My Approach to Teaching Statistics
I want to help you learn statistics. But, I’m not talking about learning all the equations. Don’t get me wrong. Equations are necessary. Equations are the framework that makes the magic, but the truly fascinating aspects are what it all means. I want you to learn the true essence of statistics. I’ll help you intuitively understand statistics by focusing on concepts and graphs. Although, there might be a few equations!
I’ve spent over a decade working at a major statistical software company. When you work on research projects, you generally use a regular group of statistical analyses. However, when you work at a statistical software company, you need to know of all the analyses that are in the software! I helped people use our software to gain insights and maximize the value of their own data regardless of their field.
While working at the statistical software company, I learned how to present statistics in a manner that makes statistics more intuitive. I’ll be writing about my experiences and useful information about statistics. However, I’ll focus on teaching the concepts in an intuitive way and deemphasize the formulas. After all, you use statistical software so you don’t have to worry about the formulas and instead focus on understanding the results.
Statistics is an Amazing Field!
Statistics is the field of learning from data. That’s amazing. It gets to the very essence of discovery. Statistics facilitates the creation of new knowledge. Bit by bit, we push back the frontier of what is known. That is what I want to teach you! The goal of my website is to help you to see statistics through my eyes―as a key that can unlock discoveries that are in your data.
The best thing about being a statistician is that you get to play in everyone’s backyard.—John Tukey
I enthusiastically agree! If you have an inquisitive mind, statistical knowledge, and data, the potential is boundless. You can play in a broad range of intriguing backyards!
That interface between a muddled reality and obtaining orderly, valid data is an exciting place. This place ties together the lofty goals of scientists to the nitty-gritty nature of the real world. It’s an interaction that I’ve written about extensively on my blog, and I plan to continue to do so. It’s where the rubber meets the road.
One of the coolest things about the statistical analysis is that it provides you with a toolkit for exploring the unknown. Christopher Columbus needed many tools to navigate to the New World and make his discoveries. Statistics are the equivalent tools for the scientific explorer because they help you navigate the sea of data that you collect.
Why You Need to Understand Statistics
The world is becoming a progressively data-driven place, and to draw trustworthy conclusions, you must analyze your data properly. It’s surprisingly easy to make a costly mistake. Even if you’re not performing your own studies, you’ll undoubtedly see statistical analyses conducted by others. Can you trust their results or do they have their own agenda?
Just like there were many wrong ways for Columbus to use his tools, things can go awry with statistical analyses. I’m going to teach you how to use the tools correctly, to draw the proper conclusions, and to recognize the conclusions that should make you wary!
You’ll be increasingly thankful for these tools when you see a worksheet filled with numbers and you’re responsible for telling everyone what it all means.