Blog

New eBook Release! Regression Analysis: An Intuitive Guide

I’m thrilled to announce the release of my first book! Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

If you like the clear writing style I use on this website, you’ll love this book! The end of the post displays the entire table of contents! [Read more…] about New eBook Release! Regression Analysis: An Intuitive Guide

Using Confidence Intervals to Compare Means

By Jim Frost 66 Comments

To determine whether the difference between two means is statistically significant, analysts often compare the confidence intervals for those groups. If those intervals overlap, they conclude that the difference between groups is not statistically significant. If there is no overlap, the difference is significant.

While this visual method of assessing the overlap is easy to perform, regrettably it comes at the cost of reducing your ability to detect differences. Fortunately, there is a simple solution to this problem that allows you to perform a simple visual assessment and yet not diminish the power of your analysis.

In this post, I’ll start by showing you the problem in action and explain why it happens. Then, we’ll proceed to an easy alternative method that avoids this problem. [Read more…] about Using Confidence Intervals to Compare Means

Can High P-values Be Meaningful?

By Jim Frost 33 Comments

Can high p-values be helpful? What do high p-values mean?

Typically, when you perform a hypothesis test, you want to obtain low p-values that are statistically significant. Low p-values are sexy. They represent exciting findings and can help you get articles published.

However, you might be surprised to learn that higher p-values, the ones that are not statistically significant, are also valuable. In this post, I’ll show you the potential value of a p-value that is greater than 0.05, or whatever significance level you’re using. [Read more…] about Can High P-values Be Meaningful?

Using Histograms to Understand Your Data

By Jim Frost 25 Comments

Histograms are graphs that display the distribution of your continuous data. They are fantastic exploratory tools because they reveal properties about your sample data in ways that summary statistics cannot. For instance, while the mean and standard deviation can numerically summarize your data, histograms bring your sample data to life.

In this blog post, I’ll show you how histograms reveal the shape of the distribution, its central tendency, and the spread of values in your sample data. You’ll also learn how to identify outliers, how histograms relate to probability distribution functions, and why you might need to use hypothesis tests with them.
[Read more…] about Using Histograms to Understand Your Data

Using Post Hoc Tests with ANOVA

By Jim Frost 138 Comments

Post hoc tests are an integral part of ANOVA. When you use ANOVA to test the equality of at least three group means, statistically significant results indicate that not all of the group means are equal. However, ANOVA results do not identify which particular differences between pairs of means are significant. Use post hoc tests to explore differences between multiple group means while controlling the experiment-wise error rate.

In this post, I’ll show you what post hoc analyses are, the critical benefits they provide, and help you choose the correct one for your study. Additionally, I’ll show why failure to control the experiment-wise error rate will cause you to have severe doubts about your results. [Read more…] about Using Post Hoc Tests with ANOVA

When Can I Use One-Tailed Hypothesis Tests?

By Jim Frost 16 Comments

One-tailed hypothesis tests offer the promise of more statistical power compared to an equivalent two-tailed design. While there is some debate about when you can use a one-tailed test, the general consensus among statisticians is that you should use two-tailed tests unless you have concrete reasons for using a one-tailed test.

In this post, I discuss when you should and should not use one-tailed tests. I’ll cover the different schools of thought and offer my own opinion. [Read more…] about When Can I Use One-Tailed Hypothesis Tests?

One-Tailed and Two-Tailed Hypothesis Tests Explained

By Jim Frost 61 Comments

Choosing whether to perform a one-tailed or a two-tailed hypothesis test is one of the methodology decisions you might need to make for your statistical analysis. This choice can have critical implications for the types of effects it can detect, the statistical power of the test, and potential errors.

In this post, you’ll learn about the differences between one-tailed and two-tailed hypothesis tests and their advantages and disadvantages. I include examples of both types of statistical tests. In my next post, I cover the decision between one and two-tailed tests in more detail.
[Read more…] about One-Tailed and Two-Tailed Hypothesis Tests Explained

Central Limit Theorem Explained

By Jim Frost 107 Comments

The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable will approximate a normal distribution regardless of that variable’s distribution in the population.

Unpacking the meaning from that complex definition can be difficult. That’s the topic for this post! I’ll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics. [Read more…] about Central Limit Theorem Explained

Introduction to Bootstrapping in Statistics with an Example

By Jim Frost 118 Comments

Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics. Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and valid for more conditions.

In this blog post, I explain bootstrapping basics, compare bootstrapping to conventional statistical methods, and explain when it can be the better method. Additionally, I’ll work through an example using real data to create bootstrapped confidence intervals. [Read more…] about Introduction to Bootstrapping in Statistics with an Example

Confounding Variable: Definition & Examples

By Jim Frost 86 Comments

Confounding Variable Definition

In studies examining possible causal links, a confounding variable is an unaccounted factor that impacts both the potential cause and effect and can distort the results. Recognizing and addressing these variables in your experimental design is crucial for producing valid findings. Statisticians also refer to confounding variables that cause bias as confounders, omitted variables, and lurking variables. [Read more…] about Confounding Variable: Definition & Examples

Assessing Normality: Histograms vs. Normal Probability Plots

By Jim Frost 8 Comments

Because histograms display the shape and spread of distributions, you might think they’re the best type of graph for determining whether your data are normally distributed. However, I’ll show you how histograms can trick you! Normal probability plots are a better choice for this task and they are easy to use. Normal probability plots are also known as quantile-quantile plots, or Q-Q Plots for short!
[Read more…] about Assessing Normality: Histograms vs. Normal Probability Plots

Sample Statistics Are Always Wrong (to Some Extent)!

By Jim Frost 11 Comments

Here’s some shocking information for you—sample statistics are always wrong! When you use samples to estimate the properties of populations, you never obtain the correct values exactly. Don’t worry. I’ll help you navigate this issue using a simple statistical tool! [Read more…] about Sample Statistics Are Always Wrong (to Some Extent)!

Luck and Statistics: Do You Feel Lucky, Punk?

By Jim Frost 9 Comments

Clint Eastwood asking the punk if he was lucky. — Do you feel lucky, Punk?

Luck, statistics, and probabilities go together hand-in-hand. Clint Eastwood, playing Dirty Harry, famously asked a bad guy who was about to reach for his rifle whether he felt lucky. I’m quite sure that the crook carefully pondered the nature of luck, probabilities, and expected outcomes before deciding not to grab his rifle!

A while ago, I did something shocking . . . something that I hadn’t done for several decades. Just like the thief in the Dirty Harry movie, I started thinking about luck. Yes, you guessed it: I bought a lottery ticket for the record-breaking Mega Millions Jackpot. This purchase is shocking for someone like me who knows statistics and is fully aware of how unlikely it is to win. Did I feel lucky? Or was I just a punk? [Read more…] about Luck and Statistics: Do You Feel Lucky, Punk?

Populations, Parameters, and Samples in Inferential Statistics

By Jim Frost 25 Comments

Inferential statistics lets you draw conclusions about populations by using small samples. Consequently, inferential statistics provide enormous benefits because typically you can’t measure an entire population.

However, to gain these benefits, you must understand the relationship between populations, subpopulations, population parameters, samples, and sample statistics.

In this blog post, learn the differences between population vs. sample, parameter vs. statistic, and how to obtain representative samples using random sampling.

Types I & Type II Errors in Hypothesis Testing

By Jim Frost 11 Comments

In hypothesis testing, a Type I error is a false positive while a Type II error is a false negative. In this blog post, you will learn about these two types of errors, their causes, and how to manage them.

Hypothesis tests use sample data to make inferences about the properties of a population. You gain tremendous benefits by working with random samples because it is usually impossible to measure the entire population.

However, there are tradeoffs when you use samples. The samples we use are typically a minuscule percentage of the entire population. Consequently, they occasionally misrepresent the population severely enough to cause hypothesis tests to make Type I and Type II errors. [Read more…] about Types I & Type II Errors in Hypothesis Testing

Practical vs. Statistical Significance

By Jim Frost 27 Comments

You’ve just performed a hypothesis test and your results are statistically significant. Hurray! These results are important, right? Not so fast. Statistical significance does not necessarily mean that the results are practically significant in a real-world sense of importance.

In this blog post, I’ll talk about the differences between practical significance and statistical significance, and how to determine if your results are meaningful in the real world.
[Read more…] about Practical vs. Statistical Significance

The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

By Jim Frost 32 Comments

The Gauss-Markov theorem states that if your linear regression model satisfies the first six classical assumptions, then ordinary least squares (OLS) regression produces unbiased estimates that have the smallest variance of all possible linear estimators. [Read more…] about The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

By Jim Frost 161 Comments

Ordinary Least Squares (OLS) is the most common estimation method for linear models—and that’s true for a good reason. As long as your model satisfies the OLS assumptions for linear regression, you can rest easy knowing that you’re getting the best possible estimates. [Read more…] about 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

Normal Distribution in Statistics

By Jim Frost 184 Comments

The normal distribution, also known as the Gaussian distribution, is the most important probability distribution in statistics for independent, random variables. Most people recognize its familiar bell-shaped curve in statistical reports.

The normal distribution is a continuous probability distribution that is symmetrical around its mean, most of the observations cluster around the central peak, and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely. While the normal distribution is symmetrical, not all symmetrical distributions are normal. For example, the Student’s t, Cauchy, and logistic distributions are symmetric.

As with any probability distribution, the normal distribution describes how the values of a random variable are distributed. It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena. Characteristics that are the sum of many independent processes frequently follow normal distributions. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution.

In this blog post, learn how to use the normal distribution, about its parameters, the Empirical Rule, and how to calculate Z-scores to standardize your data and find probabilities.

Probability Distribution: Definition & Calculations

By Jim Frost 73 Comments

What is a Probability Distribution?

A probability distribution is a statistical function that describes the likelihood of obtaining all possible values that a random variable can take. In other words, the values of the variable vary based on the underlying probability distribution. Typically, analysts display probability distributions in graphs and tables. There are equations to calculate probability distributions.

Suppose you draw a random sample and measure the heights of the subjects. As you measure heights, you create a distribution of heights. This type of distribution is useful when you need to know which outcomes are most likely, the spread of potential values, and the likelihood of different results.

In this blog post, you’ll learn about probability distributions for both discrete and continuous variables. I’ll show you how they work and examples of how to use them. [Read more…] about Probability Distribution: Definition & Calculations