assumptions

Omitted Variable Bias: Definition, Avoiding & Example

What is Omitted Variable Bias?

Omitted variable bias (OVB) occurs when a regression model excludes a relevant variable. The absence of these critical variables can skew the estimated relationships between variables in the model, potentially leading to erroneous interpretations. This bias can exaggerate, mask, or entirely flip the direction of the estimated relationship between an independent and dependent variable. [Read more…] about Omitted Variable Bias: Definition, Avoiding & Example

One Way ANOVA Overview & Example

By Jim Frost Leave a Comment

What is One Way ANOVA?

Use one way ANOVA to compare the means of three or more groups. This analysis is an inferential hypothesis test that uses samples to draw conclusions about populations. Specifically, it tells you whether your sample provides sufficient evidence to conclude that the groups’ population means are different. ANOVA stands for analysis of variance. [Read more…] about One Way ANOVA Overview & Example

One Sample T Test: Definition, Using & Example

By Jim Frost Leave a Comment

What is a One Sample T Test?

Use a one sample t test to evaluate a population mean using a single sample. Usually, you conduct this hypothesis test to determine whether a population mean differs from a hypothesized value you specify. The hypothesized value can be theoretically important in the study area, a reference value, or a target. [Read more…] about One Sample T Test: Definition, Using & Example

T Test Overview: How to Use & Examples

By Jim Frost 12 Comments

What is a T Test?

A t test is a statistical hypothesis test that assesses sample means to draw conclusions about population means. Frequently, analysts use a t test to determine whether the population means for two groups are different. For example, it can determine whether the difference between the treatment and control group means is statistically significant. [Read more…] about T Test Overview: How to Use & Examples

Wilcoxon Signed Rank Test Explained

By Jim Frost Leave a Comment

What is the Wilcoxon Signed Rank Test?

The Wilcoxon signed rank test is a nonparametric hypothesis test that can do the following:

Evaluate the median difference between two paired samples.
Compare a 1-sample median to a reference value.

Kruskal Wallis Test Explained

By Jim Frost Leave a Comment

What is the Kruskal Wallis Test?

The Kruskal Wallis test is a nonparametric hypothesis test that compares three or more independent groups. Statisticians also refer to it as one-way ANOVA on ranks. This analysis extends the Mann Whitney U nonparametric test that can compare only two groups. [Read more…] about Kruskal Wallis Test Explained

Mann Whitney U Test Explained

By Jim Frost 8 Comments

What is the Mann Whitney U Test?

The Mann Whitney U test is a nonparametric hypothesis test that compares two independent groups. Statisticians also refer to it as the Wilcoxon rank sum test. The Kruskal Wallis test extends this analysis so that can compare more than two groups. [Read more…] about Mann Whitney U Test Explained

Trimmed Mean: Definition, Calculating & Benefits

By Jim Frost 12 Comments

What is a Trimmed Mean?

The trimmed mean is a statistical measure that calculates a dataset’s average after removing a certain percentage of extreme values from both ends of the distribution. By excluding outliers, this statistic can provide a more accurate representation of a dataset’s typical or central values. Usually, you’ll trim a percentage of values, such as 10% or 20%. [Read more…] about Trimmed Mean: Definition, Calculating & Benefits

ANCOVA: Uses, Assumptions & Example

By Jim Frost 1 Comment

What is ANCOVA?

ANCOVA, or the analysis of covariance, is a powerful statistical method that analyzes the differences between three or more group means while controlling for the effects of at least one continuous covariate. [Read more…] about ANCOVA: Uses, Assumptions & Example

Z Test: Uses, Formula & Examples

By Jim Frost Leave a Comment

What is a Z Test?

Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ. [Read more…] about Z Test: Uses, Formula & Examples

Paired T Test: Definition & When to Use It

By Jim Frost 5 Comments

What is a Paired T Test?

Use a paired t-test when each subject has a pair of measurements, such as a before and after score. A paired t-test determines whether the mean change for these pairs is significantly different from zero. This test is an inferential statistics procedure because it uses samples to draw conclusions about populations.

Paired t tests are also known as a paired sample t-test or a dependent samples t test. These names reflect the fact that the two samples are paired or dependent because they contain the same subjects. Conversely, an independent samples t test contains different subjects in the two samples. [Read more…] about Paired T Test: Definition & When to Use It

Independent Samples T Test: Definition, Using & Interpreting

By Jim Frost 3 Comments

What is an Independent Samples T Test?

Use an independent samples t test when you want to compare the means of precisely two groups—no more and no less! Typically, you perform this test to determine whether two population means are different. This procedure is an inferential statistical hypothesis test, meaning it uses samples to draw conclusions about populations. The independent samples t test is also known as the two sample t test. [Read more…] about Independent Samples T Test: Definition, Using & Interpreting

Variance Inflation Factors (VIFs)

By Jim Frost 22 Comments

Variance Inflation Factors (VIFs) measure the correlation among independent variables in least squares regression models. Statisticians refer to this type of correlation as multicollinearity. Excessive multicollinearity can cause problems for regression models.

In this post, I focus on VIFs and how they detect multicollinearity, why they’re better than pairwise correlations, how to calculate VIFs yourself, and interpreting VIFs. If you need a refresher about the types of problems that multicollinearity causes and how to fix them, read my post: Multicollinearity: Problems, Detection, and Solutions. [Read more…] about Variance Inflation Factors (VIFs)

Independent and Identically Distributed Data (IID)

By Jim Frost 4 Comments

Having independent and identically distributed (IID) data is a common assumption for statistical procedures and hypothesis tests. But what does that mouthful of words actually mean? That’s the topic of this post! And, I’ll provide helpful tips for determining whether your data are IID. [Read more…] about Independent and Identically Distributed Data (IID)

Guidelines for Removing and Handling Outliers in Data

By Jim Frost 70 Comments

Outliers are unusual values in your dataset, and they can distort statistical analyses and violate their assumptions. Unfortunately, all analysts will confront outliers and be forced to make decisions about what to do with them. Given the problems they can cause, you might think that it’s best to remove them from your data. But, that’s not always the case. Removing outliers is legitimate only for specific reasons. [Read more…] about Guidelines for Removing and Handling Outliers in Data

When Can I Use One-Tailed Hypothesis Tests?

By Jim Frost 16 Comments

One-tailed hypothesis tests offer the promise of more statistical power compared to an equivalent two-tailed design. While there is some debate about when you can use a one-tailed test, the general consensus among statisticians is that you should use two-tailed tests unless you have concrete reasons for using a one-tailed test.

In this post, I discuss when you should and should not use one-tailed tests. I’ll cover the different schools of thought and offer my own opinion. [Read more…] about When Can I Use One-Tailed Hypothesis Tests?

Central Limit Theorem Explained

By Jim Frost 103 Comments

The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable will approximate a normal distribution regardless of that variable’s distribution in the population.

Unpacking the meaning from that complex definition can be difficult. That’s the topic for this post! I’ll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics. [Read more…] about Central Limit Theorem Explained

Introduction to Bootstrapping in Statistics with an Example

By Jim Frost 106 Comments

Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics. Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and valid for more conditions.

In this blog post, I explain bootstrapping basics, compare bootstrapping to conventional statistical methods, and explain when it can be the better method. Additionally, I’ll work through an example using real data to create bootstrapped confidence intervals. [Read more…] about Introduction to Bootstrapping in Statistics with an Example

Confounding Variable: Definition & Examples

By Jim Frost 86 Comments

Confounding Variable Definition

In studies examining possible causal links, a confounding variable is an unaccounted factor that impacts both the potential cause and effect and can distort the results. Recognizing and addressing these variables in your experimental design is crucial for producing valid findings. Statisticians also refer to confounding variables that cause bias as confounders, omitted variables, and lurking variables. [Read more…] about Confounding Variable: Definition & Examples

The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

By Jim Frost 32 Comments

The Gauss-Markov theorem states that if your linear regression model satisfies the first six classical assumptions, then ordinary least squares (OLS) regression produces unbiased estimates that have the smallest variance of all possible linear estimators. [Read more…] about The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates