conceptual

Median Definition and Uses

In statistics, the median is the value that splits an ordered list of data values in half. Half the values are below it and half are above—it’s right in the middle of the dataset. The median is the same as the second quartile or the 50th percentile. It is one of several measures of central tendency. [Read more…] about Median Definition and Uses

Independent and Dependent Variables: Differences & Examples

By Jim Frost 15 Comments

Independent variables and dependent variables are the two fundamental types of variables in statistical modeling and experimental designs. Analysts use these methods to understand the relationships between the variables and estimate effect sizes. What effect does one variable have on another?

In this post, learn the definitions of independent and dependent variables, how to identify each type, how they differ between different types of studies, and see examples of them in use. [Read more…] about Independent and Dependent Variables: Differences & Examples

Standard Deviation: Interpretations and Calculations

By Jim Frost 21 Comments

The standard deviation (SD) is a single number that summarizes the variability in a dataset. It represents the typical distance between each data point and the mean. Smaller values indicate that the data points cluster closer to the mean—the values in the dataset are relatively consistent. Conversely, higher values signify that the values spread out further from the mean. Data values become more dissimilar, and extreme values become more likely. [Read more…] about Standard Deviation: Interpretations and Calculations

What is the Mean and How to Find It: Definition & Formula

By Jim Frost 4 Comments

What is the Mean?

The mean in math and statistics summarizes an entire dataset with a single number representing the data’s center point or typical value. It is also known as the arithmetic mean, and it is the most common measure of central tendency. It is frequently called the “average.” [Read more…] about What is the Mean and How to Find It: Definition & Formula

Gamma Distribution: Uses, Parameters & Examples

By Jim Frost 20 Comments

What is the Gamma Distribution?

The gamma distribution is a continuous probability distribution that models right-skewed data. Statisticians have used this distribution to model cancer rates, insurance claims, and rainfall. Additionally, the gamma distribution is similar to the exponential distribution, and you can use it to model the same types of phenomena: failure times, wait times, service times, etc. [Read more…] about Gamma Distribution: Uses, Parameters & Examples

Exponential Distribution: Uses, Parameters & Examples

By Jim Frost 6 Comments

What is the Exponential Distribution?

The exponential distribution is a right-skewed continuous probability distribution that models variables in which small values occur more frequently than higher values. It is a unimodal distribution where small values have relatively high probabilities, which consistently decline as data values increase. Statisticians use the exponential distribution to model the amount of change in people’s pockets, the length of phone calls, and sales totals for customers. In all these cases, small values are more likely than larger values. [Read more…] about Exponential Distribution: Uses, Parameters & Examples

Weibull Distribution: Uses, Parameters & Examples

By Jim Frost 6 Comments

What is a Weibull Distribution?

The Weibull distribution is a continuous probability distribution that can fit an extensive range of distribution shapes. Like the normal distribution, the Weibull distribution is unimodal and describes probabilities associated with continuous data. However, unlike the normal distribution, it can also model skewed data. In fact, its extreme flexibility allows it to model both left- and right-skewed data. [Read more…] about Weibull Distribution: Uses, Parameters & Examples

Poisson Distribution: Definition & Uses

By Jim Frost 11 Comments

What is the Poisson Distribution?

The Poisson distribution is a discrete probability distribution that describes probabilities for counts of events that occur in a specified observation space. It is named after Siméon Denis Poisson.

In statistics, count data represent the number of events or characteristics over a given length of time, area, volume, etc. For example, you can count the number of cigarettes smoked per day, meteors seen per hour, the number of defects in a batch, and the occurrence of a particular crime by county. [Read more…] about Poisson Distribution: Definition & Uses

Standard Error of the Mean (SEM)

By Jim Frost 26 Comments

The standard error of the mean (SEM) is a bit mysterious. You’ll frequently find it in your statistical output. Is it a measure of variability? How does the standard error of the mean compare to the standard deviation? How do you interpret it?

In this post, I answer all these questions about the standard error of the mean, show how it relates to sample size considerations and statistical significance, and explain the general concept of other types of standard errors. In fact, I view standard errors as the doorway from descriptive statistics to inferential statistics. You’ll see how that works! [Read more…] about Standard Error of the Mean (SEM)

Autocorrelation and Partial Autocorrelation in Time Series Data

By Jim Frost 14 Comments

Autocorrelation is the correlation between two observations at different points in a time series. For example, values that are separated by an interval might have a strong positive or negative correlation. When these correlations are present, they indicate that past values influence the current value. Analysts use the autocorrelation and partial autocorrelation functions to understand the properties of time series data, fit the appropriate models, and make forecasts. [Read more…] about Autocorrelation and Partial Autocorrelation in Time Series Data

Using Combinations to Calculate Probabilities

By Jim Frost 6 Comments

Combinations in probability theory and other areas of mathematics refer to a sequence of outcomes where the order does not matter. For example, when you’re ordering a pizza, it doesn’t matter whether you order it with ham, mushrooms, and olives or olives, mushrooms, and ham. You’re getting the same pizza! [Read more…] about Using Combinations to Calculate Probabilities

Law of Large Numbers

By Jim Frost 4 Comments

What is the Law of Large Numbers in Statistics?

The Law of Large Numbers is a cornerstone concept in statistics and probability theory. This law asserts that as the number of trials or samples increases, the observed outcomes tend to converge closer to the expected value. [Read more…] about Law of Large Numbers

Using Permutations to Calculate Probabilities

By Jim Frost 8 Comments

Permutations in probability theory and other branches of mathematics refer to sequences of outcomes where the order matters. For example, 9-6-8-4 is a permutation of a four-digit PIN because the order of numbers is crucial. When calculating probabilities, it’s frequently necessary to calculate the number of possible permutations to determine an event’s probability.

In this post, I explain permutations and show how to calculate the number of permutations both with repetition and without repetition. Finally, we’ll work through a step-by-step example problem that uses permutations to calculate a probability. [Read more…] about Using Permutations to Calculate Probabilities

Spearman’s Correlation Explained

By Jim Frost 53 Comments

Spearman’s correlation in statistics is a nonparametric alternative to Pearson’s correlation. Use Spearman’s correlation for data that follow curvilinear, monotonic relationships and for ordinal data. Statisticians also refer to Spearman’s rank order correlation coefficient as Spearman’s ρ (rho).

In this post, I’ll cover what all that means so you know when and why you should use Spearman’s correlation instead of the more common Pearson’s correlation. [Read more…] about Spearman’s Correlation Explained

Effect Sizes in Statistics

By Jim Frost 23 Comments

Effect sizes in statistics quantify the differences between group means and the relationships between variables. While analysts often focus on statistical significance using p-values, effect sizes determine the practical importance of the findings. [Read more…] about Effect Sizes in Statistics

Proxy Variables: The Good Twin of Confounding Variables

By Jim Frost 10 Comments

Proxy variables are easily measurable variables that analysts include in a model in place of a variable that cannot be measured or is difficult to measure. Proxy variables can be something that is not of any great interest itself, but has a close correlation with the variable of interest. [Read more…] about Proxy Variables: The Good Twin of Confounding Variables

Multiplication Rule for Calculating Probabilities

By Jim Frost 7 Comments

The multiplication rule in probability allows you to calculate the joint probability of multiple events occurring together using known probabilities of those events individually. There are two forms of this rule, the specific and general multiplication rules.

In this post, learn about when and how to use both the specific and general multiplication rules. Additionally, I’ll use and explain the standard notation for probabilities throughout, helping you learn how to interpret it. We’ll work through several example problems so you can see them in action. There’s even a bonus problem at the end! [Read more…] about Multiplication Rule for Calculating Probabilities

Using Contingency Tables to Calculate Probabilities

By Jim Frost 18 Comments

Contingency tables are a great way to classify outcomes and calculate different types of probabilities. These tables contain rows and columns that display bivariate frequencies of categorical data. Analysts also refer to contingency tables as crosstabulation (cross tabs), two-way tables, and frequency tables.

Statisticians use contingency tables for a variety of reasons. I love these tables because they both organize your data and allow you to answer a diverse set of questions. In this post, I focus on using them to calculate different types of probabilities. These probabilities include joint, marginal, and conditional probabilities. [Read more…] about Using Contingency Tables to Calculate Probabilities

Probability Definition and Fundamentals

By Jim Frost 10 Comments

What is Probability?

The definition of probability is the likelihood of an event happening. Probability theory analyzes the chances of events occurring. You can think of probabilities as being the following:

The long-term proportion of times an event occurs during a random process.
The propensity for a particular outcome to occur.

Common terms for describing probabilities include likelihood, chances, and odds. [Read more…] about Probability Definition and Fundamentals

Variance Inflation Factors (VIFs)

By Jim Frost 22 Comments

Variance Inflation Factors (VIFs) measure the correlation among independent variables in least squares regression models. Statisticians refer to this type of correlation as multicollinearity. Excessive multicollinearity can cause problems for regression models.

In this post, I focus on VIFs and how they detect multicollinearity, why they’re better than pairwise correlations, how to calculate VIFs yourself, and interpreting VIFs. If you need a refresher about the types of problems that multicollinearity causes and how to fix them, read my post: Multicollinearity: Problems, Detection, and Solutions. [Read more…] about Variance Inflation Factors (VIFs)