A z-score measures the distance between a data point and the mean using standard deviations. Z-scores can be positive or negative. The sign tells you whether the observation is above or below the mean. For example, a z-score of +2 indicates that the data point falls two standard deviations above the mean, while a -2 signifies it is two standard deviations below the mean. A z-score of zero equals the mean. Statisticians also refer to z-scores as standard scores, and I’ll use those terms interchangeably. [Read more…] about Z-score: Definition, Formula, and Uses
The law of large numbers states that as the number of trials increases, sample values tend to converge on the expected result. The two forms of this law lay the foundation for both statistics and probability theory.
In this post, I explain both forms of the law, simulate them in action, and explain why they’re crucial for statistics and probability! [Read more…] about Law of Large Numbers
Chebyshev’s Theorem estimates the minimum proportion of observations that fall within a specified number of standard deviations from the mean. This theorem applies to a broad range of probability distributions. Chebyshev’s Theorem is also known as Chebyshev’s Inequality. [Read more…] about Chebyshev’s Theorem in Statistics
In my post about how to interpret p-values, I emphasize that p-values are not an error rate. The number one misinterpretation of p-values is that they are the probability of the null hypothesis being correct.
The correct interpretation is that p-values indicate the probability of observing your sample data, or more extreme, when you assume the null hypothesis is true. If you don’t solidly grasp that correct interpretation, please take a moment to read that post first.
Hopefully, that’s clear.
Unfortunately, one part of that blog post confuses some readers. In that post, I explain how p-values are not a probability, or error rate, of a hypothesis. I then show how that misinterpretation is dangerous because it overstates the evidence against the null hypothesis. [Read more…] about P-Values, Error Rates, and False Positives
The Birthday Problem in statistics asks, how many people do you need in a group to have a 50% chance that at least two people will share a birthday? Go ahead and think about that for a moment. The answer surprises many people. We’ll get to that shortly.
In this post, I’ll not only answer the birthday paradox, but I’ll also show you how to calculate the probabilities for any size group, run a computer simulation of it, and explain why the answer to the Birthday Problem is so surprising. [Read more…] about Answering the Birthday Problem in Statistics
Luck, statistics, and probabilities go together hand-in-hand. Clint Eastwood, playing Dirty Harry, famously asked a bad guy who was about to reach for his rifle whether he felt lucky. I’m quite sure that the crook carefully pondered the nature of luck, probabilities, and expected outcomes before deciding not to grab his rifle!
A while ago, I did something shocking . . . something that I hadn’t done for several decades. Just like the thief in the Dirty Harry movie, I started thinking about luck. Yes, you guessed it: I bought a lottery ticket for the record-breaking Mega Millions Jackpot. This purchase is shocking for someone like me who knows statistics and is fully aware of how unlikely it is to win. Did I feel lucky? Or was I just a punk? [Read more…] about Luck and Statistics: Do You Feel Lucky, Punk?
The normal distribution, also known as the Gaussian distribution, is the most important probability distribution in statistics for independent, random variables. Most people recognize its familiar bell-shaped curve in statistical reports.
The normal distribution is a continuous probability distribution that is symmetrical around its mean, most of the observations cluster around the central peak, and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely. While the normal distribution is symmetrical, not all symmetrical distributions are normal. For example, the Student’s t, Cauchy, and logistic distributions are symmetric.
As with any probability distribution, the normal distribution describes how the values of a variable are distributed. It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena. Characteristics that are the sum of many independent processes frequently follow normal distributions. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution.
In this blog post, learn how to use the normal distribution, about its parameters, the Empirical Rule, and how to calculate Z-scores to standardize your data and find probabilities. [Read more…] about Normal Distribution in Statistics
What is a Probability Distribution?
A probability distribution is a statistical function that describes the likelihood of obtaining all possible values that a random variable can take. In other words, the values of the variable vary based on the underlying probability distribution. Typically, analysts display probability distributions in graphs and tables. There are equations to calculate probability distributions.
Suppose you draw a random sample and measure the heights of the subjects. As you measure heights, you create a distribution of heights. This type of distribution is useful when you need to know which outcomes are most likely, the spread of potential values, and the likelihood of different results.
In this blog post, you’ll learn about probability distributions for both discrete and continuous variables. I’ll show you how they work and examples of how to use them. [Read more…] about Probability Distribution: Definition & Calculations
Binary data occur when you can place an observation into only two categories. It tells you that an event occurred or that an item has a particular characteristic. For instance, an inspection process produces binary pass/fail results. Or, when a customer enters a store, there are two possible outcomes—sale or no sale. In this post, I show you how to use the binomial, geometric, negative binomial, and the hypergeometric probability distributions to glean more information from your binary data. [Read more…] about Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions
T-tests are statistical hypothesis tests that you use to analyze one or two sample means. Depending on the t-test that you use, you can compare a sample mean to a hypothesized value, the means of two independent samples, or the difference between paired samples. In this post, I show you how t-tests use t-values and t-distributions to calculate probabilities and test hypotheses.
As usual, I’ll provide clear explanations of t-values and t-distributions using concepts and graphs rather than formulas! If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How t-Tests Work: t-Values, t-Distributions, and Probabilities
Analysis of variance (ANOVA) uses F-tests to statistically assess the equality of means when you have three or more groups. In this post, I’ll answer several common questions about the F-test.
- How do F-tests work?
- Why do we analyze variances to test means?
I’ll use concepts and graphs to answer these questions about F-tests in the context of a one-way ANOVA example. I’ll use the same approach that I use to explain how t-tests work. If you need a primer on the basics, read my hypothesis testing overview.
Happy Saint Patrick’s Day! This holiday got me thinking about four-leaf clovers and probability theory. Now, I know that four-leaf clovers are not Shamrocks. And, it is shamrocks that are actually associated with St. Patrick’s Day. A shamrock is a young patch of three-leaf white clover that grows in winter. Nonetheless, the holiday started me thinking about four-leaf clovers and probabilities. [Read more…] about How Probability Theory Can Help You Find More Four-Leaf Clovers
Who would’ve thought that an old TV game show could inspire a statistical problem that has tripped up mathematicians and statisticians with Ph.Ds? The Monty Hall problem has confused people for decades. In the game show, Let’s Make a Deal, Monty Hall asks you to guess which closed door a prize is behind. The answer is so puzzling that people often refuse to accept it! The problem occurs because our statistical assumptions are incorrect.