• Skip to secondary menu
  • Skip to content
  • Skip to primary sidebar
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Fun
  • Glossary
  • Blog
  • Recommendations

graphs

Using Histograms to Understand Your Data

By Jim Frost 7 Comments

Histogram that displays a multimodal distribution.

Histograms are graphs that display the distribution of your continuous data. They are fantastic exploratory tools because they reveal properties about your sample data in ways that summary statistics cannot. For instance, while the mean and standard deviation can numerically summarize your data, histograms bring your sample data to life.

In this blog post, I’ll show you how histograms reveal the shape of the distribution, its central tendency, and the spread of values in your sample data. You’ll also learn how to identify outliers, how histograms relate to probability distribution functions, and why you might need to use hypothesis tests with them.
[Read more…] about Using Histograms to Understand Your Data

Filed Under: Basics Tagged With: choosing analysis, data types, graphs

Boxplots vs. Individual Value Plots: Graphing Continuous Data by Groups

By Jim Frost 6 Comments

Example of a boxplot that displays scores by teaching method.

Graphing your data before performing statistical analysis is a crucial step. Graphs bring your data to life in a way that statistical measures do not because they display the relationships and patterns. In this blog post, you’ll learn about using boxplots and individual value plots to compare distributions of continuous measurements between groups. You’ll also learn why you need to pair these plots with hypothesis tests when you want to make inferences about a population. [Read more…] about Boxplots vs. Individual Value Plots: Graphing Continuous Data by Groups

Filed Under: Basics Tagged With: choosing analysis, data types, graphs

Using Post Hoc Tests with ANOVA

By Jim Frost 8 Comments

Graph that displays the simultaneous confidence intervals produced by Tukey's post hoc test.

Post hoc tests are an integral part of ANOVA. When you use ANOVA to test the equality of at least three group means, statistically significant results indicate that not all of the group means are equal. However, ANOVA results do not identify which particular differences between pairs of means are significant. Use post hoc tests to explore differences between multiple group means while controlling the experiment-wise error rate.

In this post, I’ll show you what post hoc analyses are, the critical benefits they provide, and help you choose the correct one for your study. Additionally, I’ll show why failure to control the experiment-wise error rate will cause you to have severe doubts about your results. [Read more…] about Using Post Hoc Tests with ANOVA

Filed Under: ANOVA Tagged With: analysis example, choosing analysis, conceptual, graphs, interpreting results

Central Limit Theorem Explained

By Jim Frost 20 Comments

The central limit theorem produces approximately normal sampling distributions in this histogram.

The central limit theorem in statistics states that, given a sufficiently large sample size, the sampling distribution of the mean for a variable will approximate a normal distribution regardless of that variable’s distribution in the population.

Unpacking the meaning from that complex definition can be difficult. That’s the topic for this post! I’ll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is so important in the field of statistics. [Read more…] about Central Limit Theorem Explained

Filed Under: Basics Tagged With: assumptions, conceptual, distributions, graphs

Introduction to Bootstrapping in Statistics with an Example

By Jim Frost 22 Comments

Photograph of a cowboy boot with straps.

Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics. Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and valid for more conditions.

In this blog post, I explain bootstrapping basics, compare bootstrapping to conventional statistical methods, and explain when it can be the better method. Additionally, I’ll work through an example using real data to create bootstrapped confidence intervals. [Read more…] about Introduction to Bootstrapping in Statistics with an Example

Filed Under: Hypothesis Testing Tagged With: analysis example, assumptions, choosing analysis, conceptual, distributions, graphs, interpreting results

Assessing Normality: Histograms vs. Normal Probability Plots

By Jim Frost 1 Comment

Normal probability plot that displays data that are normally distributed.

Because histograms display the shape and spread of distributions, you might think they’re the best type of graph for determining whether your data are normally distributed. However, I’ll show you how histograms can trick you! Normal probability plots are a better choice for this task and they are easy to use.
[Read more…] about Assessing Normality: Histograms vs. Normal Probability Plots

Filed Under: Basics Tagged With: distributions, graphs

Normal Distribution in Statistics

By Jim Frost 51 Comments

Graph that displays a normal distribution with areas divided by standard deviations.

The normal distribution is the most important probability distribution in statistics because it fits many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.

The normal distribution is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely.

In this blog post, you’ll learn how to use the normal distribution, its parameters, and how to calculate Z-scores to standardize your data and find probabilities. [Read more…] about Normal Distribution in Statistics

Filed Under: Basics Tagged With: conceptual, distributions, graphs, probability

Understanding Probability Distributions

By Jim Frost 18 Comments

Probability distribution plot that displays the distribution of body fat values for teenage girls.

A probability distribution is a function that describes the likelihood of obtaining the possible values that a random variable can assume. In other words, the values of the variable vary based on the underlying probability distribution.

Suppose you draw a random sample and measure the heights of the subjects. As you measure heights, you can create a distribution of heights. This type of distribution is useful when you need to know which outcomes are most likely, the spread of potential values, and the likelihood of different results.

In this blog post, you’ll learn about probability distributions for both discrete and continuous variables. I’ll show you how they work and examples of how to use them. [Read more…] about Understanding Probability Distributions

Filed Under: Basics Tagged With: conceptual, data types, distributions, graphs, interpreting results, probability

Understanding Correlation in Statistics

By Jim Frost 22 Comments

This scatterplot displays a positive correlation between height and weight.

A correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction.  Understanding that relationship is useful because we can use the value of one variable to predict the value of the other variable. For example, height and weight are correlated—as height increases, weight also tends to increase. Consequently, if we observe an individual who is unusually tall, we can predict that his weight is also above the average. [Read more…] about Understanding Correlation in Statistics

Filed Under: Basics Tagged With: conceptual, graphs, interpreting results

Estimating a Good Sample Size for Your Study Using Power Analysis

By Jim Frost 15 Comments

Power curve graph for the 2-sample t-test.

Determining a good sample size for a study is always an important issue. After all, using the wrong sample size can doom your study from the start. Fortunately, power analysis can find the answer for you. Power analysis combines statistical analysis, subject-area knowledge, and your requirements to help you derive the optimal sample size for your study.

Statistical power in a hypothesis test is the probability that the test will detect an effect that actually exists. As you’ll see in this post, both under-powered and over-powered studies are problematic. Let’s learn how to find a good sample size for your study! [Read more…] about Estimating a Good Sample Size for Your Study Using Power Analysis

Filed Under: Hypothesis Testing Tagged With: analysis example, conceptual, graphs, interpreting results

Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

By Jim Frost 28 Comments

Graph that shows two distributions with more and less variability.

A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall further away.

In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data. [Read more…] about Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

Filed Under: Basics Tagged With: conceptual, distributions, graphs

Measures of Central Tendency: Mean, Median, and Mode

By Jim Frost 24 Comments

A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method.

Choosing the best measure of central tendency depends on the type of data you have. In this post, I explore these measures of central tendency, show you how to calculate them, and how to determine which one is best for your data. [Read more…] about Measures of Central Tendency: Mean, Median, and Mode

Filed Under: Basics Tagged With: conceptual, distributions, graphs

Guide to Data Types and How to Graph Them in Statistics

By Jim Frost 10 Comments

Pie chart that displays categorical data of new car colors.

In the field of statistics, data are vital. Data are the information that you collect to learn, draw conclusions, and test hypotheses. After all, statistics is the science of learning from data. However, there are different types of variables, and they record various kinds of information. Crucially, the type of information determines what you can learn from it, and, importantly, what you cannot learn from it. Consequently, it’s essential that you understand the different types of data. [Read more…] about Guide to Data Types and How to Graph Them in Statistics

Filed Under: Basics Tagged With: data types, graphs

Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions

By Jim Frost 3 Comments

Photo of a coin toss to represent binary data.

Binary data occur when you can place an observation into only two categories. It tells you that an event occurred or that an item has a particular characteristic. For instance, an inspection process produces binary pass/fail results. Or, when a customer enters a store, there are two possible outcomes—sale or no sale. In this post, I show you how to use the binomial, geometric, negative binomial, and the hypergeometric distributions to glean more information from your binary data. [Read more…] about Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions

Filed Under: Basics Tagged With: distributions, graphs, probability

Understanding Interaction Effects in Statistics

By Jim Frost 205 Comments

Interactions plot for the taste test ANOVA design.


Interaction effects occur when the effect of one variable depends on the value of another variable. Interaction effects are common in regression analysis, ANOVA, and designed experiments. In this blog post, I explain interaction effects, how to interpret them in statistical designs, and the problems you will face if you don’t include them in your model. [Read more…] about Understanding Interaction Effects in Statistics

Filed Under: Regression Tagged With: analysis example, conceptual, graphs, interpreting results

Using Log-Log Plots to Determine Whether Size Matters

By Jim Frost Leave a Comment

Brian Cox showing a log-log plot.

Log-log plots display data in two dimensions where both axes use logarithmic scales. When one variable changes as a constant power of another, a log-log graph shows the relationship as a straight line. In this post, I’ll show you why these graphs are valuable and how to interpret them. [Read more…] about Using Log-Log Plots to Determine Whether Size Matters

Filed Under: Regression Tagged With: analysis example, graphs, interpreting results

Flu Shots, How Effective Are They?

By Jim Frost

Diagram of the influenza virus.

With the arrival of Fall in the Northern hemisphere, it’s flu season again.

Do you debate getting a flu shot every year? I do get flu shots every year. I realize that they’re not perfect, but I figure they’re a low-cost way to reduce my chances of a crummy week suffering from the flu.

The media report that flu shots have an effectiveness of approximately 68%. But, what does that mean exactly? What is the absolute reduction in risk? Are there long-term benefits?

In this blog post, I explore the effectiveness of flu shots from a statistical viewpoint. We’ll statistically analyze the data ourselves so we can go beyond the simplified accounts that the media presents. I’ll also model the long-term outcomes you can expect with regular flu vaccinations. By the time you finish this post, you’ll have a crystal clear picture of flu shot effectiveness. Some of the results surprised me! [Read more…] about Flu Shots, How Effective Are They?

Filed Under: Hypothesis Testing Tagged With: analysis example, distributions, graphs, interpreting results

Use Control Charts with Hypothesis Tests

By Jim Frost 7 Comments

Xbar-S control charts that show that the impact variability is in control but the impact means are not in control.

Typically, quality improvement analysts use control charts to assess business processes and don’t have hypothesis tests in mind. Do you know how control charts provide tremendous benefits in other settings and with hypothesis testing? Spoilers—control charts check an assumption that we often forget about for hypothesis tests! [Read more…] about Use Control Charts with Hypothesis Tests

Filed Under: Hypothesis Testing Tagged With: assumptions, graphs, quality improvement

Understand Precision in Predictive Analytics to Avoid Costly Mistakes

By Jim Frost Leave a Comment

A picture of a very worried man who just made a bad decision by misunderstanding precision.

Precision in predictive analytics refers to how close the model’s predictions are to the observed values. The more precise the model, the closer the data points are to the predictions. When you have an imprecise model, the observations tend to be further away from the predictions, thereby reducing the usefulness of the predictions. If you have a model that is not sufficiently precise, you risk making costly mistakes! [Read more…] about Understand Precision in Predictive Analytics to Avoid Costly Mistakes

Filed Under: Regression Tagged With: analysis example, conceptual, graphs, interpreting results

Heteroscedasticity in Regression Analysis

By Jim Frost 19 Comments

Residuals by fitted values plot that displays heteroscedasticity.

Heteroscedasticity means unequal scatter. In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term. Specifically, heteroscedasticity is a systematic change in the spread of the residuals over the range of measured values. Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).

To satisfy the regression assumptions and be able to trust the results, the residuals should have a constant variance. In this blog post, I show you how to identify heteroscedasticity, explain what produces it, the problems it causes, and work through an example to show you several solutions. [Read more…] about Heteroscedasticity in Regression Analysis

Filed Under: Regression Tagged With: assumptions, conceptual, graphs

  • Page 1
  • Page 2
  • Next Page »

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More…

Subscribe via Email!

Enter your email address to receive notifications of new posts by email.

Follow Me

  • FacebookFacebook
  • RSS FeedRSS Feed
  • TwitterTwitter
  • Popular
  • Latest
Popular
  • How To Interpret R-squared in Regression Analysis
  • How to Interpret P-values and Coefficients in Regression Analysis
  • Understanding Interaction Effects in Statistics
  • How to Interpret the F-test of Overall Significance in Regression Analysis
  • Choosing the Correct Type of Regression Analysis
  • The Importance of Statistics
  • Standard Error of the Regression vs. R-squared
Latest
  • Using Histograms to Understand Your Data
  • Boxplots vs. Individual Value Plots: Graphing Continuous Data by Groups
  • Using Post Hoc Tests with ANOVA
  • When Can I Use One-Tailed Hypothesis Tests?
  • One-Tailed and Two-Tailed Hypothesis Tests Explained
  • Central Limit Theorem Explained
  • Introduction to Bootstrapping in Statistics with an Example

Copyright © 2019 · Jim Frost