A correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. Understanding that relationship is useful because we can use the value of one variable to predict the value of the other variable. For example, height and weight are correlated—as height increases, weight also tends to increase. Consequently, if we observe an individual who is unusually tall, we can predict that his weight is also above the average. [Read more…] about Interpreting Correlation Coefficients

# conceptual

## Estimating a Good Sample Size for Your Study Using Power Analysis

Determining a good sample size for a study is always an important issue. After all, using the wrong sample size can doom your study from the start. Fortunately, power analysis can find the answer for you. Power analysis combines statistical analysis, subject-area knowledge, and your requirements to help you derive the optimal sample size for your study.

Statistical power in a hypothesis test is the probability that the test will detect an effect that actually exists. As you’ll see in this post, both under-powered and over-powered studies are problematic. Let’s learn how to find a good sample size for your study! [Read more…] about Estimating a Good Sample Size for Your Study Using Power Analysis

## Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall further away.

In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data. [Read more…] about Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

## Measures of Central Tendency: Mean, Median, and Mode

A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method.

Choosing the best measure of central tendency depends on the type of data you have. In this post, I explore these measures of central tendency, show you how to calculate them, and how to determine which one is best for your data. [Read more…] about Measures of Central Tendency: Mean, Median, and Mode

## Difference between Descriptive and Inferential Statistics

Descriptive and inferential statistics are two broad categories in the field of statistics. In this blog post, I show you how both types of statistics are important for different purposes. Interestingly, some of the statistical measures are similar, but the goals and methodologies are very different. [Read more…] about Difference between Descriptive and Inferential Statistics

## Learn How Anecdotal Evidence Can Trick You!

Anecdotal evidence is a story told by individuals. It comes in many forms that can range from product testimonials to word of mouth. It’s often testimony, or a short account, about the truth or effectiveness of a claim. Typically, anecdotal evidence focuses on individual results, is driven by emotion, and presented by individuals who are not subject area experts. [Read more…] about Learn How Anecdotal Evidence Can Trick You!

## The Importance of Statistics

The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results. Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions. Statistics allows you to understand a subject much more deeply. [Read more…] about The Importance of Statistics

## Statistical Hypothesis Testing Overview

In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. [Read more…] about Statistical Hypothesis Testing Overview

## Understanding Interaction Effects in Statistics

Interaction effects occur when the effect of one variable depends on the value of another variable. Interaction effects are common in regression analysis, ANOVA, and designed experiments. In this blog post, I explain interaction effects, how to interpret them in statistical designs, and the problems you will face if you don’t include them in your model. [Read more…] about Understanding Interaction Effects in Statistics

## When Should I Use Regression Analysis?

Use regression analysis to describe the relationships between a set of independent variables and the dependent variable. Regression analysis produces a regression equation where the coefficients represent the relationship between each independent variable and the dependent variable. You can also use the equation to make predictions.

As a statistician, I should probably tell you that I love all statistical analyses equally—like parents with their kids. But, shhh, I have secret! Regression analysis is my favorite because it provides tremendous flexibility, which makes it useful in so many different circumstances. In fact, I’ve described regression analysis as taking correlation to the next level!

In this blog post, I explain the capabilities of regression analysis, the types of relationships it can assess, how it controls the variables, and generally why I love it! You’ll learn when you should consider using regression analysis. [Read more…] about When Should I Use Regression Analysis?

## Degrees of Freedom in Statistics

In statistics, the degrees of freedom (DF) indicate the number of independent values that can vary in an analysis without breaking any constraints. It is an important idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and regression analysis. Learn how this fundamental concept affects the power and precision of your statistical analysis!

In this blog post, I bring this concept to life in an intuitive manner. I’ll start by defining degrees of freedom. However, I’ll quickly move on to practical examples in a variety of contexts because they make this concept easier to understand. [Read more…] about Degrees of Freedom in Statistics

## Why Are There No P Values in Nonlinear Regression?

Nonlinear regression analysis cannot calculate P values for the independent variables in your model. Why not? And, what do you use instead? Those are the topics of this blog post. [Read more…] about Why Are There No P Values in Nonlinear Regression?

## Five Regression Analysis Tips to Avoid Common Problems

Regression is a very powerful statistical analysis. It allows you to isolate and understand the effects of individual variables, model curvature and interactions, and make predictions. Regression analysis offers high flexibility but presents a variety of potential pitfalls. Great power requires great responsibility!

In this post, I offer five tips that will not only help you avoid common problems but also make the modeling process easier. I’ll close by showing you the difference between the modeling process that a top analyst uses versus the procedure of a less rigorous analyst. [Read more…] about Five Regression Analysis Tips to Avoid Common Problems

## What is the Relationship Between the Reproducibility of Experimental Results and P Values?

The ability to reproduce experimental results should be related to P values. After all, both of these statistical concepts have similar foundations.

- P values help you separate the signal of population level effects from the noise in sample data.
- Reproducible results support the notion that the findings can be generalized to the population rather than applying only to a specific sample.

So, P values are related to reproducibility in theory. But, does this relationship exist in the real world? In this blog post, I present the findings of an exciting study that answers this question! [Read more…] about What is the Relationship Between the Reproducibility of Experimental Results and P Values?

## Understand Precision in Predictive Analytics to Avoid Costly Mistakes

Precision in predictive analytics refers to how close the model’s predictions are to the observed values. The more precise the model, the closer the data points are to the predictions. When you have an imprecise model, the observations tend to be further away from the predictions, thereby reducing the usefulness of the predictions. If you have a model that is not sufficiently precise, you risk making costly mistakes! [Read more…] about Understand Precision in Predictive Analytics to Avoid Costly Mistakes

## Heteroscedasticity in Regression Analysis

Heteroscedasticity means unequal scatter. In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term. Specifically, heteroscedasticity is a systematic change in the spread of the residuals over the range of measured values. Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).

To satisfy the regression assumptions and be able to trust the results, the residuals should have a constant variance. In this blog post, I show you how to identify heteroscedasticity, explain what produces it, the problems it causes, and work through an example to show you several solutions. [Read more…] about Heteroscedasticity in Regression Analysis

## How to Choose Between Linear and Nonlinear Regression

As you fit regression models, you might need to make a choice between linear and nonlinear regression models. The field of statistics can be weird. Despite their names, both forms of regression can fit curvature in your data. So, how do you choose? In this blog post, I show you how to choose between linear and nonlinear regression models. [Read more…] about How to Choose Between Linear and Nonlinear Regression

## Statistics, Old Love Letters, and Changing Times

Have you ever seen your present reflected in an object from the past? This summer I’ve discovered glimpses of my daily life working with statistical software in words written more than 70 years ago. Bear with me because this blog post takes the scenic route to arrive at modern statistics. [Read more…] about Statistics, Old Love Letters, and Changing Times

## Why Are P Values Misinterpreted So Frequently?

P values are commonly misinterpreted. It’s a very slippery concept that requires a lot of background knowledge to understand. Not surprisingly, I’ve received many questions about P values in statistical hypothesis testing over the years. However, one question stands out. *Why* are P value misinterpretations so prevalent? I answer that question in this blog post, and help you avoid making the same mistakes. [Read more…] about Why Are P Values Misinterpreted So Frequently?

## Model Specification: Choosing the Correct Regression Model

Model specification is the process of determining which independent variables to include and exclude from a regression equation. How do you choose the best regression model? The world is complicated, and trying to explain it with a small sample doesn’t help. In this post, I’ll show you how to select the correct model. I’ll cover statistical methods, difficulties that can arise, and provide practical suggestions for selecting your model. Often, the variable selection process is a mixture of statistics, theory, and practical knowledge. [Read more…] about Model Specification: Choosing the Correct Regression Model