Precision in predictive analytics refers to how close the model’s predictions are to the observed values. The more precise the model, the closer the data points are to the predictions. When you have an imprecise model, the observations tend to be further away from the predictions, thereby reducing the usefulness of the predictions. If you have a model that is not sufficiently precise, you risk making costly mistakes! [Read more…] about Understand Precision in Predictive Analytics to Avoid Costly Mistakes

# graphs

## Heteroscedasticity in Regression Analysis

Heteroscedasticity means unequal scatter. In regression analysis, we talk about heteroscedasticity in the context of the residuals or error term. Specifically, heteroscedasticity is a systematic change in the spread of the residuals over the range of measured values. Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all residuals are drawn from a population that has a constant variance (homoscedasticity).

To satisfy the regression assumptions and be able to trust the results, the residuals should have a constant variance. In this blog post, I show you how to identify heteroscedasticity, explain what produces it, the problems it causes, and work through an example to show you several solutions. [Read more…] about Heteroscedasticity in Regression Analysis

## World Travel, Rough Roads, and Manually Adjusting Graph Scales!

As my family and I were being rattled around in a four-wheel drive vehicle in the remote Osa Peninsula in Costa Rica, it struck me that traveling to exotic locations is just like manually adjusting the scales on graphs! That’s probably not what you were expecting, but let me explain! Unlike most of my statistical blog posts, this one gets a bit philosophical! [Read more…] about World Travel, Rough Roads, and Manually Adjusting Graph Scales!

## How to Interpret Regression Models that have Significant Variables but a Low R-squared

Does your regression model have a low R-squared? That seems like a problem—but it might not be. Learn what a low R-squared does and does not mean for your model. [Read more…] about How to Interpret Regression Models that have Significant Variables but a Low R-squared

## How to Identify the Distribution of Your Data

You’re probably familiar with data that follow the normal distribution. The normal distribution is that nice, familiar bell-shaped curve. Unfortunately, not all data are normally distributed or as intuitive to understand. You can picture the symmetric normal distribution, but what about the Weibull or Gamma distributions? This uncertainty might leave you feeling unsettled. In this post, I show you how to identify the probability distribution of your data. [Read more…] about How to Identify the Distribution of Your Data

## How Hypothesis Tests Work: Significance Levels (Alpha) and P values

Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population. In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant.

You hear about results being statistically significant all of the time. But, what do significance levels, P values, and statistical significance actually represent? Why do we even need to use hypothesis tests in statistics? [Read more…] about How Hypothesis Tests Work: Significance Levels (Alpha) and P values

## How Hypothesis Tests Work: Confidence Intervals and Confidence Levels

A confidence interval is calculated from a sample and provides a range of values that likely contains the unknown value of a population parameter. In this post, I demonstrate how confidence intervals and confidence levels work using graphs and concepts instead of formulas. In the process, you’ll see how confidence intervals are very similar to P values and significance levels. [Read more…] about How Hypothesis Tests Work: Confidence Intervals and Confidence Levels

## How t-Tests Work: t-Values, t-Distributions, and Probabilities

T-tests are statistical hypothesis tests that you use to analyze one or two sample means. Depending on the t-test that you use, you can compare a sample mean to a hypothesized value, the means of two independent samples, or the difference between paired samples. In this post, I show you how t-tests use t-values and t-distributions to calculate probabilities and test hypotheses.

As usual, I’ll provide clear explanations of t-values and t-distributions using concepts and graphs rather than formulas! If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How t-Tests Work: t-Values, t-Distributions, and Probabilities

## How to Interpret the Constant (Y Intercept) in Regression Analysis

The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the *meaning* of the y-intercept in your regression output. [Read more…] about How to Interpret the Constant (Y Intercept) in Regression Analysis

## How F-tests work in Analysis of Variance (ANOVA)

Analysis of variance (ANOVA) uses F-tests to statistically assess the equality of means when you have three or more groups. In this post, I’ll answer several common questions about the F-test.

- How do F-tests work?
- Why do we analyze
*variances*to test*means*?

I’ll use concepts and graphs to answer these questions about F-tests in the context of a one-way ANOVA example. I’ll use the same approach that I use to explain how t-tests work. If you need a primer on the basics, read my hypothesis testing overview. [Read more…] about How F-tests work in Analysis of Variance (ANOVA)

## Check Your Residual Plots to Ensure Trustworthy Regression Results!

Use residual plots to check the assumptions of an OLS linear regression model. If you violate the assumptions, you risk producing results that you can’t trust. Residual plots display the residual values on the y-axis and fitted values, or another variable, on the x-axis. After you fit a regression model, it is crucial to check the residual plots. If your plots display unwanted patterns, you can’t trust the regression coefficients and other numeric results.

In this post, I explain the conceptual reasons why residual plots help ensure that your regression model is valid. I’ll also show you what to look for and how to fix the problems. [Read more…] about Check Your Residual Plots to Ensure Trustworthy Regression Results!

## When is Easter this Year?

When is Easter this year? I ask this question every year! This year, Easter occurs on April 16, 2017. Next year, Easter falls on April 1, 2018. I have a hard time remembering when it occurs in any given year. I think that March Easters are both early and unusual. Is that true?

Being a statistician, my first thought is to study the distribution of Easter dates. By analyzing the distribution, we can determine which dates are rare and which are common. How unusual are Easter dates in March? Are there patterns in the dates? [Read more…] about When is Easter this Year?

## Statistics, Exoplanets, and the Search for Earthlike Planets

I love astronomy! The discovery of thousands of exoplanets has made it only more exciting. You often hear about the really weird planets in the news. You know, things like low density puffballs, hot Jupiters, rogue planets, planets that orbit their star in hours, and even a Jupiter mass planet that is one huge diamond! As neat as these discoveries are, I also want to know how Earth fits in. [Read more…] about Statistics, Exoplanets, and the Search for Earthlike Planets