Data mining and regression seem to go together naturally. I’ve described regression as a seductive analysis because it is so tempting and so easy to add more variables in the pursuit of a larger R-squared. In this post, I’ll begin by illustrating the problems that data mining creates. To do this, I’ll show how data mining with regression analysis can take randomly generated data and produce a misleading model that appears to have significant variables and a good R-squared. Then, I’ll explain how data mining creates these deceptive results and how to avoid them. [Read more…] about Using Data Mining to Select Regression Models Can Create Serious Problems

# interpreting results

## Guide to Stepwise Regression and Best Subsets Regression

Automatic variable selection procedures are algorithms that pick the variables to include in your regression model. Stepwise regression and Best Subsets regression are two of the more common variable selection methods. In this post, I compare how these methods work and which one provides better results. [Read more…] about Guide to Stepwise Regression and Best Subsets Regression

## Goodness-of-Fit Tests for Discrete Distributions

Discrete probability distributions are based on discrete variables, which have a finite or countable number of values. In this post, I show you how to perform goodness-of-fit tests to determine how well your data fit various discrete probability distributions. [Read more…] about Goodness-of-Fit Tests for Discrete Distributions

## Examples of Hypothesis Tests: Busting Myths about the Battle of the Sexes

In my house, we love the Mythbusters TV show on the Discovery Channel. The Mythbusters conduct scientific investigations in their quest to test myths and urban legends. In the process, the show provides some fun examples of when and how you should use statistical hypothesis tests to analyze data. [Read more…] about Examples of Hypothesis Tests: Busting Myths about the Battle of the Sexes

## Making Predictions with Regression Analysis

If you were able to make predictions about something important to you, you’d probably love that, right? It’s even better if you know that your predictions are sound. In this post, I show how to use regression analysis to make predictions and determine whether they are both unbiased and precise. [Read more…] about Making Predictions with Regression Analysis

## Curve Fitting using Linear and Nonlinear Regression

In regression analysis, curve fitting is the process of specifying the model that provides the best fit to the specific curves in your dataset. Curved relationships between variables are not as straightforward to fit and interpret as linear relationships. [Read more…] about Curve Fitting using Linear and Nonlinear Regression

## Interpreting P values

P values determine whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis. P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!

Ironically, despite being so influential, P values are misinterpreted very frequently. What *is* the correct interpretation of P values? What do P values *really* mean? That’s the topic of this post! [Read more…] about Interpreting P values

## How To Interpret R-squared in Regression Analysis

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale. [Read more…] about How To Interpret R-squared in Regression Analysis

## How Hypothesis Tests Work: Significance Levels (Alpha) and P values

Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population. In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant.

You hear about results being statistically significant all of the time. But, what do significance levels, P values, and statistical significance actually represent? Why do we even need to use hypothesis tests in statistics? [Read more…] about How Hypothesis Tests Work: Significance Levels (Alpha) and P values

## How to Interpret P-values and Coefficients in Regression Analysis

P values and coefficients in regression analysis work together to tell you which relationships in your model are statistically significant and the nature of those relationships. The linear regression coefficients describe the mathematical relationship between each independent variable and the dependent variable. The p values for the coefficients indicate whether these relationships are statistically significant. [Read more…] about How to Interpret P-values and Coefficients in Regression Analysis

## Hypothesis Testing and Confidence Intervals

Confidence intervals and hypothesis testing are closely related because both methods use the same underlying methodology. Additionally, there is a close connection between significance levels and confidence levels. Indeed, there is such a strong link between them that hypothesis tests and the corresponding confidence intervals always agree about statistical significance.

A confidence interval is calculated from a sample and provides a range of values that likely contains the unknown value of a population parameter. To learn more about confidence intervals in general, how to interpret them, and how to calculate them, read my post about Understanding Confidence Intervals.

In this post, I demonstrate how confidence intervals work using graphs and concepts instead of formulas. In the process, I compare and contrast significance and confidence levels. You’ll learn how confidence intervals are similar to significance levels in hypothesis testing. You can even use confidence intervals to determine statistical significance.

Read the companion post for this one: How Hypothesis Tests Work: Significance Levels (Alpha) and P-values. In that post, I use the same graphical approach to illustrate why we need hypothesis tests, how significance levels and P-values can determine whether a result is statistically significant, and what that actually means.

**Significance Level vs. Confidence Level**

Let’s delve into how confidence intervals incorporate the margin of error. Like the previous post, I’ll use the same type of sampling distribution that showed us how hypothesis tests work. This sampling distribution is based on the t-distribution, our sample size, and the variability in our sample. Download the CSV data file: FuelsCosts.

There are two critical differences between the sampling distribution graphs for significance levels and confidence intervals–the value that the distribution centers on and the portion we shade.

The significance level chart centers on the null value, and we shade the outside 5% of the distribution.

Conversely, the confidence interval graph centers on the sample mean, and we shade the center 95% of the distribution.

The shaded range of sample means [267 394] covers 95% of this sampling distribution. This range is the 95% confidence interval for our sample data. We can be 95% confident that the population mean for fuel costs fall between 267 and 394.

## Confidence Intervals and the Inherent Uncertainty of Using Sample Data

The graph emphasizes the role of uncertainty around the point estimate. This graph centers on our sample mean. If the population mean equals our sample mean, random samples from this population (N=25) will fall within this range 95% of the time.

We don’t know whether our sample mean is near the population mean. However, we know that the sample mean is an unbiased estimate of the population mean. An unbiased estimate does not tend to be too high or too low. It’s correct on average. Confidence intervals are correct on average because they use sample estimates that are correct on average. Given what we know, the sample mean is the most likely value for the population mean.

Given the sampling distribution, it would not be unusual for other random samples drawn from the same population to have means that fall within the shaded area. In other words, given that we did, in fact, obtain the sample mean of 330.6, it would not be surprising to get other sample means within the shaded range.

If these other sample means would not be unusual, we must conclude that these other values are also plausible candidates for the population mean. There is inherent uncertainty when using sample data to make inferences about the entire population. Confidence intervals help gauge the degree of uncertainty, also known as the margin of error.

**Related post**: Sampling Distributions

**Confidence Intervals and Statistical Significance**

If you want to determine whether your hypothesis test results are statistically significant, you can use either P-values with significance levels or confidence intervals. These two approaches always agree.

The relationship between the confidence level and the significance level for a hypothesis test is as follows:

Confidence level = 1 – Significance level (alpha)

For example, if your significance level is 0.05, the equivalent confidence level is 95%.

Both of the following conditions represent statistically significant results:

- The P-value in a hypothesis test is smaller than the significance level.
- The confidence interval excludes the null hypothesis value.

Further, it is always true that when the P-value is less than your significance level, the interval excludes the value of the null hypothesis.

In the fuel cost example, our hypothesis test results are statistically significant because the P-value (0.03112) is less than the significance level (0.05). Likewise, the 95% confidence interval [267 394] excludes the null hypotheses value (260). Using either method, we draw the same conclusion.

**Hypothesis Testing and Confidence Intervals Always Agree**

The hypothesis testing and confidence interval results always agree. To understand the basis of this agreement, remember how confidence levels and significance levels function:

- A confidence level determines the distance between the sample mean and the confidence limits.
- A significance level determines the distance between the null hypothesis value and the critical regions.

Both of these concepts specify a distance from the mean to a limit. Surprise! These distances are precisely the same length.

A 1-sample t-test calculates this distance as follows:

The critical t-value * standard error of the mean

Interpreting these statistics goes beyond the scope of this article. But, using this equation, the distance for our fuel cost example is $63.57.

**P-value and significance level approach**: If the sample mean is more than $63.57 from the null hypothesis mean, the sample mean falls within the critical region, and the difference is statistically significant.

**Confidence interval approach**: If the null hypothesis mean is more than $63.57 from the sample mean, the interval does not contain this value, and the difference is statistically significant.

Of course, they always agree!

The two approaches always agree as long as the same hypothesis test generates the P-values and confidence intervals and uses equivalent confidence levels and significance levels.

**Related posts**: Standard Error of the Mean and Critical Values

## I Really Like Confidence Intervals!

In statistics, analysts often emphasize using hypothesis tests to determine statistical significance. Unfortunately, a statistically significant effect might not always be practically meaningful. For example, a significant effect can be too small to be important in the real world. Confidence intervals help you navigate this issue!

Similarly, the margin of error in a survey tells you how near you can expect the survey results to be to the correct population value.

Learn more about this distinction in my post about Practical vs. Statistical Significance.

Learn how to use confidence intervals to compare group means!

Finally, learn about bootstrapping in statistics to see an alternative to traditional confidence intervals that do not use probability distributions and test statistics. In that post, I create bootstrapped confidence intervals.

## Reference

Neyman, J. (1937). Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability. *Philosophical Transactions of the Royal Society A*. **236** (767): 333–380.

## How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis

R-squared is a goodness-of-fit measure that tends to reward you for including too many independent variables in a regression model, and it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use different approaches to help you fight that impulse to add too many. The protection that adjusted R-squared and predicted R-squared provide is critical because too many terms in a model can produce results that you can’t trust. These statistics help you include the correct number of independent variables in your regression model. [Read more…] about How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis

## How to Interpret the Constant (Y Intercept) in Regression Analysis

The constant term in regression analysis is the value at which the regression line crosses the y-axis. The constant is also known as the y-intercept. That sounds simple enough, right? Mathematically, the regression constant really is that simple. However, the difficulties begin when you try to interpret the *meaning* of the y-intercept in your regression output. [Read more…] about How to Interpret the Constant (Y Intercept) in Regression Analysis

## How to Interpret the F-test of Overall Significance in Regression Analysis

The F-test of overall significance indicates whether your linear regression model provides a better fit to the data than a model that contains no independent variables. In this post, I look at how the F-test of overall significance fits in with other regression statistics, such as R-squared. R-squared tells you how well your model fits the data, and the F-test is related to it. [Read more…] about How to Interpret the F-test of Overall Significance in Regression Analysis

## Multicollinearity in Regression Analysis: Problems, Detection, and Solutions

Multicollinearity occurs when independent variables in a regression model are correlated. This correlation is a problem because independent variables should be *independent*. If the degree of correlation between variables is high enough, it can cause problems when you fit the model and interpret the results. [Read more…] about Multicollinearity in Regression Analysis: Problems, Detection, and Solutions

## Benefits of Welchâ€™s ANOVA Compared to the Classic One-Way ANOVA

Welch’s ANOVA is an alternative to the traditional analysis of variance (ANOVA) and it offers some serious benefits. One-way analysis of variance determines whether differences between the means of at least three groups are statistically significant. For decades, introductory statistics classes have taught the classic Fishers one-way ANOVA that uses the F-test. It’s a standard statistical analysis, and you might think it’s pretty much set in stone by now. Surprise, there’s a significant change occurring in the world of one-way analysis of variance! [Read more…] about Benefits of Welch’s ANOVA Compared to the Classic One-Way ANOVA

## Standard Error of the Regression vs. R-squared

The standard error of the regression (S) and R-squared are two key goodness-of-fit measures for regression analysis. While R-squared is the most well-known amongst the goodness-of-fit statistics, I think it is a bit over-hyped. The standard error of the regression is also known as residual standard error.

[Read more…] about Standard Error of the Regression vs. R-squared

## Chi-Square Test of Independence and an Example

The Chi-square test of independence determines whether there is a statistically significant relationship between categorical variables. It is a hypothesis test that answers the question—do the values of one categorical variable depend on the value of other categorical variables? This test is also known as the chi-square test of association.

[Read more…] about Chi-Square Test of Independence and an Example

## Multivariate ANOVA (MANOVA) Benefits and When to Use It

Multivariate ANOVA (MANOVA) extends the capabilities of analysis of variance (ANOVA) by assessing multiple dependent variables simultaneously. ANOVA statistically tests the differences between three or more group means. For example, if you have three different teaching methods and you want to evaluate the average scores for these groups, you can use ANOVA. However, ANOVA does have a drawback. It can assess only one dependent variable at a time. This limitation can be an enormous problem in certain circumstances because it can prevent you from detecting effects that actually exist. [Read more…] about Multivariate ANOVA (MANOVA) Benefits and When to Use It

## Repeated Measures Designs: Benefits and an ANOVA Example

Repeated measures designs, also known as a within-subjects designs, can seem like oddball experiments. When you think of a typical experiment, you probably picture an experimental design that uses mutually exclusive, independent groups. These experiments have a control group and treatment groups that have clear divisions between them. Each subject is in only one of these groups. [Read more…] about Repeated Measures Designs: Benefits and an ANOVA Example