What are Degrees of Freedom?
The degrees of freedom (DF) in statistics indicate the number of independent values that can vary in an analysis without breaking any constraints. It is an essential idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and linear regression. Learn how this fundamental concept affects the power and precision of your analysis!
In this post, I bring this concept to life in an intuitive manner. You’ll learn the degrees of freedom definition and know how to find degrees of freedom for various analyses, such as linear regression, t-tests, and chi-square. I’ll start by defining degrees of freedom and providing the formula. However, I’ll quickly move on to practical examples in the context of various statistical analyses because they make this concept easier to understand.
Degrees of Freedom Definition
What are degrees of freedom in statistics? Degrees of freedom are the number of independent values that a statistical analysis can estimate. You can also think of it as the number of values that are free to vary as you estimate parameters. I know, it’s starting to sound a bit murky!
DF encompasses the notion that the amount of independent information you have limits the number of parameters that you can estimate. Typically, the degrees of freedom equals your sample size minus the number of parameters you need to calculate during an analysis. It is usually a positive whole number.
Degrees of freedom is a combination of how much data you have and how many parameters you need to estimate. It indicates how much independent information goes into a parameter estimate. In this vein, it’s easy to see that you want a lot of information to go into parameter estimates to obtain more precise estimates and more powerful hypothesis tests. So, you want many DF!
Independent Information and Constraints on Values
The degrees of freedom definitions talk about independent information. You might think this refers to the sample size, but it’s a little more complicated than that. To understand why, we need to talk about the freedom to vary. The best way to illustrate this concept is with an example.
Suppose we collect the random sample of observations shown below. Now, imagine we know the mean, but we don’t know the value of an observation—the X in the table below.
The mean is 6.9, and it is based on 10 values. So, we know that the values must sum to 69 based on the equation for the mean.
Using simple algebra (64 + X = 69), we know that X must equal 5.
Related post: What is the Mean in Statistics?
How to Find the Degrees of Freedom in Statistics
As you can see, that last number has no freedom to vary. It is not an independent piece of information because it cannot be any other value. Estimating the parameter, the mean in this case, imposes a constraint on the freedom to vary. The last value and the mean are entirely dependent on each other. Consequently, after estimating the mean, we have only 9 independent pieces of information, even though our sample size is 10.
That’s the basic idea for DF in statistics. In a general sense, DF are the number of observations in a sample that are free to vary while estimating statistical parameters. You can also think of it as the amount of independent data that you can use to estimate a parameter.
Degrees of Freedom Formula
The degrees of freedom formula is straightforward. Calculating the degrees of freedom is often the sample size minus the number of parameters you’re estimating:
DF = N – P
- N = sample size
- P = the number of parameters or relationships
For example, the degrees of freedom formula for a 1-sample t test equals N – 1 because you’re estimating one parameter, the mean. To calculate degrees of freedom for a 2-sample t-test, use N – 2 because there are now two parameters to estimate.
The degrees of freedom formula for a table in a chi-square test is (r-1) (c-1), where r = the number of rows and c = the number of columns.
DF and Probability Distributions
Degrees of freedom also define the probability distributions for the test statistics of various hypothesis tests. For example, hypothesis tests use the t-distribution, F-distribution, and the chi-square distribution to determine statistical significance. Each of these probability distributions is a family of distributions where the DF define the shape. Hypothesis tests use these distributions to calculate p-values. So, the DF directly link to p-values through these distributions!
Next, let’s look at how these distributions work for several hypothesis tests.
Degrees of Freedom for t Tests
T tests are hypothesis tests for the mean and use the t-distribution to determine statistical significance.
A 1-sample t test determines whether the difference between the sample mean and the null hypothesis value is statistically significant. Let’s go back to our example of the mean above. We know that when you have a sample and estimate the mean, you have n – 1 degrees of freedom, where n is the sample size. Consequently, for a 1-sample t test, use n – 1 to calculate degrees of freedom.
The DF define the shape of the t-distribution that your t-test uses to calculate the p-value. The graph below shows the t-distribution for several different degrees of freedom. Because the degrees of freedom are so closely related to sample size, you can see the effect of sample size. As the DF decreases, the t-distribution has thicker tails. This property allows for the greater uncertainty associated with small sample sizes.
The degrees of freedom chart below displays t-distributions.
To dig into t-tests, read my post about How t-Tests Work. I show how the different t-tests calculate t-values and use t-distributions to calculate p-values.
The F-test in ANOVA also tests group means. It uses the F-distribution, which is defined by the DF. However, you calculate degrees of freedom in ANOVA differently because you need to find the numerator and denominator DF. For more information, read my post about How F-tests Work in ANOVA.
Degrees of Freedom Table
You’ll often find degrees of freedom in statistical tables along with their critical values. Statisticians use the DF in these tables to determine whether the test statistic for their hypothesis test falls in the critical region, indicating statistical significance.
For example, in a t-table, you’ll find the degrees of freedom in the first column of the table. You must know the degrees of freedom to find the corresponding critical values. In the example below, the t-table indicates that for a two-tailed t-test with 20 DF and an alpha of 0.05, the critical values are -2.086 and +2.086.
Other hypothesis tests, such as the chi-square, F-tests, and z-tests, have their own tables where you can find degrees of freedom and the corresponding critical values.
How to Find Degrees of Freedom for Tables in Chi-Square Tests
The chi-square test of independence determines whether there is a statistically significant relationship between categorical variables in a table. Just like other hypothesis tests, this test incorporates DF. To find the chi-square DF for a table with r rows and c columns, use this formula to calculate degrees of freedom: (r-1) (c-1).
However, we can create tables to understand how to find degrees of freedom more intuitively. The DF for a chi-square test of independence is the number of cells in the table that can vary before you can calculate all the other cells. In a chi-square table, the cells represent the observed frequency for each combination of categorical variables. The constraints are the totals in the margins.
Chi-Square 2 X 2 Table
For example, to find the degrees of freedom in a 2 X 2 table, after you enter one value in the table, you can calculate all the remaining cells.
In the table above, I entered the bold 15, and then I can calculate the remaining three values in parentheses. Therefore, this table has 1 DF.
Chi-Square 3 X 2 Table
Now, let’s try finding degrees of freedom for 3 X 2 table. The table below illustrates the example that I use in my post about the chi-square test of independence. In that post, I determine whether there is a statistically significant relationship between uniform color and deaths on the original Star Trek TV series.
In the table, one categorical variable is shirt color, which can be blue, gold, or red. The other categorical variable is status, which can be dead or alive. After I entered the two bolded values, I can calculate all the remaining cells. Consequently, this table has 2 DF.
Read my post, Chi-Square Test of Independence and an Example, to see how this test works and how to interpret the results using the Star Trek example.
Like the t-distribution, the chi-square distribution is a family of distributions where the DF define the shape. Chi-square tests use this distribution to calculate p-values. The degrees of freedom chart below displays several chi-square distributions.
Related post: Chi-Square Table
Linear Regression Degrees of Freedom
Calculating degrees of freedom in linear regression is a bit more complicated, and I’ll keep it on the simple side. In a linear regression model, each term is an estimated parameter that uses one degree of freedom. In the regression output below, you can see how each linear regression term requires a DF. There are n = 29 observations, and the two independent variables use a total of two DF.
The degrees of freedom formula for total DF = n – 1, which is 29 – 1 = 28 in our example. The degrees of freedom formula for Error DF is: n – P – 1. In our example that is 29 – 2 – 1 = 26. P is the number of coefficients not counting the constant. The output displays the remaining 26 degrees of freedom in Error.
In linear regression, the error DF are the independent pieces of information that are available for estimating your coefficients. For precise coefficient estimates and powerful hypothesis tests in regression, you must have many error degrees of freedom, which equates to having many observations for each model term.
As you add terms to the model, the error degrees of freedom decreases. You have fewer pieces of information available to estimate the coefficients. This situation reduces the precision of the estimates and the power of the tests. When you have too few remaining DF, you can’t trust the regression results. If you use all your linear regression degrees of freedom, the procedure can’t calculate the p-values.
For more information about the problems that occur when you use too many DF and how many observations you need, read my blog post about overfitting your model.
Even though the degrees of freedom definition might seem murky, they are essential to any statistical analysis! In a nutshell, DF define the amount of information you have relative to the number of properties that you want to estimate. If you don’t have enough data for what you want to do, you’ll have imprecise estimates and low statistical power.
If you’re learning about hypothesis testing and like the approach I use in my blog, check out my Hypothesis Testing book! You can find it at Amazon and other retailers.