In statistics, the degrees of freedom (DF) indicate the number of independent values that can vary in an analysis without breaking any constraints. It is an important idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and regression analysis. Learn how this fundamental concept affects the power and precision of your statistical analysis!

In this blog post, I bring this concept to life in an intuitive manner. I’ll start by defining degrees of freedom. However, I’ll quickly move on to practical examples in a variety of contexts because they make this concept easier to understand.

## Definition of Degrees of Freedom

Degrees of freedom are the number of independent values that a statistical analysis can estimate. You can also think of it as the number of values that are free to vary as you estimate parameters. I know, it’s starting to sound a bit murky!

Degrees of freedom encompasses the notion that the amount of independent information you have limits the number of parameters that you can estimate. Typically, the degrees of freedom equal your sample size minus the number of parameters you need to calculate during an analysis. It is usually a positive whole number.

Degrees of freedom is a combination of how much data you have and how many parameters you need to estimate. It indicates how much independent information goes into a parameter estimate. In this vein, it’s easy to see that you want a lot of information to go into parameter estimates to obtain more precise estimates and more powerful hypothesis tests. So, you want many degrees of freedom!

## Independent Information and Constraints on Values

The definitions talk about independent information. You might think this refers to the sample size, but it’s a little more complicated than that. To understand why, we need talk about the freedom to vary. The best way to illustrate this concept is with an example.

Suppose we collect the random sample of observations shown below. Now, imagine that we know the mean, but we don’t know the value of an observation—the X in the table below.

The mean is 6.9, and it is based on 10 values. So, we know that the values must sum to 69 based on the equation for the mean.

Using simple algebra (64 + X = 69), we know that X must equal 5.

## Estimating Parameters Imposes Constraints on the Data

As you can see, that last number has no freedom to vary. It is not an independent piece of information because it cannot be any other value. Estimating the parameter, the mean in this case, imposes a constraint on the freedom to vary. The last value and the mean are entirely dependent on each other. Consequently, after estimating the mean, we have only 9 independent pieces of information even though our sample size is 10.

That’s the basic idea for degrees of freedom in statistics. In a general sense, DF are the number of observations in a sample that are free to vary while estimating statistical parameters. You can also think of it as the amount of independent data that you can use to estimate a parameter.

## Degrees of Freedom and Probability Distributions

Degrees of freedom also define the probability distributions for the test statistics of various hypothesis tests. For example, hypothesis tests use the t-distribution, F-distribution, and the chi-square distribution to determine statistical significance. Each of these probability distributions is a family of distributions where the degrees of freedom define the shape. Hypothesis tests use these distributions to calculate p-values. So, the DF are directly linked to p-values through these distributions!

Next, let’s look at how these distributions work for several hypothesis tests.

**Related posts**: Understanding Probability Distributions and A Graphical Look at Significance Levels (Alpha) and P values

## Degrees of Freedom for t-Tests and the t-Distribution

T-tests are hypothesis tests for the mean and use the t-distribution to determine statistical significance.

A 1-sample t-test determines whether the difference between the sample mean and the null hypothesis value is statistically significant. Let’s go back to our example of the mean above. We know that when you have a sample and estimate the mean, you have n – 1 degrees of freedom, where n is the sample size. Consequently, for a 1-sample t-test, the degrees of freedom is n – 1.

The DF define the shape of the t-distribution that your t-test uses to calculate the p-value. The graph below shows the t-distribution for several different degrees of freedom. Because the degrees of freedom are so closely related to sample size, you can see the effect of sample size. As the degrees of freedom decreases, the t-distribution has thicker tails. This property allows for the greater uncertainty associated with small sample sizes.

To dig into t-tests, read my post about How t-Tests Work. I show how the different t-tests calculate t-values and use t-distributions to calculate p-values.

The F-test in ANOVA also tests group means. It uses the F-distribution, which is defined by the degrees of freedom. However, you calculate the DF for an F-distribution differently. For more information, read my post about How F-tests Work in ANOVA.

**Related post**: How to Interpret P-values Correctly

## Degrees of Freedom for the Chi-Square Test of Independence

The chi-square test of independence determines whether there is a statistically significant relationship between categorical variables. Just like other hypothesis tests, this test incorporates degrees of freedom. For a table with r rows and c columns, the general rule for calculating degrees of freedom for a chi-square test is (r-1) (c-1).

However, we can create tables to understand it more intuitively. The degrees of freedom for a chi-square test of independence is the number of cells in the table that can vary before you can calculate all the other cells. In a chi-square table, the cells represent the observed frequency for each combination of categorical variables. The constraints are the totals in the margins.

### Chi-Square 2 X 2 Table

For example, in a 2 X 2 table, after you enter one value in the table, you can calculate the remaining cells.

In the table above, I entered the bold 15, and then I can calculate the remaining three values in parentheses. Therefore, this table has 1 DF.

### Chi-Square 3 X 2 Table

Now, let’s try a 3 X 2 table. The table below illustrates the example that I use in my post about the chi-square test of independence. In that post, I determine whether there is a statistically significant relationship between uniform color and deaths on the original *Star Trek* TV series.

In the table, one categorical variable is shirt color, which can be blue, gold, or red. The other categorical variable is status, which can be dead or alive. After I entered the two bolded values, I can calculate all the remaining cells. Consequently, this table has 2 DF.

Read my post, Chi-Square Test of Independence and an Example, to see how this test works and how to interpret the results using the *Star Trek* example.

Like the t-distribution, the chi-square distribution is a family of distributions where the degrees of freedom define the shape. Chi-square tests use this distribution to calculate p-values. The graph below displays several chi-square distributions.

## Degrees of Freedom in Regression Analysis

Degrees of freedom in regression is a bit more complicated, and I’ll keep it on the simple side. In a regression model, each term is an estimated parameter that uses one degree of freedom. In the regression output below, you can see how each term requires a DF. There are 28 observations and the two independent variables use a total of two degrees of freedom. The remaining 26 degrees of freedom are displayed in Error.

The error degrees of freedom are the independent pieces of information that are available for estimating your coefficients. For precise coefficient estimates and powerful hypothesis tests in regression, you must have many error degrees of freedom. This equates to having many observations for each model term.

As you add terms to the model, the error degrees of freedom decreases. You have fewer pieces of information available to estimate the coefficients. This situation reduces the precision of the estimates and the power of the tests. When you have too few remaining degrees of freedom, you can’t trust the regression results. If you use all your degrees of freedom, the p-values can’t be calculated.

For more information about the problems that occur when you use too many degrees of freedom and how many observations you need, read my blog post about overfitting your model.

Even though they might seem murky, degrees of freedom are essential to any statistical analysis! In a nutshell, DF define the amount of information you have relative to the number of properties that you want to estimate. If you don’t have enough information for what you want to do, you’ll have imprecise estimates and low statistical power.

### References

Walker, H. W. Degrees of Freedom. Journal of Educational Psychology. 31(4) (1940) 253-269.

Pandy, S., and Bright, C. L., Social Work Research Vol 32, number 2, June 2008.

Teng Li says

thank you Jim, I always find your article is of great value to me.

Jim Frost says

You’re very welcome, Teng. I’m very happy to hear that you find them to be helpful!

Muhammad Arif says

in simple words we can say that, the total sample size minus the number of parameters to be estimated in a series is called D.F, am i right dear Jim? which software you have used for graphs?

Jim Frost says

Hi Muhammad! Yes, that’s a good general sense of the term. However, it’s not always exactly correct. For instance, take a look at the chi-square examples. I used Minitab software for the graphs.

Best wishes to you!

Jim

nadeem malik says

thanks Dr jim so nice concept

Jim Frost says

Thank you!

Tavsief says

Wow. Superb. Thank you so much. All I can do.

Dhruv govil says

The topic clearity is in very good format. But please explain this through R programming . Do that we can feel confidence while prediction.

Jim Frost says

Hi Dhruv, I’m glad you found the topic helpful! My blog is designed to teach statistical concepts, analyses, interpretation, etc rather than teaching a specific software package. You’ll find that degrees of freedom is inherent to statistics regardless of the software you use. The software package can supply the documentation that describes how to obtain the specific results that you need.

Akhilesh kumar Gupta says

The minitab software you are using is free or paid…if it is free please provide me its link… thank you

Jim Frost says

Hi Akhilesh, Minitab is not free. However, if you’re looking for free statistical software, I recommend PSPP, which is freeware (fully functional, no time limits) that is very similar to SPSS. Download PSPP here.

salman shah says

Dear sir,plz tell me that what is the diference between statistic and test statistic ?

Jim Frost says

Hi Salman,

A statistic is a piece of information based on data. For example, the crime rate, median income, mean height, etc.

A test statistic is a statistic that summarizes the sample data

andis used in hypothesis testing to determine whether the results are statistically significant. The hypothesis test takes all of the sample data, reduces it to a single value, and then calculates probabilities based on that value to determine significance. For more information about how test statistics work, read my posts about t-values and F-values. Both of those are test statistics.I hope this helps!

salman shah says

And also we are confused in the diference between sample size and degree of freedom……

Jim Frost says

Sample size is the number of data points in your study. Degrees of freedom are often closely related to sample size yet are never quite the same. The relationship between sample size and degrees of freedom depends on the specific test. Hypothesis tests actually use the degrees of freedom in the calculations for statistical significance. Typically, DF define the probability distribution of the test statistic.

Arindam says

Very Very informative. Thank you very much.

Jim Frost says

Hi Arindam, you’re very welcome! I’m glad it was helpful!

Ali Munsir says

Many thanks for such valuable knowledge sharing

Jim Frost says

My pleasure, Ali!

Indranil says

Jim thanks for the core area in stat that you always state. I dwnld the hurd-0.9.tar.gz ..is it the right file? if not, could you please suggest which one is right and which app file has to run? thanks.

Jim Frost says

Hi Indranil, I’m not sure which file you’re referring to? Are you referring to the PSPP software? If you, I believe the correct file for Windows is pspp-20170909-daily-64bits-setup.exe. That is a file you can run to install the program.

Eajaz Ahmad Dar says

Thanks Jim, I have probably found the first person with such clear basics. Hope to learn much more with you.

Jim Frost says

Hi Eajaz, thanks so much for the kind words! You made my day because I strive to find ways to teach statistics using easy to understand language!

Edson Chiwenga says

Thanks a lot . your explanation makes the job easier even for us who are not good in math and stats

Jim Frost says

Hi Edson, you’re very welcome. I’m glad it has helped!

payeng says

Hi Dr Jim,

How do I find values not given in t-distribution?

I am using Statistical Table “Statistical Tables – J.A. Barnes|J. Murdoch – Macmillan International”, let say i wanted to find the t-value for alpha=0.05 with the degree of freedom 39. The table just provided the degrees of freedom for 30 and 40. Which one shall I choose?

What is the general rule for this problem? Either round up the degree of freedom or round down?

Thanks

Jim Frost says

Hi,

In the t-distribution, after you get past about 30 df, the differences between the t-values for different probabilities become miniscule. You often have to go out to three decimal places before you’ll find a difference in the t-values.

Consequently, you won’t be too far off using the standard rounding rules: rounding up for >= 5 and rounding down for < 5. In your case, I'd use 40.You can also use a more precise table of t-values, such as this one that lists 39 df specifically.

I hope this helps!

payeng says

Thanks Dr Jim for the reply.

But I have another question. Is the standard rounding rules can be used for F-table as well?

I have read a statistics textbook about finding the F-value which is not given in the table.

The author wrote: “When the degree of freedom values cannot be found in the table, the closed value of the smaller side should be used. For example, if d.f.N =14, this value is between the given table values of 12 and 15, therefore 12 should be used, to be on the safe side.”

May I have your opinion and what does it means safe side?

Thanks again.

Jim Frost says

Hi, so that’s a good point to consider, although it’s not always crucial, but one that I ultimately agree with.

What the author means by “safe side” is to pick the DF that requires stronger evidence to be statistically significant. For any given test statistic distribution (t-values, F-values, etc), if you can’t pick the exact DF from a table that you require, you should pick the DF that requires stronger evidence. For a test statistic, this is equivalent to picking the DF that is associated with a larger absolute value of the statistic–and that means choosing a lower DF.

In other words, you are in a situation where you need to make a choice because you can’t use the exactly correct value. The choice you make should require stronger evidence rather than weaker evidence to be statistically significant. That is “being safe.” It’s equivalent to lowering alpha from 5% to, say, 4.95%. You wouldn’t want to go the other direction and raise it to 5.05%

To be honest, it has been decades since I’ve thought about the practical realities of using tables given the use of statistical software. But, you raise an excellent point. In some cases, such as how I described 39 DF for the t-distribution, the difference is minute. You have to go out three decimal places to see a difference.

Additionally, hypothesis test results that are borderline significant (right around p = 0.05 when alpha = 0.05) are not particularly strong results. To see why, read my post about correctly interpreting p-values. Near the end of that post, I discuss strength of evidence. In a nutshell, I would not consider results with a p-value of 0.049 to be any stronger than 0.051. In either case, both results are fairly weak evidence to build a case on. Changing the DF affects these borderline cases. So, this approach of choosing lower DF requires “stronger” evidence to be significant–but borderline cases still don’t constitute strong evidence when you use the typical significance level of 0.05.

However, I do agree with the approach of choosing the DF that requires stronger evidence to produce statistically significant results. If you have to make a choice, make a choice in the direction of requiring stronger evidence. That approach indicates choosing the lower DF. Thanks for raising this issue! It was good to think through this!