Degrees of Freedom simply explained
Degrees of freedom, v, is a very important subject in statistics. As we know, estimates of statistical parameters such as mean, standard deviation, measurement uncertainty, F-test, Student’s t-test, etc., are based upon different amounts of information or data available. So, the number of independent pieces of information that go into the estimation of a parameter are called the degrees of freedom.
This explanation may sound rather abstract. We can explain this concept easily by the following illustrations.
Suppose we had a sample of six numbers and their average (mean) value was 4. The sum of these six numbers must have been 24, otherwise the mean would not have been 4.
So, now let us think about each of the six numbers in turn and put them in each of the six boxes as shown below.
If we allowed that the numbers could be positive or negative real numbers, how many values could the first number take? Of course, any value for the first number that we could think of would do the job. Suppose it was a 4.
How many values could the next number take? It could be again anything. Say, it was a 5.
And the third number? Anything too. Suppose it was a 3.
The fourth and fifth numbers could also by anything. Say they were 6 and 4:
Now, we see that the very last number had to be just 2 and nothing else because the numbers had to add up to 24 to have the mean of the six numbers as 4.
So, we had total freedom in selecting the first number. It is true also for the second, third, fourth and fifth numbers. But we would have no choice at all in selecting the sixth number. That means we had 5 degrees of freedom when we had to consider six numbers for their mean value (being a statistical parameter).
Generally speaking, we work on n-1 degrees of freedom if we estimate the sample mean from a sample of size n. We use it in our estimation of sample standard deviation and other statistics.
So, we define it as: degrees of freedom, v, is the sample size, n, minus the number of parameter(s), p, estimated from the data.
In the case of linear regression where we consider the linear equation y=a + bx, we have two statistical parameters to take care of, i.e., the y-intercept, a and the slope or gradient, b. Hence, if we have 7 data points on a linear calibration curve, we have to put the degrees of freedom as 7 – 2 or 5. In general, it is n-2 degrees of freedom for a linear regression study.