Training and consultancy for testing laboratories.

Archive for November, 2018

R in testing sample variances

R

Before carrying out a statistic test to compare two sample means in a comparative study, we have to first test whether the sample variances are significantly different or not. The inferential statistic test used is called Fisher’s F ratio-test devised by Sir Ronald Fisher, a famous statistician.  It is widely used as a test of statistical significance.

R and F-test

 

R and Student’s t-distribution

1526367075598

R and Student t distribution

 

Application of R in standardizing normal distribution

R

In the last blog, we discussed how to use R to plot a normal distribution with actual data in hand.  Surely there are plenty of different possible normal distribution since the mean value can be anything at all, and so can the standard deviation. Therefore, it will be useful if we can find a way to standardize the normal distribution for our convenience when several normal distributions can be compared on the same basis…..

Using R to standardize normal distribution.docx

 

Using R in Normal Distribution study

R

Using R to study the Normal Probability Distribution

 

Degrees of freedom

Degrees of freedom.png

Degrees of Freedom simply explained

Degrees of freedom, v, is a very important subject in statistics. As we know, estimates of statistical parameters such as mean, standard deviation, measurement uncertainty, F-test, Student’s t-test, etc., are based upon different amounts of information or data available.  So, the number of independent pieces of information that go into the estimation of a parameter are called the degrees of freedom.

This explanation may sound rather abstract. We can explain this concept easily by the following illustrations.

Suppose we had a sample of six numbers and their average (mean) value was 4.  The sum of these six numbers must have been 24, otherwise the mean would not have been 4.

So, now let us think about each of the six numbers in turn and put them in each of the six boxes as shown below.

If we allowed that the numbers could be positive or negative real numbers, how many values could the first number take? Of course, any value for the first number that we could think of would do the job.  Suppose it was a 4.

4

How many values could the next number take?  It could be again anything. Say, it was a 5.

4

5

And the third number?  Anything too. Suppose it was a 3.

4

5

3

The fourth and fifth numbers could also by anything. Say they were 6 and 4:

4

5 3 6

4

Now, we see that the very last number had to be just 2 and nothing else because the numbers had to add up to 24 to have the mean of the six numbers as 4.

4 5 3 6 4

2

So, we had total freedom in selecting the first number. It is true also for the second, third, fourth and fifth numbers. But we would have no choice at all in selecting the sixth number.  That means we had 5 degrees of freedom when we had to consider six numbers for their mean value (being a statistical parameter).

Generally speaking, we work on n-1 degrees of freedom if we estimate the sample mean from a sample of size n.  We use it in our estimation of sample standard deviation and other statistics.

So, we define it as:  degrees of freedom, v, is the sample size, n, minus the number of parameter(s), p, estimated from the data.

In the case of linear regression where we consider the linear equation y=a + bx, we have two statistical parameters to take care of, i.e., the y-intercept, a and the slope or gradient, b.  Hence, if we have 7 data points on a linear calibration curve, we have to put the degrees of freedom as 7 – 2 or 5.  In general, it is n-2 degrees of freedom for a linear regression study.