May 25, 2020

Must linear regression always pass through its origin?

A linear regression line showing linear relationship between independent variables (x’s) such as concentrations of working standards and dependable variables (y’s) such as instrumental signals, is represented by equation y = a + bx where a is the y-intercept when x = 0, and b, the slope or gradient of the line. The slope of the line becomes y/x when the straight line does pass through the origin (0,0) of the graph where the intercept is zero. The questions are: when do you allow the linear regression line to pass through the origin? Why don’t you allow the intercept float naturally based on the best fit data? How can you justify this decision?

In theory, you would use a zero-intercept model if you knew that the model line had to go through zero. Most calculation software of spectrophotometers produces an equation of y = bx, assuming the line passes through the origin. In my opinion, this might be true only when the reference cell is housed with reagent blank instead of a pure solvent or distilled water blank for background correction in a calibration process. However, we must also bear in mind that all instrument measurements have inherited analytical errors as well.

One of the approaches to evaluate if the y-intercept, a, is statistically significant is to conduct a hypothesis testing involving a Student’s t-test. This is illustrated in an example below.

Another approach is to evaluate any significant difference between the standard deviation of the slope for y = a + bx and that of the slope for y = bx when a = 0 by a F-test.

Example:

In a study on the determination of calcium oxide in a magnesite material, Hazel and Eglog in an Analytical Chemistry article reported the following results with their alcohol method developed:

The graph below shows the linear relationship between the Mg.CaO taken and found experimentally with equationy = -0.2281 + 0.99476x for 10 sets of data points.

The following equations were applied to calculate the various statistical parameters:

Thus, by calculations, we have a = -0.2281; b = 0.9948; the standard error of y on x, sy/x= 0.2067, and the standard deviation of y-intercept, sa = 0.1378.

Let’s conduct a hypothesis testing with null hypothesis Ho and alternate hypothesis, H1:

Ho : Intercept a equals to zero
H1 : Intercept a does not equal to zero

The Student’s t– test of y-intercept,

The critical t-value for 10 minus 2 or 8 degrees of freedom with alpha error of 0.05 (two-tailed) = 2.306

Conclusion: As 1.655 < 2.306, Ho is not rejected with 95% confidence, indicating that the calculated a-value was not significantly different from zero. In other words, there is insufficient evidence to claim that the intercept differs from zero more than can be accounted for by the analytical errors. Hence, this linear regression can be allowed to pass through the origin.

April 24, 2020

Improving weighing precision by Hotelling’s method

I have recently come across an interesting article published in 1944 by Harold Hotelling, a renown American mathematical statistician and an influential economic theorist, well known for his Hotelling’s law, Hotelling’s lemma and Hotelling’s rule in economics, as well as Hotelling’s T-squared distribution in statistics. The article is titled: Some improvements in weighing and other experimental techniques.

In an effort to seek improvement of physical and chemical investigations in terms of accuracy and costs through better designed experiments, based on the theory of statistical inference, Hotelling was interested in a published work of F. Yates in Complex Experiments, Jour. Roy. Stat. Soc., Supp., vol 2 (1935) pp 181-247, and suggested various improvements for a weighing process.

When we are given several objects to weigh, our normal practice is to weigh each and everyone of them in turn on a balance which has an error in terms of standard deviation, say, s.

Let’s assume we had two objects (A and B) for weighing and obtained weights ma and mb. So, we have the reported weights Ma and Mb as:

Ma = ma +/- s

Mb = mb +/- s

If ma and mb had values of 20 and 10 grams respectively, and s was 0.1 gram, then:

Ma = 20 +/- 0.1g
Mb = 10 +/- 0.1g

Hotelling noticed however that greater precision could be obtained for the same effort if both objects were included in each weighing. He carried out a first weighing with objects A and B on the same weighing pan, followed by weighing B only to obtain the difference in weights. He then deduced the masses of ma and mb as follows:

ma + mb = p1
ma – mb = p2

Hence,ma = (p1 + p2)/2
and mb = (p1 – p2)/2

By law of propagation of error, the total variance of n independent measurements is the sum of their variances without covariance factor to be considered. The variance of n times a measurement is n2 times the variance of that measurement. In this case, n = 2. Therefore,

Var(ma) = [Var(p1) + Var (p2)]/4 = [s2 + s2]/4 = s2/2
Var(mb) = [s2 + s2]/4 = s2/2

Hence, by this method, the variances of ma and mb were both equal to s2/2, half the value when the two objects were weighed separately.

The error of the results of these measures was therefore s/SQRT(2) or 0.7s and no longer s, so that:

Ma = 20 +/- 0.07g
Mb = 10 +/- 0.07g

An important inference from this illustration is that in designing experiments, it is best to include all variables, or all factors, in each trial for the same number of repeats in order to achieve improved precision.