Must linear regression always pass through its origin?

Training and consultancy for testing laboratories.

Must linear regression always pass through its origin?

May 25, 2020

A linear regression line showing linear relationship between independent variables (x’s) such as concentrations of working standards and dependable variables (y’s) such as instrumental signals, is represented by equation y = a + bx where a is the y-intercept when x = 0, and b, the slope or gradient of the line. The slope of the line becomes y/x when the straight line does pass through the origin (0,0) of the graph where the intercept is zero. The questions are: when do you allow the linear regression line to pass through the origin? Why don’t you allow the intercept float naturally based on the best fit data? How can you justify this decision?

In theory, you would use a zero-intercept model if you knew that the model line had to go through zero. Most calculation software of spectrophotometers produces an equation of y = bx, assuming the line passes through the origin. In my opinion, this might be true only when the reference cell is housed with reagent blank instead of a pure solvent or distilled water blank for background correction in a calibration process. However, we must also bear in mind that all instrument measurements have inherited analytical errors as well.

One of the approaches to evaluate if the y-intercept, a, is statistically significant is to conduct a hypothesis testing involving a Student’s t-test. This is illustrated in an example below.

Another approach is to evaluate any significant difference between the standard deviation of the slope for y = a + bx and that of the slope for y = bx when a = 0 by a F-test.

Example:

In a study on the determination of calcium oxide in a magnesite material, Hazel and Eglog in an Analytical Chemistry article reported the following results with their alcohol method developed:

The graph below shows the linear relationship between the Mg.CaO taken and found experimentally with equationy = -0.2281 + 0.99476x for 10 sets of data points.

The following equations were applied to calculate the various statistical parameters:

Thus, by calculations, we have a = -0.2281; b = 0.9948; the standard error of y on x, s_y/x= 0.2067, and the standard deviation of y-intercept, s_a = 0.1378.

Let’s conduct a hypothesis testing with null hypothesis H_o and alternate hypothesis, H₁:

H_o : Intercept a equals to zero

H₁ : Intercept a does not equal to zero

The Student’s t– test of y-intercept,

The critical t-value for 10 minus 2 or 8 degrees of freedom with alpha error of 0.05 (two-tailed) = 2.306

Conclusion: As 1.655 < 2.306, H_o is not rejected with 95% confidence, indicating that the calculated a-value was not significantly different from zero. In other words, there is insufficient evidence to claim that the intercept differs from zero more than can be accounted for by the analytical errors. Hence, this linear regression can be allowed to pass through the origin.

Category:

Basic statistics

11 Comments

Tagged with:

Linear regression

Comments on: "Must linear regression always pass through its origin?" (11)

Everic Lee said:

August 22, 2020 at 7:09 pm

Thanks for your introduction. Could you please tell if there’s any difference in uncertainty evaluation in the situations below:
(1) Single-point calibration(forcing through zero, just get the linear equation without regression) ;
(2) Multi-point calibration(forcing through zero, with linear least squares fit);
(3) Multi-point calibration(no forcing through zero, with linear least squares fit).
Looking foward to your reply!
- Everic Lee said:
  
  August 22, 2020 at 9:49 pm
  
  In addition, interpolation is another similar case, which might be discussed together.
- GLP Consulting Singapore said:
  
  August 24, 2020 at 9:05 am
  
  The situations mentioned bound to have differences in the uncertainty estimation because of differences in their respective gradient (or slope).
  - Everic Lee said:
    
    August 24, 2020 at 11:20 am
    
    Sorry, maybe I did not express very clear about my concern.
    
    In the situation(3) of multi-point calibration(ordinary linear regressoin), we have a equation to calculate the uncertainty, as in your blog(Linear regression for calibration – Part 1). Slope, intercept and variation of Y have contibution to uncertainty.
    For situation(2), intercept will be set to zero, how to consider about the intercept uncertainty?
    For situation(1), only one point with multiple measurement, without regression, that equation will be inapplicable, only the contribution of variation of Y should be considered? But I think the assumption of zero intercept may introduce uncertainty, how to consider it ?
    For situation(4) of interpolation, also without regression, that equation will also be inapplicable, how to consider the uncertainty?
    I really apreciate your help!
GLP Consulting Singapore said:

August 24, 2020 at 9:21 pm

You are right. The situation (2) where the linear curve is forced through zero, there is no uncertainty for the y-intercept. To make a correct assumption for choosing to have zero y-intercept, one must ensure that the reagent blank is used as the reference against the calibration standard solutions. If say a plain solvent or water is used in the reference cell of a UV-Visible spectrometer, then there might be some absorbance in the reagent blank as another point of calibration. I notice some brands of spectrometer produce a calibration curve as y = bx without y-intercept. This is because the reagent blank is supposed to be used in its reference cell, instead.

One-point calibration in a routine work is to check if the variation of the calibration curve prepared earlier is still reliable or not. So one has to ensure that the y-value of the one-point calibration falls within the +/- variation range of the curve as determined. In my opinion, we do not need to talk about uncertainty of this one-point calibration.
- Everic Lee said:
  
  August 24, 2020 at 9:58 pm
  
  Thanks for your reply.
  
  For one-point calibration, it is indeed used for concentration determination in Chinese Pharmacopoeia.
  
  In my opinion, a equation like “y=ax+b” is more reliable than “y=ax”, because the assumption for zero intercept should contain some uncertainty, but I don’t know how to quantify it.
  
  Maybe one-point calibration is not an usual case in your experience, but I think you went deep in the uncertainty field, so would you please give me a direction to deal with such case?
  
  Thanks again.
GLP Consulting Singapore said:

August 25, 2020 at 9:06 am

One-point calibration is used when the concentration of the analyte in the sample is about the same as that of the calibration standard. In this case, the analyte concentration in the sample is calculated directly from the relative instrument responses. You may consider the following way to estimate the standard uncertainty of the analyte concentration without looking at the linear calibration regression:

Say, standard calibration concentration used for one-point calibration = c with standard uncertainty = u(c). After going through sample preparation procedure and instrumental analysis, the instrument response of this standard solution = R1 and the instrument repeatability standard uncertainty expressed as standard deviation = u1

Let the instrument response for the analyzed sample = R2 and the repeatability standard uncertainty = u2. The calculated analyte concentration therefore is Cs = (c/R1)xR2

Then, if the standard uncertainty of Cs is u(s), then u(s) can be calculated from the following equation:

SQ[(u(s)/Cs] = SQ[u(c)/c] + SQ[u1/R1] + SQ[u2/R2]
- Everic Lee said:
  
  August 25, 2020 at 12:39 pm
  
  Thanks for your detailed explanation.
  
  What if I want to compare the uncertainties came from one-point calibration and linear regression? As I mentioned before, I think one-point calibration may have larger uncertainty than linear regression, but some paper gave the opposite conclusion, the same method was used as you told me above, to evaluate the one-point calibration uncertainty.
  
  In one-point calibration, the uncertaity of the assumption of zero intercept was not considered, but uncertainty of standard calibration concentration was considered.
  In linear regression, uncertainty of standard calibration concentration was omitted, but the uncertaity of intercept was considered.
  So it’s hard for me to tell whose real uncertainty was larger.
  
  Two more questions:
  1. For the case of one-point calibration, is there any way to consider the uncertaity of the assumption of zero intercept?
  2. For the case of linear regression, can I just combine the uncertainty of standard calibration concentration with uncertainty of regression, as EURACHEM QUAM said? Because this is the basic assumption for linear least squares regression, if the uncertainty of standard calibration concentration was not negligible, I will doubt if linear least squares regression is still applicable. I don’t have a knowledge in such deep, maybe you could help me to make it clear.
  
  Sorry to bother you so many times. Thanks!
  - GLP Consulting Singapore said:
    
    August 26, 2020 at 9:14 am
    
    Reply to your Paragraphs 2 and 3
    I think you may want to conduct a study on the average of standard uncertainties of results obtained by one-point calibration against the average of those from the linear regression on the same sample of course. A F-test for the ratio of their variances will show if these two variances are significantly different or not.
    
    Reply to your Paragraph 4
    1. For one-point calibration, one cannot be sure that if it has a zero intercept.
    2. The least squares regression has made an important assumption that the uncertainties of standard concentrations to plot the graph are negligible as compared with the variations of the instrument responses (i.e. y-values).
- Everic Lee said:
  
  August 25, 2020 at 1:10 pm
  
  Another question not related to this topic:
  
  Is there any relationship between factor d2(typically 1.128 for n=2) in control chart for ranges used with moving range to estimate the standard deviation(σ=R/d2) and critical range factor f(n) in ISO 5725-6 used to calculate the critical range(CR=f(n)*σ)?
  I found they are linear correlated, but I want to know why.
  
  Thanks again and again!
  - GLP Consulting Singapore said:
    
    August 26, 2020 at 10:10 am
    
    Both control chart estimation of standard deviation based on moving range and the critical range factor f in ISO 5725-6 are assuming the same underlying normal distribution.
    
    The critical range is usually fixed at 95% confidence where the f critical range factor value is 1.96. For differences between two test results, the combined standard deviation is sigma x SQRT(2). Therefore the critical range R = 1.96 x SQRT(2) x sigma or 2.77 x sgima which is the maximum bound of variation with 95% confidence.
    
    In a control chart when we have a series of data, the first range is taken to be the second data minus the first data, and the second range is the third data minus the second data, and so on. We can then calculate the mean of such moving ranges, say MR(Bar). The standard deviation of these set of data = MR(Bar)/1.128 as d2 stated in ISO 8258.
    
    If the sigma is derived from this whole set of data, we have then R/2.77 = MR(Bar)/1.128. Therefore R = 2.46 x MR(bar). It is obvious that the critical range and the moving range have a relationship.

	GLP Consulting Singa… on Expressing MU for qualitative…
	Nay linn htike on Expressing MU for qualitative…
	Debunking the 10 Big… on The Law of Averages and slot m…
	Tangram on How to apply the LSD method?
	migs on The Law of Averages and slot m…

Enter your email Address

ConsultGLP

Training and consultancy for testing laboratories.