### Must linear regression always pass through its origin?

A linear regression line showing linear relationship between independent variables (*x’*s*)* such as concentrations of working standards and dependable variables (*y’*s) such as instrumental signals, is represented by equation *y* =* a *+* bx* where *a* is the *y*-intercept when *x* = 0, and *b*, the slope or gradient of the line. The slope of the line becomes *y*/*x* when the straight line does pass through the origin (0,0) of the graph where the intercept is zero. The questions are: when do you allow the linear regression line to pass through the origin? Why don’t you allow the intercept float naturally based on the best fit data? How can you justify this decision?

In theory, you would use a zero-intercept model if you knew that the model line had to go through zero. Most calculation software of spectrophotometers produces an equation of *y* = *bx*, assuming the line passes through the origin. In my opinion, this might be true only when the reference cell is housed with reagent blank instead of a pure solvent or distilled water blank for background correction in a calibration process. However, we must also bear in mind that all instrument measurements have inherited analytical errors as well.

One of the approaches to evaluate if the *y*-intercept, *a,* is statistically significant is to conduct a hypothesis testing involving a Student’s t-test. This is illustrated in an example below.

Another approach is to evaluate any significant difference between the standard deviation of the slope for *y* = *a* + *bx* and that of the slope for *y* = *bx* when *a* = 0 by a *F*-test.

**Example:**

In a study on the determination of calcium oxide in a magnesite material, Hazel and Eglog in an *Analytical Chemistry* article reported the following results with their alcohol method developed:

The graph below shows the linear relationship between the Mg.CaO taken and found experimentally with equation*y* = -0.2281 + 0.99476*x* for 10 sets of data points.

The following equations were applied to calculate the various statistical parameters:

Thus, by calculations, we have *a* = -0.2281; *b* = 0.9948; the standard error of *y* on *x*, *s _{y/x}*= 0.2067, and the standard deviation of

*y*-intercept,

*s*= 0.1378.

_{a}Let’s conduct a hypothesis testing with null hypothesis H_{o} and alternate hypothesis, H_{1}:

H_{o} : Intercept *a *equals to zero

H_{1} : Intercept *a* does not equal to zero

The Student’s *t*– test of *y*-intercept,

The critical *t*-value for 10 minus 2 or 8 degrees of freedom with alpha error of 0.05 (two-tailed) = 2.306

**Conclusion**: As 1.655 < 2.306, H_{o} is not rejected with 95% confidence, indicating that the calculated *a*-value was not significantly different from zero. In other words, there is insufficient evidence to claim that the intercept differs from zero more than can be accounted for by the analytical errors. Hence, this linear regression can be allowed to pass through the origin.

## Recent Comments