## Training and consultancy for testing laboratories.

### Are your linear regression data homoscedastic or heteroscedastic?

In instrumental analysis, there must be a measurement model, an equation that relates the amounts to be measured to the instrument response such as absorbance, transmittance, peak areas, peak heights, potential current, etc.  From this model, we can then derive the calibration equation.

It is our usual practice to perform the experiment in such as a way as to fix influence standard concentration of the measurand and the instrument response in a simple linear relationship, i.e.,

y = a + bx                                                                   ………. [1]

where

y is the indication of the instrument (i.e., the instrument response),

x is the independent variable (i.e., mostly for our purpose, the concentration of the measurand)

and,

a and b are the coefficients of the model, known as the intercept and slope (or gradient) of the curve, respectively.

Therefore, for a number of xi values, we will have the corresponding instrument responses, yi. We then fit the above model of equation to the data.

As usual, any particular instrumental measurement of yi will be subject to measurement error (ei), that is,

yi = a + bxi + ei                                                                            …….. [2]

To get this linear model, we have to find a line that is best fit for the data points that we have obtained experimentally. We use the ordinary least square (OLS) approach, which chooses the model parameters that minimize the residual sum of squares (RSS) of the predicted y values versus the actual or experimental y values.  The residual (or sometimes called error), in this case, means the difference between the predicted yi value derived from the above equation and the experimental yi value.

So, if the linear equation model is correct, the sum of all the differences from all the points (x, y) on the plot should be arithmetically equal to zero.

It must be stressed however, that for the sake of the above statement to be true, we make an important assumption, i.e., the uncertainty of the independent variable, xi, is very much less than in the instrument response, hence, only one error term ei in yi is considered due to this uncertainty which is sufficiently small to be neglected.  Such assumption is indeed valid for our laboratory analytical purposes and the estimation process of measurement error is then very much simplified.

What is another important assumption made in this OLS method?

It is that the data are known to be homoscedastic, which means that the errors in y are assumed to be independent of the concentration.  In other words, the variance of y remains constant and does not change for each xi value or for a range of x values.   This also means that all the points have equal weight when the slope and intercept of the line are calculated. The following plot illustrates this important point.

However, in many of our chemical analysis, this assumption is not likely to be valid.  In fact, many data are heteroscedastic, i.e. the standard deviation of the y-values increases with the concentration of the analyte, rather than having the constant value of variation at all concentrations. In other words, the errors that are approximately proportional to the analyte concentration. In fact, we find their relative standard deviations which are standard deviations divided by the mean values are roughly constant. The following plot illustrates this particular scenario.

In this case, the weighted regression method is to be applied. The regression line must be calculated to give additional weight to those points where the errors are smallest, i.e. it is important for the calculated line to pass close to such points than to pass close to the points representing higher concentrations with the largest errors.

This is achieved by giving each point a weighting inversely proportional to the corresponding y-direction variance, si2.  Without going through details of its calculations which can be quite tedious and complex as compared with those of the unweighted ones, , it is suffice to say that in our case of instrumental calibration which normally sees the experimental points fit a straight line very well, we would find the slope (b) and y-intercept (a) of the weighted line are remarkably similar to those of the unweighted line, and the results of the two approaches give very similar values for the concentrations of samples within the linearity of the calibration line.

So, does it mean that one the face of it, the weighted regression calculations have little value to us?

The answer is a No.

In addition to providing results very similar to those obtained from the simpler unweighted regression method, we find values in getting more realistic results on the estimation of the errors or confidence limits of those sample concentrations under study.  It can be shown by calculations that we will have narrower confidence limits at low concentrations in the weighted regression and its confidence limit increases with increasing instrumental signals, such as absorbance.  A general form of the confidence limits for a concentration determined using a weighted regression line is show in the sketch below:

These observations emphasize the particular importance of using weighted regression when the results of interest include those at low concentrations.  Similarly, detection limits may be more realistically assessed using the intercept and standard deviation obtained from a weighted regression graph.

This site uses Akismet to reduce spam. Learn how your comment data is processed.