## Training and consultancy for testing laboratories. ### Linear regression for calibration – Part 1

Most analytical methods require to determine the content of a constituent (analyte) in a given sample in terms of concentration as expressed as percentage by calculating ratio of the amount of the analyte and the weight of sample taken for analysis, multiplying by 100.  For trace levels, we may calculate and expressed in terms of mg/kg, mg/L or even ug/L and pg/L, depending on how low the level is.

If w1, w2, …, wn are the weights of a series of samples and y1,y2, …, yn the corresponding measurements made on these samples, we usually calculate the ratios:

The average of these ratios is taken to represent the data collated. But we must remember that all measurements of y have a constant random error or bias, say, a.  Hence, the ratios then become

We would expect the effect of a constant error in the measurement of y on the value of the ratio depends upon the magnitudes of w and y.  We can avoid or conceal this disturbance in the ratio value, r, by making all the samples approximately the same weight taken for analysis.  If the samples do cover an appreciable range of weights, the variation among the several values of the ratio not only reflects the inevitable random errors in y and w but is also dependent upon the weight taken from the sample.

Indeed, this is one of the important criteria in a method development process.

Let’s assume we take several weights (0.5g, 1.0g, 2.5g, 5.0g, 7.5g and 10.0g) of a certified reference fertilizer material containing 15.0% (m/m) K2O content for analysis. The absolute amounts of K2O present in these test samples are thus 0.075g, 0.15g, 0.375g, 0.75g, 1.125g and 1.50g, respectively. Upon analysis, the following results were obtained:

Figure 1 below shows the relationship between several paired values of the certified amount (x) and the analyzed value (y) graphically:

It is obvious that if there is no constant error in the analytical process, the various points should tend to lie closely along a line passing through the origin.  But if there is a constant error or bias which is unavoidable, all the points are displaced either upwards or downwards the same amount.  When the line intercepts the y-axis, this is the point corresponding to the ‘blank’ (i.e. zero analyte).  Hence, the slope (or gradient) of the line, which shows the change in y for a unit change in x, is independent of the presence of a bias and has an advantage over the ratio, r.

So, to take this advantage, we should consider using the slope of line as an alternative means of combining the results of several determinations, provided the line remains linear over a reasonable range. This is because if all samples have the same weight as in this discussion, no information is forthcoming about the slope and we would not be able to detect the presence of a constant error.

Hence, we also see the advantage of linear regression in instrument calibration process where we plot the instrument signals against various concentrations of working standards.  By so doing, we are able to estimate the errors of the slope & the y-intercept of the calibration curve, as well as the concentrations of sample solutions obtained from the calibration.

To start with, we apply the Ordinary Least Squares (OLS) formulae for fitting a straight line to a series of points on the graph. The line to be plotted must be at ‘equal distance’ among all the points where the sum of their ‘distances’, with positive and negative signs is zero.  In here, the distance means the difference between the experimental y-value and the calculated y-value calculated from the linear equation.  So, when we square these differences (i.e. deviations), the sum is a minimum or least positive figure.

It must be emphasized that an important but valid assumption on such regression line has been made, that is the recorded values on x-axis, sample weights in this case, are known exactly without error. For instrument calibration situation, it is assumed that the various working standard concentrations on the x-axis are of negligible error. It is indeed sufficient that the errors in x are small compared with the experimental errors in y (which have all the procedure steps as a source of random variation), and that the x’s cover an adequate range.

A simple computer program such as Microsoft Excel and R language will perform all these calculations, but most scientific calculators will not be adequate.

(to be continued in Part 2)

This site uses Akismet to reduce spam. Learn how your comment data is processed.