Training and consultancy for testing laboratories.

Sharing my PPT presented at one of the Webinars.

A linear regression line showing linear relationship between independent variables (x’s) such as concentrations of working standards and dependable variables (y’s) such as instrumental signals, is represented by equation y = a + bx where a is the y-intercept when x = 0, and b, the slope or gradient of the line. The slope of the line becomes y/x when the straight line does pass through the origin (0,0) of the graph where the intercept is zero.  The questions are: when do you allow the linear regression line to pass through the origin? Why don’t you allow the intercept float naturally based on the best fit data? How can you justify this decision?

In theory, you would use a zero-intercept model if you knew that the model line had to go through zero. Most calculation software of spectrophotometers produces an equation of y = bx, assuming the line passes through the origin. In my opinion, this might be true only when the reference cell is housed with reagent blank instead of a pure solvent or distilled water blank for background correction in a calibration process.  However, we must also bear in mind that all instrument measurements have inherited analytical errors as well.

One of the approaches to evaluate if the y-intercept, a, is statistically significant is to conduct a hypothesis testing involving a Student’s t-test.  This is illustrated in an example below.

Another approach is to evaluate any significant difference between the standard deviation of the slope for y = a + bx and that of the slope for y = bx when a = 0 by a F-test.

Example:

In a study on the determination of calcium oxide in a magnesite material, Hazel and Eglog in an Analytical Chemistry article reported the following results with their alcohol method developed:

The graph below shows the linear relationship between the Mg.CaO taken and found experimentally with equationy = -0.2281 + 0.99476x for 10 sets of data points.

The following equations were applied to calculate the various statistical parameters:

Thus, by calculations, we have a = -0.2281; b = 0.9948; the standard error of y on x, sy/x= 0.2067, and the standard deviation of y-intercept, sa = 0.1378.

Let’s conduct a hypothesis testing with null hypothesis Ho and alternate hypothesis, H1:

                Ho :  Intercept a equals to zero

                H1 :  Intercept a does not equal to zero

The Student’s t– test of y-intercept, 

The critical t-value for 10 minus 2 or 8 degrees of freedom with alpha error of 0.05 (two-tailed) = 2.306

Conclusion:  As 1.655 < 2.306, Ho is not rejected with 95% confidence, indicating that the calculated a-value was not significantly different from zero.  In other words, there is insufficient evidence to claim that the intercept differs from zero more than can be accounted for by the analytical errors.  Hence, this linear regression can be allowed to pass through the origin.

The equations (7) and (8) stated in the previous Part 1 article revealed an important message.  That is when the measured value of y is closer to the mean value of y, the confidence limit approaches to a minimum value. Hence, in practice, a calibration experiment of this type will give the most precise results when the measured instrument signal corresponds to a point close to the centroid of the regression line. 

This implies that we should always aim to prepare a calibration curve with a range that can effectively measure the analyte of sample solution nearer to its regression line central for better precision.   This can be illustrated in the following worked example.

A series of standard aqueous solutions of fluorescein was used to calibrate a fluorescence spectrometer. The following fluorescence intensities (in arbitrary units) were obtained:

The least-squares regression equation was found to be y = 1.518 + 1.930x and its calibration graph plotted in Figure 1 is shown below:

The important parameter for estimating the confidence intervals, i.e. standard error of y in x, sy/xwas found to be 0.433, with the number of points, n as 7.  The concentrations in terms of x-values are easily calculated using the regression equation by substituting various intensities y-values, and also their corresponding confidence intervals using equation (7) as follows:

The changes of magnitude in confidence intervals against the calculated concentrations can be visualized by the following Figures 2 and 3:

It is thus obvious from both graphs that the confidence intervals become narrowest (+/- 0.62 pg/ml) at the centroid of the calibration curve, indicating a better measurement precision at this point.

An approach to improve the confidence limits in this calibration experiment is by increasing n, the number of calibration points on the regression line. Another way is by making more than one measurement of y-values, as using equation (8) with R greater than 1. Of course, we need to weigh the underlying benefits if we are to make too many replicate measurements (assuming that sufficient sample is available) or increasing the number of calibration points as compared with the additional costs, and time involved.

Most analytical methods require to determine the content of a constituent (analyte) in a given sample in terms of concentration as expressed as percentage by calculating ratio of the amount of the analyte and the weight of sample taken for analysis, multiplying by 100.  For trace levels, we may calculate and expressed in terms of mg/kg, mg/L or even ug/L and pg/L, depending on how low the level is.

If w1, w2, …, wn are the weights of a series of samples and y1,y2, …, yn the corresponding measurements made on these samples, we usually calculate the ratios:

The average of these ratios is taken to represent the data collated. But we must remember that all measurements of y have a constant random error or bias, say, a.  Hence, the ratios then become

We would expect the effect of a constant error in the measurement of y on the value of the ratio depends upon the magnitudes of w and y.  We can avoid or conceal this disturbance in the ratio value, r, by making all the samples approximately the same weight taken for analysis.  If the samples do cover an appreciable range of weights, the variation among the several values of the ratio not only reflects the inevitable random errors in y and w but is also dependent upon the weight taken from the sample.

Indeed, this is one of the important criteria in a method development process.

Let’s assume we take several weights (0.5g, 1.0g, 2.5g, 5.0g, 7.5g and 10.0g) of a certified reference fertilizer material containing 15.0% (m/m) K2O content for analysis. The absolute amounts of K2O present in these test samples are thus 0.075g, 0.15g, 0.375g, 0.75g, 1.125g and 1.50g, respectively. Upon analysis, the following results were obtained:

Figure 1 below shows the relationship between several paired values of the certified amount (x) and the analyzed value (y) graphically:

It is obvious that if there is no constant error in the analytical process, the various points should tend to lie closely along a line passing through the origin.  But if there is a constant error or bias which is unavoidable, all the points are displaced either upwards or downwards the same amount.  When the line intercepts the y-axis, this is the point corresponding to the ‘blank’ (i.e. zero analyte).  Hence, the slope (or gradient) of the line, which shows the change in y for a unit change in x, is independent of the presence of a bias and has an advantage over the ratio, r.

So, to take this advantage, we should consider using the slope of line as an alternative means of combining the results of several determinations, provided the line remains linear over a reasonable range. This is because if all samples have the same weight as in this discussion, no information is forthcoming about the slope and we would not be able to detect the presence of a constant error.

Hence, we also see the advantage of linear regression in instrument calibration process where we plot the instrument signals against various concentrations of working standards.  By so doing, we are able to estimate the errors of the slope & the y-intercept of the calibration curve, as well as the concentrations of sample solutions obtained from the calibration.

To start with, we apply the Ordinary Least Squares (OLS) formulae for fitting a straight line to a series of points on the graph. The line to be plotted must be at ‘equal distance’ among all the points where the sum of their ‘distances’, with positive and negative signs is zero.  In here, the distance means the difference between the experimental y-value and the calculated y-value calculated from the linear equation.  So, when we square these differences (i.e. deviations), the sum is a minimum or least positive figure.

It must be emphasized that an important but valid assumption on such regression line has been made, that is the recorded values on x-axis, sample weights in this case, are known exactly without error. For instrument calibration situation, it is assumed that the various working standard concentrations on the x-axis are of negligible error. It is indeed sufficient that the errors in x are small compared with the experimental errors in y (which have all the procedure steps as a source of random variation), and that the x’s cover an adequate range.

A simple computer program such as Microsoft Excel and R language will perform all these calculations, but most scientific calculators will not be adequate.

(to be continued in Part 2)

To determine the concentration of copper in treated mine waste water samples by atomic absorption spectrometry, we can prepare a series of aqueous solutions containing a pure copper salt to calibrate the spectrometer and then use the resulting calibration graph in the determination of the copper in the test samples.

This approach is valid only if a pure aqueous solution of copper and a waste water sample containing the same concentration of copper give the similar absorbance values. In other words, by doing so we are assuming that there is no reduction or enhancement of the copper absorbance signal by other constituents present in the test sample. In many areas of analysis, this assumption is not always true.  Matrix effect can have a significant influence to the final answer, even with methods such as plasma spectrometry (say ICP-AES) which is widely known for being relatively free from interferences.

There can be so-called proportional effects as these effects are normally proportional to the analyte signal, resulting in a change of the slope of the calibration curve.

One way to overcome this is to prepare the calibration standards in a matrix that is similar to the test sample but free of the targeted analyte, by adding known amounts of a copper salt to it in this discussion.  However, in practice, this matrix matching approach is not practical. It will not eliminate matrix effects that differ in magnitude from one sample to another, and it may not be possible even to obtain a sample of the copper mine waste water matrix that contains no such analyte.

So, a better solution to this problem is that all the analytical measurements, including the establishment of the calibration graph, must in some way to be performed using the sample itself.  Hence, the method of standard additions is proposed.  It is widely practiced in atomic-absorption and emission spectrometry, and has also been applied in electrochemical analysis and many other areas.

This method of standard additions suggests to take six or more equal volumes of the sample solution, ‘spike’ them individually with known and different amounts of the analyte, and dilute all to the same volume. The instrument signals are then determined for all these standard solutions and the results are plotted as a linear calibration graph which has the signals plotted on the y-axis with its x-axis graduated in terms of the amounts of analyte added (either as an absolute weight or as a concentration).

When this linear regression is extrapolated to the point on the x-axis at which y = 0, we get a negative intercept on the x-axis which corresponds to the amount of the analyte in the test sample.  So, when the linear calibration equation is expressed in the form of y = a + bx, where a is the y-intercept when x = 0, and b, the slope or gradient of the linear curve, simple geometry shows that the expected amount of the analyte in the test sample, xE, is given by a/b in absolute term, which is the ratio of the intercept and the slope of the regression line.

Since both a and b are subject to error, the calculated concentration is clearly subject to error as well. However, as this concentration is not predicted from a single measured value of y, the formula for the standard deviation, sxE of the sample analyte from the extrapolated x-value, xE is as follows:

where sy/x is the standard error of y on x, and n, the number of points plotted on the regression. The standard error of y on x, sy/x of the linear regression is given by equation:

where yexp,i is the instrument signal value observed for standard concentration xi, and ycal,i, the ‘fitted’ yi-value calculated from the linear regression equation for xi value. This equation has made some important assumptions that the y-values obtained have a normal (Gaussian) error distribution and that the magnitude of the random errors in the y-values is independent of the analyte concentration (i.e. x-values).

Subsequently, we need to determine the confidence limits for xE as xE + t(n-2).sxE at alpha (a) error of 0.05 or 95% confidence. Increasing the value of n surely improves the precision of the estimated concentration. In general, at least six points should be used in a standard-additions experiment.

You may have noticed that the above equation differs slightly from the expression that is familiar in evaluating the standard deviation sx of an x-value given a known or measured y-value from a linear regression:

A worked example

The copper content in a sample of treated mining waste water was determined by FAAS with the method of standard additions.  The following results were obtained:

Added Cu salt in moles/L 0.000, 0.0003, 0.006, 0.009, 0.012, 0.015

Absorbance recorded 0.312, 0.430, 0.584, 0.718, 0.838, 0.994

Let’s determine the concentration of copper level in the sample, and estimate the 95% confidence intervals for this concentration.

Apply the following equations for a, the y-intercept when x = 0, and b, the gradient or slope of the least-squares straight line expressed as y = a + bx:

These two equations yielded a = 45.4095 and b = 0.3054.  The ratio a/b gave the expected copper concentration in the test sample as 0.0067 moles Cu per liter.

By further calculation, we have sy/x = 0.01048, sxE = 0.00028.  Using Student’s t(6-2) critical value of 2.78, the confidence intervals are found to be 0.0067 M +/- 0.0008 M.

Although the method of standard additions is a good approach to cater for the common problem of matrix interference effects, it has certain disadvantages too: 

  • As each sample requires its own calibration graph, in contrast to conventional calibration experiments, where one graph can provide concentration values for many test samples, the workload for analysis is increased.
  • This approach uses larger quantities of sample than other methods.  Sometimes, the customer may not be able to provide that much of sample for analysis
  • Since it is an extrapolated method, statistically it should in principle be less precise than interpolation method. But such loss of precision has been found to be not so serious.

You may also read our earlier article  https://consultglp.com/2018/12/23/std-additions-in-instrumental-calibration/

This worked example illustrates the use of a certified wheat flour reference material to evaluate analytical bias (recovery) of a Kjeldahl nitrogen method for its crude protein content expressed as K-N.

Those analysts familiar with the Kjeldahl method can testify that it is quite a tedious and lengthy method. It consists of three major steps, starting from strong acid digestion of the test sample with concentrated sulfuric acid and a catalyst to form ammonium sulfate, followed by addition of sodium hydroxide to make it into an alkaline solution, evolving a lot of steam and ammonia during the process, and lastly subject to steam distillation before titrating the excess standard boric solution after reaction with the distilled ammonia gas at the steam distillate collector. 

During these processes, there is always a chance to lose some nitrogen content of the sample due to volatility of reactions during the processing, and one can therefore expect to find lower K-N content in the sample than expected. Of course, many modern digestion/distillation systems sold in the market have minimized such losses through apparatus design but some significantly low recoveries do happen, depending on the technical competence and the extent of precautions taken by the laboratory concerned.

In here, a certified wheat flour reference material with a N- value of 1.851 g/100g + 0.017 g/100g was used. It was subject to duplicate analysis by four analysts in different days, using the same digestion/distillation set of the laboratory.  The test data were summarized below and the pooled standard deviation of the exercise was calculated together with its overall mean value:

Although 97.3% recovery of the K-N content in the certified reference material looks reasonably close to 100%, we still have to ask if the grand mean value of 1.802 g/100g was significantly different from the certified value of 1.851 g/100g.  If it were significantly different, we could conclude that the average test result in this exercise was bias.

A significance (hypothesis) testing using Student’s t-test statistic was carried out as described below:

Let Ho :  Grand observed mean value = certified value (1.851 g/100g)

       H1 :  Grand observed mean value ≠ certified value

By calculation, t-value was found to be 3.532 whilst t -critical value obtained by MS Excel® function “=T.INV.2T(0.05,3)” was 3.182. Since 3.532 > 3.182, the Null Hypothesis Ho was rejected, indicating the difference between the observed mean value and the certified value was significant. We can also use Excel function “=T.DIST.2T(3.532,3)” to calculate the probability p-value, giving p = 0.039 which was less than 0.05. Values below 0.05 indicate significance (at the 95% level).

Now, you have to exercise your professional judgement if you would wish to make a correction to your routine wheat flour sample analyses posted within the validity of your CRM checks by multiplying test results of actual samples by a correction factor of 1/0.973 or 1.027.

Introduction

ISO/IEC 17025:2017 accreditation standard requires laboratories to adopt standard test methods after verification of their performance, or in-house/non-standard methods after full validation.  The main difference between verification and validation is the approaches to make sure the test methods adopted are fit for intended purpose. 

Method verification is usually carried on standardized or internationally recognized methods which have been duly studied for their suitability.  So, the laboratory needs only to show its technical competence in that it can meet the repeatability and reproducibility criteria laid down by the standard method concerned. Method validation, on the other hand, has to be conducted with many statistical parameters to confirm the suitability of the test method used.

Irrespective to either verification or validation, we must satisfy ourselves that the test methods adopted are precise and accurate. As we never know the true or native value of an analyte in a given sample sent for analysis, how can we be sure that the results presented to our customer are accurate or correct? How do we have confidence that the test method used in our laboratory is reliable enough for its purpose?

Routinely you may have carried out duplicates or triplicates in the analysis, but by doing so, you are actually studying the precision of the method based on the spread of test results in these replicated analyses. To know the accuracy of the method, you need to carry out analysis on some kind of samples with known or assigned value of the analyte to see if the recovery data are acceptable statistically. You may use a certified reference material (CRM) for this purpose.

Certified reference materials

ISO defines certified reference material (CRM) as a reference material characterized by a metrologically valid procedure for one or more specified properties, accompanied by a certificate that provides the value of the specified property, its associated uncertainty, and a statement of metrological traceability, while reference material (RM) is a material, sufficiently homogeneous and stable with respect to one or more specified properties, which has been established to be fit for its intended use in a measurement process.

Hence, one of its important usage in method validation is to assess the trueness (bias) of a method, although with careful planning of experiments, other useful information such as method precision can also be collected at the same time.

To know the accuracy of a test method is to monitor its biasness and recovery. Ideal samples in which analyte levels are well characterized, e.g.: matrix CRMs, are necessary.  This is because pure analyte standards do not really test the method in the same way that matrix-based samples do. However, matrix CRMs may not be always available in the markets. If not available, then a reference material prepared in-house is our next best option.

In the absence of suitable reference materials, it is also possible to carry out recovery on spiked samples where known amounts of analyte are added to so-called ‘blank’ samples. However, the analyte in this case tends to be bound less closely in spiked samples than in real samples for analysis, and consequent recoveries tend to be over-estimated.

Measurement errors

To understand the bias associated with an analytical method, we need first of all discuss measurement errors, which include random error and systematic error which lead us to bias (trueness).

We notice that repeated laboratory analyses always generate different results.  This is due to many uncontrollable random effects during the experimentation. They can be assessed through replicate testing.  However, experimental work is invariably subject to possible systematic effects too.  A method can be considered ‘validated’ only if any systematic effects are duly studied and confirmed.

It is important to point out that under the current ISO definitions, accuracy is a property of a result and comprises both bias and precision, whilst the trueness is the closeness of agreement between the average value obtained from a large set of test results and an accepted reference value.  ISO further notes that the measure of trueness is normally expressed in terms of bias.

The following figure gives an illustration of analytical bias and its relationship with precision of replicate analysis.

How to measure bias against CRMs?

From the definitions of bias given, we know that:

  • any measure of bias should constitute an average reading
  • a test for bias must be made on a test item with a known or accepted reference value, e.g.: a CRM

It therefore follows that tests to measure bias need:

  • sufficient precision to detect practically significant bias through replicate testing for finding the maximum acceptable bias for the method to be fit for purpose
  • use of the most appropriate reference materials and certified values available
  • tests covering the scope of the method adequately (i.e. range of analyte concentrations and matrices specified in the scope).

Bias can be expressed in one of the two ways:

  • As an absolute value, i.e.  x – xo, where a positive bias means a higher observed value
  • As a fraction or percentage for analytical recovery, x/xo or 100x/xo

The difference between a test result and its certified reference value does not tell us much about result bias. To know any significance between the difference, we have to carry out a series of replicate experiments and apply statistical treatment on the test data collected.

When conducting a bias study comparing the certified value for a reference material with the results obtained with the particular test method, we carry out a Student’s t -test statistic to interpret the results.  We apply the mean value and its standard deviation of n replicates of the experiments in the following equation:

If t-value is greater than the t -critical value at alpha a error of 0.05, the bias is statistically significant with 95% confidence.

How to use bias information?

Bias information obtained during method development and validation is primarily intended to initiate any further method development and study. If a significant effect is found, action is normally required to reduce it to insignificance level.  Typically, further study and corrective actions are to be taken to identify the source(s) of the error and bring it under control. Corrective actions might involve, for example, some changes to the test procedure or additional training.

However, it is quite unusual to correct an entire analytical method just for the sake of observed bias. If minor changes to the test protocols are not able to improve the accuracy of the results, we may resort to do a correction for recovery.  If R is the average recovery noted in the experiments, a recovery correction factor of 1/R can be applied to the test results in order to bring them back to a 100% recovery level.  However, there is no current consensus on such correction for recovery. 

The Harmonized IUPAC Guidelines for the Use of Recovery Information in Analytical Measurement have recognized a rationale for either standardizing on an uncorrected method or adopting a corrected method, depending on the end use.  Its recommendation is: It is of over-riding importance that all data, when reported, should (a) be clearly identified as to whether or not a recovery correction has been applied and (b) if a recovery correction has been applied, the amount of the correction and the method by which it was derived should be included with the report. This will promote direct comparability of data sets. Correction functions should be established on the basis of appropriate statistical considerations, documented, archived and available to the client.

I have recently come across an interesting article published in 1944 by Harold Hotelling, a renown American mathematical statistician and an influential economic theorist, well known for his Hotelling’s law, Hotelling’s lemma and Hotelling’s rule in economics, as well as Hotelling’s T-squared distribution in statistics.  The article is titled: Some improvements in weighing and other experimental techniques.

In an effort to seek improvement of physical and chemical investigations in terms of accuracy and costs through better designed experiments, based on the theory of statistical inference, Hotelling was interested in a published work of F. Yates in Complex Experiments, Jour. Roy. Stat. Soc., Supp., vol 2 (1935) pp 181-247, and suggested various improvements for a weighing process.

When we are given several objects to weigh, our normal practice is to weigh each and everyone of them in turn on a balance which has an error in terms of standard deviation, say, s

Let’s assume we had two objects (A and B) for weighing and obtained weights ma and mb. So, we have the reported weights Ma and Mb as:

Ma = ma +/- s

Mb = mb +/- s

If ma and mb had values of 20 and 10 grams respectively, and s was 0.1 gram, then:

Ma = 20 +/- 0.1g

Mb = 10 +/- 0.1g

Hotelling noticed however that greater precision could be obtained for the same effort if both objects were included in each weighing.  He carried out a first weighing with objects A and B on the same weighing pan, followed by weighing B only to obtain the difference in weights.  He then deduced the masses of ma and mb as follows:

ma + mb = p1

mamb = p2

Hence,                  ma = (p1 + p2)/2

and                        mb = (p1 p2)/2

By law of propagation of error, the total variance of n independent measurements is the sum of their variances without covariance factor to be considered.  The variance of n times a measurement is n2 times the variance of that measurement. In this case, n = 2. Therefore,

Var(ma) = [Var(p1) + Var (p2)]/4 = [s2 + s2]/4 = s2/2

Var(mb) = [s2 + s2]/4 = s2/2

Hence, by this method, the variances of ma and mb were both equal to s2/2, half the value when the two objects were weighed separately.

The error of the results of these measures was therefore s/SQRT(2) or 0.7s and no longer s, so that:

Ma = 20 +/- 0.07g

Mb = 10 +/- 0.07g

An important inference from this illustration is that in designing experiments, it is best to include all variables, or all factors, in each trial for the same number of repeats in order to achieve improved precision.