## Training and consultancy for testing laboratories.

### Theory behind decision rule simply explained – Part B

There are a few approaches for decision making leading to a conformity statement after testing.

Simple approaches for binary decision rule involving comments of pass/fail, compliant/non-compliant:

• A result implies non-compliance with an upper limit if the measured value exceeds the limit by the expanded uncertainty. See Figure 1A.  This is a clear-cut case.
• A result equal to or above the upper limit implies non-compliance and a result below the limit implies compliance – provided that uncertainty is below a specified value or assumed zero.  This is normally used where the uncertainty is so small compared with the limit that the risk of making a wrong decision is acceptable. For example, the relative uncertainty of the measured value is 1-2% whilst the Type I error that you are prepared to take is 5%.  See Figure 1B.

However, to use such a rule without specifying the maximum permitted value of the uncertainty would mean the risk (probability) of making a wrong decision would not be known.

More complicated approaches for decision rule by use of Guard Bands:

Many learned organizations like ILAC, Eurachem, etc. have suggested to consider incorporating some tolerance limits (or interval) or guard bands (say, +g) added to the nominal specification for risk decision.  In this instance, a rejection zone can be defined as starting from the specification limit L plus or minus an amountg (the Guard Band).  The purpose of establishing such an “expanded” or “conservative” error on the specification value is to draw “safe” conclusions concerning whether measurement errors are within acceptable limits with a calculated risk as agreed by both the customers and the laboratory concerned.

The value of g is chosen so that for a measurement result greater than or equal to L + g, the probability of false rejection is less than or equal to alpha (Type I error) which is the accepted risk level.

In general, g will be a multiple of the standard uncertainty of the test parameter, u. The multiplying factor can be 1.645 or 1.65 (95% confidence) or 3.3 (>99% confidence).  That is to say that the amount of uncertainty in the measurement process and where the measurement result lies with respect to the tolerance limit set help to determine the probability of an incorrect decision.

A situation is, for example, when you set your guard band g to be the expanded uncertainty of the measurement, that is U = 2u above the upper limit of specification.  In this case, your estimated critical measurement result plus 1.645u with 95% confidence is well inside the L + g zone, and hence, your risk of making a wrong decision is at 5%.  This is shown graphically in Figure 2A below:

Often it is the customer who would specify such a tolerance limit as in Figure 2A, indicating that he would be happy to accept when such tolerance level or guard band is above the upper specification limit or below the lower specification limit in the rejection zones.  Hence, the risk is at the customer’s side.  It is also known as ‘relax rejection zone’ which covers the Type II (beta) error.

However, if the laboratory operator is to set his own risk limit, it is best for him to set the tolerance limit or guard band below the upper specification level or above the lower specification level to safeguard his own interest.  It is known as ‘conservative or stringent acceptance zone’, leading to the Type I (alpha) error.

How to estimate the critical value for acceptance?

Let’s illustrate it via a worked example.

One of the toxic elements in soil is cadmium (Cd).  Let the upper acceptable limit on the total Cd in soil required by the environmental consultant client as 2.0 mg/kg on dried matter.  The measurand is therefore the total Cd content in soil by ICP-OES method.

Upon analysis, the average value of Cd content in soil samples, say, was found to be 1.81 mg/kg on dried basis, and the uncertainty of measurement U was 0.20 mg/kg with a coverage factor of 2 (95% confidence). Hence, the standard uncertainty of the measurement = 0.20 / 2 = 0.10 mg/kg. This standard uncertainty included both sampling and analytical uncertainties.

Our Decision ruleThe critical value or the decision limit was the Cd concentration where it could be decided with a confidence of approximately 95% (alpha=0.05) that the sample batch had a concentration below the set upper limit of 2 mg/kg.

The guard band g is then calculated as:

1.645 x u = 1.645 x 0.10 = 0.165 mg/kg

where k = z = 1.645 for one-tailed value of normal probability distribution at 95% confidence.

The decision (critical) limit therefore = 2.0 – 0.165 = 1.84 mg/kg.

The client would then be duly informed and agreed that all reported values below this critical limit value of 1.84 mg/kg were in the acceptance zone.  Hence, the test result of 1.81 mg/kg in this study was in compliance with the Cd specification limit of 2.0 mg/kg maximum.

Suggested types of guard bands

The guard band is often based on a multiple, r, of the expanded measurement uncertainty, U where g = rU

For a binary decision rule, a measurement result below the acceptance limit AL = (L-g) is accepted.

The above example of g = U is quite commonly used, but there may be cases where a multiplier other than 1 is more appropriate.  ILAC Guide G08:09/2019 titled “Guidelines on decision rules and statements of conformity” provides a table showing examples of different guard bands to achieve certain levels of specific risks, based on the customer application, as reproduced in Figure 3 Table 1 below.  Note that probability of False Accept PFA refers to false positive or Type I error.

It may be noted that the multiplying factor of 0.83 in the guard band of 0.83U as given by ISO 14253-1:2017 is derived by calculation of 1.65/2, where 1.645 has been approximated to 1.65 and 2 is the coverage factor of 1.96 rounded up to the nearest integer, for 95% confidence interval.

### A worked example for decision rule on conformity statement

Consider a measurement value y = 2.70ppm with a standard uncertainty of u(y) = 0.20ppm.  (Its expanded uncertainty = k x 0.20ppm = 2 x 0.20ppm = 0.40ppm where coverage factor k = 2 at 95% confidence). It is also given that the single tolerance or specification upper limit of Tu = 3.0ppm.

Assuming the normal probability distribution data and a type I error alpha = 0.05 (5%), we are to make a statement of specification conformity at probability of (1-alpha) or 0.95 (95%).

Our decision rule is that :  “Acceptance if the hypothesis Ho: P(y< 3.0ppm) > 0.95”is true.

Use Microsoft Excel spreadsheet function: “= 1-NORM.DIST(2.7,3.0,0.2,TRUE)” to calculate P(y< 3.0ppm) to get 0.933 or 93.3%.  Note that the function “=NORM.DIST(2.7,3.0,0.2,TRUE)” gives the cumulative area under the curve from far left to right for a value of 0.067 approximately.

Alternatively, we can also calculate a normalized z -value as (2.7 – 3.0)/0.2 = – 1.50, and look up the one-tailed normal distribution table for cumulative probability under the curve with z =|1.5| which gives 0.5000 + 0.4332 = 0.9332, as a normal distribution curve is symmetrical in shape. See Appendix A for the normal distribution cumulative table. In fact, we would get the same answer if we were to use the Excel function “=1- NORM.DIST(-1.5,0,1,TRUE)” as well.

Since 93.3% < 95.0%, the Ho is rejected, i.e. the sample result of 2.70ppm can be declared non-compliant with the specification limit, or put it more mildly, “not possible to state compliance” or “conditional pass” or some other qualification wordings!

If, for discussion sake, the measured value was 2.60ppm, instead. Would it be within the upper specification limit of 3.0ppm by the above evaluation?

Indeed, by following the above reasoning, we would find that the normalized z-value as (2.6-3.0)/0.2 = – 2.0 and the cumulative area under the curve was 0.5000 + 0.4772 = 0.977 which is larger than 0.950.  Therefore, the Ho is not rejected, i.e. the sample or test item is declared in compliant with the specification limit.

What is the critical acceptable value Xppm in order not to get Ho rejected?

The task will be simple if we know how to find the critical z -value in a normal distribution curve where the area under the curve on the right tail is 0.05 out of 1.00, or 5%, as we have fixed our Type I (alpha) risk as 5%.

Reading from the normal distribution cumulative table in Appendix A, we note that when z = 1.645, the area under the curve is 0.5000 + 0.4500 = 0.9500.  Similarly, the absolute value of Excel function “=NORM.INV(0.05,0,1)” also gives a |z|-value 0f 1.645.

The critical acceptable value X is then calculated as below:

which gives X = 2.67ppm.

The conclusion therefore is that any test value found to be less than or equal to 2.67ppm will be declared as in compliance with the specification of 3.0ppm maximum with 95% confidence (or 5% error risk).  Any value found larger than 2.67ppm will be assessed for compliant by considering the higher than 5% risk that the test laboratory is willing to undertake, probably based on some commercial reason.  In other words, where a confidence level of less than 95% is acceptable to the laboratory, a compliance statement may be possible.  Decision is entirely yours!

Appendix A

### Theory behind decision rule simply explained – Part A

All testing and calibration laboratories accredited under ISO/IEC 17025:2017 are required to prepare and implement a set of decision rules when the customer requests for a statement of conformity in the test or calibration report issued.

As the word “conformity” is defined as “compliance with standards, rules and laws”, a statement of conformity is an expression that clearly describes the state of compliance or non-compliance to a specification, standard, regulatory limits or requirements, after calibration or testing.

Like any decision made, you have to assume a certain amount of risk as you might make a wrong decision. So, how much is a risk that you can comfortably undertake when you issue a statement of conformity in your test or calibration report?

Generally, decision rules give a prescription for the acceptance or rejection of a product based on:

• the measurement result
• its uncertainty due to inherent errors (random and/or systematic)
• the specification (or regulatory) limit or limits, and,
• the acceptable risk level based on the probability of making a wrong decision

Certainly, you want to minimize our risk in issuing a statement of conformity that is to be proven wrong by others.  But, what is the type of risk you are answering when making such decision rule?  In short, it  is either

• the supplier’s (laboratory’s) risk (statistically speaking, false positive or Type I error, alpha) or
• the consumer’s (customer’s) risk (false negative or Type II error, beta).

From the laboratory service point of view, you should be interested in the Type I (alpha) error to protect your own interest.

Before indulging further in the discussion, let’s take note of an important assumption, that is, the uncertainty of measurement is represented by a normal (Gaussian) probability distribution function, which is consistent with the typical measurement results (being assumed the applicability of the Central Limit Theorem).

After calibration or testing an item with its measurement uncertainty known, our subsequent statement of conformance with a specification or regulatory limits can lead us to 2 possible outcomes:

• We are right
• We are wrong

The decision rule made is related to statistical hypothesis testing where we propose a null hypothesis Ho for a situation and an alternative hypothesis H1 should Ho be rejected after some test statistics.   In this case, we can make either a Type I (false POSITIVE or false ALARM, i.e. rejecting null hypothesis Ho when in fact Ho is true) or Type II (false NEGATIVE, i.e. not rejecting Ho when in fact Ho is actually false) errors.

It follows that the probabilities of making the correct decisions are (1 – alpha) and (1 – beta), respectively.  Generally we would take a 5% Type I risk, hence we had alpha = 0.05 and would claim that we have 95% confidence in making this statement of conformity.

In layman’s language:

• Type I :  Deciding that something is NOT OK when it actually is OK,  given the probability (risk):  alpha
• Type II:  Deciding something is OK when it really was NOT OK, given the probability (risk):  beta

Figure 1 shows the matrix of such decision making and potential errors involved:

The statistical basis of the decision rules is to determine where the “Acceptance zone” and the “Rejection zone” are, such that if the measurement result lies in the acceptance zone, the product is declared compliant, and, if in the rejection zone, it is declared non-compliant.  Graphically, it can be shown as in Figure 2 below:

We should not have any issue in deciding the conformity in Case 1 and non-conformity in Case 4 due to a clear cut situation as shown in Figure 2 above, but we need to assess if Cases 2 and 3 are in conformity or not, as illustrated in Figure 3 below for an upper specification limit:

For the situations in Cases 2 and 3, we may include the following thoughts in the decision rule making before considering the amount of risk to be taken in deciding conformity:

• Making a request for additional measurement(s)
• Re-evaluating measurement uncertainty to narrow the range, if possible
• A manufactured (and tested) product must be compared with an alternative specification to decide on possible sale at a discounted price, as a rejected goods

Part B of this article will discuss both simple and more complicated decision rules that can be made during issuing statement of conformance after testing or calibration. Before that, we shall study a practical worked example.

### R evaluation of Measurement uncertainty

At the recent Eurachem/PUC ISO 17025 training course in Nicosia, Cyprus on 20-21 February 2020, I had learnt something new from Dr Stephen Ellison’s presentation.

There is a measurement uncertainty package in the R Language, named “metRology”.  You can download this library when you are in the R environment.

For example, if we were asked to evaluate the uncertainty of the following expression:

expr = A + 2xB + 3xC + D/2

where A = 1, B = 3, C=2, D=11.  The sensitive coefficients, c’s, from the above expression are thus 1, 2, 3 and ½ for A, B, C and D, respectively.

Assuming the standard uncertainties of these parameters are constant at 1/10th of their values, the following steps demonstrate how the combined standard uncertainty can be evaluated.

> library(“metRology”)

Attaching package: ‘metRology’

The following objects are masked from ‘package:base’:

cbind, rbind

> expr<-expression(A+B*2+C*3+D/2)

> x=list(A=1,B=3,C=2,D=11)

> u=lapply(x,function(x) x/10)

> u

\$A

[1] 0.1

\$B

[1] 0.3

\$C

[1] 0.2

\$D

[1] 1.1

>

> u.expr<-uncert(expr,x,u,method=”NUM”)

> u.expr

Uncertainty evaluation

Call:

uncert.expression(obj = expr, x = x, u = u, method = “NUM”)

Expression: a + b * 2 + C * 3 + D/2

Evaluation method:  NUM

Uncertainty budget:

x    u      c     u.c

A   1   0.1   1.0   0.10

B   3   0.3   2.0   0.60

C   2   0.2   3.0   0.60

D  11  1.1   0.5  0.55

y:  18.5

u(y):  1.01612

### Uncertainty of measurement – “Bottom-up” vs “Top-down”

At the recently concluded Eurachem/PUC training workshop on “Accreditation of analytical, microbiological and medical laboratories – ISO/IEC 17025:2017 and ISO 15189:2012”, the following important pointers were noted during the presentation of Dr Steve Ellison of LGC UK on the to subject : Measurement Uncertainty – “Bottom-up” vs “Top-down”:

1. Measurement uncertainty assessed in analytical chemistry is either through the use of the Law of Propagation of Uncertainty from uncertainty budgets (or inputs or contributors) as per GUM (bottom-up) method, or adopting the method performance (or validation) data (top-down);
2. Using the GUM approach with a mathematical model, the laboratory is to assess and sufficiently quantify significant uncertainty contributors in the test procedures.  This can be done by (a) using descriptive statistical data through repeated experiments (Type A), or (b) any other means, such as certificates of analysis by a third party, theory or professional judgement (Type B);
3. It has been stated that testing laboratories tend to underestimate measurement uncertainty using the GUM method in almost measurement fields, as one cannot comprehensively identify and quantify all important uncertainty inputs;
4. Use of any one of the top-down approaches with the use of validation data is a better bet in the evaluation of measurement uncertainty because the actual dispersion of test results in extended experiments can be observed; the major uncertainty source data can come from (a) long term precision (intermediate reproducibility), (b) bias uncertainty based on reliable certified reference materials, and (c ) any other additional important effects which are not part of the method’s mathematical equation;
5. By definition, uncertainty is a range which includes the true value.  Therefore, any significant bias should not be ignored.
6. Empirical methods are operationally defined.  In the top-down approach, relevant  reference material should be used to estimate laboratory bias as an input of uncertainty.  In this case, only matrix bias is to be taken care of and method bias is not relevant.
7. Eurachem opines that the bottom-up GUM method is appropriate for metrology laboratories, whilst the top-down approaches are best for testing laboratories.

### Expressing MU for qualitative testing?

I would like to share some of the ideas picked up at the 2-day Eurachem / Pancyprian Union of Chemists (PUC) joint training workshop on 20-21 February 2020, titled “Accreditation of analytical, microbiological and medical laboratories – ISO/IEC 17025:2017 and ISO 15189:2012”, after flying all the way from Singapore to Nicosia of Cyprus via a stop-over at Istanbul.

Today, let’s see whether there is a requirement for an expression of uncertainty in qualitative analysis. In other words, are there quantitative reports of uncertainties in qualitative test results?

Qualitative chemical and microbiological testing usually fall under the following binary classifications with two outcomes only:

• Pass/Fail for a targeted measurand
• Positive/Negative
• Presence/Absence
• “Above” or “Below” a limit
• Red or yellow colour
• Classification into ranges (<2; 2 – 5; 5 – 10; >10)
• Authentic or non-authentic

Many learned professional organizations have set up working groups to study on expression of uncertainty for such types of qualitative analysis for many years and have yet to officially publish guidance in this respect.

The current thinking refers to the following common approaches:

1. Using false positive and negative response rates

In a binary test, we can get result to be a true positive (TP) or a true negative (TN).  There are two kinds of errors associated in such testing, giving rise to a false positive (FP) or a false negative (FN) situation.

A false positive error occurs in data reporting when a test result improperly indicates presence of a condition, such as a harmful pathogen in food, when in reality it is not present (being a Type I error, statistically speaking), while a false negative is an error in which a test result improperly indicates absence of a condition when in reality it is present (i.e. a Type II error).

Consequently, the false positive response rate, which is equal to the significance level (Greek letter alpha, α) in statistical hypothesis testing, is the ratio of those negatives that still yield positive test outcomes against the total observations. The specificity of the test is then equal to 1−α.

Complementarily, the false negative rate is the proportion of positives which yield negative test outcomes with the test.  In statistical hypothesis testing, we can express this fraction a letter beta β (for a Type II error), and the “power” or the “sensitivity” of the test is equal to 1−β.

See table below:

2.     Alternative performance indicators (single laboratory)

The alternative performance indicators are actually reliability measures involving several formulae, as summarized below:

There are many challenges to evaluate qualitative “uncertainty”.  Although the idea of estimating uncertainty for such binary results is sound, the most problematic one is how to collect hundreds of experimental data in order to make reasonable statistical estimates for low false response rates.  Another challenge is how to confidently estimate the population probabilities in order not to be bias.  A sensible suggestion is to ask laboratories to following published codes of best practices in qualitative testing where they are available and to ensure the conditions of testing are under adequate control.

At this moment, quantitative (i.e. numerical) reports of uncertainties in qualitative test results, involving strict metrological and statistical calculations, are not generally expected by the accreditation bodies.

### What’s Internal Quality Control (IQC)?

A professionally run test laboratory must have a set of internal quality control or check (IQC) procedures in place. Regrettably I have noticed that many accredited chemical laboratories do not institute such IQC system in their routine works.

The purpose of IQC is to ensure as far as possible that the magnitude of errors affecting the analytical system is not changing during its routine use since method validation or verification process. By not having any IQC system in place, the analyst would not be able to state with confidence that the test results generated for that particular batch of samples are precise, accurate and fit for purpose.

During method validation, we have estimated the uncertainty of the method and showed that it is fit for purpose. Therefore, when the method is put in routine use, every run of analysis should be checked to show that the errors of measurement are probably no larger than they were at validation time.  Even when a standardized method is used for analysis, we have to demonstrate that our laboratory’s precision is no worse than the stated repeatability of the method.

For this IQC purpose, we can employ the concept of statistical control, which means in general that some critical feature of the system is behaving like a normally distributed variable.  How are we going to do it?

For chemical analysis, we can add one or more “control materials or samples” to the run of test methods.  These control materials are treated throughout in exactly the same manner as the test materials, from the weighing of the test portion to the final measurement.  Of course, the control materials ideally must be of the same type as the materials for which the analytical system was validated, in respect of matrix composition and analyte concentration.

By doing so, we treat the control materials as a surrogate and their behavior is a proper indicator of the performance of the system.  We can plot the results obtained in successive runs on a control chart for visual inspection on its moving trend over time.  The control lines are determined by run-to-run intermediate precision of the data collected.  Intermediate precision, by definition, is the pooled standard deviation of a number of successive runs in the same laboratory with inevitable changing measurement conditions (such as different analysts, instruments, newly prepared reagents, environmental variations, etc.) over time.

### The uncertainty of measuring instruments

In addition to classical analytical methods, we have several instruments that are helpful in our routine laboratory analysis.  Examples are aplenty, such as pH meter, dissolved oxygen meter, turbidity meter, Conductivity meter, UV-visible spectrometer, FT-IR spectrophotometer, etc.  Some are being used for in-situ measurements in the field.  Hence, it is important to estimate their respective measurement uncertainty.

Most measuring instruments are generally characterized by:

• Class (depending on the precision of its measurement grading, such as Class A and Class B of burette, etc)
• Sensitivity on instrument response
• Discrimination threshold in identification
• Resolution of displaying device
• Stability as measured by drifting of its graded measurement

To evaluate the uncertainty of readings from a measuring instrument, we look for two basic uncertainty contributors, namely:

1. The maximum permissible error provided by the supplier.
2. The repeatability of measuring instrument

Maximum permissible error (MPE)

By VIM definition, MPE is an extreme value of measurement error, with respect to a known reference quantity value, permitted by specifications or regulations for a given measurement, measuring instrument, or measuring system.  It is the ‘best’ accuracy confirmed by a calibration and specified by the manufacturer of the instrument during the warranty period.

MPE data can always be found in the manufacturer’s manual under the instrument specification. It is usually expressed in one of the following manners:

1. When the MPE is constant throughout the instrument indications, it is expressed as:

MPE = +/-a

where a is a given value for its unit.

For example, a glass thermometer with a measuring range of 0 – 50oC with sub-divided units of 0.1oC, MPE = +/-0.2oC

• When MPE varies with a change of instrument indications following a regression line, the maximum error tolerance can be a given relation as follows:

MPE = +/-(a + bx)

where x is a measured value.

• When the measuring instrument uses a constant relative standard deviation RSD, its MPE can be expressed as:

MPE = +/-RSD.x

Repeatability of measuring instrument

Repeatability is the closeness of the agreement between the results of successive measurements of the same measure carried out under the same conditions of measurement, being taken by a single person or instrument on the same item, under the same conditions, and in a short period of time. Indeed, repeatability is a measure of instrument indicator’s variation under successive measurement exercise.  It is expressed as sr, the standard deviation of a series of repeated measurements.

Example

A breathalyzer is a device for estimating blood alcohol content (BAC) from a breath sample. A given brand breathalyzer has the following performance data:

1. Maximum permissible error

BAC  < 0.20 g/100ml               MPE = +/- 0.025 g/100ml

BAC  0.20 – 0.40 g/100ml       MPE = +/- 0.04 g/100ml

• Measurement repeatability expressed as standard deviation

sr = +/- 0.006 g/100ml

Evaluating measurement uncertainty of the breathalyzer

1. The standard uncertainty of the MPE is calculated by MPE/SQRT(3) using the rectangular probability factor for a maximum bound of error estimation.  Hence, we have:

BAC  < 0.20 g/100ml               u(E) = +/- 0.014(4) g/100ml

BAC  0.20 – 0.40 g/100ml       u(E)  = +/- 0.023(1) g/100ml

• Measurement repeatability

sr = +/- 0.006 g/100ml

The combined standard uncertainty u (Comb) = SQRT(u(E)2 + sr2) and the expanded uncertainties which are 2 x u(Comb) with 95% confidence for the two ranges are as follows:

### Can we estimate uncertainty by replicates?

The method traditionally practiced by most test laboratories in the estimation of measurement uncertainty is by the ISO GUM (ISO/IEC Guide 98-3) approach, which is quite tedious and time consuming to study and gather uncertainty contributions from each and every step of the test method.  An alternative way of looking at uncertainty is to attempt to study the overall performance of the analytical procedure by involving replication of the whole procedure to give a direct estimate of the uncertainty for the final test result. This is the so-called ‘top-down’ approach.

We may use the data from inter-laboratory study, in-house validation or ongoing quality control. This approach is particularly appropriate where individual effects are poorly understood in terms of their quantitative theoretical models which are capable of predicting the behavior of analytical results for particular sample types.  By this approach, it is suffice to consider reproducibility from inter-laboratory data or long-term within-laboratory precision as recommended by ISO 21748, ISO 11352 and ASTM D 6299.

However, one must be aware of that by repeatedly analyzing a given sample over several times will not be a good estimate of the uncertainty unless the following conditions are fulfilled:

1. There must be no perceptible bias or systematic error in the procedure.  That is to say that the difference between the expected results and the true or reference value must be negligible in relation to twice of the standard deviation with 95% confidence. This condition is usually (but not always) fulfilled in analytical chemistry.
• The replication has to explore all of the possible variations in the execution of the method by engaging different analysts on different days using different equipment on a similar sample. If not, at least all of the variations of important magnitude are considered. Such condition may not be easily met by replication under repeatability conditions (i.e. repeated testing within laboratory), because such variations would be laboratory-specific to a great extent.

The conclusion is that replicated data by a single analyst on same equipment over a short period of time are not sufficient for uncertainty estimation. If the top-down approach is to be followed, we must obtain a good estimate of the long-term precision of the analytical method.  This can be done for example, by studying the precision for a typical test method used as a QC material over a reasonable period of time. We may also use a published reproducibility standard deviation for the method in use, provided we document proof that we are able to follow the procedure closely and competently.