A professionally run test laboratory must have a set of
internal quality control or check (IQC) procedures in place. Regrettably I have
noticed that many accredited chemical laboratories do not institute such IQC system
in their routine works.

The purpose of IQC is to ensure as far as possible that the magnitude of errors affecting the analytical system is not changing during its routine use since method validation or verification process. By not having any IQC system in place, the analyst would not be able to state with confidence that the test results generated for that particular batch of samples are precise, accurate and fit for purpose.

During method validation, we have estimated the uncertainty
of the method and showed that it is fit for purpose. Therefore, when the method
is put in routine use, every run of analysis should be checked to show that the
errors of measurement are probably no larger than they were at validation
time. Even when a standardized method is
used for analysis, we have to demonstrate that our laboratory’s precision is no
worse than the stated repeatability of the method.

For this IQC purpose, we can employ the concept of
statistical control, which means in general that some critical feature of the
system is behaving like a normally distributed variable. How are we going to do it?

For chemical analysis, we can add one or more “control materials
or samples” to the run of test methods.
These control materials are treated throughout in exactly the same
manner as the test materials, from the weighing of the test portion to the
final measurement. Of course, the
control materials ideally must be of the same type as the materials for which
the analytical system was validated, in respect of matrix composition and
analyte concentration.

By doing so, we treat the control materials as a surrogate
and their behavior is a proper indicator of the performance of the system. We can plot the results obtained in
successive runs on a control chart for visual inspection on its moving trend
over time. The control lines are determined
by run-to-run intermediate precision of the data collected. Intermediate precision, by definition, is the pooled
standard deviation of a number of successive runs in the same laboratory with inevitable
changing measurement conditions (such as different analysts, instruments, newly
prepared reagents, environmental variations, etc.) over time.

In addition to classical analytical methods, we have several instruments that are helpful in our routine laboratory analysis. Examples are aplenty, such as pH meter, dissolved oxygen meter, turbidity meter, Conductivity meter, UV-visible spectrometer, FT-IR spectrophotometer, etc. Some are being used for in-situ measurements in the field. Hence, it is important to estimate their respective measurement uncertainty.

Most measuring instruments are
generally characterized by:

Class
(depending on the precision of its measurement grading, such as Class A and
Class B of burette, etc)

Sensitivity on
instrument response

Discrimination threshold in identification

Resolution of displaying device

Stability as
measured by drifting of its graded measurement

To evaluate
the uncertainty of readings from a measuring instrument, we look for two basic uncertainty
contributors, namely:

The maximum
permissible error provided by the supplier.

The
repeatability of measuring instrument

Maximum permissible error (MPE)

By VIM definition, MPE is an extreme value of measurement error, with respect to a known reference quantity value, permitted by specifications or regulations for a given measurement, measuring instrument, or measuring system. It is the ‘best’ accuracy confirmed by a calibration and specified by the manufacturer of the instrument during the warranty period.

MPE data can always be found in the
manufacturer’s manual under the instrument specification. It is usually expressed
in one of the following manners:

When
the MPE is constant throughout the instrument indications, it is expressed as:

MPE = +/-a

where a is a given value for its unit.

For example, a glass thermometer with a measuring range of 0 – 50^{o}C with sub-divided units of 0.1^{o}C, MPE = +/-0.2^{o}C

When
MPE varies with a change of instrument indications following a regression line,
the maximum error tolerance can be a given relation as follows:

MPE = +/-(a + bx)

where
x
is a measured value.

When
the measuring instrument uses a constant relative standard deviation RSD, its MPE can be expressed as:

MPE = +/-RSD.x

Repeatability
of measuring instrument

Repeatability is the closeness of the agreement between the results of successive measurements of the same measure carried out under the same conditions of measurement, being taken by a single person or instrument on the same item, under the same conditions, and in a short period of time. Indeed, repeatability is a measure of instrument indicator’s variation under successive measurement exercise. It is expressed as s_{r}, the standard deviation of a series of repeated measurements.

Example

A breathalyzer is a device for estimating blood alcohol content (BAC) from a breath sample. A given brand breathalyzer has the following performance data:

Maximum
permissible error

BAC < 0.20 g/100ml MPE = +/- 0.025 g/100ml

BAC 0.20 – 0.40 g/100ml MPE = +/- 0.04 g/100ml

Measurement repeatability
expressed as standard deviation

s_{r}_{ }= +/- 0.006 g/100ml

Evaluating measurement uncertainty of
the breathalyzer

The standard
uncertainty of the MPE is calculated by MPE/SQRT(3) using the rectangular
probability factor for a maximum bound of error estimation. Hence, we have:

The
combined standard uncertainty u (Comb)
= SQRT(u(E)^{2} + s_{r}^{2})
and the expanded uncertainties which
are 2 x u(Comb) with 95% confidence for the two ranges are as follows:

The method traditionally practiced by most test laboratories in the estimation of measurement uncertainty is by the ISO GUM (ISO/IEC Guide 98-3) approach, which is quite tedious and time consuming to study and gather uncertainty contributions from each and every step of the test method. An alternative way of looking at uncertainty is to attempt to study the overall performance of the analytical procedure by involving replication of the whole procedure to give a direct estimate of the uncertainty for the final test result. This is the so-called ‘top-down’
approach.

We may use the data from inter-laboratory study, in-house validation or ongoing quality control. This approach is particularly appropriate where individual effects are poorly understood in terms of their quantitative theoretical models which are capable of predicting the behavior of analytical results for particular sample types. By this approach, it is suffice to consider reproducibility from inter-laboratory data or long-term within-laboratory precision as recommended by ISO 21748, ISO 11352 and ASTM D 6299.

However, one must be aware of
that by repeatedly analyzing a given sample over several times will not be a
good estimate of the uncertainty unless the following conditions are fulfilled:

There must be no perceptible bias or systematic
error in the procedure. That is to say
that the difference between the expected results and the true or reference
value must be negligible in relation to twice of the standard deviation with
95% confidence. This condition is usually (but not always) fulfilled in
analytical chemistry.

The replication has to explore all of the
possible variations in the execution of the method by engaging different
analysts on different days using different equipment on a similar sample. If
not, at least all of the variations of important magnitude are considered. Such
condition may not be easily met by replication under repeatability conditions
(i.e. repeated testing within laboratory), because such variations would be
laboratory-specific to a great extent.

The conclusion is that replicated
data by a single analyst on same equipment over a short period of time are not
sufficient for uncertainty estimation. If the top-down approach is to be
followed, we must obtain a good estimate of the long-term precision of the
analytical method. This can be done for
example, by studying the precision for a typical test method used as a QC
material over a reasonable period of time. We may also use a published
reproducibility standard deviation for the method in use, provided we document
proof that we are able to follow the procedure closely and competently.

The revised ILAC G8 document with reference to general guidelines on decision rules to issuance of a statement of conformance to a specification or compliance to regulatory limits has been recently published in September 2019. Being a guideline document, we can expect to be provided with various decision options for consideration but the final mode of application is entirely governed by our own decision with calculated risk in mind.

The Section 4.2 of the document gives a series of decision rules for consideration. In sub-section 4.2.1 which considers a binary statement (either pass or fail) for simple acceptance rule, it suggests a clear cut of test results to be given a pass or a fail without taking any risk of making a wrong decision into account, as long as the mean measured value falls inside the acceptance zone, as graphically shown in their Figure 3, whilst the reverse is also true:

In this manner, my view is that the maximum risk that the
laboratory is assuming when declaring conformity to a specification limit is
50% when the test result is on the dot of the specification limit. Would this be too high a risk for the test
laboratory to take?

When guard bands (w)
are used to reduce the probability of making an incorrect conformance decision
by placing them between the upper and lower tolerance specification limit (TL) values so that the
range between the upper and lower acceptance limits (AL) are narrower, we can simply let w = TL – AL = U where U is the expanded
uncertainty of the measurement.

By doing so, we can have one of the two situations, namely for a binary statement, see Figure 4 of the ILAC G8 reproduced below and for a non-binary statement where multiple terms may be expressed, see Figure 5 of the ILAC document.

In my opinion, the decision to give a pass for the
measurement found within the acceptance zone in Figure 4 is to the full
advantage of the laboratory (zero risk as long as the laboratory is confident
of its measurement uncertainty, U), but to state a clear “fail” in the case where
the measurement is within the w-zone
of the acceptance zone may not be received well by the customer who would
expect a “pass” based on the numerical value alone, which has been done all
this while. Shouldn’t the laboratory determine
and bear a certain percentage of risk by working out with the customer on its
acceptable critical measurement value where a certain portion of U lies outside
the upper and lower specification limits?

Similarly, the “Conditional Pass / Fail” in Figure 5 also needs
further clarification and explanations with the customer after considering a
certain percentage of risk to be borne for the critical measurement values to
be reported by the test laboratory. A
statement to the effect that “a conditional pass / fail with 95% confidence” might
be necessary to clarify the situation.

But from a commercial point of view, the local banker clearing a shipment’s letter of credit for payment with the requirement of a certificate of analysis to certify conformance to a quality specification laid down by the overseas buyer might not appreciate such statement format and might want to hold back the payment to the local exporter until his overseas principal agrees with this. Hence, it is advisable for the contracted laboratory service provider to explain and get written agreement with the local exporter on the decision rule in reporting conformity, so that the exporter in return can discuss such mode of reporting with the overseas buyers during the negotiation of a sales contract.

In my training workshops on decision rule for making statement of conformity after laboratory analysis of a product, some participants have found the subject of hypothesis testing rather abstract. But in my opinion, an understanding of the significance of type I and type II error in hypothesis testing does help to formulate decision rule based on acceptable risk to be taken by the laboratory in declaring if a product tested conforms with specification.

As we know well, a hypothesis is a statement that might, or
might not, be true until we put it to some statistical tests. As an analogy, a
graduate studying for a Ph.D. degree always carries out research works on a
certain hypothesis given by his or her supervisor. Such hypothesis may or may
not be proven true at the conclusion. Of
course, a breakthrough of the research in hand means that the original
hypothesis, called null hypothesis is not rejected.

In statistics, we set up the hypothesis in such as way that
it is possible to calculate the probability (p) of the data, or the test
statistic (such as Student’s t-tests) calculated from the data, given the
hypothesis, and then to make a decision about whether this hypothesis is to be
accepted (high p) or rejected (low p).

In conformity testing, we treat the specification or regulatory
limit given as the ‘true’ or certified value and our measurement value obtained
is the data for us to decide whether it conforms with the specification. Hence, our null hypothesis H_{o} can
be put forward as that there is no real difference between the measurement and
the specification. Any observed difference arises from random effects only.

To make decision rule on conformance in significance
testing, a choice about the value of the probability below which the null
hypothesis is rejected, and a significant difference concluded, must be made. This
is the probability of making an error of judgement in the decision.

If the probability that the data are consistent with the
null hypothesis H_{o} falls below a pre-determined low value (say, alpha
= 0.05 or 0.01), then the hypothesis is rejected at that probability. Therefore, a p<0.05 would mean that
we reject H_{o} with 95% level of confidence (or 5% error) if the
probability of the test statistic, given the truth of H_{o}, falls
below 0.05. In other words, if H_{o}
were indeed correct, less than 1 in 20 repeated experiments would fall outside
the limits. Hence, when we reject H_{o}, we conclude that there was a
significant difference between the measurement and the specification limit.

Gone are the days when we provide a conformance statement
when the measurement result is exactly on the specification value. By doing so, we are exposed to a 50% risk of
being found wrong. This is because we either
have assumed zero uncertainty in our measurement (which cannot be true) or the specification
value itself has encompassed its own uncertainty which again is not likely
true.

Now, in our routine testing, we would have established the
measurement uncertainty (MU) of test parameter such as contents of oil,
moisture, protein, etc. Our MU as an expanded uncertainty has been evaluated by
multiplying a coverage factor (normally k = 2) with the combined
standard uncertainty estimated, with 95% confidence. Assuming the MU is constant in the range of
values tested, we can easily determine the critical value that is not significantly
different from the specification value or regulatory limit by the use of
Student’s t-test. This is Case B
in the Fig 1 below.

So, if the specification has an upper or maximum limit, any test value smaller than the critical value below the specification estimated by the Student’s t-test can be ‘safely’ claimed to be within specification (Case A). On the other hand, any test value larger than this critical value has reduced our confidence level in claiming within specification (Case C). Do you want to claim that the test value does not meet with the specification limit although numerically it is smaller than the specification limit? This is the dilemma that we are facing today.

The ILAC Guide G8:2009 has suggested to state “not possible
to state compliance” in such situation.
Certainly, the client is not going to be pleased about it as he has used
to receive your positive compliance comments even when the measurement result
is exactly on the dot of the upper limit.

That is why the ISO/IEC 17025:2017 standard has required the
accredited laboratory personnel to discuss his decision rule with the clients
and get their written consent in the manner of reporting.

To minimize this awkward situation, one remedy is to reduce
your measurement uncertainty range as much as possible, pushing the critical
value nearer to the specification value. However, there is always a limit to do
so because uncertainty of measurement always exists. The critical reporting value is definitely
going to be always smaller than the upper limit numerically in the above
example.

Alternatively, you can discuss with the client and let him
provide you his acceptance limits. In this case, your laboratory’s risk is
minimized greatly as long as your reported value with its associated measurement
uncertainty is well within the documented acceptance limit because your client
has taken over the risk of errors in the product specification (i.e. customer
risk).

Thirdly, you may want to take a certain calculated commercial risk by having the upper uncertainty limit extended into the fail zone above the upper specification limit, due to commercial reasons such as keeping good relationship with an important customer. You may even choose to report a measurement value that is exactly on the specification limit as conformance. However, by doing so, you are taking a 50% risk to be found err in the issued statement of conformance. Is it worth taking such a risk? Always remember the actual meaning of measurement uncertainty (MU) which is to provide a range of values around the reported number of the test, covering the true value of the test parameter with 95% confidence.

In metrology, error is defined as “the result of measurement
minus a given true value of the measurand”.

What is ‘true value’?

ISO 3534-2:2006 (3.2.5) states that “Value which characterizes
a quantity or quantitative characteristic perfectly defined in the conditions
which exist when that quantity or quantitative characteristic is considered.”,
and the Note 1 that follows suggests that this true value is a theoretical
concept and generally cannot be known exactly.

In other words, when you are asked to analyze a certain analyte
concentration in a given sample, the analyte present has a value in the sample,
but what we do in the experiment is only trying to determine that particular value.
No matter how accurate is your method and how many repeats you have done on the
sample to get an average value, we would never be 100% sure at the end that
this average value is exactly the true value in the sample. We bound to have a measurement error!

Actually in our routine analytical works, we do encounter
three types of error, known as gross, random and systematic errors.

Gross errors leading to serious outcome with
unacceptable measurement is committed through making serious mistakes in the
analysis process, such as using a reagent titrant with wrong concentration for
titration. It is so serious that there is no alternative but abandoning the
experiment and making a completely fresh start.

Such blunders however, are easily recognized if there is a
robust QA/QC program in place, as the laboratory quality check samples with
known or reference value (i.e. true value) will produce erratic results.

Secondly, when the analysis of a test method is repeated a
large number of times, we get a set of variable data, spreading around the
average value of these results. It is
interesting to see that the frequency of occurrence of data further away from
the average value is getting fewer. This
is the characteristic of a random error.

There are many factors that can contribute to random error:
the ability of the analyst to exactly reproduce the testing conditions,
fluctuations in the environment (temperature, pressure, humidity, etc.),
rounding of arithmetic calculations, electronic signals of the instrument
detector, and so on. The variation of
these repeated results is referred to the precision of the method.

Systematic error, on the other hand, is a permanent deviation
from the true result, no matter how many repeats of analysis would not improve
the situation. It is also known as bias.

A color deficiency technician might persistently
overestimate the end point in a titration, the extraction of an analyte from a
sample may only be 90% efficient, or the on-line derivatization step before
analysis by gas chromatography may not be complete. In each of these cases, if
the results were not corrected for the problems, they would always be wrong,
and always wrong by about the same amount for a particular experiment.

How do we know that we have a systematic error in our
measurement?

It can be easily estimated by measuring a reference material
a large number of times. The difference
between the average of the measurements and the certified value of the
reference material is the systematic error. It is important to know the sources
of systematic error in an experiment and try to minimize and/or correct for
them as much as possible.

If you have tried your very best and the final average
result is still significantly different from the reference or true value, you
have to correct the reported result by multiplying it with a certain correction
factor. If R is the recovery factor which is
calculated by dividing your average test result by the reference or true value,
the correction factor is 1/R.

Today, there is another statistical term in use. It is ‘trueness’.

The measure of truenessis usually expressed in terms
of bias.

Trueness in ISO 3534-2:2006 is defined as “The closeness of
agreement between the expectation of a test result or a measurement result and
a true value.” whilst ISO 15195:2018 defines trueness as “Closeness of
agreement between the average value obtained from a large series of results of
measurements and a true value.”. The definition of ISO 15195 is quite similar
to those of ISO 15971:2008 and ISO 19003:2006.
The ISO 3534-2 definition includes a note that in practice, an “accepted
reference value” can be substituted for the true value.

Is there a difference between ‘accuracy’ and ‘trueness’?

The difference between ‘accuracy’ and ‘trueness’ is shown in
their respective ISO definition.

ISO 3534-2:2006 (3.3.1) defines ‘accuracy’ as “closeness of
agreement between a test result or measurement result and true
value”, whilst the same standard in (3.2.5) defines ‘trueness’ as “closeness of
agreement between the expectation of a test result or measurement
result and true value”. What does the
word ‘expectation’ mean here? It
actually refers to the average of the test result, as given in the definition
of ISO 15195:2018.

Hence, accuracy is a qualitative parameter whilst trueness
can be quantitatively estimated through repeated analysis of a sample with
certified or reference value.

References:

ISO 3534-2:2006 “Statistics – Vocabulary and symbols – Part 2:
Applied statistics”

ISO 15195:2018 “Laboratory medicine – Requirements for the
competence of calibration laboratories using reference measurement procedures”

In the next blog, we shall discuss how the uncertainty of bias is evaluated. It is an uncertainty component which cannot be overlooked in our measurement uncertainty evaluation, if present.

Why
measurement uncertainty is important in analytical chemistry?

Conducting a laboratory analysis is to make informed
decisions on the samples drawn. The
result of an analytical measurement can be deemed incomplete without a
statement (or at least an implicit knowledge) of its uncertainty. This is because we cannot make a valid
decision based on the result alone, and nearly all analysis is conducted to
inform a decision.

We know that the uncertainty of a result is a parameter that
describes a range within which the value of the quantity being measured is
expected to lie, taking into account all sources of error, with a stated degree
of confidence (usually 95%). It characterizes
the extent to which the unknown value of the targeted analyte is known after
measurement, taking account of the given information from the measurement.

With a knowledge of uncertainty in hand, we can make the
following typical decisions based on analysis:

Does this particular laboratory have the capacity
to perform analyses of legal and statutory significance?

Does this batch of pesticide formulation contain
less than the maximum allowed concentration of an impurity?

Does this batch of animal feed contain at least
the minimum required concentration of profit (protein + fat)?

How pure is this batch of precious metal?

The figure below shows a variety of instances affecting decisions about compliance with externally imposed limits or specifications. The error bars can be taken as expanded uncertainties, effectively intervals containing the true value of the concentration of the analyte with 95% confidence.

We can make the following observations from the above illustration:

Result A clearly indicates the test
result is below the limit, as even the extremity of the uncertainty interval is
below the limit,

Result B is below the limit but the upper
end of the uncertainty is above the limit, so we not sure if the true value is
below the limit.

Result C is above the limit but the lower
end of the uncertainty is below the limit, so we are not sure that the true
value is above.

What conclusions can we draw from the equal
results D and E? Both results are above the limit but, while D
is clearly above the limit, E is not so because the greater uncertainty
interval extends below the limit.

In short, we have to make decisions on how to act upon
results B, C and E.
What is the level of risk that can be afforded to assume the test result
is in conformity with the stated specification or in compliance with the
regulatory limit?

By making such a decision rule, we must be serious in the evaluation of measurement uncertainty, making sure that the uncertainty obtained is reasonable. If not, any decision made on conformity or compliance will be meaningless.

Sampling is a
process of selecting a portion of material (statistically termed as
‘population’) to represent or provide information about a larger body or
material. It is essential for the whole
testing and calibration processes.

The old ISO/IEC 17025:2005 standard defines sampling as “a defined
procedure whereby a part of a substance, material or product is taken to
provide for testing or calibration of a representative sample of the whole. Sampling may also be required by the
appropriate specification for which the substance, material or product is to be
tested or calibrated. In certain cases (e.g. forensic analysis), the sample may
not be representative but is determined by availability.”

In other words, sampling, in general, should be carried out in random manner but so-called judgement sampling is also allowed in specific cases. This judgement sampling approach involves using knowledge about the material to be sampled and about the reason for sampling, to select specific samples for testing. For example, an insurance loss adjuster acting on behalf of a cargo insurance company to inspect a shipment of damaged cargo during transit will apply a judgement sampling procedure by selecting the worst damaged samples from the lot in order to determine the cause of damage.

2. Types of samples to be differentiated

Field
sample Random sample(s) taken
from the material in the field. Several random
samples can be drawn and compositing the samples is done in the field before
sending it to the laboratory for analysis

Laboratorysample Sample(s) as prepared for sending to the laboratory, intended for inspection or testing.

Test sample A sub-sample, which is a selected portion of the laboratory sample, taken for laboratory analysis.

3. Principles of sampling

Randomization

Generally
speaking, random sampling is a method of selection whereby each possible member
of a population has an equal chance of being selected so that unintended bias
can be minimized. It provides an unbiased estimate of the population parameters
on interest (e.g. mean), normally in terms of analyte concentration.

Representative
samples

“Representative”
refers to something like “sufficiently like the population to allow inferences
about the population”. By taking a
single sample through any random process may not be necessary to have
representative composition of the bulk.
It is entirely possible that the composition of a particular sample
randomly selected may be completely unlike the bulk composition, unless the
population is very homogeneous in its composition distribution (such as
drinking water).

Remember
the saying that the test result is no better than the sample that it is based
upon. Sample taken for analysis should
be as representative of the sampling target as possible. Therefore, we must take the sampling variance
into serious consideration. The larger the sampling variance, the more likely
it is that the individual samples will be very different from the bulk.

Hence, in
practice, we must carry out representative sampling which involves obtaining
samples which are not only unbiased, but which also have sufficiently small
variance for the task in hand. In other words, we need to decide on the number
of random samples to be collected in the field to provide smaller sampling
variance in addition to choosing randomization procedures that provide unbiased
results. This is normally decided upon
information such as the specification limits and uncertainty expected.

Composite
samples

Often it
is useful to combine a collection of field samples into a single homogenized
laboratory sample for analysis. The measured value for the composite laboratory
sample is then taken as an estimate of the mean value for the bulk material.

It is important to note also that the importance of a sound sub-sampling process in the laboratory cannot be over emphasized. Hence, there must be a SOP prepared to guide the laboratory analyst to draw the test sample for measurement from the sample that arrives at the laboratory.

4. Sampling uncertainty

Today, sampling
uncertainty is recognized as an important contributor to the measurement uncertainty
associated with the reported results.

It is to be noted
that sampling uncertainty cannot be estimated as a standalone identity. The
analytical uncertainty has to be evaluated at the same time. For a fairly homogeneous population, a
one-factor ANOVA (Analysis of Variance) method will be suffice to estimate the
overall measurement uncertainty based on the between- and within-sample
variance. See https://consultglp.com/2018/02/19/a-worked-example-to-estimate-sampling-precision-measuremen-uncertainty/

However, for
heterogeneous population such as soil in a contaminated land, sample location
variance in addition to sampling variance to be taken into account. More
complicated calculations involve the application of the two-way ANOVA
technique. An EURACHEM’s worked example
can be found at the website: https://consultglp.com/2017/10/10/verifying-eurachems-example-a1-on-sampling-uncertainty/

Today
there is a dilemma for an ISO/IEC 17025 accredited laboratory service provider
in issuing a statement of conformity with specification to the clients after
testing, particularly when the analysis result of the test sample is close to
the specified value with its upper or lower measurement uncertainty crossing
over the limit. The laboratory manager has to decide on the level of risk he is
willing to take in stating such conformity.

However,
there are certain trades which buy goods and commodities with a given tolerance
allowance against the buying specification. A good example is in the trading of
granular or pelletized compound fertilizers which contain multiple primary nutrients
(e.g. N, P, K) in each individual granule.
A buyer usually allows some permissible 2- 5% tolerance on the buying
specification as a lower limit to the declared value to allow variation in the
manufacturing process. Some government departments of agriculture even allow up
to a lower 10% tolerance limit in their procurement of compound fertilizers
which will be re-sold to their farmers with a discount.

Given
the permissible lower tolerance limit, the fertilizer buyer has taken his own
risk of receiving a consignment that might be below his buying specification. This
is rightly pointed out in the Eurolab’s Technical Report No. 01/2017 “Decision rule applied to conformity
assessment” that by giving a tolerance limit above the upper specification
limit, or below the lower specification limit, we can classify this as the
customer’s or consumer’s risk. In
hypothesis testing context, we say this is a type II (beta-) error.

What will be the decision rule of test
laboratory in issuing its conformity statement under such situation?

Let’s
discuss this through an example.

A
government procurement department purchased a consignment of 3000 bags of
granular compound fertilizer with a guarantee of available plant nutrients
expressed as a percentage by weight in it, e.g. a NPK of 15-15-15 marking on
its bag indicates the presence of 15% nitrogen (N), 15% phosphorus (P_{2}O_{5})
and 15% potash (K_{2}O) nutrients.
Representative samples were drawn and analyzed in its own fertilizer
laboratory.

In
the case of potash (K_{2}O) content of 15% w/w, a permissible tolerance
limit of 13.5% w/w is stated in the tender document, indicating that a
fertilizer chemist can declare conformity at this tolerance level. The
successful supplier of the tender will be charged a calculated fee for any
specification non-conformity.

Our
conventional approach of decision rules has been based on the comparison of
single or interval of conformity limits with single measurement results. Today, we have realized that each test result
has its own measurement variability, normally expressed as measurement
uncertainty with 95% confidence level.

Therefore,
it is obvious that the conventional approach of stating conformity based on a
single measurement result has exposed the laboratory to a 50% risk of having
the true (actual) value of test parameter falling outside the given tolerance
limit, rendering it to be non-conformance! Is the 50% risk bearable by the test
laboratory?

Let
say the average test result of K_{2}O content of this fertilizer sample
was found to be 13.8+0.55%w/w. What
is the critical value for us in deciding on conformity in this particular case
with the usual 95% confidence level? Can we declare the result of 13.8%w/w found
to be in conformity with specification referencing to its given tolerance limit
of 13.5%w/w?

Let
us first see how the critical value is estimated. In hypothesis testing, we make the following
hypotheses:

H_{o} : Target tolerance value > 13.5%w/w

H_{1} : Target tolerance value < 13.5%w/w

Use
the following equation with an assumption that the variation of the laboratory
analysis result agrees with the normal or Gaussian probability distribution:

where

mu is the tolerance value for the specification, i.e. 13.5%,

x(bar) , the critical value with 95% confidence (alpha- = 0.05),

z, the z -score
of -1.645 for H_{1}’s one-tailed test, and

u, the standard uncertainty of the test, i.e. U/2 = 0.55/2 or 0.275

By calculation, we have the critical value x(bar) = 13.95%w/w, which, statistically speaking, was not significantly different from 13.5%w/w with 95% confidence.

Assuming
the measurement uncertainty remains constant in this measurement region, such
13.95%w/w minus its lower uncertainty
U of 0.55%w/w would give 13.40% which has
(13.5-13.4) or 0.1%w/w K_{2}O amount below the lower tolerance limit,
thus exposing some 0.1/(2×0.55) or 9.1% risk.

When
the reported test result of 13.8%w/w has an expanded U of 0.55%w/w, the range of measured values
would be 13.25 to 14.35%w/w, indicating that there would be (13.50-13.25) or 0.25%w/w
of K_{2}O amount below the lower tolerance limit, thus exposing some 0.25/(2×0.55)
or 22.7% risk in claiming conformity to the specification limit with reference
to the tolerance limit given.

Visually, we can present these situations in the following sketch with U = 0.55%w/w:

The
fertilizer laboratory manager thus has to make an informed decision rule on
what level of risk that can be bearable to make a statement of conformity. Even
the critical value of 13.95%w/w estimated by the hypothesis testing has an
exposure of 9.1% risk instead of the expected 5% error or risk. Why?

The
reason is that the measurement uncertainty was traditionally evaluated by
two-tailed (alpha- = 0.025) test under normal probability distribution with a
coverage factor of 2 whilst the hypothesis testing was based on the one-tailed
(alpha- = 0.05) test with a z-score of 1.645.

To reduce the risk of testing laboratory in issuing statement of conformity to zero, the laboratory manager may want to take a safe bet by setting his critical reporting value as (13.5%+0.55%) or 14.05%w/w so that its lower uncertainty value is exactly 13.5%w/w. Barring any evaluation error for its measurement uncertainty, this conservative approach will let the test laboratory to have practically zero risk in issuing its conformity statement.

It
may be noted that the ISO/IEC 17025:2017 requires the laboratory to communicate
with the customers and clearly spell out its decision rule with the clients
before undertaking the analytical task. This is to avoid any unnecessary
misunderstanding after issuance of test report with a statement of conformity
or non-conformity.

Dilemmas
in making decision rules for conformance testing

In carrying out routine
testing on samples of commodities and products, we normally encounter requests
by clients to issue a statement on the conformity of the test results against
their stated specification limits or regulatory limits, in addition to standard
reporting.

Conformance testing, as
the term suggests, is testing to determine whether a product or just
a medium complies with the requirements of a product specification, contract,
standard or safety regulation limit. It
refers to the issuance of a compliance statement to customers by the test /
calibration laboratory after testing.
Examples of statement can be:
Pass/Fail; Positive/Negative; On specification/Off specification.

Generally, such statements of conformance are issued after testing, against a target value with a certain degree of confidence. This is because there is always an element of measurement uncertainty associated with the test result obtained, normally expressed as X +/- U with 95% confidence.

It has been our usual practice
in all these years to make direct
comparison of measurement value with the specification or regulatory limits,
without realizing the risk involved in making such conformance statement.

For example, if the
specification minimum limit of the fat content in a product is 10%m/m, we would
without hesitation issue a statement of conformity to the client when the
sample test result is reported exactly as 10.0%m/m, little realizing that there
is a 50% chance that the true value of the analyte in the sample analyzed lies outside the limit! See Figure 1 below.

In here, we might have made an
assumption that the specification limit has taken measurement uncertainty in
account (which is not normally true), or, our measurement value has zero uncertainty which is also untrue.
Hence, by knowing the fact that there is a presence of uncertainty in all
measurements, we are actually taking some 50% risk to allow the actual true
value of the test parameter to be found outside the specification while making
such conformity statement.

Various guides published by learned professional organizations like ILAC, EuroLab and Eurachem have suggested various manners to make decision rules for such situation. Some have proposed to add a certain estimated amount of error to the measurement uncertainty of a test result and then state the result as passed only when such error added with uncertainty is more than the minimum acceptance limit. Similarly, a ‘fail’ statement is to be made for a test result when its uncertainty with added estimated error is less than the minimum acceptance limit.

The aim of adding an
additional estimated error is to make sure “safe” conclusions concerning
whether measurement errors are within acceptable limits. See
Figure 2 below.

Others have suggested to make
decision consideration only based on the measurement uncertainty found
associated with the test result without adding an estimated error. See Figure 3 below:

This is to ensure that if
another lab is tasked with taking the same measurements and using the same
decision rule, they will come to the similar conclusion about a “pass” or
“fail”, in order to avoid any undesirable implication.

However, by doing so, we are
faced with a dilemma on how to explain to the client who is a layman on the
rationale to make such pass/fail statement.

For discussion sake, let say we have got a mean result of the fat content as 10.30 +/- 0.45%m/m, indicating that the true value of the fat lies between the range of 9.85 – 10.75%m/m with 95% confidence. A simple calculation tells us that there is a 15% chance that the true value is to lie below the 10%m/m minimum mark. Do we want to take this risk by stating the result has conformed with the specification? In the past, we used to do so.

In fact, if we were to carry
out a hypothesis (or significance) testing, we would have found that the mean
value of 10.30%m/m found with a standard uncertainty of 0.225% (obtained by
dividing 0.45% with a coverage factor of 2) was not significantly different
from the target value of 10.0%m/m, given a set type I error (alpha-) of
0.05. So, statistically speaking, this
is a pass situation. In this sense, are we
safe to make this conformity
statement? The decision is yours!

Now, the opposite is also very true.

Still on the same example, a hypothesis testing would show that an average result of 9.7%m/m with a standard uncertainty of 0.225%m/m would not be significantly different from the target value of 10.0%m/m specification with 95% confidence. But, do you want to declare that this test result conforms with the specification limit of 10.0%m/m minimum? Traditionally we don’t. This will be a very safe statement on your side. But, if you claim it to be off-specification, your client may not be happy with you if he understands hypothesis testing. He may even challenge you for failing his shipment.

In fact, the critical value of
9.63%m/m can be calculated by the hypothesis testing for the sample analyzed to
be significantly different from 10.0%.
That means any figure lower than 9.63%m/m can then be confidently
claimed to be off specification!

Indeed, these are the challenges faced by third party testing providers today with the implementation of new ISO/IEC 17025:2017 standard.

To ‘inch’ the mean measured result nearer to the specification limit from either direction, you may want to review your measurement uncertainty evaluation associated with the measurement. If you can ‘improve’ the uncertainty by narrowing the uncertainty range, your mean value will come closer to the target value. Of course, there is always a limit for doing so.

Therefore you have to make decision rules to address the risk you can afford to take in making such statement of conformance or compliance as requested. Also, before starting your sample analysis and implementing these rules, you must communicate and get a written agreement with your client, as required by the revised ISO/IEC 17025 accreditation standard.

## Recent Comments