Data collected randomly are usually normally distributed,
particularly on a large sample size, being symmetric about the mean value of
the data. In a graphic form, it will appear like a bell-shaped curve, showing
the data nearer to the mean are more frequent in occurrence than those far away
from the mean. The width of the curve
varies, depending on the standard deviation of the data. It is ‘flatter’ when the standard deviation
is larger, indicating that the set of data is less precise.
MS Excel spreadsheet is a good tool to show the above
For example, we can construct various normal curves using
Excel, describing the flash points in deg C of a certain solvent with a mean
equal to 72 deg C with three different standard deviations, 2.6, 1.6 and 1.1
deg C obtained by analysts A, B and C, respectively. The Excel solution is shown below.
First, we enter numbers from 67 deg C to 77 deg C with an
interval of 0.5 deg C into column A, and key in Excel function “=NORM.DIST(A4,$B$1,$B$2,0)”
for Analyst A in cell B4. Similarly we enter functions “=NORM.DIST(A4,$C$1,$C$2,0)”
and “=NORM.DIST(A4,$D$1,$D$2,0)” for Analyst B and C, respectively. A click-and-drag is then performed on all the
What does Excel function NORM.DIST do? The function “=NORM.DIST(x,mean,standard_dev,cummulative)”
with cumulative = 0 tells Excel to calculate the height of the curve at the
number x. So, expression
“=NORM.DIST(67,72,2.6,0)” keyed in any cell
gives 0.024. We may also key in “FALSE”
in the fourth position of the function instead of “0” to get the same outcome.
We can then plot the normal curves with different standard deviations in usual manner by clicking Insert –> Scatter(X,Y) with smooth lines, as shown below:
The above diagram shows that more precise data (i.e. smaller
standard deviations) gives rise to a narrower
but taller curve.
On the other hand, suppose we wished to know the probability
(or chance) in terms of percentage of flash points which is smaller than 76 deg
C, we use “=NORM.DIST(76,72,2.6,1)” keyed in any
cell to give 0.938, which is the percent that are smaller than or equal to 76
deg C. Notice that the “1” or “TRUE” in
the fourth position of the NORM.DIST function tells Excel to accumulate the
area from 76 deg C to the left. It is the cumulative sum from a low 67 deg C to
76 deg C in this instance.
This can also be interpreted as the probability of finding a value less than or equal to 76 deg C. There would be therefore 1 – 0.938 = 0.062 or 6.2% chance that are larger than 76 deg C.
A professionally run test laboratory must have a set of
internal quality control or check (IQC) procedures in place. Regrettably I have
noticed that many accredited chemical laboratories do not institute such IQC system
in their routine works.
The purpose of IQC is to ensure as far as possible that the magnitude of errors affecting the analytical system is not changing during its routine use since method validation or verification process. By not having any IQC system in place, the analyst would not be able to state with confidence that the test results generated for that particular batch of samples are precise, accurate and fit for purpose.
During method validation, we have estimated the uncertainty
of the method and showed that it is fit for purpose. Therefore, when the method
is put in routine use, every run of analysis should be checked to show that the
errors of measurement are probably no larger than they were at validation
time. Even when a standardized method is
used for analysis, we have to demonstrate that our laboratory’s precision is no
worse than the stated repeatability of the method.
For this IQC purpose, we can employ the concept of
statistical control, which means in general that some critical feature of the
system is behaving like a normally distributed variable. How are we going to do it?
For chemical analysis, we can add one or more “control materials
or samples” to the run of test methods.
These control materials are treated throughout in exactly the same
manner as the test materials, from the weighing of the test portion to the
final measurement. Of course, the
control materials ideally must be of the same type as the materials for which
the analytical system was validated, in respect of matrix composition and
By doing so, we treat the control materials as a surrogate
and their behavior is a proper indicator of the performance of the system. We can plot the results obtained in
successive runs on a control chart for visual inspection on its moving trend
over time. The control lines are determined
by run-to-run intermediate precision of the data collected. Intermediate precision, by definition, is the pooled
standard deviation of a number of successive runs in the same laboratory with inevitable
changing measurement conditions (such as different analysts, instruments, newly
prepared reagents, environmental variations, etc.) over time.
In addition to classical analytical methods, we have several instruments that are helpful in our routine laboratory analysis. Examples are aplenty, such as pH meter, dissolved oxygen meter, turbidity meter, Conductivity meter, UV-visible spectrometer, FT-IR spectrophotometer, etc. Some are being used for in-situ measurements in the field. Hence, it is important to estimate their respective measurement uncertainty.
Most measuring instruments are
generally characterized by:
(depending on the precision of its measurement grading, such as Class A and
Class B of burette, etc)
Discrimination threshold in identification
Resolution of displaying device
measured by drifting of its graded measurement
the uncertainty of readings from a measuring instrument, we look for two basic uncertainty
permissible error provided by the supplier.
repeatability of measuring instrument
Maximum permissible error (MPE)
By VIM definition, MPE is an extreme value of measurement error, with respect to a known reference quantity value, permitted by specifications or regulations for a given measurement, measuring instrument, or measuring system. It is the ‘best’ accuracy confirmed by a calibration and specified by the manufacturer of the instrument during the warranty period.
MPE data can always be found in the
manufacturer’s manual under the instrument specification. It is usually expressed
in one of the following manners:
the MPE is constant throughout the instrument indications, it is expressed as:
MPE = +/-a
where a is a given value for its unit.
For example, a glass thermometer with a measuring range of 0 – 50oC with sub-divided units of 0.1oC, MPE = +/-0.2oC
MPE varies with a change of instrument indications following a regression line,
the maximum error tolerance can be a given relation as follows:
MPE = +/-(a + bx)
is a measured value.
the measuring instrument uses a constant relative standard deviation RSD, its MPE can be expressed as:
MPE = +/-RSD.x
of measuring instrument
Repeatability is the closeness of the agreement between the results of successive measurements of the same measure carried out under the same conditions of measurement, being taken by a single person or instrument on the same item, under the same conditions, and in a short period of time. Indeed, repeatability is a measure of instrument indicator’s variation under successive measurement exercise. It is expressed as sr, the standard deviation of a series of repeated measurements.
A breathalyzer is a device for estimating blood alcohol content (BAC) from a breath sample. A given brand breathalyzer has the following performance data:
BAC < 0.20 g/100ml MPE = +/- 0.025 g/100ml
BAC 0.20 – 0.40 g/100ml MPE = +/- 0.04 g/100ml
expressed as standard deviation
sr= +/- 0.006 g/100ml
Evaluating measurement uncertainty of
uncertainty of the MPE is calculated by MPE/SQRT(3) using the rectangular
probability factor for a maximum bound of error estimation. Hence, we have:
The method traditionally practiced by most test laboratories in the estimation of measurement uncertainty is by the ISO GUM (ISO/IEC Guide 98-3) approach, which is quite tedious and time consuming to study and gather uncertainty contributions from each and every step of the test method. An alternative way of looking at uncertainty is to attempt to study the overall performance of the analytical procedure by involving replication of the whole procedure to give a direct estimate of the uncertainty for the final test result. This is the so-called ‘top-down’
We may use the data from inter-laboratory study, in-house validation or ongoing quality control. This approach is particularly appropriate where individual effects are poorly understood in terms of their quantitative theoretical models which are capable of predicting the behavior of analytical results for particular sample types. By this approach, it is suffice to consider reproducibility from inter-laboratory data or long-term within-laboratory precision as recommended by ISO 21748, ISO 11352 and ASTM D 6299.
However, one must be aware of
that by repeatedly analyzing a given sample over several times will not be a
good estimate of the uncertainty unless the following conditions are fulfilled:
There must be no perceptible bias or systematic
error in the procedure. That is to say
that the difference between the expected results and the true or reference
value must be negligible in relation to twice of the standard deviation with
95% confidence. This condition is usually (but not always) fulfilled in
The replication has to explore all of the
possible variations in the execution of the method by engaging different
analysts on different days using different equipment on a similar sample. If
not, at least all of the variations of important magnitude are considered. Such
condition may not be easily met by replication under repeatability conditions
(i.e. repeated testing within laboratory), because such variations would be
laboratory-specific to a great extent.
The conclusion is that replicated
data by a single analyst on same equipment over a short period of time are not
sufficient for uncertainty estimation. If the top-down approach is to be
followed, we must obtain a good estimate of the long-term precision of the
analytical method. This can be done for
example, by studying the precision for a typical test method used as a QC
material over a reasonable period of time. We may also use a published
reproducibility standard deviation for the method in use, provided we document
proof that we are able to follow the procedure closely and competently.
When we repeat analysis of a sample several times, we get a spread of results surrounding its average value. This phenomenon gives rise to data precision, but provides no clue as to how close the results are to the true concentration of the analyte in the sample.
However, it is possible for a test method to produce precise results which are in very close agreement with one another but are consistently lower or higher than they should be. How do we know that? Well, this observation can be made when we carry out replicate analysis of a sample with a certified analyte value. In this situation, we know we have encountered a systematic error in the analysis.
The term “trueness” is generally referred to the closeness of agreement between the expectation of a test result or a measurement result and a true value or an accepted reference value. And, trueness is normally expressed in terms of bias. Hence, bias can be evaluated by comparing the mean of measurement results and an accepted reference value, as shown in the figure below.
Therefore, bias can be evaluated by carrying out repeat analysis of a suitable material containing a known amount of the analyte (i.e. reference value) mu, and is calculated as the difference between the average of the test results and the reference value:
We often express bias in a relative form, such as a percentage:
or as a ratio when we assess ‘recovery’ in an experiment:
The revised ILAC G8 document with reference to general guidelines on decision rules to issuance of a statement of conformance to a specification or compliance to regulatory limits has been recently published in September 2019. Being a guideline document, we can expect to be provided with various decision options for consideration but the final mode of application is entirely governed by our own decision with calculated risk in mind.
The Section 4.2 of the document gives a series of decision rules for consideration. In sub-section 4.2.1 which considers a binary statement (either pass or fail) for simple acceptance rule, it suggests a clear cut of test results to be given a pass or a fail without taking any risk of making a wrong decision into account, as long as the mean measured value falls inside the acceptance zone, as graphically shown in their Figure 3, whilst the reverse is also true:
In this manner, my view is that the maximum risk that the
laboratory is assuming when declaring conformity to a specification limit is
50% when the test result is on the dot of the specification limit. Would this be too high a risk for the test
laboratory to take?
When guard bands (w)
are used to reduce the probability of making an incorrect conformance decision
by placing them between the upper and lower tolerance specification limit (TL) values so that the
range between the upper and lower acceptance limits (AL) are narrower, we can simply let w = TL – AL = U where U is the expanded
uncertainty of the measurement.
By doing so, we can have one of the two situations, namely for a binary statement, see Figure 4 of the ILAC G8 reproduced below and for a non-binary statement where multiple terms may be expressed, see Figure 5 of the ILAC document.
In my opinion, the decision to give a pass for the
measurement found within the acceptance zone in Figure 4 is to the full
advantage of the laboratory (zero risk as long as the laboratory is confident
of its measurement uncertainty, U), but to state a clear “fail” in the case where
the measurement is within the w-zone
of the acceptance zone may not be received well by the customer who would
expect a “pass” based on the numerical value alone, which has been done all
this while. Shouldn’t the laboratory determine
and bear a certain percentage of risk by working out with the customer on its
acceptable critical measurement value where a certain portion of U lies outside
the upper and lower specification limits?
Similarly, the “Conditional Pass / Fail” in Figure 5 also needs
further clarification and explanations with the customer after considering a
certain percentage of risk to be borne for the critical measurement values to
be reported by the test laboratory. A
statement to the effect that “a conditional pass / fail with 95% confidence” might
be necessary to clarify the situation.
But from a commercial point of view, the local banker clearing a shipment’s letter of credit for payment with the requirement of a certificate of analysis to certify conformance to a quality specification laid down by the overseas buyer might not appreciate such statement format and might want to hold back the payment to the local exporter until his overseas principal agrees with this. Hence, it is advisable for the contracted laboratory service provider to explain and get written agreement with the local exporter on the decision rule in reporting conformity, so that the exporter in return can discuss such mode of reporting with the overseas buyers during the negotiation of a sales contract.
In my training workshops on decision rule for making statement of conformity after laboratory analysis of a product, some participants have found the subject of hypothesis testing rather abstract. But in my opinion, an understanding of the significance of type I and type II error in hypothesis testing does help to formulate decision rule based on acceptable risk to be taken by the laboratory in declaring if a product tested conforms with specification.
As we know well, a hypothesis is a statement that might, or
might not, be true until we put it to some statistical tests. As an analogy, a
graduate studying for a Ph.D. degree always carries out research works on a
certain hypothesis given by his or her supervisor. Such hypothesis may or may
not be proven true at the conclusion. Of
course, a breakthrough of the research in hand means that the original
hypothesis, called null hypothesis is not rejected.
In statistics, we set up the hypothesis in such as way that
it is possible to calculate the probability (p) of the data, or the test
statistic (such as Student’s t-tests) calculated from the data, given the
hypothesis, and then to make a decision about whether this hypothesis is to be
accepted (high p) or rejected (low p).
In conformity testing, we treat the specification or regulatory
limit given as the ‘true’ or certified value and our measurement value obtained
is the data for us to decide whether it conforms with the specification. Hence, our null hypothesis Ho can
be put forward as that there is no real difference between the measurement and
the specification. Any observed difference arises from random effects only.
To make decision rule on conformance in significance
testing, a choice about the value of the probability below which the null
hypothesis is rejected, and a significant difference concluded, must be made. This
is the probability of making an error of judgement in the decision.
If the probability that the data are consistent with the
null hypothesis Ho falls below a pre-determined low value (say, alpha
= 0.05 or 0.01), then the hypothesis is rejected at that probability. Therefore, a p<0.05 would mean that
we reject Ho with 95% level of confidence (or 5% error) if the
probability of the test statistic, given the truth of Ho, falls
below 0.05. In other words, if Ho
were indeed correct, less than 1 in 20 repeated experiments would fall outside
the limits. Hence, when we reject Ho, we conclude that there was a
significant difference between the measurement and the specification limit.
Gone are the days when we provide a conformance statement
when the measurement result is exactly on the specification value. By doing so, we are exposed to a 50% risk of
being found wrong. This is because we either
have assumed zero uncertainty in our measurement (which cannot be true) or the specification
value itself has encompassed its own uncertainty which again is not likely
Now, in our routine testing, we would have established the
measurement uncertainty (MU) of test parameter such as contents of oil,
moisture, protein, etc. Our MU as an expanded uncertainty has been evaluated by
multiplying a coverage factor (normally k = 2) with the combined
standard uncertainty estimated, with 95% confidence. Assuming the MU is constant in the range of
values tested, we can easily determine the critical value that is not significantly
different from the specification value or regulatory limit by the use of
Student’s t-test. This is Case B
in the Fig 1 below.
So, if the specification has an upper or maximum limit, any test value smaller than the critical value below the specification estimated by the Student’s t-test can be ‘safely’ claimed to be within specification (Case A). On the other hand, any test value larger than this critical value has reduced our confidence level in claiming within specification (Case C). Do you want to claim that the test value does not meet with the specification limit although numerically it is smaller than the specification limit? This is the dilemma that we are facing today.
The ILAC Guide G8:2009 has suggested to state “not possible
to state compliance” in such situation.
Certainly, the client is not going to be pleased about it as he has used
to receive your positive compliance comments even when the measurement result
is exactly on the dot of the upper limit.
That is why the ISO/IEC 17025:2017 standard has required the
accredited laboratory personnel to discuss his decision rule with the clients
and get their written consent in the manner of reporting.
To minimize this awkward situation, one remedy is to reduce
your measurement uncertainty range as much as possible, pushing the critical
value nearer to the specification value. However, there is always a limit to do
so because uncertainty of measurement always exists. The critical reporting value is definitely
going to be always smaller than the upper limit numerically in the above
Alternatively, you can discuss with the client and let him
provide you his acceptance limits. In this case, your laboratory’s risk is
minimized greatly as long as your reported value with its associated measurement
uncertainty is well within the documented acceptance limit because your client
has taken over the risk of errors in the product specification (i.e. customer
Thirdly, you may want to take a certain calculated commercial risk by having the upper uncertainty limit extended into the fail zone above the upper specification limit, due to commercial reasons such as keeping good relationship with an important customer. You may even choose to report a measurement value that is exactly on the specification limit as conformance. However, by doing so, you are taking a 50% risk to be found err in the issued statement of conformance. Is it worth taking such a risk? Always remember the actual meaning of measurement uncertainty (MU) which is to provide a range of values around the reported number of the test, covering the true value of the test parameter with 95% confidence.
Notes on outlier test statistics in
analytical method is repeated several times on a given sample, the measured
values nearer to the mean (or average) of the data set tend to occur more often
than those found further away from the mean value. This is the characteristic of analytical
chemistry following the normal probability distribution and the phenomenon is
known to be a measure of central tendency.
there are times and again that we notice some extremely low or high value(s)
which are visibly distant from the remainder of data. These values can be suspected to be outliers
which may be defined as observations in a set of data that appear to be
inconsistent with the remainder of that set.
obvious that outlying values generally have an appreciable influence on
calculated mean value and more influence on calculated standard deviation if
they are not examined carefully and removed if necessary.
must remember that random variation of analysis does generate occasional values
by chance. If so, these values are indeed part of the valid data and should
generally be included in any statistical calculations. However undesirable human error or other
deviation in the analytical process such as instrument failure may cause
outliers to appear from such faulty procedure.
Hence, it is important to have the effect of outliers minimized.
such effect, we have to find ways to identify outliers and distinguish them
from chance variation. There are many outlier tests available which allow
analysts to inspect suspect data and if necessary correct or remove erroneous
values. These test statistics assume underlying a normal distribution and the
test sample is relatively homogeneous.
outlier testing needs careful consideration where the population
characteristics are not known, or, worse, known to be non-normal. For example, if the data were Poisson
distributed, many valid high values might be incorrectly rejected because they
appear inconsistent with a normal distribution. It is also crucial to consider
whether outlying values might represent genuine features of the population.
approach is to use robust statistics which are not greatly affected by the
presence of occasional extreme values and will still perform well when no
outliers are present.
The outlier tests are aplenty for your disposal: Dixon’s, Grubb’s, Levene’s, Cochran’s, Thompson’s, Bartlett’s, Hartley’s, Brown-Forsythe’s, etc. They are quite simple to be applied on a set of analytical data. However, to be meaningful in the outcome, the number of data examined should be large rather than just a few.
outlier tests are only to provide us with objective criteria or signal to
investigate the cause; usually, outliers should not be removed from the data
set solely because of the results of a statistical test. Instead, the tests highlight the need to
inspect the data more closely in the first instance.
guidelines for acting on outlier tests on analytical data, based on the outlier
testing and inspection procedure listed in ISO 5725 Part 2 Accuracy
(trueness and precision) of measurement methods and results — Part 2:
Basic method for the determination of repeatability and reproducibility of a
standard measurement methodare as follows:
test at the 95% and the 99% confidence level
outliers should be investigated and any errors corrected
significant at the 99% level may be rejected unless there is a technical reason
to retain them
significant only at the 95% level (normally called ‘stragglers’) should be
rejected only if there is an additional technical reason to do so
testing and rejection are permissible, but not to the extent of rejecting a
large proportion of the data.
procedure leads to results which are not so seriously biased by rejection of
chance extreme values, but are rather relatively insensitive to outliers at the
frequency commonly encountered in measurement work. The application of robust statistics might be
a better choice.
In statistics, the average (mean) and sample
standard deviation are known as “estimators” of the population mean and
standard deviation. These estimates improve as the number of data collected
increases. As we know, the use of these
statistics requires data that are normally distributed, and for confidence
intervals employing the standard deviation of the mean, this tends to be so.
However, real experimental data may be so
distributed but often the distribution will contain data that are seriously
flawed. They can be extremely low or high values. If we can identify such data
and remove them from further consideration, then all is well and good.
Sometimes this is possible, but not always.
This is a problem as a single rouge value can seriously upset our calculations
of the mean and standard deviation.
Estimators that can tolerate a certain amount
of ‘bad’ data are called robust estimators, and can be used when it is not
possible to ensure that the data being processed has the correct
For example, we can use the middle value of a set
of ascending data (called median) as a robust estimator of the mean, and the
range of the middle 68% of the data (called normalized interquartile range IQR)
as a robust estimator of the standard deviation.
By definition, the median is the middle value
of a set of data when arranged in ascending order. If there are odd number of
data, then the median is the unique middle datum. If there are an even number, then the median
is the average of the middle two data.
Median is robust, because no matter how
outrageous one or more extreme values are, they are only individual values at
the end of a list. Their magnitude is immaterial.
The interquartile range (IQR) is a measure of where
the “middle fifty” is in a data set, i.e. the range of values that spans the
middle 50% of data. Three quarters of
the IQR, known as the
normalized IQR, is an estimate of the
standard deviation. In other words, the interquartile range formula is the median
of the first quartile Q1 subtracted from that
of the third quartile Q3:
IQR = Q3 – Q1
A problem with the IQR is that it is unrealistic
to be used to calculate for small data sets, as we must have sufficient data to
define quartiles (sections of the ordered data that contain one-quarter of the
Another robust estimator of standard deviation
is median absolute deviation (MAD). It is a fairly simple estimate that can be implemented
easily in a spreadsheet. The MAD from the data set median is calculated by:
MAD = median (| xi
– median value |i=1,2,…n)
Robust methods have their place, particularly
when we must keep all the data together in, for example, an interlaboratory
comparison study where an outlying result from a laboratory cannot simply be
ignored. They are less strongly affected
by extreme values.
However, robust estimators are not really the
best statistics, and wherever possible the statistics appropriate to the
distribution of the data should be used.
So, when can we use these robust estimators?
Robust estimators can be considered to provide
good estimates of the parameters for the ‘good’ data in an outlier-contaminated
data set. They are appropriate when:
data are expected to be normally distributed. In here, robust statistics give
answers very close to ordinary statistics
data are expected to be normally distributed, but contaminated with occasional
spurious values which are regarded as unrepresentative or erroneous and
approximately symmetrically distributed around the population mean. Robust
estimators in here are less affected by these extreme values and hence are
Remember that robust estimators are not
recommended where the data set shows evidence of multi-modality or shows heavy
skewing, especially when it is expected to follow non-normal or skewed
distributions such as binomial and Poisson with low counts, chi-squared, etc.
which generate extreme values with reasonable likelihood.
In metrology, error is defined as “the result of measurement
minus a given true value of the measurand”.
What is ‘true value’?
ISO 3534-2:2006 (3.2.5) states that “Value which characterizes
a quantity or quantitative characteristic perfectly defined in the conditions
which exist when that quantity or quantitative characteristic is considered.”,
and the Note 1 that follows suggests that this true value is a theoretical
concept and generally cannot be known exactly.
In other words, when you are asked to analyze a certain analyte
concentration in a given sample, the analyte present has a value in the sample,
but what we do in the experiment is only trying to determine that particular value.
No matter how accurate is your method and how many repeats you have done on the
sample to get an average value, we would never be 100% sure at the end that
this average value is exactly the true value in the sample. We bound to have a measurement error!
Actually in our routine analytical works, we do encounter
three types of error, known as gross, random and systematic errors.
Gross errors leading to serious outcome with
unacceptable measurement is committed through making serious mistakes in the
analysis process, such as using a reagent titrant with wrong concentration for
titration. It is so serious that there is no alternative but abandoning the
experiment and making a completely fresh start.
Such blunders however, are easily recognized if there is a
robust QA/QC program in place, as the laboratory quality check samples with
known or reference value (i.e. true value) will produce erratic results.
Secondly, when the analysis of a test method is repeated a
large number of times, we get a set of variable data, spreading around the
average value of these results. It is
interesting to see that the frequency of occurrence of data further away from
the average value is getting fewer. This
is the characteristic of a random error.
There are many factors that can contribute to random error:
the ability of the analyst to exactly reproduce the testing conditions,
fluctuations in the environment (temperature, pressure, humidity, etc.),
rounding of arithmetic calculations, electronic signals of the instrument
detector, and so on. The variation of
these repeated results is referred to the precision of the method.
Systematic error, on the other hand, is a permanent deviation
from the true result, no matter how many repeats of analysis would not improve
the situation. It is also known as bias.
A color deficiency technician might persistently
overestimate the end point in a titration, the extraction of an analyte from a
sample may only be 90% efficient, or the on-line derivatization step before
analysis by gas chromatography may not be complete. In each of these cases, if
the results were not corrected for the problems, they would always be wrong,
and always wrong by about the same amount for a particular experiment.
How do we know that we have a systematic error in our
It can be easily estimated by measuring a reference material
a large number of times. The difference
between the average of the measurements and the certified value of the
reference material is the systematic error. It is important to know the sources
of systematic error in an experiment and try to minimize and/or correct for
them as much as possible.
If you have tried your very best and the final average
result is still significantly different from the reference or true value, you
have to correct the reported result by multiplying it with a certain correction
factor. If R is the recovery factor which is
calculated by dividing your average test result by the reference or true value,
the correction factor is 1/R.
Today, there is another statistical term in use. It is ‘trueness’.
The measure of truenessis usually expressed in terms
Trueness in ISO 3534-2:2006 is defined as “The closeness of
agreement between the expectation of a test result or a measurement result and
a true value.” whilst ISO 15195:2018 defines trueness as “Closeness of
agreement between the average value obtained from a large series of results of
measurements and a true value.”. The definition of ISO 15195 is quite similar
to those of ISO 15971:2008 and ISO 19003:2006.
The ISO 3534-2 definition includes a note that in practice, an “accepted
reference value” can be substituted for the true value.
Is there a difference between ‘accuracy’ and ‘trueness’?
The difference between ‘accuracy’ and ‘trueness’ is shown in
their respective ISO definition.
ISO 3534-2:2006 (3.3.1) defines ‘accuracy’ as “closeness of
agreement between a test result or measurement result and true
value”, whilst the same standard in (3.2.5) defines ‘trueness’ as “closeness of
agreement between the expectation of a test result or measurement
result and true value”. What does the
word ‘expectation’ mean here? It
actually refers to the average of the test result, as given in the definition
of ISO 15195:2018.
Hence, accuracy is a qualitative parameter whilst trueness
can be quantitatively estimated through repeated analysis of a sample with
certified or reference value.
ISO 3534-2:2006 “Statistics – Vocabulary and symbols – Part 2:
ISO 15195:2018 “Laboratory medicine – Requirements for the
competence of calibration laboratories using reference measurement procedures”
In the next blog, we shall discuss how the uncertainty of bias is evaluated. It is an uncertainty component which cannot be overlooked in our measurement uncertainty evaluation, if present.