Training and consultancy for testing laboratories.

Archive for the ‘Basic statistics’ Category

Dilemmas in making decision rules for conformance testing

Dilemmas in making decision rules for conformance testing

In carrying out routine testing on samples of commodities and products, we normally encounter requests by clients to issue a statement on the conformity of the test results against their stated specification limits or regulatory limits, in addition to standard reporting.

Conformance testing, as the term suggests, is testing to determine whether a product or just a medium complies with the requirements of a product specification, contract, standard or safety regulation limit.  It refers to the issuance of a compliance statement to customers by the test / calibration laboratory after testing.  Examples of statement can be:  Pass/Fail; Positive/Negative; On specification/Off specification. 

Generally, such statements of conformance are issued after testing, against a target value with a certain degree of confidence.  This is because there is always an element of measurement uncertainty associated with the test result obtained, normally expressed as X +/- U with 95% confidence.

It has been our usual practice in all these years to make direct comparison of measurement value with the specification or regulatory limits, without realizing the risk involved in making such conformance statement.

For example, if the specification minimum limit of the fat content in a product is 10%m/m, we would without hesitation issue a statement of conformity to the client when the sample test result is reported exactly as 10.0%m/m, little realizing that there is a 50% chance that the true value of the analyte in the sample analyzed lies outside the limit!  See Figure 1 below.

In here, we might have made an assumption that the specification limit has taken measurement uncertainty in account (which is not normally true), or, our measurement value has zero uncertainty which is also untrue. Hence, by knowing the fact that there is a presence of uncertainty in all measurements, we are actually taking some 50% risk to allow the actual true value of the test parameter to be found outside the specification while making such conformity statement.

Various guides published by learned professional organizations like ILAC, EuroLab and Eurachem have suggested various manners to make decision rules for such situation. Some have proposed to add a certain estimated amount of error to the measurement uncertainty of a test result and then state the result as passed only when such error added with uncertainty is more than the minimum acceptance limit.  Similarly, a ‘fail’ statement is to be made for a test result when its uncertainty with added estimated error is less than the minimum acceptance limit. 

The aim of adding an additional estimated error is to make sure “safe” conclusions concerning whether measurement errors are within acceptable limits.   See Figure 2 below.

Others have suggested to make decision consideration only based on the measurement uncertainty found associated with the test result without adding an estimated error.  See Figure 3 below:

This is to ensure that if another lab is tasked with taking the same measurements and using the same decision rule, they will come to the similar conclusion about a “pass” or “fail”, in order to avoid any undesirable implication.

However, by doing so, we are faced with a dilemma on how to explain to the client who is a layman on the rationale to make such pass/fail statement.

For discussion sake, let say we have got a mean result of the fat content as 10.30 +/- 0.45%m/m, indicating that the true value of the fat lies between the range of 9.85 – 10.75%m/m with 95% confidence. A simple calculation tells us that there is a 15% chance that the true value is to lie below the 10%m/m minimum mark.  Do we want to take this risk by stating the result has conformed with the specification? In the past, we used to do so.

In fact, if we were to carry out a hypothesis (or significance) testing, we would have found that the mean value of 10.30%m/m found with a standard uncertainty of 0.225% (obtained by dividing 0.45% with a coverage factor of 2) was not significantly different from the target value of 10.0%m/m, given a set type I error (alpha-) of 0.05.  So, statistically speaking, this is a pass situation.  In this sense, are we safe to make this conformity statement?  The decision is yours!

Now, the opposite is also very true.

Still on the same example, a hypothesis testing would show that an average result of 9.7%m/m with a standard uncertainty of 0.225%m/m would not be significantly different from the target value of 10.0%m/m specification with 95% confidence. But, do you want to declare that this test result conforms with the specification limit of 10.0%m/m minimum? Traditionally we don’t. This will be a very safe statement on your side.  But, if  you claim it to be off-specification, your client may not be happy with you if he understands hypothesis testing. He may even challenge you for failing his shipment.

In fact, the critical value of 9.63%m/m can be calculated by the hypothesis testing for the sample analyzed to be significantly different from 10.0%.  That means any figure lower than 9.63%m/m can then be confidently claimed to be off specification!

Indeed, these are the challenges faced by third party testing providers today with the implementation of new ISO/IEC 17025:2017 standard.

To ‘inch’ the mean measured result nearer to the specification limit from either direction, you may want to review your measurement uncertainty evaluation associated with the measurement. If you can ‘improve’ the uncertainty by narrowing the uncertainty range, your mean value will come closer to the target value. Of course, there is always a limit for doing so.

Therefore you have to make decision rules to address the risk you can afford to take in making such statement of conformance or compliance as requested. Also, before starting your sample analysis and implementing these rules, you must communicate and get a written agreement with your client, as required by the revised ISO/IEC 17025 accreditation standard.

Sharing Excel calculations for type I error by Student’s t-test

Sharing Excel calculations for type I and II errors by z-test

Basis of decision rule on conformity testing

There are three fundamental types of risks associated with the uncertainty approach through making conformity or compliance decisions for tests which are based on meeting specification interval or regulatory limits.  Conformity decision rules can then be applied accordingly.

In summary, they are:

  1. Risk of false acceptance of a test result
  2. Risk of false rejection of a test result
  3. Shared risk

The basis of the decision rule is to determine an “Acceptance zone” and a “Rejection zone”, such that if the measurement result lies in the acceptance zone, the product is declared compliant, and, if it is in the rejection zone, it is declared non-compliant.  Hence, a decision rule documents the method of determining the location of acceptance and rejection zones, ideally including the minimum acceptable level of the probability that the value of the targeted analyte lies within the specification limits.

A straight forward decision rule that is widely used today is in a situation where a measurement implies non-compliance with an upper or lower specification limit if the measured value exceeds the limit by its expanded uncertainty, U

By adopting this approach, it should be emphasized that it is based on an assumption that the uncertainty of measurement is represented by a normal or Gaussian probability distribution function (PDF), which is consistent with the typical measurement results (being assumed the applicability of the Central Limit Theorem),

Current practices

When performing a measurement and subsequently making a statement of conformity, for example, in or out-of-specification to manufacturer’s specifications or Pass/Fail to a particular requirement, there can be only two possible outcomes:

  • The result is reported as conforming with the specification
  • The result is reported as not conforming with the specification

Currently, the decision rule is often based on direct comparison of measurement value with the specification or regulatory limits.  So, when the test result is found to be exactly on the dot of the specification, we would gladly state its conformity with the specification. The reason can be that these limits are deemed to have taken into account the measurement uncertainty (which is not normally true) or it has been assumed that the laboratory’s measurement value has zero uncertainty!  But, by realizing the fact that there is always a presence of uncertainty in all measurements, we are actually taking a 50% risk to have the actual or true value of the test parameter found outside the specification.  Do we really want to undertake such a high risky reporting? If not, how are we going to minimize our exposed risk in making such statement?

Decision rule and conformity testing

What is conformity testing?

Conformance testing is testing to determine whether a product, system or just a medium complies with the requirements of a product specification, contract, standard or safety regulation limit.  It refers to the issuance of a compliance statement to customers after testing.  Examples are:  Pass/Fail; Positive/Negative; On specs/Off specs, etc. 

Generally, statements of conformance are issued after testing, against a target value of the specification with a certain degree of confidence. It is usually applied in forensic, food, medical pharmaceutical, and manufacturing fields. Most QC laboratories in manufacturing industry (such as petroleum oils, foods and pharmaceutical products) and laboratories of government regulatory bodies regularly check the quality of an item against the stated specification and regulatory safety limits.

Decision rule involves measurement uncertainty

Why must measurement uncertainty be involved in the discussion of decision rule? 

To answer this, let us first be clear about the ISO definition of decision rule.  The ISO 17025:2017 clause 3.7 defines that: “Rule that describes how measurement uncertainty is accounted for when stating conformity with a specified requirement.”

Therefore, decision rule gives a prescription for the acceptance or rejection of a product based on consideration of the measurement result, its uncertainty associated, and the specification limit or limits.  Where product testing and calibration provide for reporting measured values, levels of measurement decision risk acceptable to both the customer and supplier must be prepared. Some statistical tools such as hypothesis testing covering both type I and type II errors are to be applied in decision risk assessment.

Decision rule and ISO/ IEC17025:2017

Notes on decision rule as per ISO/IEC 17025:2017 requirements


The revised ISO/IEC 17025:2017 laboratory accreditation standard introduces a new concept, i.e., “risk-based thinking” which requires the operator of an accredited laboratory to plan and implement actions to address possible risks and opportunities associated with the laboratory activities, including issuance a statement of conformity to product specification or a compliance statement against regulatory limits.

The risk-based approach to management system implementation is one in which the breadth and depth of the implementation of particular clauses is varied to best suit the perceived risk involved for that particular laboratory activity.

Indeed, the laboratory is responsible for deciding which risks and opportunities need to be addressed. The aims as stated in the ISO standard clause 8.5.1 are:

  1. to give assurance that the management system achieves its intended results;
  2. to enhance opportunities to achieve the purpose and objectives of the laboratory;
  3. to prevent, or minimize, undesired impacts or interfering elements to cause failures in the laboratory activities, and
  • to achieve improvement of the activities.

The decision rule as required in ISO/IEC 17025:2017

On the subject of decision rule for conformity testing, the word of ‘risk’ can be found in the following relevant clauses of this international standard:

Clause 7.1.3

When the customer requests a statement of conformity to a specification or standard for the test or calibration (e.g. pass/fail, in-tolerance/out-of-tolerance), the specification or standard and the decision rule shall be clearly defined.  Unless inherent in the requested specification or standard, the decision rule selected shall be communicated to, and agreed with the customer.”


When a statement of conformity to a specification or standard is provided,  the laboratory shall document the decision rule employed, taking into account the level of risk (such as false accept and false reject and statistical assumptions) associated with the decision rule employed and apply the decision rule.”


The laboratory shall report on the statement of conformity, such that the statement clearly identified:

  1.  to which results the statement of conformity applies;
  2. Which specifications, standards or part therefor are met or not met;
  3. The decision rule applied (unless it is inherent in the requested specification or standard).

From these specified requirements, it is obvious that clearly defined decision rules must be in place when the laboratory’s customer requests for inclusion of a statement of conformity on the specification in the test report after laboratory analysis.  Therefore, the tasks in front of the accredited laboratory operator are how the decision rules are going to be for a tested commodity or product, based on the laboratory’s own measurement uncertainty estimated, and how to communicate and convince the customers on its choice of reporting limits against the given specification or regulatory limits when issuing such conformity statement.

Proficiency testing – 2

Proficiency testing – what, why and how (Part II)

The Part 1 of this article series discussed the rationale to conduct proficiency testing (PT) programs and various proficiency assessment tools such as setting an assigned value and estimation of its standard deviation to reflect inter-laboratory variations. Let’s see how the scoring of PT results is made.

  1. The z-Score

The most common scoring system is the z-score for a proficiency test result xi, calculated as:


xa is the assigned value and

sigmapt is the standard deviation for proficiency assessment.

Those readers who are familiar with the normal probability distribution function should appreciate the use of this z-score which is to standardize all randomly distributed data to a standard normal distribution N(0,1) with mean = 0 and sigma2 = 1.

The conventional interpretation of z -scores is as follows:

  • A result that gives | z | ≤ 2 is considered to be acceptable;
  • A result that gives 2 < | z | < 3 is considered to give a warning signal;
  • A result that gives | z | ≥ 3 is considered to be unacceptable or unsatisfactory performance (or action signal).

Assuming all participants perform exactly in accordance with the performance requirements, then by the normal distribution, about 95% of values are then expected to be within two standard deviations of the mean value. In other words, there is only a 5% chance that a valid result would fall further than two standard deviation from the mean.

The probability of finding a valid result more than three standard deviations away from the mean is very low (approximately 0.3% for a normal distribution). Therefore, a score of  | z | ≥ 3 is considered unsatisfactory performance. Also, participants should be advised to check their measurement procedures following warning signals in case they indicate an emerging or recurrent problem.

  • The z’-Scores

When there is concern about the uncertainty of an assigned value u(xa), (e.g. when u(xa) > 0.3spt, then the uncertainty can be taken into account by expanding the denominator of the performance score.

This statistic called a z’-score is calculated as follows:

The criteria of assessing the laboratory’s performance by the z’-scores are the same as those of z-scores.

  • The Q-scores

An alternative scoring system is the Q-score where:

The above equation is essentially a relative measure of laboratory’s bias and does not take the target standard deviation into account. In the ideal situation, the distribution of Q-scores will be surrounding zero value when there is no significant bias in the participants’ measurement results.

Any interpretations of results are based on the set criteria of acceptability by the PT program organizer, such as setting acceptable percentage deviation from the target value. 

Issues to be considered by the laboratory found to have unsatisfactory score:

  1. Look at the overall performance of all participants in this round.  If a large number of them obtained unsatisfactory results, it may indicate that the problem might not lie within your laboratory.
  2. Did you use a test method that had very different performance criteria as compared with the others?
  3. Also look at the test sample factor.  Did the test material sent by the PT organizer differ significantly from the scope of the laboratory’s normal operation? Was there any sample storage condition compromised in this round of test?
  4. On the PT scheme itself, were there enough participants in this comparison exercise?  Small number of results may render the statistical conclusion unreliable.

If none of the above applies, the laboratory shall initiate a corrective action to investigate the cause of this unsatisfactory result, and implement and document any appropriate corrective actions taken.

There are many possible causes of unsatisfactory performance. Some of these are listed below:

  • Incorrect calibration of instrument;
  • Analytical instrument performance not optimized, such as the choice of inappropriate wavelength of element in the ICP analysis due to other elemental interference
  • Analytical error such as too much or too few dilutions
  • Error in some critical steps during sample pre-treatment such as incomplete analyte extraction from the sample or improper handling of its clean-up process
  • Improper choice of test method as compared with the methods of others
  • Calculation errors; transcription errors
  • Results reported in incorrect units

In conclusion, participation in well run PT schemes let us gain information on how our measurement results compare with those of others, whether our own measurements improve or deteriorate with time, and how our own laboratory’s performance is compared with an external quality standard.  In short, the aim of such schemes is the evaluation of the competence of analytical laboratories. PT program, indeed, is plays an important part in a laboratory’s QA/QC system.

Proficiency testing -1

Proficiency testing – what, why and how (Part I)

We may want to make claims that our analytical results are reliable and accurate to the data users, but it is no better than if we can show proof to them the proficiency testing (PT) program reports that we have participated in testifying our good quality standing. So, what is proficiency testing?

The ISO 13528:2015 defines proficiency testing as “evaluation of participant performance against pre-established criteria by means of interlaboratory comparisons”.

Therefore, a proficiency testing program typically involves the simultaneous distribution of sufficiently homogeneous and stable test samples to laboratory participants. It is usually organized by an independent PT provider which is an organization that takes responsibility for all tasks in the development and operation of a proficiency testing scheme.

The participants are to analyze the samples using either a method of their choice or a specified standard method, and submit their results to the scheme organizers, who will then carry out statistical analysis of all the data and prepare a final PT report showing the ‘scores’ of all participants to allow them to judge their own performance in that particular round. Ideally, all participants should conduct the testing with the same standard method for more meaningful comparison of results.

The scores are reflections of the difference between the participants’ results and a target or assigned value in that round with a quality target, usually in the form of standard deviation. Such comparison is important as it gives an allowance for measurement error. The scoring system should set acceptability criteria to allow participants to evaluate their performance.

The primary aim of PT therefore is to allow participating laboratories to monitor and optimize the quality of its routine analytical measurements. It may be noted that it is concerned with the assessment of participant performance and as such does not specifically address bias or precision.

There are several international guidelines and standards for organizing a PT round and statistical analytical methods for these inter-laboratory comparison data. The well known ones are ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison, which provides statistical support for the implementation of ISO/IEC 17043:2010 Conformity assessment — General requirements for proficiency testing, which describes the general methods that are used in proficiency testing schemes. 

How is a PT program organized?

There are two important steps in the organization of a PT scheme, that is specifying the assigned value for the samples to be analyzed, and secondly, setting the standard deviation for the proficiency assessment.

The organizer has to decide whether the assigned value and criterion for assessing deviations should be independent of participant results, or should be derived from the results submitted. in general, choosing assigned values and assessment criteria independently of participant results offers advantages.

Indeed, these two directly affect the scores that participants receive and therefore how they will interpret their performance in the scheme.

  1. Setting the assigned value

The assigned value is the value attributed to a particular quantity being measured. It is accepted by the scheme organizer as having a suitable small uncertainty which is appropriate for a given purpose.  

There are a number of approaches to obtain the assigned value:

  • Obtained by formulation, through adding a known amount of concentration of the target analyte to a base material containing no such analyte (or a trace but well characterized amount.)
  • Being a certified reference value due to the test material is a certified reference material (CRM).
  • Being a reference value determined by a single expert laboratory using a primary or classical method of analysis (e.g., gravimetry, titrimetry, isotope dilution mass spectrometry), or a fully validated test method which has been calibrated with CRMs.
  • Obtained from consensus of a number of expert laboratories after having analyzed the material using suitable methods.
  • Taken from consensus of the particular program participants. The consensus value in this case is usually based on a robust estimate of the mean of all the participating laboratories, in order to minimize the effect of extreme values in the data set.
  • Estimating standard deviation for proficiency assessment

The standard deviation for proficiency assessment is set by the scheme organizer. It is intended to represent the uncertainty regarding as fit for purpose for a particular type of analysis. Ideally, the basis for setting the standard deviation should remain the same over successive rounds of the PT program so that interpretation of performance scores is consistent over different rounds.

Also, due allowance for changes in performance at different analyte concentrations is usually made to make it easier for participants to monitor their performance over time.

There are a number of different approaches for performance evaluation.

a)   Using the repeatability and reproducibility standard deviations from a previous collaborative study of precision of a test method

This approach to defining the standard deviation for performance evaluation is based on the results from a previous reproducibility experiment via collaborative study by using the same analytical method, if any.  In this case, we can look for the reproducibility and repeatability estimates from the study.

The standard deviation for proficiency assessment, spt, is given by


b)   By experience from previous rounds of a proficiency testing scheme

This is determined by experience with previous rounds of proficiency testing for the same analyte with comparable concentrations, and where participants use compatible measurement procedures. Such evaluations will be based on reasonable performance expectations.

  • By ‘perception’

The standard deviation is chosen to ensure that laboratories that obtain a satisfactory score are producing results that are fit for a particular purpose, such as being related to a legislative requirement.  It can also be set to reflect the perceived performance of laboratories or to reflect the performance that the PT organizer and participants would like to be able to achieve.

  • From data obtained in the same round of a PT program

With this approach, the standard deviation for proficiency assessment, spt, is calculated from the results of participants in the same round of the proficiency testing scheme.

This approach is relatively simple and has been conventionally accepted due to successful use in many situations. The data from the PT program are assessed using the robust mean of participant results as the assigned value.

When this approach is used it is usually most convenient to use a performance score such as the z score.

  • From a general model, Horwitz function

The Horwitz function is an empirical relationship based on statistics from a very large number of collaborative studies for chemical applications over an extended period of time.  It describes how the reproducibility standard deviation varies with the analyte concentration level:

where c is the concentration of the chemical species to be determined in mass fraction 0 ≤ c ≤ 1.  (e.g. 1 mg/kg = 10-6).

This approach however, does not reflect the true reproducibility of certain test materials, bur is useful when the number of participants is running short for any other more meaningful statistical comparison.

The next article will discuss the assessment scoring systems commonly adopted in proficiency testing programs.

What are the types of precision estimates?

Accuracy, Precision & Trueness


When we evaluate the validity of a test result, we are mostly concern if the performance of the test method used is precise and reproducible enough to fit for a particular purpose or to meet the customer’s requirements.  That concern also includes in some cases whether the method detection limit is low enough to meet the regulatory or specification limits required.  Read on ….Types of precision estimates

How to determine significant systematic error?


When over time, you carry out several batches of analysis on a certified reference material (CRM) and find that the mean values are apparently to be consistently different from the expected value given in the certificate of analysis of the CRM sample, your analytical method may have a systematic error. Such error affects the method trueness and accuracy.

Determining significant systematic error