Proficiency testing – what, why and how (Part I)
We may want to make claims that our analytical results are reliable and accurate to the data users, but it is no better than if we can show proof to them the proficiency testing (PT) program reports that we have participated in testifying our good quality standing. So, what is proficiency testing?
The ISO 13528:2015 defines proficiency testing as “evaluation of participant performance against pre-established criteria by means of interlaboratory comparisons”.
Therefore, a proficiency testing program typically involves the simultaneous distribution of sufficiently homogeneous and stable test samples to laboratory participants. It is usually organized by an independent PT provider which is an organization that takes responsibility for all tasks in the development and operation of a proficiency testing scheme.
The participants are to analyze the samples using either a method of their choice or a specified standard method, and submit their results to the scheme organizers, who will then carry out statistical analysis of all the data and prepare a final PT report showing the ‘scores’ of all participants to allow them to judge their own performance in that particular round. Ideally, all participants should conduct the testing with the same standard method for more meaningful comparison of results.
The scores are reflections of the difference between the participants’ results and a target or assigned value in that round with a quality target, usually in the form of standard deviation. Such comparison is important as it gives an allowance for measurement error. The scoring system should set acceptability criteria to allow participants to evaluate their performance.
The primary aim of PT therefore is to allow participating laboratories to monitor and optimize the quality of its routine analytical measurements. It may be noted that it is concerned with the assessment of participant performance and as such does not specifically address bias or precision.
There are several international guidelines and standards for organizing a PT round and statistical analytical methods for these inter-laboratory comparison data. The well known ones are ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison, which provides statistical support for the implementation of ISO/IEC 17043:2010 Conformity assessment — General requirements for proficiency testing, which describes the general methods that are used in proficiency testing schemes.
How is a PT program organized?
There are two important steps in the organization of a PT scheme, that is specifying the assigned value for the samples to be analyzed, and secondly, setting the standard deviation for the proficiency assessment.
The organizer has to decide whether the assigned value and criterion for assessing deviations should be independent of participant results, or should be derived from the results submitted. in general, choosing assigned values and assessment criteria independently of participant results offers advantages.
Indeed, these two directly affect the scores that participants receive and therefore how they will interpret their performance in the scheme.
1. Setting the assigned value
The assigned value is the value attributed to a particular quantity being measured. It is accepted by the scheme organizer as having a suitable small uncertainty which is appropriate for a given purpose.
There are a number of approaches to obtain the assigned value:
The standard deviation for proficiency assessment is set by the scheme organizer. It is intended to represent the uncertainty regarding as fit for purpose for a particular type of analysis. Ideally, the basis for setting the standard deviation should remain the same over successive rounds of the PT program so that interpretation of performance scores is consistent over different rounds.
Also, due allowance for changes in performance at different analyte concentrations is usually made to make it easier for participants to monitor their performance over time.
There are a number of different approaches for performance evaluation.
This approach to defining the standard deviation for performance evaluation is based on the results from a previous reproducibility experiment via collaborative study by using the same analytical method, if any. In this case, we can look for the reproducibility and repeatability estimates from the study.
The standard deviation for proficiency assessment, spt, is given by
where
This is determined by experience with previous rounds of proficiency testing for the same analyte with comparable concentrations, and where participants use compatible measurement procedures. Such evaluations will be based on reasonable performance expectations.
The standard deviation is chosen to ensure that laboratories that obtain a satisfactory score are producing results that are fit for a particular purpose, such as being related to a legislative requirement. It can also be set to reflect the perceived performance of laboratories or to reflect the performance that the PT organizer and participants would like to be able to achieve.
With this approach, the standard deviation for proficiency assessment, spt, is calculated from the results of participants in the same round of the proficiency testing scheme.
This approach is relatively simple and has been conventionally accepted due to successful use in many situations. The data from the PT program are assessed using the robust mean of participant results as the assigned value.
When this approach is used it is usually most convenient to use a performance score such as the z score.
The Horwitz function is an empirical relationship based on statistics from a very large number of collaborative studies for chemical applications over an extended period of time. It describes how the reproducibility standard deviation varies with the analyte concentration level:
where c is the concentration of the chemical species to be determined in mass fraction 0 ≤ c ≤ 1. (e.g. 1 mg/kg = 10-6).
This approach however, does not reflect the true reproducibility of certain test materials, bur is useful when the number of participants is running short for any other more meaningful statistical comparison.
The next article will discuss the assessment scoring systems commonly adopted in proficiency testing programs
Currently many measurement uncertainty (MU) courses and workshops for test laboratories in this region are run by metrology experts instead of practicing chemists. Some laboratory analysts and quality control personnel have found the outcome after attending the two- or three-day presentations rather disillusion, leaving the classroom with their minds even more uncertain. This is because they cannot see how to apply in their routine works as there are no practical worked examples demonstrated to satisfy their needs….. Read on Measurement uncertainty – the very basic