Training and consultancy for testing laboratories.

Proficiency testing – 2

Proficiency testing – what, why and how (Part II)

The Part 1 of this article series discussed the rationale to conduct proficiency testing (PT) programs and various proficiency assessment tools such as setting an assigned value and estimation of its standard deviation to reflect inter-laboratory variations. Let’s see how the scoring of PT results is made.

  1. The z-Score

The most common scoring system is the z-score for a proficiency test result xi, calculated as:


xa is the assigned value and

sigmapt is the standard deviation for proficiency assessment.

Those readers who are familiar with the normal probability distribution function should appreciate the use of this z-score which is to standardize all randomly distributed data to a standard normal distribution N(0,1) with mean = 0 and sigma2 = 1.

The conventional interpretation of z -scores is as follows:

  • A result that gives | z | ≤ 2 is considered to be acceptable;
  • A result that gives 2 < | z | < 3 is considered to give a warning signal;
  • A result that gives | z | ≥ 3 is considered to be unacceptable or unsatisfactory performance (or action signal).

Assuming all participants perform exactly in accordance with the performance requirements, then by the normal distribution, about 95% of values are then expected to be within two standard deviations of the mean value. In other words, there is only a 5% chance that a valid result would fall further than two standard deviation from the mean.

The probability of finding a valid result more than three standard deviations away from the mean is very low (approximately 0.3% for a normal distribution). Therefore, a score of  | z | ≥ 3 is considered unsatisfactory performance. Also, participants should be advised to check their measurement procedures following warning signals in case they indicate an emerging or recurrent problem.

  • The z’-Scores

When there is concern about the uncertainty of an assigned value u(xa), (e.g. when u(xa) > 0.3spt, then the uncertainty can be taken into account by expanding the denominator of the performance score.

This statistic called a z’-score is calculated as follows:

The criteria of assessing the laboratory’s performance by the z’-scores are the same as those of z-scores.

  • The Q-scores

An alternative scoring system is the Q-score where:

The above equation is essentially a relative measure of laboratory’s bias and does not take the target standard deviation into account. In the ideal situation, the distribution of Q-scores will be surrounding zero value when there is no significant bias in the participants’ measurement results.

Any interpretations of results are based on the set criteria of acceptability by the PT program organizer, such as setting acceptable percentage deviation from the target value. 

Issues to be considered by the laboratory found to have unsatisfactory score:

  1. Look at the overall performance of all participants in this round.  If a large number of them obtained unsatisfactory results, it may indicate that the problem might not lie within your laboratory.
  2. Did you use a test method that had very different performance criteria as compared with the others?
  3. Also look at the test sample factor.  Did the test material sent by the PT organizer differ significantly from the scope of the laboratory’s normal operation? Was there any sample storage condition compromised in this round of test?
  4. On the PT scheme itself, were there enough participants in this comparison exercise?  Small number of results may render the statistical conclusion unreliable.

If none of the above applies, the laboratory shall initiate a corrective action to investigate the cause of this unsatisfactory result, and implement and document any appropriate corrective actions taken.

There are many possible causes of unsatisfactory performance. Some of these are listed below:

  • Incorrect calibration of instrument;
  • Analytical instrument performance not optimized, such as the choice of inappropriate wavelength of element in the ICP analysis due to other elemental interference
  • Analytical error such as too much or too few dilutions
  • Error in some critical steps during sample pre-treatment such as incomplete analyte extraction from the sample or improper handling of its clean-up process
  • Improper choice of test method as compared with the methods of others
  • Calculation errors; transcription errors
  • Results reported in incorrect units

In conclusion, participation in well run PT schemes let us gain information on how our measurement results compare with those of others, whether our own measurements improve or deteriorate with time, and how our own laboratory’s performance is compared with an external quality standard.  In short, the aim of such schemes is the evaluation of the competence of analytical laboratories. PT program, indeed, is plays an important part in a laboratory’s QA/QC system.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: