Training and consultancy for testing laboratories.

Archive for September, 2017

Practical statistical tools a lab analyst needs to know

Stat and Prob

The other day someone in an analytical laboratory asked me what the minimum knowledge was a chemist needed to know before he could handle the routine laboratory data without much problem.

In my opinion, most of the statistical calculations are of high school arithmetic and algebra in nature, although some subjects may be a bit more abstract and complicated. However, it is not really a big deal to get acquainted with. The most challenging part of learning statistics is to understand its principles and reasoning behind these calculations.  A very basic point is to appreciate that statistics and probability are not two but one combined subject.

Indeed, statistics cannot be a standalone subject without bringing in the concept of probability. That is because in most cases, we are working only on samples which are subsets of a population but actually we are interested in knowing the bigger picture in the population. We need therefore make inferences and estimation based on the sample data collected through the use of appropriate probability distribution(s) and statistical testing.  Indeed probabilities underlie everything in the field of statistics. Having a clear sense of the basic ideas in probability theory will help us more easily digest the more advanced statistical ideas encountered.

An analyst working in a laboratory routinely faces a pool of data that need to be analyzed, from the calculation of simple arithmetic mean and standard deviation to standard calibration curve, precision, accuracy, detection limit, measurement uncertainty, method validation, and so forth. He or she has to be equipped with basic statistical knowledge in order to carry out the duties assigned satisfactorily.

Append below is a list of general statistical subjects which are of value in laboratory data analysis in the first instance:

  1. Basic probability concepts: outcomes, events, continuous and discrete probability distribution functions, etc.
  2. Descriptive statistics: error, mean, median, mode, standard deviation, variance, coefficient of variance, relative standard deviation, standard error of mean, linear and non-linear regression, data transformation, etc.
  3. Inferential statistics: statistical modelling, confidence intervals, Central Limit Theorem, model validation and prediction, outliers’ tests, hypothesis tests, chi-square test, randomization, analysis of variance ANOVA, statistic tests: Fisher’s F-test, Student’s t-test, chi-square test, Anderson Darling test, Shapiro- Wilk test, etc.
  4. Graphical presentations: histogram, QQ-plots, scatter plots, residual plots, control charts, etc.




Sampling uncertainty: Between-group & within-group variation

Sampling uncertainty – Between group n within group variation


Sampling Theory: What is a sampling distribution?

Sampling Theory and sampling distribution

Using R to generate a random sampling table

Sampling 8

The open source R programing language is a free software environment for statistical computing and graphics, and is easy to master. The official website is . It can run on a wide variety of UNIX platforms, Windows and MacOS.

On September 24, 2016, this blog site published an article on how to use R to generate random numbers ( .   In light of the newly revised ISO/IEC 17025 accreditation standards embracing sampling as another important criterion for technical competence assessment, the random number function of R becomes very handy for cargo surveyors and samplers to prepare their sampling plan on cargo shipment.

We can use the random number function of R to create a random number table to suit the needs in randomly selecting samples for laboratory quality analysis.

For example, there is a shipment of 1000 bags of coffee beans in a warehouse to be surveyed prior to be dispatched to port. The buyer requires a 5% sampling for laboratory quality testing.  That means some 50 bags have to be random selected before composite a portion of each bag into a suitable sized test sample through a quartering sub-sampling process.

The sampling plan, therefore, can be the following process:

1.  Label each bag with a sequential number

2.  Create 50 numbers in a random number table with the R command language:

> RandSampling=sample(500,50)

> dim(RandSampling)=c(10,5)

> RandSampling

[,1]   [,2]   [,3]   [,4]   [,5]

[1,]  154  424   84  486   82

[2,]   78  214  275  498  388

[3,]   93  104  478  148  258

[4,]  229  283   96  479  489

[5,]  487  211  216   59  263

[6,]   94  450   47  201  105

[7,]  330  121  130  276   56

[8,]   11  415  303  240  407

[9,]  427   60   71  142  409

[10,]  101  238  228  441  355


3.  Sample a portion (say, 500g) of the coffee beans from the bags with these selected numbers into a large sampling bag.

4.  Conduct a sample quartering process on site to reduce the test sample size to about 2.5 kg before sending to the laboratory for analysis.




How to ensure your random sampling process is really random?

Cargo                                       How to ensure your simple random sampling is really random

Random sampling strategies

Random Sampling Strategies

New FDIS 17025 version on subject of sampling

Soil sampling A

By definition, sampling involves selecting a portion of material (i.e. sample or samples) from a substance, material, product or even a consignment of goods to represent or provide information about that larger body of material (i.e. population).

Although the ISO/IEC 17025 accreditation standards and also its latest revision are still concerned about the technical competence of organizations that conduct laboratory activities, it is reckoned that the reliability of testing result lies on how representative the sample analyzed is for the bulk material of interest. As the saying goes : “The result is not better than the sample that it is based on”.

A question has thus been asked: Can sampling be considered as a stand alone activity or should sampling activities always be associated with testing or calibration?

It is obvious that although the scope of this accreditation standards is with laboratory activities, sampling indeed has an inevitable connection to the laboratory’s analytical process that produces the test result which is important to the end users. It is usually impossible to analyze the whole bulk (or lot) of material (statistically called ‘population’ or ‘sampling target’).  Therefore, proper sampling plays an essential role to ensure the validity of the final test result.

The FDIS requires laboratory to have a sampling plan and documented procedures for sampling in their field of testing.  The laboratory is allowed to state “analyzed as received” in the report if it has not been responsible for the sampling stage. Also, laboratory performing sampling or testing activities shall evaluate measurement uncertainty, i.e. the uncertainty of sampling process is to be evaluated and forms an additional uncertainty contributor to the measurement uncertainty evaluation of the whole testing or calibration process.

Even though a laboratory does not get involved in the sampling of a population outside its premises, it often carries out sub-sampling process before the start of the analytical procedure. Therefore the subject of sampling cannot be ignored also in such laboratory.

As said earlier, since the main purpose of laboratory analysis is to estimate the value of analyte concentration in a sampling target, sample taken should be as representative of the sampling target under study as possible.  This is to ensure that the property that each sample has the same probability of being drawn from the population as another sample.

So, in order to optimize the whole measurement process including taking a good sample for analysis, the sampling planner needs to gather information of the sampling target and to decide on appropriate sampling protocols.

Ask the following questions during information gathering:

  • What are the analyte(s) to be determined?
  • Is the measurand in the bulk material homogeneously or heterogeneously distributed?
  • What is the kind of average sample required: hourly, daily, by shift, batch, shipment, ?
  • Are all the necessary personnel and equipment available?
  • What is the uncertainty level of the analyte allowed in the specification, if any? – this information is particularly important for deciding on the number of samples to be taken for analysis

To decide on the appropriate sampling protocols, one must try to deal with the following subjects:

  • Manual vs automatic sampling
  • Sampling frequency – number of samples to be drawn
  • Sample sizes (volume, weight)
  • Number of samples to be drawn
  • Sampling locations (ship’s tanks, silos, warehouses)
  • Individual vs composite samples
  • Which sampling strategy
    • Random sampling
    • Stratified random sampling
    • Systematic sampling

We shall discuss these sampling strategies in more details in the next blog.