Training and consultancy for testing laboratories.

Archive for the ‘Basic statistics’ Category

Use of statistical tools in Excel

Statistical Tools in EXCEL

I have come across some course participants who are not aware of the fact that we can add    a plug-in of Analysis ToolPak on the popular Microsoft Excel® spreadsheet. This tool package is very useful for statistical analysis.  It uses appropriate statistical macro functions to calculate and display the results of your data analysis in an output table. Some tools even generate charts in addition to output tables.

To do so, you can simply open the Excel spreadsheet, first click the “File” tab, click “Options” , then click “Add-ins” which will show up a list of Inactive Application Add-ins under the “Manage” box. Check the “Analysis ToolPak” check box and click “OK”. The spreadsheet will automatically install this statistical package for you. You can find the “Data Analysis” tab when you click the “Data” tab on the spreadsheet tool bar as shown below:

Excel analysis toolpak

When the “Data Analysis” button is clicked, the menu as shown below shows up. You can just choose the relevant statistical tool you want to work on by select it with a click.

Excel data analysis

The regular statistical tools useful in analyzing data collected from chemical experiments are analysis of variance (ANOVA), linear regression correlation, descriptive statistics, plotting histogram, F-tests for variances and various Student’s t-tests.


Gosset and Student’s t-distribution

W.S. Gosset (aka Student) and t-distribution

    Whilst attending an Eurachem Scientific Workshop on June 14-15, 2018 at Dublin of Ireland, the workshop organizer arranged the Workshop Banquet at the renowned Guinness St. James Gat Brewery, whose one of the employees was William Sealy Gosset, a chemist cum statistician.

Gosset was interested in analyzing quality data obtained small sample size in his routine work on quality control of raw materials, as he noticed that it was neither practical nor economical in analyzing hundreds of data.

At that time, making statistical inferences from small sample-sized data to their population was unthinkable.  The general accepted idea was that if you were to have a large sample size, say well over 30 observations, you could use the Gaussian’s normal distribution to describe your data.

In 1906, Gosset was sent to Karl Pearson’s laboratory at the University College London  on sabbatical.  Pearson then was one of the well known scientific figures of his time, who was later credited with establishing the field of statistics.

At the laboratory, Gosset discovered the “Student’s t-distribution”, which is an important pillar of modern statistics to use small sample-sized data to infer what we could expect from the population out there.  It is the origin of the concept of “statistical significance testing”.

Why didn’t Gosset name the distribution as Gosset’s instead of Student’s?

It is interesting to note that it was because his employer, Guinness objected to his proposal to publish the findings as it did not want the competitors to know their gained advantage in using this unique procedure to select the best varieties of barley and hops for their popular beer in a way that no other business could do.

So finally Gosset published his article on Pearson’s journal Biometrika in 1908 under the pseudonym “Student”, leading to the famous “Student’s t-distribution”.

In statistics and probability studies, the t-distribution is a probability distribution in dealing with a normally distributed population whilst the sample size is not large. It uses sample standard deviation (s) to estimate the population standard deviation (s) which is unknown. For small samples, the confidence limits of the population mean are given by:

Student t formula

As the story goes, Gosset’s published paper was then mostly ignored by the statistical researchers until a young mathematician called R.A, Fisher discovered its importance and popularized it, particularly in estimating the random chance for considering a result “significant”.

Today, the t-distribution is routinely used as t- statistic tests for checking results for significance bias from true value, or for comparing measurements two sets of results and their means, and is also important for calculating confidence intervals.

This t-distribution is symmetric and resembles the normal distribution except for rather stronger “tails” due to more spread out because of the extra variability in smaller sample size.



Homogeneity and stability of PT samples

Importance of Homogeneity and stability of PT samples

To run a successful proficiency testing (PT) program, the importance of homogeneity and stability of PT samples prepared for an inter-laboratory comparison study cannot be over emphasized, as these two factors can adversely affect the evaluation of performance.

The PT provider must ensure that the measurand (i.e. targeted analyte) in the batch of samples is evenly distributed and is stable enough before laboratory analysis at the participant’s premises. Therefore an assessment for homogeneity and stability for a bulk preparation of PT items must be done prior to the conduct of the program.

Checks for sample stability are best carried out prior to circulation of PT items. The uncertainty contributors to be considered include the effects of transport conditions and any variation occurred during the PT program period.

A common model for testing stability in PT is to test a small sample of PT items before and after a PT round, to assure that no change occurred through the time of the round.  One may check for any effect of transport conditions by additionally exposing the PT samples retained for the study duration to conditions representing transport conditions.

A simple procedure for a homogeneity check

The homogeneity check aims to obtain a sufficiently small repeatability standard deviation (sr) after replicated analyses.  The general procedure is as follows:

  1. Select a competent laboratory to carry out this exercise
  2. Take a number k of the PT samples from the final packaged bulk preparation through a random process
  3. Prepare at least m = 2 test portions randomly from each PT sample
  4. Take the k x m test portions in a random order and carry out single measurements for the targeted analyte concentrations
  5. Use 1-way ANOVA to analyze the data generated from (4)
  6. Homogeneity of the prepared bulk is achieved when the F test ratio of mean square between samples against the mean square within samples is smaller than the F critical value with the given degrees of freedom, i.e. (k-1) and k(m-1), respectively.

The assigned value of PT program


The assigned value of PT program

A critical step in the organization of a proficiency testing (PT) scheme is specifying the assigned value for the participating laboratories.  The purpose is to compare the deviation of participant’s reported results from the assigned value (i.e. measurement error) with a statistical scoring criterion which is used to decide whether or not the deviation represents significant cause for concern in its performance.

ISO 13528:2015 defines assigned value as “value attributed to a particular property of a proficiency test item”.  It is a value attributed to a particular quantity being measured.  Such an assigned value will have a suitably small uncertainty which is appropriate for this interlaboratory comparison purpose.

Where do we obtain an assigned value?

A.  Assigned value obtained by formulation

A specified known level or concentration of the target analyte is added accurately to a base material preferably containing no native analyte. The assigned value is then derived by calculating the analyte concentration from the masses of analyte used.  By this way, the traceability of the assigned value can usually be established.

However, there may be no suitable base material (blank material) or well characterized base material available. Ensuring homogeneity in the prepared bulk material before distributing to the participants may also be a challenge. Furthermore, formulated samples may not be truly representative of test materials as the analyte may be in a different form from the less strongly bound to the matrix.

B.  Assigned value is a certified reference value

In this case, the test material is a certified reference material (CRM) made by a reputable organization and the assigned value is therefore the certified value and its uncertainty are quoted on the CRM certificate.  The limitations of using this assigned value are:

  1. Generally the CRM can be expensive to provide every participant with a unit of such CRM
  2. It is important to conceal the identity of commercial CRM from the participants as the testing outcome may be compromised by the participants
  3. Some certified value uncertainty may be high.

C.  Assigned value is a reference value

The assigned value is determined by a single expert laboratory using a suitable primary method of analysis (e.g., gravimetry, titrimetry, isotope dilution mass spectrometry, etc.) or a fully validated test method which has been calibrated with a closely matched CRM.

D.  Assigned value from consensus of expert laboratories

This assigned value is obtained from the results reported by a number of expert laboratories, with demonstrated proficiency in the measurements of interest, which analyze the material using suitable methods.  However, it must be cautioned that there may be an unknown bias in the results produced by the expert laboratories.

E.  Assigned value from consensus of PT scheme participants

This is the result from all the participants in the proficiency testing round. It is normally based on a robust estimate to minimize the effect of extreme values in the data set.



What is proficiency testing?

Lab Picture 3

What is Proficiency Testing?

The quality of your analytical measurements and your laboratory’s technical competence can be enhanced if you participate in testing schemes satisfactory with a number of laboratories simultaneously.  Such a scheme is known as proficiency testing (PT) scheme.

The primary aim of PT is to allow laboratories to monitor and optimize the quality of their routine analytical measurements.

A PT scheme is usually organized by an independent body, which can be a national standard body, a learned professional organization or even a business enterprise.  Basically, in chemical analysis, aliquots from homogeneous and stable test materials are distributed to a number of laboratories for analysis to be carried out at a stated window of time. Each participant is given an unique identification code.

After the participants have analyzed the samples using either a test method of their choice or a stated standard method, the scheme organizer will carry out statistical analysis of all the data submitted and provide a performance report, detailing each participant’s statistical ‘score’ that allows them to judge their performance in that particular round of testing.

In other words, the participating laboratory can gain information on how their measurements compare with those of others, how their own measurements improve or deteriorate with time, and how their own measurements compare with an external quality standard.

There are a number of different scoring systems used in PT programs; the majority involve comparing the difference between the participant’s result (x) with a target or assigned value (xa) with a quality target, which is usually a standard deviation for proficiency assessment, denoted by xp.  Each scoring system has acceptability criteria to allow participants to evaluate their performance.

Generally we do expect some divergent results to arise even between experienced, well equipped and well-staffed laboratories. If so, this PT scheme helps to highlight such alarming differences, and to suggest to these laboratories look into their own analytical process in order to improve the quality of their test results.  Hopefully a better comparison is achieved in the next round of the PT testing.

Amusingly, there are reports that some participating laboratories have been caught to have colluded in reporting their test results to the scheme organizer, particularly when the PT scheme does not involve a large number of laboratories and some of these laboratory operators are known to each other, such as being subsidiary laboratories of an organization group.  Such collusion act is undesirable as it defeats the noble purpose of carrying out the PT scheme and renders the outcome of this round of PT testing meaningless.

A simple explanation to this incident is that these laboratories are not confident of their own testing and need to compare results of others before submitting the results to the organizer. However, the outcome of a statistical graph may show a bunch of results grouped at one corner when these results are questionable, meaning they are significantly different from the assigned value of the test material analyzed.

Of late, some scheme organizers try to overcome this malpractice by preparing and sending out at least two labelled test samples with close but significantly different analyte concentrations (not duplicates) to the participants. The originality of these samples is only known to the organizer.

Actually we must not treat PT samples with extra care and attention, but run the PT samples like any other routine samples.  It is quite common to see that a participant would repeat the analysis as many times as possible until no more sample left for future reference!

A good source of reference on statistical techniques applicable to a PT scheme is ISO 13528:2015  Statistical methods for use in proficiency testing by interlaboratory comparison

Why do we perform hypothesis tests?

Types I and II

Why do er perform hypothesis tests?

A course participant commented the other day that descriptive statistical subjects were much easier to understand and could be appreciated, but not the analytical or inferential statistics which call for logical reasoning and inferential implications of the data collected.

I think the core issue lies on the abstract nature of inferential statistics.  Hypothesis testing is a good example.  In here, we need to determine the probability of finding the data given the truth of a stated hypothesis.

A hypothesis is a statement made that might, or might not, be true.

Usually the hypothesis is set up in such a way that it is possible for us to calculate the probability (P) of the data (or the test statistic calculated from the data) given the hypothesis, and then to make a decision about whether the hypothesis is to be accepted (high P) or rejected (low P).

A particular case of a hypothesis test is one that determines whether or not the difference between two values is significant – a significance test.

For this case, we actually put forward the hypothesis that there is no real difference and the observed difference arises from random effects.  We assign this as the null hypothesis (Ho).

If the probability that the data are consistent with the null hypothesis (HO) falls below a predetermined low value (say, 0.05 or 0.01), then the HO hypothesis is rejected at that probability.

Therefore, p<005 means that if the null hypothesis were true, we would find the observed data (or more accurately, the value of the test statistic, or greater, calculated from the data) in less than 5% of repeated experiments.

To use this in significance testing, a decision about the value of the probability below which the null hypothesis is rejected, and a significance difference concluded, must be made.

In laboratory analysis, we tend to reject the null hypothesis “at the 95% level of confidence” if the probabiity of the test statistic, given the truth of HO falls below 0.05.  In other words, if HO is indeed correct, less than 5% (i.e. 1 in 20 numbers) averages of repeated experiments would fall outside the limits. In this case, it is concluded that there was a significant difference.

However, it must be stressed that the figure of 95% is a somewhat arbitrary one, arising because of the fact that (mean +2 standard deviation) covers about 95% of a population.

With modern computers and spreadsheets, it is possible to calculate the probability of the statistic given a hypothesis, leaving the analyst to decide whether to accept or reject it.

In deciding what a reasonable level to accept or reject a hypothesis is, i.e. how significant is “significant”, two scenarios, in which the wrong conclusion is arrived at, need to be considered.  Therefore, there is a “risk” in making a wrong decision at a specified probability.

A so-called Type I error is in the case where we reject a hypothesis when it is actually true. It may also be known as “a false negative”.

The second scenario is the opposite of this, when the significance test leads to the analyst wrongly accepting the null hypothesis although in reality HO is false (a Type II error or a false positive).

We had discuss these two types of error in the short articles: , and,


Does your test result “fit for purpose”?

The concept of “fit for purpose”

The ultimate aim of a laboratory analysis is to produce reliable enough, accurate enough results to allow the proper use of them.  We do not undertake testing just for fun or for our own sake.  Proper handling of the method validation and verification processes become important. And, the concept of “fit for purpose” sums up what is required.

Indeed, the quality of the analytical chemistry needs to be sufficient to answer the question on the actual situation based on sample analysis.  The data user wants to know if he can eat the vegetables safely, drink the water without harm, or invest in the gold mine. Erroneous results can lead to loss of customer confidence.

In order to deliver test results that are “fit for purpose”, a proper understanding of basic statistical data analysis is essential. Unfortunately many laboratory analysts are somehow quite weak in this important subject.

To obtain valid results, we can refer to the six principles of valid analytical measurement (VAM), as proposed by the UK Laboratory of the Government Chemist (LGC):

  • Analytical measurement should be made to satisfy an agreed customer requirement
  • Use validated methods and equipment
  • Use qualified and competent staff to undertake the task
  • Participate regularly in independent assessment of technical performance (i.e. proficiency testing)
  • Ensure comparability with measurement made in other laboratories (i.e. traceability, reproducibility and measurement uncertainty)
  • The laboratory should have well-defined quality control and quality assurance practices.


Sampling randomization – Part II


In selecting random samples for analysis, it is necessary to generate random numbers.  Random numbers also are used for simulations and can be used to create sample datasets.   Random numbers can be generated in a number of different ways ……

Randomization – Part II

Sampling randomization – Part I


We have been talking about the importance of carrying out random sampling for laboratory analysis.  What is actually randomization?

Randomization – Part I

Confidence intervals- How many measurements should you take?

Laboratory 1

Confidence intervals – how many measurements to take