Training and consultancy for testing laboratories.

Archive for the ‘Basic statistics’ Category

Using Excel’s worksheet functions for 1-factor ANOVA

Using Excel’s worksheet functions for 1-factor ANOVA

Data analysis for one factor analysis of variance ANOVA with an equal number of replicates for each ‘treatment’ is probably quicker and easier by using Excel’s traditional functions in the spreadsheet calculations.

One of the advantages is that the manual calculations return live worksheet functions rather than the static values returned by the Data Analysis Toolpak add-ins, so we can easily edit the underlying data and see immediately whether that makes sense or shows a difference to the outcome.  On the other hand, we have to run the Data Analysis tool over again if we want to even change one value of the data.

The following figure shows how simple it can be in analyzing a set of test data obtained from sampling at 6 different targets of a population with the laboratory samples analyzed in four replicates. The manual calculations have been verified by the Single Factor Tool in the Data Analysis add-in, as shown in the lower half of the figure.

With the variance results in the form of MS(between) and MS(within), we can proceed easily to estimate the measurement uncertainty as contributed by the sampling and analytical components.

One-way ANOVA Calculations



Descriptive statistics of Excel Data Analysis Tools

It is convenient for us to use Excel to analyze our data. Indeed, Excel comes equipped with a Descriptive Statistics tool in the Data Analysis add-in package, termed Analysis ToolPak or ATP. With this tool, we get as many as 16 different descriptive statistical parameters without having to enter a single function on the worksheet….

Descriptive statistics of Excel Data Analysis Tools


Mean or Median?


Mean or median? You decide

The mean and median are two of the three kinds of “averages”. The third one is called mode.  Both mean and median are measures of the central tendency of the dataset, but they have different meanings with different advantages and disadvantages in applications.

The sample mean is calculated as the average of all the data, i.e., we add up all the observations and divide by the number of observations.  The median on the other hand partitions the data into two parts such that there is an equal number of observations on either side of the median. So, if we have a set of 5 data arranged in ascending order, the middle value on the 3rd place is the median.  However, if we have a set of 6 data in ascending order, then the average of the 3rd and 4th data is the median.

One important advantage of the median is that it is not influenced by extreme values (or outliers statistically speaking) in the dataset. Only either the middle observation or average of the two middle observations is used in the calculation, whilst the actual values of the remaining data are not considered.  It is commonly used in proficiency testing programs to assess inter-laboratory comparison data. The robust statistics also uses MAD (median absolute deviation) as a measure of the variability of a univariate quantitative data set.

The mean on the other hand is sensitive to all values in the dataset because every observation in the data affects the mean value, and extreme observations can have a substantial influence on the mean calculated.

Generally speaking, the mean value has some very important mathematical properties that make it possible to prove theorems, such as the Central Limit Theorem.  We note that useful results within statistics and inference methods naturally give rise to the mean value as a parameter estimate.

It is much more problematic to prove mathematical results related to the median even though it is more robust to extreme observations.  So, the mean is used for systematic quantitative data, unless there is a situation with extreme values, where the median is used, instead.

Therefore, you have to make a professional judgement on which one is to be used to suit your purpose.


A simple example of sampling uncertainty evaluation


A simple example of sampling uncertainty evaluation

In analytical chemistry, the test sample is usually only part of the system for which information is required. It is not possible to analyze numerous samples drawn from a population. Hence, one has to ensure a small number of samples taken are representative and assume that the results of the analysis can be taken as the answer for the whole…..

Improving uncertainty of linear calibration experiments

Standard error of cal curve

Improving uncertainty of linear calibration experiments


ANOVA and regression calculations

Linear regression graph

ANOVA and regression calculations

Accuracy and Trueness – What is the difference?

pexels-photo-226579 Accuracy

Accuracy and Trueness – what is the difference?

In analytical chemistry, we are quite familiar with the statistical term “accuracy” but of late, one may have noticed that the term “trueness” has been referred frequently, particularly in the evaluation of measurement uncertainty by the holistic top down approach which looks at the overall performance of the test method.

Is there a real difference between the meanings of these two terms?

ISO 5725-1 “Accuracy (trueness and precision) of measurement methods and results — Part 1: General principles and definitions” states the following definitions:

Clause 3.6

accuracy   The closeness of agreement between a test result and the accepted reference value.

[SOURCE: ISO 3534-1]

Clause 3.7  trueness

The closeness of agreement between the average value obtained from a large series of test results and an accepted reference value.

[SOURCE: ISO 3534-1]

From these definitions, we see that “accuracy” is actually a qualitative term and as suggested by the title of ISO 5725-1, accuracy covers trueness and precision.  On the other hand, “trueness“ is statistically quantifiable if we know the closeness of the mean values of repeated testing of a test material from its assigned or certified value.  A high level of trueness is equivalent to a lack of bias in the test method.

We can assess “trueness” by one of the following manners:

  1. A significance Student’s t-test is applied to the null hypothesis that the method is free from bias when we measure the analyte content of a certified reference material several times;
  2. If such reference materials are not available, we can analyze a test portion of the sample before and after adding a known mass of analyte to it, a process known as “spiking”. If the difference between the two average measurements is not equivalent to the amount added, i.e. the recovery is significantly different from 100%, we conclude that some bias exists in the method.
  3. We can also compare the results from the test method which is being validated with the results obtained when a standard or reference method is applied to the same test materials. In this approach, we have to test a number of test materials containing different analyte levels.  We can then apply the paired t-test or by regression methods to check for any significant difference between them.

Using Excel’s functions for one-way ANOVA

Celebrating a new milestone …..

200th Blog

Using Excel functions for one-way ANOVA

Risk and opportunities in test laboratories – Part II

Sample risk assessment grid

Risk and Opportunities in Test Labs Part II


Risk and opportunities in test laboratories – Part I

Risk assessment grid

Risk and Opportunities in Test Labs Part I