July 28, 2019

Outlier test statistics in analytical data

Notes on outlier test statistics in analytical data

When an analytical method is repeated several times on a given sample, the measured values nearer to the mean (or average) of the data set tend to occur more often than those found further away from the mean value. This is the characteristic of analytical chemistry following the normal probability distribution and the phenomenon is known to be a measure of central tendency.

However, there are times and again that we notice some extremely low or high value(s) which are visibly distant from the remainder of data. These values can be suspected to be outliers which may be defined as observations in a set of data that appear to be inconsistent with the remainder of that set.

It is obvious that outlying values generally have an appreciable influence on calculated mean value and more influence on calculated standard deviation if they are not examined carefully and removed if necessary.

However we must remember that random variation of analysis does generate occasional values by chance. If so, these values are indeed part of the valid data and should generally be included in any statistical calculations. However undesirable human error or other deviation in the analytical process such as instrument failure may cause outliers to appear from such faulty procedure. Hence, it is important to have the effect of outliers minimized.

To minimize such effect, we have to find ways to identify outliers and distinguish them from chance variation. There are many outlier tests available which allow analysts to inspect suspect data and if necessary correct or remove erroneous values. These test statistics assume underlying a normal distribution and the test sample is relatively homogeneous.

Furthermore, outlier testing needs careful consideration where the population characteristics are not known, or, worse, known to be non-normal. For example, if the data were Poisson distributed, many valid high values might be incorrectly rejected because they appear inconsistent with a normal distribution. It is also crucial to consider whether outlying values might represent genuine features of the population.

Another approach is to use robust statistics which are not greatly affected by the presence of occasional extreme values and will still perform well when no outliers are present.

The outlier tests are aplenty for your disposal: Dixon’s, Grubb’s, Levene’s, Cochran’s, Thompson’s, Bartlett’s, Hartley’s, Brown-Forsythe’s, etc. They are quite simple to be applied on a set of analytical data. However, to be meaningful in the outcome, the number of data examined should be large rather than just a few.

Therefore, outlier tests are only to provide us with objective criteria or signal to investigate the cause; usually, outliers should not be removed from the data set solely because of the results of a statistical test. Instead, the tests highlight the need to inspect the data more closely in the first instance.

The general guidelines for acting on outlier tests on analytical data, based on the outlier testing and inspection procedure listed in ISO 5725 Part 2 Accuracy (trueness and precision) of measurement methods and results — Part 2: Basic method for the determination of repeatability and reproducibility of a standard measurement method are as follows:

  • To test at the 95% and the 99% confidence level
  • All outliers should be investigated and any errors corrected
  • Outliers significant at the 99% level may be rejected unless there is a technical reason to retain them
  • Outliers significant only at the 95% level (normally called ‘stragglers’) should be rejected only if there is an additional technical reason to do so
  • Successive testing and rejection are permissible, but not to the extent of rejecting a large proportion of the data.

The above procedure leads to results which are not so seriously biased by rejection of chance extreme values, but are rather relatively insensitive to outliers at the frequency commonly encountered in measurement work. The application of robust statistics might be a better choice.

July 21, 2019

What do we know about robust estimators?

In statistics, the average (mean) and sample standard deviation are known as “estimators” of the population mean and standard deviation. These estimates improve as the number of data collected increases. As we know, the use of these statistics requires data that are normally distributed, and for confidence intervals employing the standard deviation of the mean, this tends to be so.

However, real experimental data may be so distributed but often the distribution will contain data that are seriously flawed. They can be extremely low or high values. If we can identify such data and remove them from further consideration, then all is well and good.

Sometimes this is possible, but not always. This is a problem as a single rouge value can seriously upset our calculations of the mean and standard deviation.

Estimators that can tolerate a certain amount of ‘bad’ data are called robust estimators, and can be used when it is not possible to ensure that the data being processed has the correct characteristics.

For example, we can use the middle value of a set of ascending data (called median) as a robust estimator of the mean, and the range of the middle 68% of the data (called normalized interquartile range IQR) as a robust estimator of the standard deviation.

By definition, the median is the middle value of a set of data when arranged in ascending order. If there are odd number of data, then the median is the unique middle datum. If there are an even number, then the median is the average of the middle two data.

Median is robust, because no matter how outrageous one or more extreme values are, they are only individual values at the end of a list. Their magnitude is immaterial.

The interquartile range (IQR) is a measure of where the “middle fifty” is in a data set, i.e. the range of values that spans the middle 50% of data. Three quarters of the IQR, known as the normalized IQR, is an estimate of the standard deviation. In other words, the interquartile range formula is the median of the first quartile Q1 subtracted from that of the third quartile Q3:

IQR = Q 3 – Q1

A problem with the IQR is that it is unrealistic to be used to calculate for small data sets, as we must have sufficient data to define quartiles (sections of the ordered data that contain one-quarter of the data).

Another robust estimator of standard deviation is median absolute deviation (MAD). It is a fairly simple estimate that can be implemented easily in a spreadsheet. The MAD from the data set median is calculated by:

MAD = median (| xi – median value |i=1,2,…n)

Robust methods have their place, particularly when we must keep all the data together in, for example, an interlaboratory comparison study where an outlying result from a laboratory cannot simply be ignored. They are less strongly affected by extreme values.

However, robust estimators are not really the best statistics, and wherever possible the statistics appropriate to the distribution of the data should be used.

So, when can we use these robust estimators?

Robust estimators can be considered to provide good estimates of the parameters for the ‘good’ data in an outlier-contaminated data set. They are appropriate when:

  • The data are expected to be normally distributed. In here, robust statistics give answers very close to ordinary statistics
  • The data are expected to be normally distributed, but contaminated with occasional spurious values which are regarded as unrepresentative or erroneous and approximately symmetrically distributed around the population mean. Robust estimators in here are less affected by these extreme values and hence are useful.

Remember that robust estimators are not recommended where the data set shows evidence of multi-modality or shows heavy skewing, especially when it is expected to follow non-normal or skewed distributions such as binomial and Poisson with low counts, chi-squared, etc. which generate extreme values with reasonable likelihood.