Training and consultancy for testing laboratories.

Archive for January, 2020

How to use Excel to construct normal distribution curves

Data collected randomly are usually normally distributed, particularly on a large sample size, being symmetric about the mean value of the data. In a graphic form, it will appear like a bell-shaped curve, showing the data nearer to the mean are more frequent in occurrence than those far away from the mean.  The width of the curve varies, depending on the standard deviation of the data.  It is ‘flatter’ when the standard deviation is larger, indicating that the set of data is less precise.

MS Excel spreadsheet is a good tool to show the above phenomena.

For example, we can construct various normal curves using Excel, describing the flash points in deg C of a certain solvent with a mean equal to 72 deg C with three different standard deviations, 2.6, 1.6 and 1.1 deg C obtained by analysts A, B and C, respectively.  The Excel solution is shown below.

First, we enter numbers from 67 deg C to 77 deg C with an interval of 0.5 deg C into column A, and key in Excel function “=NORM.DIST(A4,$B$1,$B$2,0)” for Analyst A in cell B4. Similarly we enter functions “=NORM.DIST(A4,$C$1,$C$2,0)” and “=NORM.DIST(A4,$D$1,$D$2,0)” for Analyst B and C, respectively.  A click-and-drag is then performed on all the columns.

What does Excel function NORM.DIST do?  The function “=NORM.DIST(x,mean,standard_dev,cummulative)” with cumulative = 0 tells Excel to calculate the height of the curve at the number x.  So, expression “=NORM.DIST(67,72,2.6,0)” keyed in any cell gives 0.024.  We may also key in “FALSE” in the fourth position of the function instead of “0” to get the same outcome.

We can then plot the normal curves with different standard deviations in usual manner by clicking Insert –> Scatter(X,Y) with smooth lines, as shown below:

The above diagram shows that more precise data (i.e. smaller standard deviations) gives rise to  a narrower but taller curve.

On the other hand, suppose we wished to know the probability (or chance) in terms of percentage of flash points which is smaller than 76 deg C, we use “=NORM.DIST(76,72,2.6,1)” keyed in any cell to give 0.938, which is the percent that are smaller than or equal to 76 deg C.  Notice that the “1” or “TRUE” in the fourth position of the NORM.DIST function tells Excel to accumulate the area from 76 deg C to the left. It is the cumulative sum from a low 67 deg C to 76 deg C in this instance.

This can also be interpreted as the probability of finding a value less than or equal to 76 deg C. There would be therefore 1 – 0.938 = 0.062 or 6.2% chance that are larger than 76 deg C.

What’s Internal Quality Control (IQC)?

A professionally run test laboratory must have a set of internal quality control or check (IQC) procedures in place. Regrettably I have noticed that many accredited chemical laboratories do not institute such IQC system in their routine works.

The purpose of IQC is to ensure as far as possible that the magnitude of errors affecting the analytical system is not changing during its routine use since method validation or verification process. By not having any IQC system in place, the analyst would not be able to state with confidence that the test results generated for that particular batch of samples are precise, accurate and fit for purpose.

During method validation, we have estimated the uncertainty of the method and showed that it is fit for purpose. Therefore, when the method is put in routine use, every run of analysis should be checked to show that the errors of measurement are probably no larger than they were at validation time.  Even when a standardized method is used for analysis, we have to demonstrate that our laboratory’s precision is no worse than the stated repeatability of the method.

For this IQC purpose, we can employ the concept of statistical control, which means in general that some critical feature of the system is behaving like a normally distributed variable.  How are we going to do it?

For chemical analysis, we can add one or more “control materials or samples” to the run of test methods.  These control materials are treated throughout in exactly the same manner as the test materials, from the weighing of the test portion to the final measurement.  Of course, the control materials ideally must be of the same type as the materials for which the analytical system was validated, in respect of matrix composition and analyte concentration.

By doing so, we treat the control materials as a surrogate and their behavior is a proper indicator of the performance of the system.  We can plot the results obtained in successive runs on a control chart for visual inspection on its moving trend over time.  The control lines are determined by run-to-run intermediate precision of the data collected.  Intermediate precision, by definition, is the pooled standard deviation of a number of successive runs in the same laboratory with inevitable changing measurement conditions (such as different analysts, instruments, newly prepared reagents, environmental variations, etc.) over time.