Data collected randomly are usually normally distributed, particularly on a large sample size, being symmetric about the mean value of the data. In a graphic form, it will appear like a bell-shaped curve, showing the data nearer to the mean are more frequent in occurrence than those far away from the mean. The width of the curve varies, depending on the standard deviation of the data. It is ‘flatter’ when the standard deviation is larger, indicating that the set of data is less precise.
MS Excel spreadsheet is a good tool to show the above phenomena.
For example, we can construct various normal curves using Excel, describing the flash points in deg C of a certain solvent with a mean equal to 72 deg C with three different standard deviations, 2.6, 1.6 and 1.1 deg C obtained by analysts A, B and C, respectively. The Excel solution is shown below.
First, we enter numbers from 67 deg C to 77 deg C with an interval of 0.5 deg C into column A, and key in Excel function “=NORM.DIST(A4,$B$1,$B$2,0)” for Analyst A in cell B4. Similarly we enter functions “=NORM.DIST(A4,$C$1,$C$2,0)” and “=NORM.DIST(A4,$D$1,$D$2,0)” for Analyst B and C, respectively. A click-and-drag is then performed on all the columns.
What does Excel function NORM.DIST do? The function “=NORM.DIST(x,mean,standard_dev,cummulative)” with cumulative = 0 tells Excel to calculate the height of the curve at the number x. So, expression “=NORM.DIST(67,72,2.6,0)” keyed in any cell gives 0.024. We may also key in “FALSE” in the fourth position of the function instead of “0” to get the same outcome.
We can then plot the normal curves with different standard deviations in usual manner by clicking Insert –> Scatter(X,Y) with smooth lines, as shown below:
The above diagram shows that more precise data (i.e. smaller standard deviations) gives rise to a narrower but taller curve.
On the other hand, suppose we wished to know the probability (or chance) in terms of percentage of flash points which is smaller than 76 deg C, we use “=NORM.DIST(76,72,2.6,1)” keyed in any cell to give 0.938, which is the percent that are smaller than or equal to 76 deg C. Notice that the “1” or “TRUE” in the fourth position of the NORM.DIST function tells Excel to accumulate the area from 76 deg C to the left. It is the cumulative sum from a low 67 deg C to 76 deg C in this instance.
This can also be interpreted as the probability of finding a value less than or equal to 76 deg C. There would be therefore 1 – 0.938 = 0.062 or 6.2% chance that are larger than 76 deg C.