Training and consultancy for testing laboratories.

Archive for June, 2018

Use of statistical tools in Excel

Statistical Tools in EXCEL

I have come across some course participants who are not aware of the fact that we can add    a plug-in of Analysis ToolPak on the popular Microsoft Excel® spreadsheet. This tool package is very useful for statistical analysis.  It uses appropriate statistical macro functions to calculate and display the results of your data analysis in an output table. Some tools even generate charts in addition to output tables.

To do so, you can simply open the Excel spreadsheet, first click the “File” tab, click “Options” , then click “Add-ins” which will show up a list of Inactive Application Add-ins under the “Manage” box. Check the “Analysis ToolPak” check box and click “OK”. The spreadsheet will automatically install this statistical package for you. You can find the “Data Analysis” tab when you click the “Data” tab on the spreadsheet tool bar as shown below:

Excel analysis toolpak

When the “Data Analysis” button is clicked, the menu as shown below shows up. You can just choose the relevant statistical tool you want to work on by select it with a click.

Excel data analysis

The regular statistical tools useful in analyzing data collected from chemical experiments are analysis of variance (ANOVA), linear regression correlation, descriptive statistics, plotting histogram, F-tests for variances and various Student’s t-tests.

 

Gosset and Student’s t-distribution

W.S. Gosset (aka Student) and t-distribution

    Whilst attending an Eurachem Scientific Workshop on June 14-15, 2018 at Dublin of Ireland, the workshop organizer arranged the Workshop Banquet at the renowned Guinness St. James Gat Brewery, whose one of the employees was William Sealy Gosset, a chemist cum statistician.

Gosset was interested in analyzing quality data obtained small sample size in his routine work on quality control of raw materials, as he noticed that it was neither practical nor economical in analyzing hundreds of data.

At that time, making statistical inferences from small sample-sized data to their population was unthinkable.  The general accepted idea was that if you were to have a large sample size, say well over 30 observations, you could use the Gaussian’s normal distribution to describe your data.

In 1906, Gosset was sent to Karl Pearson’s laboratory at the University College London  on sabbatical.  Pearson then was one of the well known scientific figures of his time, who was later credited with establishing the field of statistics.

At the laboratory, Gosset discovered the “Student’s t-distribution”, which is an important pillar of modern statistics to use small sample-sized data to infer what we could expect from the population out there.  It is the origin of the concept of “statistical significance testing”.

Why didn’t Gosset name the distribution as Gosset’s instead of Student’s?

It is interesting to note that it was because his employer, Guinness objected to his proposal to publish the findings as it did not want the competitors to know their gained advantage in using this unique procedure to select the best varieties of barley and hops for their popular beer in a way that no other business could do.

So finally Gosset published his article on Pearson’s journal Biometrika in 1908 under the pseudonym “Student”, leading to the famous “Student’s t-distribution”.

In statistics and probability studies, the t-distribution is a probability distribution in dealing with a normally distributed population whilst the sample size is not large. It uses sample standard deviation (s) to estimate the population standard deviation (s) which is unknown. For small samples, the confidence limits of the population mean are given by:

Student t formula

As the story goes, Gosset’s published paper was then mostly ignored by the statistical researchers until a young mathematician called R.A, Fisher discovered its importance and popularized it, particularly in estimating the random chance for considering a result “significant”.

Today, the t-distribution is routinely used as t- statistic tests for checking results for significance bias from true value, or for comparing measurements two sets of results and their means, and is also important for calculating confidence intervals.

This t-distribution is symmetric and resembles the normal distribution except for rather stronger “tails” due to more spread out because of the extra variability in smaller sample size.