Training and consultancy for testing laboratories.

Posts tagged ‘hypothesis testing’

Revisiting hypothesis testing

Revisiting Hypothesis Testing

Few course participants had expressed their opinions that the subject of hypothesis testing was quite abstract and they have found it hard to grasp its concept and application.  I thought otherwise. Perhaps let’s go through its basics again.

We know the study of statistics can be broadly divided into descriptive statistics and inferential or analytical statistics.   Descriptive statistical techniques (like frequency distributions, mean, standard deviation, variance, central tendency, etc.) are useful for summarizing data obtained from samples, but they also provide tools for more advanced data analysis related to a broader picture on population where the samples are drawn from, through the application of probability theories in sampling distributions and confidential intervals.  We use the analysis of sample data variation collected to infer what the situation of its population parameter is to be.

A hypothesis is an educated guess about something around us, as long as we can put it to test either by experiment or just observations. So, hypothesis testing is a statistical method that is used in making statistical decisions using experimental data.  It is basically an assumption that we make about the population parameter. In the nutshell, we want to:

  • make a statement about something
  • collect sample data relating to the statement
  • if given that the statement is true and the sample outcome is unlikely, we shall realize that the statement probably is not true.

In short, we have to make decisions about the hypothesis. The decisions are to decide if we should accept the null hypothesis or if we should reject the null hypothesis with certain level of significance.  Therefore, every test in hypothesis testing produces a significance value for that particular test.  In hypothesis testing, if the significance value of the test is greater than the predetermined significance level, then we accept the null hypothesis.  If the significance value is less than the predetermined value, then we should reject the null hypothesis.

Let us have a simple illustration.

Assume we want to know if a particular coin is fair.  We can give a statistical statement (null hypothesis, Ho) that it is a fair coin.  The alternative hypothesis, H1 or Ha, of course, is that the coin is not a fair coin.

If we were to toss the coin, say 30 times and got heads 25 times.   We take this as an unlikely outcome given it is a fair coin, we can reject the null hypothesis saying that it is a fair coin.

In the next article, we shall discuss the steps to be taken in carrying out such hypothesis testing with a set of laboratory data.

 

 

 

 

Excel functions in Hypothesis Testing

Data analysis allows us to answer questions about the data or about the population that the sample data describes.

When we ask questions like “is the alcohol level in the suspect’s blood sample significantly greater than 50 mg/100 ml?” or “does my newly developed TEST method give the same results as the standard method?”, we need to determine the probability of finding the test data given the truth of a stated hypothesis (e.g. no significant difference) – hence “hypothesis testing” or also known as “significance testing”.

A hypothesis, therefore, is an assumptive statement which might, or might not, be true. We test the truth of a hypothesis, which is known as a null hypothesis, Ho, with parameter estimation (such as mean, µ or standard deviation, s) and a calculated probability for making a decision about whether the hypothesis is to be accepted (high p -value) or rejected (lower p -value) based on a pre-set confidence level, such as p = 0.05 for 95% confidence.

Whilst making a null hypothesis, we must also be prepared for an alternative hypothesis, H1, to fall back in case the Ho is rejected after a statistic test, such as F-test or Student’s t-test. The H1 hypothesis can be one of the following statements:

H1:  sa ≠ sb (2-sided or 2-tailed)

H1:  sa > sb (1- right sided or 1- right tailed)

H1:  sa < sb (1- left sided or 1- left tailed)

Generally a simple hypothesis test is one that determines whether or not the difference between two values is significant.  These values can be means, standard deviations, or variances.  So, for this case, we actually put forward the null hypothesis Ho that there is no real difference between the two s’s, and the observed difference arises from random effects only.  If the probability that the data are consistent with the null hypothesis falling below a pre-determined low value (e.g. p = 0.05 or 0.01), then the hypothesis is rejected at that probability.

For an illustration, let’s say we have obtained a t observed value after the Student’s t-statistic testing. If the p-value calculated is small, then the observed t-value is higher than the t-critical value at the pre-determined p-value. So, we do not believe in the null hypothesis and reject it.  If, on the other hand, the p -value is large, then the observed value of t is quite likely acceptable, being below the critical t-value based on the degrees of freedom at a set confidence level, so we cannot reject the null hypothesis.

We can use the MS Excel built-in functions to find the critical values of F– and t-tests at prescribed probability level, instead of checking them from their respective tables.

In the F-test for p=0.05 and degrees of freedom v = 7 and 6, the following critical one-tail inverse values are found to be the same (4.207) under all the old and new versions of the MS Excel spreadsheet since 2010:

“=FINV(0.05,7,6)”

“=F.INV(0.95,7,6)”

“=F.INV.RT(0.05,7,6)”

But, for the t-test, the old Excel function “=TINV” for the one-tail significance testing has been found to be a bit awkward, because this function giving the t-value has assumed that it is a two-tail probability in its algorithm.

To get a one-tail inverse value, we need to double the probability value, in the form of “=TINV(0.05*2, v)”.  This make explanation to someone with lesser knowledge of statistics difficult to apprehend.

For example, if we want to find a t-value at p=0.05 with v = 5 degrees of freedom, we can have the following options:

=TINV(0.05,5) 2.5705
=TINV(0.05*2,5) 2.0150
=T.INV(0.05,5) -2.0150
=T.INV(0.95,5) 2.0150
=T.INV.2T(0.05*2,5) 2.0150

So, it looks like better to use the new function “=T.INV(0.95,5)” or absolute value of “=T.INV(0.05,5)” for the one-tail test at 95% confidence.

The following thus summarizes the use of T.INV for one- or two-tail hypothesis testing:

  1. To find the t-value for a right-sided or greater than H1 test, use =T.INV(0.95, v)
  2. To find the t-value for a left-sided or less than H1 test, use =T.INV(0.05, v)
  3. To find the t-value for a two-sided H1 test, use =T.INV.2T(0.05, v)