## Training and consultancy for testing laboratories. ### Your decision rule for conformity testing

In my training workshops on decision rule for making statement of conformity after laboratory analysis of a product, some participants have found the subject of hypothesis testing rather abstract.  But in my opinion, an understanding of the significance of type I and type II error in hypothesis testing does help to formulate decision rule based on acceptable risk to be taken by the laboratory in declaring if a product tested conforms with specification.

As we know well, a hypothesis is a statement that might, or might not, be true until we put it to some statistical tests. As an analogy, a graduate studying for a Ph.D. degree always carries out research works on a certain hypothesis given by his or her supervisor. Such hypothesis may or may not be proven true at the conclusion.  Of course, a breakthrough of the research in hand means that the original hypothesis, called null hypothesis is not rejected.

In statistics, we set up the hypothesis in such as way that it is possible to calculate the probability (p) of the data, or the test statistic (such as Student’s t-tests) calculated from the data, given the hypothesis, and then to make a decision about whether this hypothesis is to be accepted (high p) or rejected (low p).

In conformity testing, we treat the specification or regulatory limit given as the ‘true’ or certified value and our measurement value obtained is the data for us to decide whether it conforms with the specification.  Hence, our null hypothesis Ho can be put forward as that there is no real difference between the measurement and the specification. Any observed difference arises from random effects only.

To make decision rule on conformance in significance testing, a choice about the value of the probability below which the null hypothesis is rejected, and a significant difference concluded, must be made. This is the probability of making an error of judgement in the decision.

If the probability that the data are consistent with the null hypothesis Ho falls below a pre-determined low value (say, alpha = 0.05 or 0.01), then the hypothesis is rejected at that probability.  Therefore, a p<0.05 would mean that we reject Ho with 95% level of confidence (or 5% error) if the probability of the test statistic, given the truth of Ho, falls below 0.05.  In other words, if Ho were indeed correct, less than 1 in 20 repeated experiments would fall outside the limits. Hence, when we reject Ho, we conclude that there was a significant difference between the measurement and the specification limit.

Gone are the days when we provide a conformance statement when the measurement result is exactly on the specification value.  By doing so, we are exposed to a 50% risk of being found wrong.  This is because we either have assumed zero uncertainty in our measurement (which cannot be true) or the specification value itself has encompassed its own uncertainty which again is not likely true.

Now, in our routine testing, we would have established the measurement uncertainty (MU) of test parameter such as contents of oil, moisture, protein, etc. Our MU as an expanded uncertainty has been evaluated by multiplying a coverage factor (normally k = 2) with the combined standard uncertainty estimated, with 95% confidence.  Assuming the MU is constant in the range of values tested, we can easily determine the critical value that is not significantly different from the specification value or regulatory limit by the use of Student’s t-test.  This is Case B in the Fig 1 below.

So, if the specification has an upper or maximum limit, any test value smaller than the critical value below the specification estimated by the Student’s t-test can be ‘safely’ claimed to be within specification (Case A).  On the other hand, any test value larger than this critical value has reduced our confidence level in claiming within specification (Case C). Do you want to claim that the test value does not meet with the specification limit although numerically it is smaller than the specification limit?   This is the dilemma that we are facing today.

The ILAC Guide G8:2009 has suggested to state “not possible to state compliance” in such situation.  Certainly, the client is not going to be pleased about it as he has used to receive your positive compliance comments even when the measurement result is exactly on the dot of the upper limit.

That is why the ISO/IEC 17025:2017 standard has required the accredited laboratory personnel to discuss his decision rule with the clients and get their written consent in the manner of reporting.

To minimize this awkward situation, one remedy is to reduce your measurement uncertainty range as much as possible, pushing the critical value nearer to the specification value. However, there is always a limit to do so because uncertainty of measurement always exists.  The critical reporting value is definitely going to be always smaller than the upper limit numerically in the above example.

Alternatively, you can discuss with the client and let him provide you his acceptance limits. In this case, your laboratory’s risk is minimized greatly as long as your reported value with its associated measurement uncertainty is well within the documented acceptance limit because your client has taken over the risk of errors in the product specification (i.e. customer risk).

Thirdly, you may want to take a certain calculated commercial risk by having the upper uncertainty limit extended into the fail zone above the upper specification limit, due to commercial reasons such as keeping good relationship with an important customer.  You may even choose to report a measurement value that is exactly on the specification limit as conformance.  However, by doing so, you are taking a 50% risk to be found err in the issued statement of conformance.  Is it worth taking such a risk? Always remember the actual meaning of measurement uncertainty (MU) which is to provide a range of values around the reported number of the test, covering the true value of the test parameter with 95% confidence.

### Controversy of LOD (Detection Limit)

The limit of detection (LOD) is an important characteristic of a test method involving trace analysis but its concept has been, and still is, one of the most controversial in analytical chemistry.   Read more …  Controversies on Limit of Detection

### Why do we perform hypothesis tests? Why do we perform hypothesis tests?

A course participant commented the other day that descriptive statistical subjects were much easier to understand and could be appreciated, but not the analytical or inferential statistics which call for logical reasoning and inferential implications of the data collected.

I think the core issue lies on the abstract nature of inferential statistics.  Hypothesis testing is a good example.  In here, we need to determine the probability of finding the data given the truth of a stated hypothesis.

A hypothesis is a statement made that might, or might not, be true.

Usually the hypothesis is set up in such a way that it is possible for us to calculate the probability (P) of the data (or the test statistic calculated from the data) given the hypothesis, and then to make a decision about whether the hypothesis is to be accepted (high P) or rejected (low P).

A particular case of a hypothesis test is one that determines whether or not the difference between two values is significant – a significance test.

For this case, we actually put forward the hypothesis that there is no real difference and the observed difference arises from random effects.  We assign this as the null hypothesis (Ho).

If the probability that the data are consistent with the null hypothesis (HO) falls below a predetermined low value (say, 0.05 or 0.01), then the HO hypothesis is rejected at that probability.

Therefore, p<005 means that if the null hypothesis were true, we would find the observed data (or more accurately, the value of the test statistic, or greater, calculated from the data) in less than 5% of repeated experiments.

To use this in significance testing, a decision about the value of the probability below which the null hypothesis is rejected, and a significance difference concluded, must be made.

In laboratory analysis, we tend to reject the null hypothesis “at the 95% level of confidence” if the probabiity of the test statistic, given the truth of HO falls below 0.05.  In other words, if HO is indeed correct, less than 5% (i.e. 1 in 20 numbers) averages of repeated experiments would fall outside the limits. In this case, it is concluded that there was a significant difference.

However, it must be stressed that the figure of 95% is a somewhat arbitrary one, arising because of the fact that (mean +2 standard deviation) covers about 95% of a population.

With modern computers and spreadsheets, it is possible to calculate the probability of the statistic given a hypothesis, leaving the analyst to decide whether to accept or reject it.

In deciding what a reasonable level to accept or reject a hypothesis is, i.e. how significant is “significant”, two scenarios, in which the wrong conclusion is arrived at, need to be considered.  Therefore, there is a “risk” in making a wrong decision at a specified probability.

A so-called Type I error is in the case where we reject a hypothesis when it is actually true. It may also be known as “a false positive ”.

The second scenario is the opposite of this, when the significance test leads to the analyst wrongly accepting the null hypothesis although in reality HO is false (a Type II error or a false negative).

We had discuss these two types of error in the short articles: https://consultglp.com/2017/12/28/type-i-and-type-ii-errors-in-significance-tests/ , and, https://consultglp.com/2017/03/01/sharing-a-story-of-type-i-error/

### Types I and II errors in significance tests Type I and type II errors in significance tests

### Sharing a story of TYPE I error

retell-a-story-of-type-i-error