### A step-by-step ANOVA example on Sampling and Analysis

A step-by-step ANOVA example on Sampling and Analysis

There is a growing interest in sampling and sampling uncertainty amongst laboratory analysts. This is mainly because the newly revised ISO/IEC 17025 accreditation standards to be implemented soon has added in new requirements for sampling and estimating its uncertainty, as the standard reckons that the test result is as good as the sample that is based on, and hence the importance of representative sampling cannot be over emphasized.

Like measurement uncertainty, appropriate statistical methods involving the analysis of variance (frequently abbreviated to ANOVA) have to be applied to estimate the sampling uncertainty. Strictly speaking, the uncertainty of a measurement result has two contributing components, i.e. sampling uncertainty and analysis uncertainty. We have been long ignoring this important contributor for all these years.

ANOVA indeed is a very powerful statistical technique which can be used to separate and estimate the different causes of variation.

It is simple to compare two mean values obtained from two samples upon testing to see whether they differ significantly by a Student’s t-test. But in analytical work, we are often confronted with more than two means for comparison. For example, we may wish to compare the mean concentrations of protein in a sample solution stored under different temperature and holding time; we may also want to compare the concentration of an analyte by several test methods.

In the above examples, we have two possible sources of variation. The first, which is always present, is due to the inherent random error in measurement. This within-sample variation can be estimated through series of repeated testing.

The second possible source of variation is due to what is known as controlled or fixed-effect and random-fixed factors: in the above example on protein analysis, the controlled factors are respectively the temperature, holding time and the method of analysis used for comparing test results. ANOVA then statistically analyzes the between-sample variation.

If there is one factor, either controlled or random, the type of statistical analysis is known as one-way ANOVA. When there are two or more factors involved, there is a possibility of interaction between variables. In this case, we conduct two-way ANOVA or multi-way ANOVA.

On this blog site, several short articles on ANOVA have been previously presented. Valuable comments are always welcome.

https://consultglp.com/2017/04/04/anova-variance-testing-an-important-statistical-tool-to-know/

https://consultglp.com/wp-content/uploads/2017/01/analysis-of-variance-anova-revisited.pdf

https://consultglp.com/wp-content/uploads/2016/10/the-arithmetic-of-anova-calculations.pdf

https://consultglp.com/wp-content/uploads/2017/01/how-to-interpret-an-anova-table.pdf

The open source R programing language is a free software environment for statistical computing and graphics, and is easy to master. The official website is https://www.r-project.org/ . It can run on a wide variety of UNIX platforms, Windows and MacOS.

On September 24, 2016, this blog site published an article on how to use R to generate random numbers (https://consultglp.com/2016/09/24/how-to-use-r-to-generate-random-numbers/) . In light of the newly revised ISO/IEC 17025 accreditation standards embracing sampling as another important criterion for technical competence assessment, the random number function of R becomes very handy for cargo surveyors and samplers to prepare their sampling plan on cargo shipment.

We can use the random number function of R to create a random number table to suit the needs in randomly selecting samples for laboratory quality analysis.

For example, there is a shipment of 1000 bags of coffee beans in a warehouse to be surveyed prior to be dispatched to port. The buyer requires a 5% sampling for laboratory quality testing. That means some 50 bags have to be random selected before composite a portion of each bag into a suitable sized test sample through a quartering sub-sampling process.

The sampling plan, therefore, can be the following process:

1. Label each bag with a sequential number

2. Create 50 numbers in a random number table with the R command language:

> RandSampling=sample(500,50)

> dim(RandSampling)=c(10,5)

> RandSampling

[,1] [,2] [,3] [,4] [,5]

[1,] 154 424 84 486 82

[2,] 78 214 275 498 388

[3,] 93 104 478 148 258

[4,] 229 283 96 479 489

[5,] 487 211 216 59 263

[6,] 94 450 47 201 105

[7,] 330 121 130 276 56

[8,] 11 415 303 240 407

[9,] 427 60 71 142 409

[10,] 101 238 228 441 355

>

3. Sample a portion (say, 500g) of the coffee beans from the bags with these selected numbers into a large sampling bag.

4. Conduct a sample quartering process on site to reduce the test sample size to about 2.5 kg before sending to the laboratory for analysis.

By definition, sampling involves selecting a portion of material (i.e. sample or samples) from a substance, material, product or even a consignment of goods to represent or provide information about that larger body of material (i.e. *population*).

Although the ISO/IEC 17025 accreditation standards and also its latest revision are still concerned about the technical competence of organizations that conduct laboratory activities, it is reckoned that the reliability of testing result lies on how representative the sample analyzed is for the bulk material of interest. As the saying goes : “*The result is not better than the sample that it is based on*”.

A question has thus been asked: Can sampling be considered as a stand alone activity or should sampling activities always be associated with testing or calibration?

It is obvious that although the scope of this accreditation standards is with laboratory activities, sampling indeed has an inevitable connection to the laboratory’s analytical process that produces the test result which is important to the end users. It is usually impossible to analyze the whole bulk (or lot) of material (statistically called ‘population’ or ‘sampling target’). Therefore, proper sampling plays an essential role to ensure the validity of the final test result.

The FDIS requires laboratory to have a sampling plan and documented procedures for sampling in their field of testing. The laboratory is allowed to state “analyzed as received” in the report if it has not been responsible for the sampling stage. Also, laboratory performing sampling or testing activities shall evaluate measurement uncertainty, *i.e*. the uncertainty of sampling process is to be evaluated and forms an additional uncertainty contributor to the measurement uncertainty evaluation of the whole testing or calibration process.

Even though a laboratory does not get involved in the sampling of a population outside its premises, it often carries out sub-sampling process before the start of the analytical procedure. Therefore the subject of sampling cannot be ignored also in such laboratory.

As said earlier, since the main purpose of laboratory analysis is to estimate the value of analyte concentration in a sampling target, sample taken should be as representative of the sampling target under study as possible. This is to ensure that the property that each sample has the ** same probability** of being drawn from the population as another sample.

So, in order to optimize the whole measurement process including taking a good sample for analysis, the sampling planner needs to gather information of the sampling target and to decide on appropriate sampling protocols.

Ask the following questions during information gathering:

- What are the analyte(s) to be determined?
- Is the measurand in the bulk material homogeneously or heterogeneously distributed?
- What is the kind of average sample required: hourly, daily, by shift, batch, shipment, ?
- Are all the necessary personnel and equipment available?
- What is the uncertainty level of the analyte allowed in the specification, if any? –
*this information is particularly important for deciding on the number of samples to be taken for analysis*

To decide on the appropriate sampling protocols, one must try to deal with the following subjects:

- Manual vs automatic sampling
- Sampling frequency – number of samples to be drawn
- Sample sizes (volume, weight)
- Number of samples to be drawn
- Sampling locations (ship’s tanks, silos, warehouses)
- Individual vs composite samples
- Which sampling strategy
- Random sampling
- Stratified random sampling
- Systematic sampling

We shall discuss these sampling strategies in more details in the next blog.

- Design of Experiments
- ANOVA
- Randomization
- Linear regression
- Significance testing
- outliers
- Sampling
- Probability distribution
- uncertainty
- GUM
- Sampling statistics
- Probability
- Microbiology
- Degrees of freedom
- Median
- Monte Carlo
- Measurement error
- F-test
- Confidence interval
- propagation of uncertainty
- p-Value
- Control chart
- Anderson-Darling
- ISO 17025
- Decision Rule
- t-test
- Confidence limits
- Normal distribution
- IQR
- Variance
- Law of Averages
- Coverage factor
- How to
- ISO FDIS 17025
- Risk
- Cross-checks
- Detection limit
- Factorial design
- Chi-square
- interlab comparison
- Central limit theorem
- quartile
- Divisor
- Type I and II errors
- Precision
- Accuracy
- Aerobic plate count
- Compliance
- hypothesis testing

## Recent Comments