Assumptions of using ANOVA
Analysis of variance (ANOVA) is useful in laboratory data analysis for significance testing. It however, has certain assumptions that must be met for the technique to be used appropriately. Their assumptions are somewhat similar to those of regression because both linear regression and ANOVA are really just two ways of analysis the data that use the general linear model. Departures from these assumptions can seriously affect inferences made from the analysis of variance.
The assumptions are:
- Appropriateness of data
The outcome variables should be continuous, measured at the interval or ratio level, and are unbounded or valid over a wide range. The factor (group variables) should be categorical (i.e. being an object such as Analyst, Laboratory, Temperature, etc.);
- Randomness and independence
Each value of the outcome variable is independent of each other value to avoid biases. There should not have any influence of the data collected. That means the samples of the group under comparison must be randomly and independently drawn from the population.
The continuous variable is approximately normally distributed within each group. This distribution of the continuous variable can be checked by creating a histogram and by a statistical test for normality such as the Anderson-Darling or the Kolmogorov-Smirnov. However, the one-way ANOVA F-test is fairly robust against departures from the normal distribution. As long as the distributions are not extremely different from a normal distribution, the level of significance of the ANOVA F-test is usually not greatly affected by lack of normality, particularly for large samples.
- Homogeneity of variance
The variance of each of the groups should be approximately equal. This assumption is needed in order to combine or pool the variances within the groups into a single within-group source of variation SSW. The Levene statistic test can be used to check variance homogeneity. The null hypothesis is that the variance is homogeneous, so if the Levene statistic are not statistically significant (normally at alpha <0.05), the variances are assumed to be sufficiently homogeneous to proceed in the data analysis.