Training and consultancy for testing laboratories.

Posts tagged ‘Normal distribution’

Assumptions of using ANOVA

ANOVA

Assumptions of using ANOVA

Analysis of variance (ANOVA) is useful in laboratory data analysis for significance testing.  It however, has certain assumptions that must be met for the technique to be used appropriately.  Their assumptions are somewhat similar to those of regression because both linear regression and ANOVA are really just two ways of analysis the data that use the general linear model.  Departures from these assumptions can seriously affect inferences made from the analysis of variance.

The assumptions are:

  1. Appropriateness of data

The outcome variables should be continuous, measured at the interval or ratio level, and are unbounded or valid over a wide range.  The factor (group variables) should be categorical (i.e. being an object such as Analyst, Laboratory, Temperature, etc.);

  1. Randomness and independence

Each value of the outcome variable is independent of each other value to avoid biases. There should not have any influence of the data collected. That means the samples of the group under comparison must be randomly and independently drawn from the population.

  1. Distribution

The continuous variable is approximately normally distributed within each group. This distribution of the continuous variable can be checked by creating a histogram and by a statistical test for normality such as the Anderson-Darling or the Kolmogorov-Smirnov.  However, the one-way ANOVA F-test is fairly robust against departures from the normal distribution.  As long as the distributions are not extremely different from a normal distribution, the level of significance of the ANOVA F-test is usually not greatly affected by lack of normality, particularly for large samples.

  1. Homogeneity of variance

The variance of each of the groups should be approximately equal. This assumption is needed in order to combine or pool the variances within the groups into a single within-group source of variation SSW. The  Levene statistic test can be used to check variance homogeneity.  The null hypothesis is that the variance is homogeneous, so if the Levene statistic are not statistically significant (normally at alpha <0.05), the variances are assumed to be sufficiently homogeneous to proceed in the data analysis.

 

Descriptive statistics of Excel Data Analysis Tools

It is convenient for us to use Excel to analyze our data. Indeed, Excel comes equipped with a Descriptive Statistics tool in the Data Analysis add-in package, termed Analysis ToolPak or ATP. With this tool, we get as many as 16 different descriptive statistical parameters without having to enter a single function on the worksheet….

Descriptive statistics of Excel Data Analysis Tools

 

R computations with normal distribution

Std normal distribution density A

R computations with normal distributions

There are various R functions which are useful for computation with normal distributions, such as pnorm( ), qnorm( ), and dnorm( ).

The pnorm( ) function gives the cumulative distribution function, and the alphabet ‘p’ stands for probability.  The qnorm( ) is for quantiles whilst the dnorm( ) function, the density.

Let’s use the statistical notation for normal distribution: X ~ N(µ,sigma2).  We shall illustrate the usage of these R functions.

R function pnorm( )

For example, let X ~ N(8,4), then

(a)  the probability P(X < 2) can be computed via pnorm( ) in several different ways:

> pnorm(2,mean=8,sd=2)  #P(X<=2) in N(8,4)

[1] 0.001349898

> pnorm(2,8,2)  #P(X<=2) in N(8,4) simplified

[1] 0.001349898

(b)  the probability P(X < 1.96) for x ~ N(8,4) by R language is:

> pnorm(1.96,8,2)  #P(X<=1.96) in N(8,4)

[1] 0.001263873

Remember that for f(1.96) = 0.975 and f(1.645) = 0.950, respectively from the statistics table, the R gives us the same answers:

> pnorm(1.96,0,1)  #P(X<=1.96) in N(0,1)

[1] 0.9750021

> pnorm(1.645,0,1)  #P(X<=1.645) in N(0,1)

[1] 0.9500151

> pnorm(1.645)  #P(X<=1.645) in N(0,1) simplified

[1] 0.9500151

>

And, when P(X < -1.645), the R result indicates the area on the left hand side of the normal distribution curve:

> pnorm(-1.645)  #P(X<=-1.645) in N(0,1) simplified

[1] 0.04998491

>

 R function qnorm( )

In layman’s language, a quantile is where a series of sample data is sub-divided into equal proportions. In statistics, we divide a probability distribution into areas of equal probability. The simplest division that can be envisioned is into two equal halves, i.e., 50%.

The R function: qnorm( ) is used to compute the quantiles for the standard normal distribution using its density function f.

For example,

> qnorm(0.95)  #95.0% quantile of N(0,1)

[1] 1.644854

> qnorm(0.975)  #97.5% quantile of N(0,1)

[1] 1.959964

>

 R function dnorm( )

The density of a Gaussian formulae for normal distribution can be shown to be close to 0.4 when x = 0.

The R function dnorm(0) indeed gives the same result as below:

> dnorm(0)  # Density of N(0,1) evaluated at x= 0

[1] 0.3989423

>

Further remarks

Like pnorm( ), the functions qnorm( ) and dnorm( ) can also be used for normal distributions with non-zero mean and non-zero standard deviation or variance, simply by supplying the mean and standard deviation as extra arguments.

For example, for the N(8,4) distribution,  the results are self-explanatory:

> qnorm(0.975,8,2)  # 97.5% quantile of N(8,4)

[1] 11.91993

> dnorm(1,8,2)  #Density of N(8,4) at x=1

[1] 0.0004363413

> dnorm(4,8,2)  #Density of N(8,4) at x=4

[1] 0.02699548

>

 

Review of normal probability distribution – Part II

Histogram AA

Review of normal probability distribution Part II

Review of normal probability distribution – Part I

Histogram A

Review of normal probability distribution Part I

 

How to use Excel on AD statistic test for data normality?

A-D Calculation on 25 data Chloride

How to use R to generate random numbers?

how-to-use-r-to-generate-random-numbers