R Tutorial Series: R Tutorial Series: ANOVA Pairwise Comparison Methods

When we have a statistically significant effect in ANOVA and an independent variable of more than two levels, we typically want to make follow-up comparisons. There are numerous methods for making pairwise comparisons and this tutorial will demonstrate how to execute several different techniques in R.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 participants who are divided into three stress reduction treatment groups (mental, physical, and medical). The values are represented on a scale that ranges from 1 to 5. This dataset can be conceptualized as a comparison between three stress treatment programs, one using mental methods, one using physical training, and one using medication. The values represent how effective the treatment programs were at reducing participant's stress levels, with higher numbers indicating higher effectiveness.

Beginning Steps

To begin, we need to read our dataset into R and store its contents in a variable.

> #read the dataset into an R variable using the read.csv(file) function

> dataPairwiseComparisons <- read.csv("dataset_ANOVA_OneWayComparisons.csv")

> #display the data

> dataPairwiseComparisons

The first ten rows of our dataset

Omnibus ANOVA

For the purposes of this tutorial, we will assume that the omnibus ANOVA has already been conducted and that the main effect for treatment was statistically significant. For details on this process, see the One-Way ANOVA with Pairwise Comparisons tutorial, which uses the same dataset.

Means

Let's also look at the means of our treatment groups. Here, we will use the tapply() function, along with the following arguments, to generate a table of means.

X: the data
INDEX: a list() of factor variables
FUN: the function to be applied

> #use tapply(X, INDEX, FUN) to generate a table displaying each treatment group mean

> tapply(X = dataPairwiseComparisons$StressReduction, INDEX = list(dataPairwiseComparisons$Treatment), FUN = mean)

The treatment group means

Pairwise Comparisons

We will cover five major techniques for controlling Type I error when making pairwise comparisons. These methods are no adjustment, Bonferroni's adjustment, Holm's adjustment, Fisher's LSD, and Tukey's HSD. All of these techniques will be demonstrated on our sample dataset, although the decision as to which to use in a given situation is left up to the reader.

pairwise.t.test()

Our first three methods will make use of the pairwise.t.test() function, which has the following major arguments.

x: the dependent variable
g: the independent variable
p.adj: the p-value adjustment method used to control for the family-wise Type I error rate across the comparisons; one of "none", "bonferroni", "holm", "hochberg", "hommel", "BH", or "BY"

No Adjustment

Using p.adj = "none" in the pairwise.t.test() function makes no correction for the Type I error rate across the pairwise tests. This technique can be useful for employing methods that are not already built into R functions, such as the Shaffer/Modified Shaffer, which use different alpha level divisors based on the number of levels composing the independent variable. The console results will contain no adjustment, but the researcher can manually consider the statistical significance of the p-values under his or her desired alpha level.

> #use pairwise.t.test(x, g, p.adj) to test the pairwise comparisons between the treatment group means

> #no adjustment

> pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "none")

Pairwise comparisons of treatment group means with no adjustment

With no adjustment, the mental-medical and physical-medical comparisons are statistically significant, whereas the mental-physical comparison is not. This suggests that both the mental and physical treatments are superior to the medical treatment, but that there is insufficient statistical support to distinguish between the mental and physical treatments.

Bonferroni Adjustment

The Bonferroni adjustment simply divides the Type I error rate (.05) by the number of tests (in this case, three). Hence, this method is often considered overly conservative. The Bonferroni adjustment can be made using p.adj = "bonferroni" in the pairwise.t.test() function.

> #Bonferroni adjustment

> pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "bonferroni")

Pairwise comparisons of treatment group means using Bonferroni adjustment

Using the Bonferroni adjustment, only the mental-medical comparison is statistically significant. This suggests that the mental treatment is superior to the medical treatment, but that there is insufficient statistical support to distinguish between the mental and physical treatments and the physical and medical treatments. Notice that these results are more conservative than with no adjustment.

Holm Adjustment

The Holm adjustment sequentially compares the lowest p-value with a Type I error rate that is reduced for each consecutive test. In our case, this means that our first p-value is tested at the .05/3 level (.017), second at the .05/2 level (.025), and third at the .05/1 level (.05). This method is generally considered superior to the Bonferroni adjustment and can be employed using p.adj = "holm" in the pairwise.t.test() function.

> #Holm adjustment

> pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "holm")

Pairwise comparisons of treatment group means using Holm adjustment

Using the Holm procedure, our results are practically (but not mathematically) identical to using no adjustment.

LSD Method

The Fisher Least Significant Difference (LSD) method essentially does not correct for the Type I error rate for multiple comparisons and is generally not recommended relative to other options. However, should the need arise to employ this method, one should seek out the LSD.test() function in the agricolae package, which has the following major arguments.

y: the dependent variable
trt: the independent variable
DFerror: the degrees of freedom error
MSerror: the mean squared error

Note that the DFerror and MSerror can be found in the omnibus ANOVA table.

> #load the agricolae package (install first, if necessary)

> library(agricolae)

#LSD method

#use LSD.test(y, trt, DFerror, MSerror) to test the pairwise comparisons between the treatment group means

> LSD.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, 30.5, 1.13)

Pairwise comparisons of treatment group means using LSD method

Using the LSD method, our results are practically (but not mathematically) identical to using no adjustment or the Holm procedure.

HSD Method

The Tukey Honest Significant Difference (HSD) method controls for the Type I error rate across multiple comparisons and is generally considered an acceptable technique. This method can be executed using the TukeyHSD(x) function, where x is a linear model object created using the aov(formula, data) function. Note that in this application, the aov(formula, data) function is identical to the lm(formula, data) that we are already familiar with from linear regression.

> #HSD method

> #use TukeyHSD(x), in tandem with aov(formula, data), to test the pairwise comparisons between the treatment group means

TukeyHSD(aov(StressReduction ~ Treatment, dataPairwiseComparisons))

Pairwise comparisons of treatment group means using HSD method

Using the HSD method, our results are practically (but not mathematically) identical to using the Bonferroni, Holm, or LSD methods.

Complete Pairwise Comparisons Example

To see a complete example of how various pairwise comparison techniques can be applied in R, please download the ANOVA pairwise comparisons example (.txt) file.

10 comments:

jebyrnesMarch 8, 2011 at 12:45 PM
I've been wondering about Relevant Range tests recently, and the whole panopoly of other posthocs. Ryan's Q comes to mind as one that has been shown to be quite good, but I cannot for the life of me find it implemented in R. See Day & Quinn 1989 Ecological Monographs for a bit on Ryan's Q (that regwq in SAS) and other posthocs - as well as their formulae.

I actually ended up coding up my own version of Ryan's Q for R that works fairly well, but it's not in the same nice general framework as these other tests. You can see my effort at http://homes.msi.ucsb.edu/~byrnes/r_files/ryans_q.r
AnonymousOctober 16, 2011 at 1:41 PM
Great stuff!!
AnonymousDecember 20, 2011 at 11:54 AM
Another idea would be show how to have the pairwise tests repeat for large data sets instead of working through each one.
AnonymousSeptember 30, 2012 at 1:10 AM
I have a similar problem but with Poisson distributed data. I made a Poisson GLM with a single predictor. The predictor is of factor type and has four levels. I want to compare the effect of levels. I cannot use the above methods because of the lack of normality.
AnonymousFebruary 2, 2013 at 3:05 PM
Also a 2-way ANOVA example of the multiple comparison would be useful - how is it different? I have not found any such examples.
AnonymousAugust 13, 2013 at 10:09 AM
These tutorials are great, thank you so much!
UnknownOctober 10, 2013 at 10:35 AM
There's an error in the LSD test. In the tutorial it says DFerror, but you wrote the Sum Sq in the formula. Are the results displayed made correctly or not?
vctoriaJuly 8, 2014 at 9:29 AM
what do the following stand for?: lwr upr p adj
AnonymousOctober 12, 2014 at 10:33 PM
It's lower, upper and adjusted p-value

R Tutorial Series

R Tutorial Series: ANOVA Pairwise Comparison Methods