Tutorial Files
Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 participants who are divided into three stress reduction treatment groups (mental, physical, and medical). The values are represented on a scale that ranges from 1 to 5. This dataset can be conceptualized as a comparison between three stress treatment programs, one using mental methods, one using physical training, and one using medication. The values represent how effective the treatment programs were at reducing participant's stress levels, with higher numbers indicating higher effectiveness.Beginning Steps
To begin, we need to read our dataset into R and store its contents in a variable.
- > #read the dataset into an R variable using the read.csv(file) function
- > dataPairwiseComparisons <- read.csv("dataset_ANOVA_OneWayComparisons.csv")
- > #display the data
- > dataPairwiseComparisons
The first ten rows of our dataset
Omnibus ANOVA
For the purposes of this tutorial, we will assume that the omnibus ANOVA has already been conducted and that the main effect for treatment was statistically significant. For details on this process, see the One-Way ANOVA with Pairwise Comparisons tutorial, which uses the same dataset.Means
Let's also look at the means of our treatment groups. Here, we will use the tapply() function, along with the following arguments, to generate a table of means.- X: the data
- INDEX: a list() of factor variables
- FUN: the function to be applied
- > #use tapply(X, INDEX, FUN) to generate a table displaying each treatment group mean
- > tapply(X = dataPairwiseComparisons$StressReduction, INDEX = list(dataPairwiseComparisons$Treatment), FUN = mean)
The treatment group means
Pairwise Comparisons
We will cover five major techniques for controlling Type I error when making pairwise comparisons. These methods are no adjustment, Bonferroni's adjustment, Holm's adjustment, Fisher's LSD, and Tukey's HSD. All of these techniques will be demonstrated on our sample dataset, although the decision as to which to use in a given situation is left up to the reader.pairwise.t.test()
Our first three methods will make use of the pairwise.t.test() function, which has the following major arguments.- x: the dependent variable
- g: the independent variable
- p.adj: the p-value adjustment method used to control for the family-wise Type I error rate across the comparisons; one of "none", "bonferroni", "holm", "hochberg", "hommel", "BH", or "BY"
No Adjustment
Using p.adj = "none" in the pairwise.t.test() function makes no correction for the Type I error rate across the pairwise tests. This technique can be useful for employing methods that are not already built into R functions, such as the Shaffer/Modified Shaffer, which use different alpha level divisors based on the number of levels composing the independent variable. The console results will contain no adjustment, but the researcher can manually consider the statistical significance of the p-values under his or her desired alpha level.
- > #use pairwise.t.test(x, g, p.adj) to test the pairwise comparisons between the treatment group means
- > #no adjustment
- > pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "none")
Pairwise comparisons of treatment group means with no adjustment
With no adjustment, the mental-medical and physical-medical comparisons are statistically significant, whereas the mental-physical comparison is not. This suggests that both the mental and physical treatments are superior to the medical treatment, but that there is insufficient statistical support to distinguish between the mental and physical treatments.
Bonferroni Adjustment
The Bonferroni adjustment simply divides the Type I error rate (.05) by the number of tests (in this case, three). Hence, this method is often considered overly conservative. The Bonferroni adjustment can be made using p.adj = "bonferroni" in the pairwise.t.test() function.
- > #Bonferroni adjustment
- > pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "bonferroni")
Pairwise comparisons of treatment group means using Bonferroni adjustment
Using the Bonferroni adjustment, only the mental-medical comparison is statistically significant. This suggests that the mental treatment is superior to the medical treatment, but that there is insufficient statistical support to distinguish between the mental and physical treatments and the physical and medical treatments. Notice that these results are more conservative than with no adjustment.
Holm Adjustment
The Holm adjustment sequentially compares the lowest p-value with a Type I error rate that is reduced for each consecutive test. In our case, this means that our first p-value is tested at the .05/3 level (.017), second at the .05/2 level (.025), and third at the .05/1 level (.05). This method is generally considered superior to the Bonferroni adjustment and can be employed using p.adj = "holm" in the pairwise.t.test() function.
- > #Holm adjustment
- > pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "holm")
Pairwise comparisons of treatment group means using Holm adjustment
Using the Holm procedure, our results are practically (but not mathematically) identical to using no adjustment.
LSD Method
The Fisher Least Significant Difference (LSD) method essentially does not correct for the Type I error rate for multiple comparisons and is generally not recommended relative to other options. However, should the need arise to employ this method, one should seek out the LSD.test() function in the agricolae package, which has the following major arguments.- y: the dependent variable
- trt: the independent variable
- DFerror: the degrees of freedom error
- MSerror: the mean squared error
- > #load the agricolae package (install first, if necessary)
- > library(agricolae)
- #LSD method
- #use LSD.test(y, trt, DFerror, MSerror) to test the pairwise comparisons between the treatment group means
- > LSD.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, 30.5, 1.13)
Pairwise comparisons of treatment group means using LSD method
Using the LSD method, our results are practically (but not mathematically) identical to using no adjustment or the Holm procedure.
HSD Method
The Tukey Honest Significant Difference (HSD) method controls for the Type I error rate across multiple comparisons and is generally considered an acceptable technique. This method can be executed using the TukeyHSD(x) function, where x is a linear model object created using the aov(formula, data) function. Note that in this application, the aov(formula, data) function is identical to the lm(formula, data) that we are already familiar with from linear regression.
- > #HSD method
- > #use TukeyHSD(x), in tandem with aov(formula, data), to test the pairwise comparisons between the treatment group means
- TukeyHSD(aov(StressReduction ~ Treatment, dataPairwiseComparisons))
Pairwise comparisons of treatment group means using HSD method
Using the HSD method, our results are practically (but not mathematically) identical to using the Bonferroni, Holm, or LSD methods.
I've been wondering about Relevant Range tests recently, and the whole panopoly of other posthocs. Ryan's Q comes to mind as one that has been shown to be quite good, but I cannot for the life of me find it implemented in R. See Day & Quinn 1989 Ecological Monographs for a bit on Ryan's Q (that regwq in SAS) and other posthocs - as well as their formulae.
ReplyDeleteI actually ended up coding up my own version of Ryan's Q for R that works fairly well, but it's not in the same nice general framework as these other tests. You can see my effort at http://homes.msi.ucsb.edu/~byrnes/r_files/ryans_q.r
Great stuff!!
ReplyDeleteAnother idea would be show how to have the pairwise tests repeat for large data sets instead of working through each one.
ReplyDeleteThat would be really great.
DeleteI have a similar problem but with Poisson distributed data. I made a Poisson GLM with a single predictor. The predictor is of factor type and has four levels. I want to compare the effect of levels. I cannot use the above methods because of the lack of normality.
ReplyDeleteAlso a 2-way ANOVA example of the multiple comparison would be useful - how is it different? I have not found any such examples.
ReplyDeleteThese tutorials are great, thank you so much!
ReplyDeleteThere's an error in the LSD test. In the tutorial it says DFerror, but you wrote the Sum Sq in the formula. Are the results displayed made correctly or not?
ReplyDeletewhat do the following stand for?: lwr upr p adj
ReplyDeleteIt's lower, upper and adjusted p-value
ReplyDelete