R Tutorial Series: R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate datasets for our omnibus ANOVA and follow-up comparisons. This tutorial will demonstrate how the reshape package can be used to simplify the ANOVA data organization process in R.

Tutorial Files

Before we begin, you may want to download the between group and repeated measures datasets (.csv) used in this tutorial. Be sure to right-click and save the files to your R working directory. The between groups dataset contains a hypothetical sample of 30 cases separated into three groups (a, b, and c). The repeated measures dataset contains a hypothetical sample of 10 cases across three measurements (a, b, and c). In both cases, the values are represented on a scale that ranges from 1 to 5.

Beginning Steps

To begin, we need to read our datasets into R and store their contents in variables.

> #read the datasets into R variables using the read.csv(file) function

> dataBetween <- read.csv("dataset_ANOVA_reshape_1.csv")

> dataRepeated <- read.csv("dataset_ANOVA_reshape_2.csv")

Reshape Package

Next, we need to install and load the reshape package. In this tutorial, we will make use of the package's cast() and melt() functions.

> #install the package

> install.packages("reshape")

> #load the package

> library(reshape)

Using cast() to Derive ANOVA Descriptives

The cast() function can be used to easily derive summary statistics for a between groups ANOVA dataset. The cast() function receives the following primary arguments.

data: the dataset
formula: in our case, a one-sided formula indicating the grouping variable
fun.aggregate: a function or vector of functions for deriving summary statistics, such as mean, var, or sd

> #display the raw between groups data

> dataBetween

The raw between groups data

> #cast the between groups data using cast(data, formula, fun.aggregate) to get the group means

> cast(dataBetween, formula = ~group, fun.aggregate = mean)

The casted data with means

Note that the fun.aggregate argument can also receive a vector of summary statistics functions. This will yield all of the requested descriptives via a single cast() function.

> #cast the between groups data using cast(data, formula, fun.aggregate) to get the group means, variances, and standard deviations

> cast(dataBetween, formula = ~group, fun.aggregate = c(mean, var, sd))

The casted data with descriptives

Using melt() to Prepare Repeated Measures Data for Pairwise Comparisons

The melt() function can be used to morph a repeated measures ANOVA dataset prior to conducting pairwise comparisons. The melt() function receives the following primary arguments.

data: the dataset
id.vars: the id variable or a vector of values that can be used as ids
measure.vars: a vector containing the variables to be melted
variable_name: the name of the column containing the melted variables

> #display the repeated measures data

> dataRepeated

The raw repeated measures data

> #melt the repeated measures data using melt(data, id.vars, measure.vars, variable_name) to organize it for pairwise comparisons

> melt(dataRepeated, id.vars = "case", measure.vars = c("valueA", "valueB", "valueC"), variable_name = "abcValues")

The melted repeated measures data

Note that the data are now prepared to be used in the pairwise.t.test() function. See the One-Way ANOVA with Pairwise Comparisons tutorial for details on using the pairwise.t.test() function.

Complete ANOVA Reshape Example

To see a complete example of how ANOVA data can be organized using the reshape package in R, please download the ANOVA reshape example (.txt) file.

4 comments:

AnonymousMarch 14, 2011 at 4:37 PM
Very helpfull, thanks Alot!
AndreaSeptember 25, 2011 at 7:48 AM
great! but how can I re shape my data set to be used in two-way repeated measures ANOVA? I not only have two factors (besides time - repeated measure) but several repetitions for each one.
JohnSeptember 25, 2011 at 8:28 AM
Hi Andrea,

I haven't done this, but it seems that you could melt both variables separately, then recombine them into a single dataset. There may also be a way to melt them both at once, but I have not seen this. See the Two-Way Repeated Measures ANOVA article for the proper data setup.

John
AnonymousFebruary 20, 2012 at 11:02 PM
Thank you.

There is an article published @ http://www.jstatsoft.org/v21/i12/paper

Santosh

R Tutorial Series

R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data