By John M Quick

The R Tutorial Series provides a collection of user-friendly tutorials to people who want to learn how to use R for statistical analysis.


My Statistical Analysis with R book is available from Packt Publishing and Amazon.


R Tutorial Series: Two-Way ANOVA with Unequal Sample Sizes

When the sample sizes within the levels of our independent variables are not equal, we have to handle our ANOVA differently than in the typical two-way case. This tutorial will demonstrate how to conduct a two-way ANOVA in R when the sample sizes within each level of the independent variables are not the same.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 students who were exposed to one of two learning environments (offline or online) and one of two methods of instruction (classroom or tutor), then tested on a math assessment. Possible math scores range from 0 to 100 and indicate how well each student performed on the math assessment. Each student participated in either an offline or online learning environment and received either classroom instruction (i.e. one to many) or instruction from a personal tutor (i.e. one to one).

Beginning Steps

To begin, we need to read our dataset into R and store its contents in a variable.
  1. > #read the dataset into an R variable using the read.csv(file) function
  2. > dataTwoWayUnequalSample <- read.csv("dataset_ANOVA_TwoWayUnequalSample.csv")
  3. > #display the data
  4. > dataTwoWayUnequalSample

The first ten rows of our dataset

Unequal Sample Sizes

In our study, 16 students participated in the online environment, whereas only 14 participated in the offline environment. Further, 20 students received classroom instruction, whereas only 10 received personal tutor instruction. As such, we should take action to compensate for the unequal sample sizes in order to retain the validity of our analysis. Generally, this comes down to examining the correlation between the factors and the causes of the unequal sample sizes en route to choosing whether to use weighted or unweighted means - a decision which can drastically impact the results of an ANOVA. This tutorial will demonstrate how to conduct ANOVA using both weighted and unweighted means. Thus, the ultimate decision as to the use of weighted or unweighted means is left up to each individual and his or her specific circumstances.

Weighted Means

First, let's suppose that we decided to go with weighted means, which take into account the correlation between our factors that results from having treatment groups with different sample sizes. A weighted mean is calculated by simply adding up all of the values and dividing by the total number of values. Consequently, we can easily derive the weighted means for each treatment group using our subset(data, condition) and mean(data) functions.
  1. > #use subset(data, condition) to create subsets for each treatment group
  2. > #offline subset
  3. > offlineData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$environment == "offline")
  4. > #online subset
  5. > onlineData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$environment == "online")
  6. > #classroom subset
  7. > classroomData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$instruction == "classroom")
  8. > #tutor subset
  9. > tutorData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$instruction == "tutor")
  10. > #use mean(data) to calculate the weighted means for each treatment group
  11. > #offline weighted mean
  12. > mean(offlineData$math)
  13. > #online weighted mean
  14. > mean(onlineData$math)
  15. > #classroom weighted mean
  16. > mean(classroomData$math)
  17. > #tutor weighted mean
  18. > mean(tutorData$math)

The weighted means for the environment and instruction conditions

ANOVA using Type I Sums of Squares

When applying weighted means, it is suggested that we use Type I sums of squares (SS) in our ANOVA. Type I happens to be the default SS used in our standard anova(object) function, which will be used to execute our analysis. Note that in the case of two-way ANOVA, the ordering of our independent variables matters when using weighted means. Therefore, we must run our ANOVA two times, once with each independent variable taking the lead. However, the interaction effect is not affected by the ordering of the independent variables.
  1. > #use anova(object) to execute the Type I SS ANOVAs
  2. > #environment ANOVA
  3. > anova(lm(math ~ environment * instruction, dataTwoWayUnequalSample))
  4. > #instruction ANOVA
  5. > anova(lm(math ~ instruction * environment, dataTwoWayUnequalSample))

The Type I SS ANOVA results. Note the differences in main effects based on the ordering of the independent variables.

These results indicate statistically insignificant main effects for both the environment and instruction variables, as well as the interaction between them.

Unweighted Means

Now let's turn to using unweighted means, which essentially ignore the correlation between the independent variables that arise from unequal sample sizes. An unweighted mean is calculated by taking the average of the individual group means. Thus, we can derive our unweighted means by summing the means of each level of our independent variables and dividing by the total number of levels. For instance, to find the unweighted mean for environment, we will add the means for our offline and online groups, then divide by two.
  1. > #use mean(data) and subset(data, condition) to calculate the unweighted means for each treatment group
  2. > #offline unweighted mean = (classroom offline mean + tutor offline mean) / 2
  3. (mean(subset(offlineData$math, offlineData$instruction == "classroom")) + mean(subset(offlineData$math, offlineData$instruction == "tutor"))) / 2
  4. > #online unweighted mean = (classroom online mean + tutor online mean) / 2
  5. > (mean(subset(onlineData$math, onlineData$instruction == "classroom")) + mean(subset(onlineData$math, onlineData$instruction == "tutor"))) / 2
  6. > #classroom unweighted mean = (offline classroom mean + online classroom mean) / 2
  7. > (mean(subset(classroomData$math, classroomData$environment == "offline")) + mean(subset(classroomData$math, classroomData$environment == "online"))) / 2
  8. > #tutor unweighted mean = (offline tutor mean + online tutor mean) / 2
  9. > (mean(subset(tutorData$math, tutorData$environment == "offline")) + mean(subset(tutorData$math, tutorData$environment == "online"))) / 2

The unweighted means for the environment and instruction conditions

ANOVA using Type III Sums of Squares

When applying unweighted means, it is suggested that we use Type III sums of squares (SS) in our ANOVA. Type III SS can be set using the type argument in the Anova(mod, type) function, which is a member of the car package.
  1. > #load the car package (install first, if necessary)
  2. > library(car)
  3. > #use the Anova(mod, type) function to conduct the Type III SS ANOVA
  4. > Anova(lm(math ~ environment * instruction, dataTwoWayUnequalSample), type = "3")

The Type III SS ANOVA results.

Once again, our ANOVA results indicate statistically insignificant main effects for both the environment and instruction variables, as well as the interaction between them. However, it is worth noting that both the means and p-values are different when using unweighted means and Type III SS compared to weighted means and Type I SS. In certain cases, this difference can be quite pronounced and lead to entirely different outcomes between the two methods. Hence, choosing the appropriate means and SS for a given analysis is a matter that should be approached with conscious consideration.

Pairwise Comparisons

Note that since our independent variables contain only two levels, there is no need to conduct follow-up comparisons. However, should you reach this point with a statistically significant independent variable of more than three levels, you could conduct pairwise comparisons in the same manner as demonstrated in the Two-Way ANOVA with Comparisons tutorial.

Complete Two-Way ANOVA with Unequal Sample Sizes Example

To see a complete example of how two-way ANOVA with unequal sample sizes can be conducted in R, please download the two-way ANOVA with unequal sample sizes example (.txt) file.

R Tutorial Series: Two-Way Repeated Measures ANOVA

Repeated measures data require a different analysis procedure than our typical two-way ANOVA and subsequently follow a different R process. This tutorial will demonstrate how to conduct two-way repeated measures ANOVA in R using the Anova() function from the car package.
Note that the two-way repeated measures ANOVA process can be very complex to organize and execute in R. Although it has been distilled into just a few small steps in this guide, it is recommended that you fully and precisely complete the example before experimenting with your own data. As you will see, organization of the raw data is critical to successfully conducting a two-way repeated measures ANOVA using the demonstrated technique.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) and sample idata frame (.csv) used in this tutorial. Be sure to right-click and save the files to your R working directory. This dataset contains a hypothetical sample of 30 participants whose interest in school and interest in work was measured at three different ages (10, 15, and 20). The interest values are represented on a scale that ranges from 1 to 5 and indicate how interested each participant was in a given topic at each given age.

Data Setup

Notice that our data are arranged differently for a repeated measures ANOVA. In a typical two-way ANOVA, we would place all of the values of our independent variable in a single column and identify their respective levels with a second column, as demonstrated in this sample two-way dataset. In a two-way repeated measures ANOVA, we instead combine each independent variable with its time interval, thus yielding columns for each pairing. Hence, rather than having one vertical column for school interest and one for work interest, with a second column for age, we have six separate columns for interest, three for school interest and three for work interest at each age level. The following graphic is intended to help demonstrate this organization method.

Treat time as if it were an independent variable. Then combine each independent variable with each level of time and arrange the columns horizontally.

Beginning Steps

To begin, we need to read our dataset into R and store its contents in a variable.
  1. > #read the dataset into an R variable using the read.csv(file) function
  2. > dataTwoWayRepeatedMeasures <- read.csv("dataset_ANOVA_TwoWayRepeatedMeasures.csv")
  3. > #display the data
  4. > #notice the atypical column arrangement for repeated measures data
  5. > dataTwoWayRepeatedMeasures

The first ten rows of our dataset

idata Frame

Another item that we need to import for this analysis is our idata frame. This object will be used in our Anova() function to define the structure of our analysis.
  1. > #read the idata frame into an R variable
  2. > idataTwoWayRepeatedMeasures <- read.csv("idata_ANOVA_TwoWayRepeatedMeasures.csv")
  3. > #display the idata frame
  4. > #notice the text values and correspondence between our idata rows and the columns in our dataset
  5. > idataTwoWayRepeatedMeasures

The idata frame

Note that it is critical that your idata frame take the demonstrated form for this technique to work. I experimented with several alternative, perhaps more intuitive, layouts without success. It is particularly important to notice that both columns of the idata frame contain text values (not numerical ones - hence the repeated prefixing of Age to the values in every row of the Age column). Additionally, if you read the rows of the idata frame horizontally, you will see that they correspond precisely to the columns of our dataset. The following graphic is intended to help demonstrate this organization method.


Use only text values in your idata frame. Ensure that the rows of your idata frame correspond to the columns in your dataset.

Linear Model

Prior to executing our analysis, we must follow two steps to formulate our linear model to be used in the Anova() function.

Step 1: Bind the Columns

  1. > #use cbind() to bind the columns of the original dataset
  2. > interestBind <- cbind(dataTwoWayRepeatedMeasures$schoolAge10, dataTwoWayRepeatedMeasures$schoolAge15, dataTwoWayRepeatedMeasures$schoolAge20, dataTwoWayRepeatedMeasures$workAge10, dataTwoWayRepeatedMeasures$workAge15, dataTwoWayRepeatedMeasures$workAge20)

Step 2: Define the Model

  1. > #use lm() to generate a linear model using the bound columns from step 1
  2. > interestModel <- lm(interestBind ~ 1)

Anova(mod, idata, idesign) Function

Typically, researchers will choose one of several techniques for analyzing repeated measures data, such as an epsilon-correction method, like Huynh-Feldt or Greenhouse-Geisser, or a multivariate method, like Wilks' Lambda or Hotelling's Trace. Conveniently, having already prepared our data, we can employ a single Anova(mod, idata, idesign) function from the car package to yield all of the relevant repeated measures results. This allows us simplicity in that only a single function is required, regardless of the technique that we wish to employ. Thus, witnessing our outcomes becomes as simple as locating the desired method in the cleanly printed results.
Our Anova(mod, idata, idesign) function will be composed of three arguments. First, mod will contain our linear model. Second, idata will contain our data frame. Third, idesign will contain a multiplication of the row headings from our idata frame (in other words, our independent variables), preceded by a tilde (~). Thus, our final function takes on the following form.
  1. > #load the car package (install first, if necessary)
  2. library(car)
  3. > #compose the Anova(mod, idata, idesign) function
  4. > analysis <- Anova(interestModel, idata = idataTwoWayRepeatedMeasures, idesign = ~Interest * Age)

Results Summary

Finally, we can use the summary(object) function to visualize the results of our repeated measures ANOVA.
  1. > #use summary(object) to visualize the results of the repeated measures ANOVA
  2. > summary(analysis)

Relevant segment of repeated measures ANOVA results

Supposing that we are interested in the Wilks' Lambda method, we can see that there is a statistically significant interaction effect between interest in school and interest in work across the age groups (p < .001). This suggests that we should further examine our data at the level of simple main effects. For more information investigating on simple main effects, see the Two-Way ANOVA with Interactions and Simple Main Effects tutorial. Of course, in this case of repeated measures ANOVA, another way to break the data down would be to run two one-way repeated measures ANOVAs, one for each of the independent variables. In either instance, pairwise comparisons can be conducted to determine the significance of the differences between the levels of any significant effects.

Complete Two-Way Repeated Measures ANOVA Example

To see a complete example of how two-way repeated measures ANOVA can be conducted in R, please download the two-way repeated measures ANOVA example (.txt) file.

References

Moore, Colleen. (n.d.). 610 R9 -- Two-way Repeated-measures Anova. Retrieved January 21, 2011 from http://psych.wisc.edu/moore/Rpdf/610-R9_Within2way.pdf

Book Review: R Graphs Cookbook

Book Information

Mittal, H. (2011). R graphs cookbook. Birmingham, UK: Packt Publishing Ltd.

Audience

The book's stated audience is anyone who is familiar with the basics of R, as well as expert users who are looking for a graphical reference. However, it is my opinion that the book is better suited for advanced users who are already somewhat familiar with R graphics and are very comfortable with programming in R.

Content

To begin, the first chapter of R Graphs Cookbook rapidly introduces all of the major graphic types covered in the book. Next, in Chapter two, readers are acquainted with various arguments and modification functions that are used throughout the book to customize and enhance visuals. Subsequently, individual chapters focus on specific topics in R graphics, such as:

  1. scatterplots,
  2. line and time series charts,
  3. bar, dot, and pie charts,
  4. histograms,
  5. box plots,
  6. heat and contour maps,
  7. geographical maps, and
  8. exporting and annotating graphics.

Analysis

I will start with some general impressions, before moving into chapter by chapter analyses.

First, I feel that the book needs both more and larger screenshots. Often times, recipes are without any visuals and most of the time only one is present, whereas one per major graphical modification is expected. Furthermore, the screenshots are too small. These are critical items to neglect in a book that explicitly deals with visuals. Fortunately, full-size, full-color images are provided with the downloadable code for the book.

Second, I feel that the topics presented in the book are glanced over with far too little explanation. This is the main reason that I feel it is not suited for those who are not already well versed in R programming. Moreover, R Graphs Cookbook frequently refers the reader to help documentation or to other books on R, which can be frustrating. I personally feel that a book should be largely self-contained, at least when discussing topics within its scope.

Third, I believe that the book could be better organized for use as a fast reference guide and that it generally could be better structured to present information. For example, rather than tables are a clearer way to present head to head comparisons between objects, and lists are better for describing several function arguments.

On the other hand, I do like the book's code formatting, which displays one argument per line. While this could confuse novice users into thinking that each argument is a separate line of executable code, most readers should find this a welcomed organization style for often lengthy graphics functions. I also enjoyed how the see also sections at the end of each recipe let me know whether more recipes would build on a given topic.

Continuing, chapter one felt like a whirlwind of information that charged forward with a lack of purpose, organization, and explanation. Chapter two was much better, offering several nice recipes that were fast and easy to digest, with just enough information provided.

Chapter three takes an in-depth look at scatterplots and provides a number of useful recipes, such as how to group data, label points, generate error bars, and create graphical correlation matrices. Similarly, chapter four provides a solid collection of recipes for time series and line charts.

In contrast, chapters five through seven cover a disappointingly sparse amount of material related to their respective topics. Unfortunately, they do not stretch far beyond what is covered in the two graphics-focused chapters of Statistical Analysis with R, which is a guide for newcomers and early beginners. From an advanced reference like R Graphs Cookbook, I expected broader coverage. For instance, very few external packages are presented in this book, with the author choosing to focus on built-in graphics functions almost exclusively. An introduction to external options, such as ggplot2, would be warmly welcomed.

Chapters eight and nine relate a few of the lesser covered topics in R, including heat, contour, and geographical maps. These chapters will likely be informative and valuable for readers interested in these graphical applications.

Lastly, chapter ten deals with the presentation and exportation of graphics. While I wish a deeper exploration was made, there are some useful tips in is chapter. Namely, the use of the expression() function to annotate graphics is well covered.

Brief Summary

  • Title: R Graphs Cookbook
  • Author: Hrishi Mittal
  • Where To Find: Packt Publishing
  • Audience: those who are comfortable programming in R, able to mix, match, apply, and extend recipes for their own purposes, and looking to learn more about R's built-in graphical capabilities.
  • Content: a loosely associated collection of recipes for applying R's built-in graphics functions to create the most common types of charts, graphs, plots, and maps.
  • Analysis: although it could have better visuals, structure, and coverage, it is likely that almost any reader will be able to take away valuable techniques from this book
  • Arbitrary Rating: 6/10
  • Recommendation: take a look at the table of contents and count the number of recipes that would both be useful to you and that you do not already know how to accomplish to get an idea of how much you will take away from this book; also read the free sample chapter
  • Disclaimer: I received a review copy of this book

R Tutorial Series: One-Way Repeated Measures ANOVA

Repeated measures data require a different analysis procedure than our typical one-way ANOVA and subsequently follow a different R process. This tutorial will demonstrate how to conduct one-way repeated measures ANOVA in R using the Anova(mod, idata, idesign) function from the car package.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 participants whose interest in voting was measured at three different ages (10, 15, and 20). The interest values are represented on a scale that ranges from 1 to 5 and indicate how interested each participant was in voting at each given age.

Data Setup

Notice that our data are arranged differently for a repeated measures ANOVA. In a typical one-way ANOVA, we would place all of the values of our independent variable in a single column and identify their respective levels with a second column, as demonstrated in this sample one-way dataset. In a repeated measures ANOVA, we instead treat each level of our independent variable as if it were a variable, thus placing them side by side as columns. Hence, rather than having one vertical column for voting interest, with a second column for age, we have three separate columns for voting interest, one for each age level.

Beginning Steps

To begin, we need to read our dataset into R and store its contents in a variable.
  1. > #read the dataset into an R variable using the read.csv(file) function
  2. > dataOneWayRepeatedMeasures <- read.csv("dataset_ANOVA_OneWayRepeatedMeasures.csv")
  3. > #display the data
  4. > #notice the atypical column arrangement for repeated measures data
  5. > dataOneWayRepeatedMeasures

The first ten rows of our dataset

Preparing the Repeated Measures Factor

Prior to executing our analysis, we must follow a small series of steps in order to prepare our repeated measures factor.

Step 1: Define the Levels

  1. > #use c() to create a vector containing the number of levels within the repeated measures factor
  2. > #create a vector numbering the levels for our three voting interest age groups
  3. > ageLevels <- c(1, 2, 3)

Step 2: Define the Factor

  1. > #use as.factor() to create a factor using the level vector from step 1
  2. > #convert the age levels into a factor
  3. > ageFactor <- as.factor(ageLevels)

Step 3: Define the Frame

  1. > #use data.frame() to create a data frame using the factor from step 2
  2. > #convert the age factor into a data frame
  3. > ageFrame <- data.frame(ageFactor)

Step 4: Bind the Columns

  1. > #use cbind() to bind the levels of the factor from the original dataset
  2. > #bind the age columns
  3. > ageBind <- cbind(dataOneWayRepeatedMeasures$Interest10, dataOneWayRepeatedMeasures$Interest15, dataOneWayRepeatedMeasures$Interest20)

Step 5: Define the Model

  1. > #use lm() to generate a linear model using the bound factor levels from step 4
  2. > #generate a linear model using the bound age levels
  3. > ageModel <- lm(ageBind ~ 1)

Employing the Anova(mod, idata, idesign) Function

Typically, researchers will choose one of several techniques for analyzing repeated measures data, such as an epsilon-correction method, like Huynh-Feldt or Greenhouse-Geisser, or a multivariate method, like Wilks' Lambda or Hotelling's Trace. Conveniently, having already prepared our data, we can employ a single Anova(mod, idata, idesign) function from the car package to yield all of the relevant repeated measures results. This allows us simplicity in that only a single function is required, regardless of the technique that we wish to employ. Thus, witnessing our outcomes becomes as simple as locating the desired method in the cleanly printed results.
Our Anova(mod, idata, idesign) function will be composed of three arguments. First, mod will contain our linear model from Step 5 in the preceding section. Second, idata will contain our data frame from Step 3. Third, idesign will contain our factor from Step 2, preceded by a tilde (~). Thus, our final function takes on the following form.
  1. > #load the car package (install first, if necessary)
  2. library(car)
  3. > #compose the Anova(mod, idata, idesign) function
  4. > analysis <- Anova(ageModel, idata = ageFrame, idesign = ~ageFactor)

Visualizing the Results

Finally, we can use the summary(object) function to visualize the results of our repeated measures ANOVA.
  1. > #use summary(object) to visualize the results of the repeated measures ANOVA
  2. > summary(analysis)

Relevant segment of repeated measures ANOVA results

Supposing that we are interested in the Wilks' Lambda method, we can see that the differences in the means for voting interest at ages 10, 15, and 20 are statistically significant (p < .001).

Pairwise Comparisons

Note that we could conduct follow-up comparisons on our age factor to determine which age level means are significantly different from one another. For this purpose, it is recommended that the data be rearranged into the standard ANOVA format that we have used throughout our other tutorials. Subsequently, we could conduct pairwise comparisons in the same manner as demonstrated in the One-Way ANOVA with Comparisons tutorial.

Complete One-Way Repeated Measures ANOVA Example

To see a complete example of how one-way repeated measures ANOVA can be conducted in R, please download the one-way repeated measures ANOVA example (.txt) file.

R Tutorial Series: Two-Way ANOVA with Interactions and Simple Main Effects

When an interaction is present in a two-way ANOVA, we typically choose to ignore the main effects and elect to investigate the simple main effects when making pairwise comparisons. This tutorial will demonstrate how to conduct pairwise comparisons when an interaction is present in a two-way ANOVA.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 60 participants who are divided into three stress reduction treatment groups (mental, physical, and medical) and two gender groups (male and female). The stress reduction values are represented on a scale that ranges from 1 to 5. This dataset can be conceptualized as a comparison between three stress treatment programs, one using mental methods, one using physical training, and one using medication across genders. The values represent how effective the treatment programs were at reducing participant's stress levels, with higher numbers indicating higher effectiveness.

Beginning Steps

To begin, we need to read our dataset into R and store its contents in a variable.
  1. > #read the dataset into an R variable using the read.csv(file) function
  2. > dataTwoWayInteraction <- read.csv("dataset_ANOVA_TwoWayInteraction.csv")
  3. > #display the data
  4. > dataTwoWayInteraction

The first ten rows of our dataset.

Omnibus Test

Let's run a general omnibus test to assess the main effects and interactions present in the dataset.
  1. > #use anova(object) to test the omnibus hypothesis
  2. > #Are main effects or interaction effects present in the independent variables?
  3. > anova(lm(StressReduction ~ Treatment * Gender, dataTwoWayInteraction))

The omnibus ANOVA test

Divide the Data

The significant omnibus interaction suggests that we should ignore the main effects and instead investigate the simple main effects for our independent variables. To do so, we need to divide our dataset along each level of our treatment variable. We can create subsets of our dataset using the subset(data, condition) function, where data is the original dataset and condition contains the parameters defining the subset.
  1. > #use subset(data, condition) to divide the original dataset
  2. > #medical subset
  3. > dataMedical <- subset(dataTwoWayInteraction, Treatment == "medical")
  4. > #mental subset
  5. > dataMental <- subset(dataTwoWayInteraction, Treatment == "mental")
  6. > #physical subset
  7. > dataPhysical <- subset(dataTwoWayInteraction, Treatment == "physical")

Group ANOVAs

With datasets representing each of our treatment groups, we can now run an ANOVA for each that investigates the impact of gender. You may notice that this is effectively running three one-way ANOVAs with gender being the independent variable. Therefore, we should control for Type I error by dividing our typical .05 alpha level by three (.017).
  1. > #run ANOVA on the treatment subsets to investigate the impacts of gender within each
  2. > #medical
  3. > anova(lm(StressReduction ~ Gender, dataMedical))
  4. > #mental
  5. > anova(lm(StressReduction ~ Gender, dataMental))
  6. > #physical
  7. > anova(lm(StressReduction ~ Gender, dataPhysical))

The gender within treatment group ANOVA tests

At an alpha level of .017, the gender effect within the mental (p = .014) and physical (p < .001) groups was statistically significant. In the mental condition, the means are 3 for males and 4 for females. In the physical condition, the means are 4 for males and 2 for females. These results suggest that the mental treatment is more effective in reducing stress for females than males, while the physical treatment is more effective for males than females. Further, there is insufficient statistical support for a gender difference in the medical treatment.

Pairwise Comparisons

Note that since our gender variable contains only two levels, there is no need to conduct follow-up comparisons. However, should you reach this point with an independent variable of more than three levels, you could conduct pairwise comparisons in the same manner as demonstrated in the Two-Way ANOVA with Comparisons tutorial. In this case, remember to carry through your reduced Type I error rate from the preceding ANOVA tests.

Complete Two-Way ANOVA with Interactions Example

To see a complete example of how two-way ANOVA simple main effects can be explored in R, please download the two-way ANOVA interaction example (.txt) file.

Statistical Analysis with R Book Reviews

Reviews of my Statistical Analysis with R book have started to emerge online and I am writing today to share them with potential readers and recommenders.

Reviews

The following is a list of online reviews for Statistical Analysis with R. If you have written a review of the book and would like it to be featured in this post, please contact me.
In summarizing the reviews, a few points are very clear about Statistical Analysis with R.
  1. It is for beginners: The book was written for people who have little to no experience with R, statistical software, and programming. It makes no assumptions of prior experience along these lines and starts right from the beginning. If you are new to R and want to learn how to apply it to your work, then this book is for you. If you are already an intermediate or experienced user, perhaps you might recommend it to people you know who are just becoming familiar with R.
  2. It is a learning tool, not a reference: The book is structured with the intent that it is experienced as a holistic learning experience. The chapters build on one another and progressively delve deeper into R. It is not a dictionary-style reference that one might pull out, flip to an entry, and get a brief answer on a single item. Again, this has implications for the audience. Beginners are more likely to enjoy this approach, whereas experienced users may be interested in more of a reference-style book.
  3. It has a story: Woven into the book's learning structure is a storyline based on the Three Kingdoms period of ancient Chinese history. For many, this will be a motivating and engaging way to learn. For others, the story may not inspire the same level of interest. If you would like to get a taste of the story, and the book in general, it is recommended that you read the free sample chapter.

New Release: R Graph Cookbook

Packt Publishing recently released a second book on R, the R Graph Cookbook by Hrishi Mittal. This reference-style guide covers an array of R graphical applications and is geared towards users who are already familiar with the basics of R. A sample chapter is available. I will be reviewing this book in the near future and posting my thoughts here on the R Tutorial Series blog. [Update: read my review of R Graph Cookbook]