Tutorial FilesBefore we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains information used to estimate undergraduate enrollment at the University of New Mexico (Office of Institutional Research, 1990). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.
Pre-Analysis StepsBefore comparing regression models, we must have models to compare. In the segment on multiple linear regression, we created three successive models to estimate the fall undergraduate enrollment at the University of New Mexico. The complete code used to derive these models is provided in that tutorial. This article assumes that you are familiar with these models and how they were created. Therefore, a shorthand method for generating the models is displayed below.
- > #create three linear models using lm(FORMULA, DATAVAR)
- > #one predictor model
- > onePredictorModel <- lm(ROLL ~ UNEM, datavar)
- > #two predictor model
- > twoPredictorModel <- lm(ROLL ~ UNEM + HGRAD, datavar)
- > #three predictor model
- > threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, datavar)
Comparing Individual ModelsThe summary(OBJECT) function can be used to ascertain the overall variance explained (R-squared) and statistical significance (F-test) of each individual model, as well as the significance of each predictor to each model (t-test). The following code demonstrates how to generate summaries for each model.
The results of the previous functions are displayed below.
- > #get summary data for each model using summary(OBJECT)
- > summary(onePredictorModel)
- > summary(twoPredictorModel)
- > summary(threePredictorModel)
Comparing Successive ModelsThe anova(MODEL1, MODEL2,… MODELi) function can be used to compare the significance of each successive model. The code sample below demonstrates how to use ANOVA to accomplish this task.
The table resulting from the preceding function is pictured below.
- > #compare successive models using anova(MODEL1, MODEL2, MODELi)
- > anova(onePredictorModel, twoPredictorModel, threePredictorModel)
Here, we can see that each successive model is significant above and beyond the previous one. This suggests that each predictor added along the way is making an important contribution to the overall model.