Tutorial Files
Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains information used to estimate undergraduate enrollment at the University of New Mexico (Office of Institutional Research, 1990). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.Pre-Analysis Steps
Before comparing regression models, we must have models to compare. In the segment on multiple linear regression, we created three successive models to estimate the fall undergraduate enrollment at the University of New Mexico. The complete code used to derive these models is provided in that tutorial. This article assumes that you are familiar with these models and how they were created. Therefore, a shorthand method for generating the models is displayed below.
- > #create three linear models using lm(FORMULA, DATAVAR)
- > #one predictor model
- > onePredictorModel <- lm(ROLL ~ UNEM, datavar)
- > #two predictor model
- > twoPredictorModel <- lm(ROLL ~ UNEM + HGRAD, datavar)
- > #three predictor model
- > threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, datavar)
Comparing Individual Models
The summary(OBJECT) function can be used to ascertain the overall variance explained (R-squared) and statistical significance (F-test) of each individual model, as well as the significance of each predictor to each model (t-test). The following code demonstrates how to generate summaries for each model.The results of the previous functions are displayed below.
- > #get summary data for each model using summary(OBJECT)
- > summary(onePredictorModel)
- > summary(twoPredictorModel)
- > summary(threePredictorModel)
Comparing Successive Models
The anova(MODEL1, MODEL2,… MODELi) function can be used to compare the significance of each successive model. The code sample below demonstrates how to use ANOVA to accomplish this task.The table resulting from the preceding function is pictured below.
- > #compare successive models using anova(MODEL1, MODEL2, MODELi)
- > anova(onePredictorModel, twoPredictorModel, threePredictorModel)
Here, we can see that each successive model is significant above and beyond the previous one. This suggests that each predictor added along the way is making an important contribution to the overall model.
How do handle categorical independent variables in HLM?
ReplyDeleteJust a doubt:
ReplyDeleteYour title "Hierarchical linear modeling" is suggestive of mixed modeling/HLM/MLM literature (used for clustered/non-independent data), and not the hierarchical regression (based on analyzing hierarchical Anova models) that you actually seem to be explaining here.
Maybe my mistake (i AM a novice), but if what i say is true, i guess it may be better to correct this and restate the title as "Hierarchical regression"; otherwise new-comers interested in mixed modeling might mistake the message.
Bye,take care.
Indeed, you are discussing what is known as "Hierarchical regression". The term "Hierarchical linear modeling" (or HLM) is used for multilevel models and using that as a title for this part is confusing.
ReplyDeleteApart from that, it is nicely done.
Thanks for the comments. I updated the tutorial to reflect the appropriate title.
ReplyDeleteAgree with the comments above. This seems to be manual approach to step-wise regression which has numerous problems.
ReplyDeleteThe comments above refer to the title of post, which was originally wrong, and not to the content.
DeleteI disagree about your thought that this is like stepwise regression. In HLR, the researcher decides upon the order of a few variables and examines them sequentially in a few models. In stepwise regression, a computer iterates through all possible variable combinations in every model. If you search Google on this topic and you will find similar, but more extensive comparisons.
First, thanks for the tutorial.
ReplyDeleteGood to know that I did go in the right direction. But I have a question: With non-complete data I have the problem that I can not do it this way because each regression model than have different cases excluded because of the different missing patterns. So is there a way, besides multiple imputation, to compare different models in a hierachic regression with non-complete data?
Thanks for any comments.