By John M Quick

The R Tutorial Series provides a collection of user-friendly tutorials to people who want to learn how to use R for statistical analysis.


My Statistical Analysis with R book is available from Packt Publishing and Amazon.


R Tutorial Series: Regression With Interaction Variables

Interaction variables introduce an additional level of regression analysis by allowing researchers to explore the synergistic effects of combined predictors. This tutorial will explore how interaction models can be created in R.
Tutorial Files
Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains variables for the following information related to ice cream consumption.
  • DATE: Time period (1-30)
  • CONSUME: Ice cream consumption in pints per capita
  • PRICE: Per pint price of ice cream in dollars
  • INC: Weekly family income in dollars
  • TEMP: Mean temperature in degrees F
Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Planning The Model

Suppose that our research question is "how much of the variance in ice cream consumption can be predicted by per pint price, weekly family income, mean temperature, and the interaction between per pint price and weekly family income?" The italicized interaction term is the new addition to our typical multiple regression modeling procedure. This variable is relatively simple to incorporate, but it does require a few preparations.

Creating The Interaction Variable

A two step process can be followed to create an interaction variable in R. First, the input variables must be centered to mitigate multicollinearity. Second, these variables must be multiplied to create the interaction variable.

Step 1: Centering

To center a variable, simply subtract its mean from each data point and save the result into a new R variable, as demonstrated below.
  1. > #center the input variables
  2. > PRICEc <- PRICE - mean(PRICE)
  3. > INCc <- INC - mean(INC)

Step 2: Multiplication

Once the input variables have been centered, the interaction term can be created. Since an interaction is formed by the product of two or more predictors, we can simply multiply our centered terms from step one and save the result into a new R variable, as demonstrated below.
  1. > #create the interaction variable
  2. > PRICEINCi <- PRICEc * INCc

Creating The Model

Now we have all of the pieces necessary to assemble our complete interaction model.
  1. > #create the interaction model using lm(FORMULA, DATAVAR)
  2. > #predict ice cream consumption by its per pint price, weekly family income, mean temperature, and the interaction between per pint price and weekly family income
  3. > interactionModel <- lm(CONSUME ~ PRICE + INC + TEMP + PRICEINCi, datavar)
  4. > #display summary information about the model
  5. > summary(interactionModel)
A summary of our interaction model is displayed below.

At this point we have a complete interaction model. Naturally, if this were a full research analysis, we would likely compare this model to others and assess the value of each predictor. For information on comparing models, see the tutorial on hierarchical linear regression.

Complete Interaction Model Example

To see a complete example of how an interaction model can be created in R, please download the interaction model example (.txt) file.

References

Kadiyala, K. (1970). Ice Cream [Data File]. Retrieved December 14, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html

3 comments:

  1. can you show an example creating an interaction term with categorical variables please

    ReplyDelete
  2. Hi,
    Did you find an example of an interaction variable that involves categorical variables? I wanted to know that as well.

    ReplyDelete
  3. Use the interaction function (base). Let a and b categorial variables then it would be:
    interaction(a,b, sep = ":")

    ReplyDelete