Tutorial FilesBefore we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains pre and post test scores for 66 subjects on a series of reading comprehension tests (Moore & McCabe, 1989). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.
Correlation Between Two VariablesThe most fundamental way to calculate correlations is to directly operate on two variables. In R, this can be done using the cor() function. The cor() function accepts the following arguments ("Correlation, Variance...", n.d.).
- x: the first variable to correlate
- y: the second variable to correlate
- use (optional): determines how missing values are handled; accepts "all.obs", "complete.obs", or "pairwise.complete.obs"
- method (optional): determines the statistical method used; accepts c("pearson"), c("kendall"), or c("spearman")
cor(VAR1, VAR2) ExampleSuppose that our research question is: "How does a subject's pretest 1 score relate to his or her posttest 1 score?" The following example demonstrates how to use the cor() function to calculate the correlation between pretest 1 (PRE1) and posttest 1 (POST1).
- >#use cor(VAR1, VAR2) to calculate the correlation between variable 1 and variable 2
- > cor(PRE1, POST1)
-  0.5659026
Correlations Between Multiple VariablesWhen beginning to analyze a dataset, researchers often want to get a complete picture of all correlations, rather than just a single one. Conveniently, the cor() function can also be run on an entire set of data. The format for this operation is cor(DATAVAR), where DATAVAR is the name of the R variable containing the data.
Note that the underlying code for the cor(datavar) function has changed in recent versions of R. The function is no longer able to receive datasets that do contain non-numerical values. In this case, you will receive an error to the effect of "x must be numeric," and should ensure that all of your data are in numeric form prior to using the function.
Suppose now that our research question is: "How do all of the test scores in the dataset relate to each other?" The following example demonstrates how to use the cor() function to calculate all of the correlations in a dataset.
The output of the preceding function is pictured below.
- >#use cor(DATAVAR) to get the correlations between all variables
- > cor(datavar)
Complete Correlational AnalysisTo see a complete example of how correlational analysis can be conducted in R, please download the correlational analysis example (.txt) file.
ReferencesCorrelation, Variance and Covariance (Matrices). (n.d.). Retrieved October, 27, 2009 from http://sekhon.berkeley.edu/stats/html/cor.html
Moore, D., and McCabe, G. (1989). Introduction to the practice of statistics [Data File]. Retrieved October, 27, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/ReadingTestScores.html