R Tutorial Series: R Tutorial Series: Labeling Data Points on a Plot

There are times that labeling a plot's data points can be very useful, such as when conveying information in certain visuals or looking for patterns in our data. Fortunately, labeling the individual data points on a plot is a relatively simple process in R. In this tutorial, we will use the Calibrate package's textxy function to label the points on a scatterplot.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains information used to estimate undergraduate enrollment at the University of New Mexico (Office of Institutional Research, 1990). Note that this tutorial assumes that this data has already been read into R and saved into a variable named enrollmentData.

Plot

To begin, we need to create a scatterplot using the plot(x,y) function. With our example data, we will plot the year on the x axis and the unemployment rate on the y axis.

> #generate a plot using the plot(x,y) function

> #plot year on the x axis and unemployment rate on the y axis

> plot(enrollmentData$YEAR, enrollmentData$UNEM)

For a more detailed description of plotting data in R, see the article on scatterplots.

Textxy

Within the calibrate package, the textxy() function can be used to label a plot's data points. The textxy() function accepts the following arugments ("Label points in a plot," n.d.).

x: the x values of the plot's points
y: the y values of the plot's points
labs: the labels to be associated with the plot's points

cx: used to resize the label font
dcol: used to set the label color; defaults to black
m: sets the origin of the plot; defaults to (0,0)

Here, we will use textxy() to add labels for the enrollment at the University of New Mexico to each of our plot's data points.

> #if necessary, install the calibrate package

> #install.packages("calibrate")

> #load the calibrate package

> library(calibrate)

> #use the textxy() function to add labels to the preexisting plot's points

> #add labels for the total enrollment

> textxy(enrollmentData$YEAR, enrollmentData$UNEM, enrollmentData$ROLL)

In this case, adding labels to our data points helps us to better assess the relationships in our dataset.

Complete Data Point Labeling Example

To see a complete example of how a plot's data points can be labeled in R, please download the Data Point Labeling (.txt) file.

References

Label points in a plot. (n.d.). Retrieved September 19, 2010 from http://rss.acs.unt.edu/Rdoc/library/calibrate/html/textxy.html
Office of Institutional Research (1990). Enrollment Forecast [Data File]. Retrieved November 22, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/enrolldat.html

25 comments:

AnonymousSeptember 20, 2010 at 7:30 AM
What does this textxy command get you over the regular R text() command?
prasoonsharmaSeptember 20, 2010 at 7:39 AM
textxy didn't work well for me when the points were dense... overlapping points labels that made the plot less useful... anyway to deal with that?
JohnSeptember 25, 2010 at 2:30 AM
I posted textxy, because it was what I found that worked and thought it would be useful for others, too. Naturally, there are many ways to do anything in any piece of software, so it is not necessarily the definitive choice, nor did I represent every possible choice.

For formatting of dense points, I recommend making the graphic window larger and playing with the font size (cx argument) to start. You can also manipulate the chart itself (axes, scaling, values, spacing) to try for a different look.
RayNovember 17, 2010 at 3:18 PM
My data is a little different

gene Average_NP Average_P
Actrt2 0 1.63861E-13
Amy2 8.82536E-17 5.67038E-18
Cyp4f40 5.93776E-16 5.3336E-17
Gp1bb 5.18189E-27 1.21771E-28
Gsta2 0 5.74168E-18
Hsd17b6 1.74952E-22 4.86044E-22
Krt4 1.12772E-30 9.15714E-41
LOC367975 9.84506E-22 4.51166E-23
LOC689412 0 3.6644E-26
Olr1191 0 8.01645E-12
Pnma5 4.30767E-37 9.68314E-39
RGD1559903 5.65043E-14 6.2156E-25
Serpinb9e 2.55785E-33 3.01634E-28
Vom2r46 7.11398E-16 2.64378E-12
Mpeg1 10.26345375 1.340263375
Vegfb 2.56374875 7.02261125
Gdap1 16.6410125 28.6131375
Freq 62.06075 36.6501125
Slc25a23 37.398325 19.861505
Zfand5 34.0225125 48.28675
Exoc2 19.3219 25.627375
Pcdhgc3 13.01275875 7.30436375

How do I plot this data, where each point corresponds to a gene name ?
JohnNovember 17, 2010 at 5:56 PM
Hi Ray,

You should be able to use the same method demonstrated in the article. Your labels would be the gene column and your x and y would be the Average_NP and Average_P columns. For example:

> plot(Average_NP, Average_P)
> textxy(Average_NP, Average_P, gene)
AnonymousJanuary 6, 2011 at 9:34 PM
I found textxy() very useful. The regular R text() function overlays the labels on the points such that I cannot make out the points, which have different symbols depending on the category to which they belong.

How to I change the text size of the labels with textxy?
JohnJanuary 7, 2011 at 9:34 AM
Hi. You can use the optional cx argument to scale the font size in your chart. For example: textxy(enrollmentData$YEAR, enrollmentData$UNEM, enrollmentData$ROLL, cx = 2)

John
AnonymousFebruary 20, 2011 at 3:33 AM
It helps me just little..tanx..
AnonymousFebruary 20, 2011 at 3:35 AM
Gud day every1! anyone who knows how to label plots in 3Dscatterplots..?
AnonymousApril 27, 2011 at 1:58 AM
I tried to plot data but I get the following error message although I followed step by step the tutorial:

Erreur dans text.default(X[posXposY], Y[posXposY], labs[posXposY], adj = c(-0.3, :
plot.new n'a pas encore été appelé

Thanks for the tutorials. Great job.
JohnApril 27, 2011 at 7:44 AM
This error indicates that no plot exists for the labels to be added to. You need to create your plot first, then use textxy() to label it.
MariAugust 8, 2011 at 7:12 PM
Hi, John!
I found your textxy very useful and although I managed to use it without any problems yesterday, today I'm getting the following error message:

Error in if (sum(posXposY) > 0) text(X[posXposY], Y[posXposY], labs[posXposY], :
missing value where TRUE/FALSE needed

Have you any idea of what may be going wrong?
Thanks!
JohnAugust 9, 2011 at 8:37 AM
Hi and thanks.

My guess from the error is that you have missing values (NA) in your data and therefore your if statement is does not come out to true or false. You should try removing the NA values.

John
AbhishekMarch 22, 2012 at 5:00 AM
what the function to reduce font size for each data point in with()?
JessMdSMarch 28, 2012 at 2:27 AM
Hi, Does anyone know how to reposition the point labels?
AnonymousJanuary 26, 2013 at 4:25 PM
Hi guys,

How do you keep the labels from overlapping each other or from going off the graph?

Thanks.
AnonymousJuly 22, 2013 at 1:04 PM
Hi Guys,
I have a RA plot on which I want to annotated each point (about 850points) using the textxy(x,y,z). I do successfully get the point annotated but all the labels are placed in the upper part of the graph...any idea how to fix this?

Thanks
AnonymousJuly 17, 2014 at 11:35 PM
Hi,
I have a boxplot and I want to label the max, mean, median, etc. values in my boxplot. How can I use textxy?

Thanks,
winspius5

R Tutorial Series

R Tutorial Series: Labeling Data Points on a Plot

Tutorial Files

Plot

Textxy

Complete Data Point Labeling Example

References

25 comments:

Download Files

Topics

License