By John M Quick

The R Tutorial Series provides a collection of user-friendly tutorials to people who want to learn how to use R for statistical analysis.


My Statistical Analysis with R book is available from Packt Publishing and Amazon.


R Tutorial Series: Labeling Data Points on a Plot

There are times that labeling a plot's data points can be very useful, such as when conveying information in certain visuals or looking for patterns in our data. Fortunately, labeling the individual data points on a plot is a relatively simple process in R. In this tutorial, we will use the Calibrate package's textxy function to label the points on a scatterplot.

Tutorial Files

Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains information used to estimate undergraduate enrollment at the University of New Mexico (Office of Institutional Research, 1990). Note that this tutorial assumes that this data has already been read into R and saved into a variable named enrollmentData.

Plot

To begin, we need to create a scatterplot using the plot(x,y) function. With our example data, we will plot the year on the x axis and the unemployment rate on the y axis.
  1. > #generate a plot using the plot(x,y) function
  2. > #plot year on the x axis and unemployment rate on the y axis
  3. > plot(enrollmentData$YEAR, enrollmentData$UNEM)

For a more detailed description of plotting data in R, see the article on scatterplots.

Textxy

Within the calibrate package, the textxy() function can be used to label a plot's data points. The textxy() function accepts the following arugments ("Label points in a plot," n.d.).
    Required
  • x: the x values of the plot's points
  • y: the y values of the plot's points
  • labs: the labels to be associated with the plot's points
    • Optional
    • cx: used to resize the label font
    • dcol: used to set the label color; defaults to black
    • m: sets the origin of the plot; defaults to (0,0)
    Here, we will use textxy() to add labels for the enrollment at the University of New Mexico to each of our plot's data points.
    1. > #if necessary, install the calibrate package
    2. > #install.packages("calibrate")
    3. > #load the calibrate package
    4. > library(calibrate)
    5. > #use the textxy() function to add labels to the preexisting plot's points
    6. > #add labels for the total enrollment
    7. > textxy(enrollmentData$YEAR, enrollmentData$UNEM, enrollmentData$ROLL)

    In this case, adding labels to our data points helps us to better assess the relationships in our dataset.

    Complete Data Point Labeling Example

    To see a complete example of how a plot's data points can be labeled in R, please download the Data Point Labeling (.txt) file.

    References

    Label points in a plot. (n.d.). Retrieved September 19, 2010 from http://rss.acs.unt.edu/Rdoc/library/calibrate/html/textxy.html
    Office of Institutional Research (1990). Enrollment Forecast [Data File]. Retrieved November 22, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/enrolldat.html

    25 comments:

    1. What does this textxy command get you over the regular R text() command?

      ReplyDelete
    2. textxy didn't work well for me when the points were dense... overlapping points labels that made the plot less useful... anyway to deal with that?

      ReplyDelete
    3. I posted textxy, because it was what I found that worked and thought it would be useful for others, too. Naturally, there are many ways to do anything in any piece of software, so it is not necessarily the definitive choice, nor did I represent every possible choice.

      For formatting of dense points, I recommend making the graphic window larger and playing with the font size (cx argument) to start. You can also manipulate the chart itself (axes, scaling, values, spacing) to try for a different look.

      ReplyDelete
    4. My data is a little different

      gene Average_NP Average_P
      Actrt2 0 1.63861E-13
      Amy2 8.82536E-17 5.67038E-18
      Cyp4f40 5.93776E-16 5.3336E-17
      Gp1bb 5.18189E-27 1.21771E-28
      Gsta2 0 5.74168E-18
      Hsd17b6 1.74952E-22 4.86044E-22
      Krt4 1.12772E-30 9.15714E-41
      LOC367975 9.84506E-22 4.51166E-23
      LOC689412 0 3.6644E-26
      Olr1191 0 8.01645E-12
      Pnma5 4.30767E-37 9.68314E-39
      RGD1559903 5.65043E-14 6.2156E-25
      Serpinb9e 2.55785E-33 3.01634E-28
      Vom2r46 7.11398E-16 2.64378E-12
      Mpeg1 10.26345375 1.340263375
      Vegfb 2.56374875 7.02261125
      Gdap1 16.6410125 28.6131375
      Freq 62.06075 36.6501125
      Slc25a23 37.398325 19.861505
      Zfand5 34.0225125 48.28675
      Exoc2 19.3219 25.627375
      Pcdhgc3 13.01275875 7.30436375

      How do I plot this data, where each point corresponds to a gene name ?

      ReplyDelete
    5. Hi Ray,

      You should be able to use the same method demonstrated in the article. Your labels would be the gene column and your x and y would be the Average_NP and Average_P columns. For example:

      > plot(Average_NP, Average_P)
      > textxy(Average_NP, Average_P, gene)

      ReplyDelete
      Replies
      1. Hello there,

        Very helpful posting, thank you. I have a question related to displaying labels on the type of data set described above (gene names and two columns of numbers). How might one label only a subset of those data points? For example, how could I label only those points that have been found to be significantly different from each other? I have the list of those genes saved as an object in R, but I would like to identify those genes on a scatterplot and label only those genes. Haven't seen anyone show an example of how to do that, perhaps someone could give some advice?

        Delete
    6. I found textxy() very useful. The regular R text() function overlays the labels on the points such that I cannot make out the points, which have different symbols depending on the category to which they belong.

      How to I change the text size of the labels with textxy?

      ReplyDelete
    7. Hi. You can use the optional cx argument to scale the font size in your chart. For example: textxy(enrollmentData$YEAR, enrollmentData$UNEM, enrollmentData$ROLL, cx = 2)

      John

      ReplyDelete
    8. It helps me just little..tanx..

      ReplyDelete
    9. Gud day every1! anyone who knows how to label plots in 3Dscatterplots..?

      ReplyDelete
    10. I tried to plot data but I get the following error message although I followed step by step the tutorial:

      Erreur dans text.default(X[posXposY], Y[posXposY], labs[posXposY], adj = c(-0.3, :
      plot.new n'a pas encore été appelé

      Thanks for the tutorials. Great job.

      ReplyDelete
    11. This error indicates that no plot exists for the labels to be added to. You need to create your plot first, then use textxy() to label it.

      ReplyDelete
    12. Hi, John!
      I found your textxy very useful and although I managed to use it without any problems yesterday, today I'm getting the following error message:

      Error in if (sum(posXposY) > 0) text(X[posXposY], Y[posXposY], labs[posXposY], :
      missing value where TRUE/FALSE needed

      Have you any idea of what may be going wrong?
      Thanks!

      ReplyDelete
      Replies
      1. Your problem here is that you have missing values in your data. One option is :

        plot(dataset[,"x"],dataset[,"y"])
        library(calibrate)
        textxy(dataset[complete.cases(dataset),"x"],dataset[complete.cases(dataset),"y"],dataset[complete.cases(dataset),"label"])

        Delete
    13. Hi and thanks.

      My guess from the error is that you have missing values (NA) in your data and therefore your if statement is does not come out to true or false. You should try removing the NA values.

      John

      ReplyDelete
      Replies
      1. I had the same problem and solved it by removing NA values. Thanks!

        Delete
      2. I am getting this message too. I need the N/A values in there so do I just add "na.omit" to the command line? If so, whereabouts in the line? This is what I typed:

        > textxy(tdataDf$AgeYears, tdataDf$LLM1_D, tdataDf$CatNo, na.omit)

        and I'm still getting this error:

        Error in if (sum(posXposY) > 0) text(X[posXposY], Y[posXposY], labs[posXposY], :
        missing value where TRUE/FALSE needed

        Delete
    14. what the function to reduce font size for each data point in with()?

      ReplyDelete
      Replies
      1. You can use the cex argument to scale fonts in a plot. I recommend searching Google on this topic, since there is plenty of information out there on your specific question.

        Delete
    15. Hi, Does anyone know how to reposition the point labels?

      ReplyDelete
      Replies
      1. I ended up having to resize the axes (x axis) with xlim=c(x min,x max). Might be another way to change the physical location of the labels, but altering the x axis worked for what I needed.

        Delete
      2. Hi!

        I have been searching for something similar and I came to the conclusion, that identify() worked best for me. You can even suggest the position of the labels by clicking on the empty space.

        Greetings!

        Delete
    16. Hi guys,

      How do you keep the labels from overlapping each other or from going off the graph?

      Thanks.

      ReplyDelete
    17. Hi Guys,
      I have a RA plot on which I want to annotated each point (about 850points) using the textxy(x,y,z). I do successfully get the point annotated but all the labels are placed in the upper part of the graph...any idea how to fix this?

      Thanks

      ReplyDelete
    18. Hi,
      I have a boxplot and I want to label the max, mean, median, etc. values in my boxplot. How can I use textxy?

      Thanks,
      winspius5

      ReplyDelete