Tutorial Files
Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains information used to estimate undergraduate enrollment at the University of New Mexico (Office of Institutional Research, 1990). Note that this tutorial assumes that this data has already been read into R and saved into a variable named enrollmentData.Plot
To begin, we need to create a scatterplot using the plot(x,y) function. With our example data, we will plot the year on the x axis and the unemployment rate on the y axis.
- > #generate a plot using the plot(x,y) function
- > #plot year on the x axis and unemployment rate on the y axis
- > plot(enrollmentData$YEAR, enrollmentData$UNEM)
Textxy
Within the calibrate package, the textxy() function can be used to label a plot's data points. The textxy() function accepts the following arugments ("Label points in a plot," n.d.).- Required
- x: the x values of the plot's points
- y: the y values of the plot's points
- labs: the labels to be associated with the plot's points
- cx: used to resize the label font
- dcol: used to set the label color; defaults to black
- m: sets the origin of the plot; defaults to (0,0)
- > #if necessary, install the calibrate package
- > #install.packages("calibrate")
- > #load the calibrate package
- > library(calibrate)
- > #use the textxy() function to add labels to the preexisting plot's points
- > #add labels for the total enrollment
- > textxy(enrollmentData$YEAR, enrollmentData$UNEM, enrollmentData$ROLL)
Complete Data Point Labeling Example
To see a complete example of how a plot's data points can be labeled in R, please download the Data Point Labeling (.txt) file.References
Label points in a plot. (n.d.). Retrieved September 19, 2010 from http://rss.acs.unt.edu/Rdoc/library/calibrate/html/textxy.htmlOffice of Institutional Research (1990). Enrollment Forecast [Data File]. Retrieved November 22, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/enrolldat.html
What does this textxy command get you over the regular R text() command?
ReplyDeletetextxy didn't work well for me when the points were dense... overlapping points labels that made the plot less useful... anyway to deal with that?
ReplyDeleteI posted textxy, because it was what I found that worked and thought it would be useful for others, too. Naturally, there are many ways to do anything in any piece of software, so it is not necessarily the definitive choice, nor did I represent every possible choice.
ReplyDeleteFor formatting of dense points, I recommend making the graphic window larger and playing with the font size (cx argument) to start. You can also manipulate the chart itself (axes, scaling, values, spacing) to try for a different look.
My data is a little different
ReplyDeletegene Average_NP Average_P
Actrt2 0 1.63861E-13
Amy2 8.82536E-17 5.67038E-18
Cyp4f40 5.93776E-16 5.3336E-17
Gp1bb 5.18189E-27 1.21771E-28
Gsta2 0 5.74168E-18
Hsd17b6 1.74952E-22 4.86044E-22
Krt4 1.12772E-30 9.15714E-41
LOC367975 9.84506E-22 4.51166E-23
LOC689412 0 3.6644E-26
Olr1191 0 8.01645E-12
Pnma5 4.30767E-37 9.68314E-39
RGD1559903 5.65043E-14 6.2156E-25
Serpinb9e 2.55785E-33 3.01634E-28
Vom2r46 7.11398E-16 2.64378E-12
Mpeg1 10.26345375 1.340263375
Vegfb 2.56374875 7.02261125
Gdap1 16.6410125 28.6131375
Freq 62.06075 36.6501125
Slc25a23 37.398325 19.861505
Zfand5 34.0225125 48.28675
Exoc2 19.3219 25.627375
Pcdhgc3 13.01275875 7.30436375
How do I plot this data, where each point corresponds to a gene name ?
Hi Ray,
ReplyDeleteYou should be able to use the same method demonstrated in the article. Your labels would be the gene column and your x and y would be the Average_NP and Average_P columns. For example:
> plot(Average_NP, Average_P)
> textxy(Average_NP, Average_P, gene)
Hello there,
DeleteVery helpful posting, thank you. I have a question related to displaying labels on the type of data set described above (gene names and two columns of numbers). How might one label only a subset of those data points? For example, how could I label only those points that have been found to be significantly different from each other? I have the list of those genes saved as an object in R, but I would like to identify those genes on a scatterplot and label only those genes. Haven't seen anyone show an example of how to do that, perhaps someone could give some advice?
I found textxy() very useful. The regular R text() function overlays the labels on the points such that I cannot make out the points, which have different symbols depending on the category to which they belong.
ReplyDeleteHow to I change the text size of the labels with textxy?
Hi. You can use the optional cx argument to scale the font size in your chart. For example: textxy(enrollmentData$YEAR, enrollmentData$UNEM, enrollmentData$ROLL, cx = 2)
ReplyDeleteJohn
It helps me just little..tanx..
ReplyDeleteGud day every1! anyone who knows how to label plots in 3Dscatterplots..?
ReplyDeleteI tried to plot data but I get the following error message although I followed step by step the tutorial:
ReplyDeleteErreur dans text.default(X[posXposY], Y[posXposY], labs[posXposY], adj = c(-0.3, :
plot.new n'a pas encore été appelé
Thanks for the tutorials. Great job.
This error indicates that no plot exists for the labels to be added to. You need to create your plot first, then use textxy() to label it.
ReplyDeleteHi, John!
ReplyDeleteI found your textxy very useful and although I managed to use it without any problems yesterday, today I'm getting the following error message:
Error in if (sum(posXposY) > 0) text(X[posXposY], Y[posXposY], labs[posXposY], :
missing value where TRUE/FALSE needed
Have you any idea of what may be going wrong?
Thanks!
Your problem here is that you have missing values in your data. One option is :
Deleteplot(dataset[,"x"],dataset[,"y"])
library(calibrate)
textxy(dataset[complete.cases(dataset),"x"],dataset[complete.cases(dataset),"y"],dataset[complete.cases(dataset),"label"])
Hi and thanks.
ReplyDeleteMy guess from the error is that you have missing values (NA) in your data and therefore your if statement is does not come out to true or false. You should try removing the NA values.
John
I had the same problem and solved it by removing NA values. Thanks!
DeleteI am getting this message too. I need the N/A values in there so do I just add "na.omit" to the command line? If so, whereabouts in the line? This is what I typed:
Delete> textxy(tdataDf$AgeYears, tdataDf$LLM1_D, tdataDf$CatNo, na.omit)
and I'm still getting this error:
Error in if (sum(posXposY) > 0) text(X[posXposY], Y[posXposY], labs[posXposY], :
missing value where TRUE/FALSE needed
what the function to reduce font size for each data point in with()?
ReplyDeleteYou can use the cex argument to scale fonts in a plot. I recommend searching Google on this topic, since there is plenty of information out there on your specific question.
DeleteHi, Does anyone know how to reposition the point labels?
ReplyDeleteI ended up having to resize the axes (x axis) with xlim=c(x min,x max). Might be another way to change the physical location of the labels, but altering the x axis worked for what I needed.
DeleteHi!
DeleteI have been searching for something similar and I came to the conclusion, that identify() worked best for me. You can even suggest the position of the labels by clicking on the empty space.
Greetings!
Hi guys,
ReplyDeleteHow do you keep the labels from overlapping each other or from going off the graph?
Thanks.
Hi Guys,
ReplyDeleteI have a RA plot on which I want to annotated each point (about 850points) using the textxy(x,y,z). I do successfully get the point annotated but all the labels are placed in the upper part of the graph...any idea how to fix this?
Thanks
Hi,
ReplyDeleteI have a boxplot and I want to label the max, mean, median, etc. values in my boxplot. How can I use textxy?
Thanks,
winspius5