-
Notifications
You must be signed in to change notification settings - Fork 0
Data Visualization Practices
Page Editor: @allopole
To request edits to this page, open an issue, and tag @allopole
These resources offer ways to visualize data to support your narrative and tell a story, but don't focus on using a specific type of software or coding language.
-
Edward Tufte: Very minimalist approach to data visualization. Some of his books are on the lab bookshelf
-
Alberto Cairo: He has two books, The Functional Art and The Truthful Art, that offer a lot of guidance on how to use data visualizations to tell a story. His visualization wheel can be especially helpful.
-
Claus Wilke's Fundamentals of Data Visualization: Wilke contributes to a lot of
ggplot
geometries and add-ons, and this book (made usingbookdown
in R!) is a good introduction to some general rules of thumb for data visualization. -
R graph gallery: This is more
R
specific, but does give you some examples of types of plots you can make, with code to create it, if you're struggling to think of one.
There are many ways to visualize data in R, but the two largest camps are using base R and ggplot. These two approaches use very different langauge in R, and so are generally not compatible with each other (e.g. you can't use a base R call to color within ggplot). Generally, it is best to choose whichever one you like and stick with it, becoming an expert in that method. Familiarity with base R plotting, however, is probably always a good idea.
Within the Drake lab, we have a fairly even split between both base R and ggplot.
Paul Murrell summarizes the differences between base R and ggplot in this article: http://onlinelibrary.wiley.com.proxy-remote.galib.uga.edu/doi/10.1002/wics.22/full.
There are two base R packages: the default graphics
package, and the lattice
package, which use different syntaxes and are not mutually compatible. The lattice
package is better suited to multi-panel plots such as trellis plots, but both systems can be used for most basic plot types. The lattice
package must be loaded first to use it.
Paul Murrell covers both systems in his book R Graphics (2nd edition), and all the R code from the book is available here: https://www.stat.auckland.ac.nz/~paul/RG2e/.
ggplot2
implements Leland Wilkinson's Grammar of Graphics paradigm for specifying plots. Specifically, it implements a layered grammar of graphics, which allows plots to be built in a layered format, with each element conforming to similar grammar.
ggplot2
is part of Hadley Wikham's tidyverse
collection of data science R packages, and conforms to tidyverse conventions. So your data must be in "long format" to use it.
R for Data Science, by Hadley Wickham and Garret Grolemund, is the go-to online resource on tidyverse
, including ggplot2
. Also available in print.
There are numerous official extensions to ggplot2.
Many special purpose R packages exist, too numerous to list comprehensively here. Useful ones we have used include:
See our List-of-packages for more.
There are lots of topical lists, such as https://github.com/uhub/awesome-r, and CRAN Task Views.
If you are doing exploratory analysis, it is sometimes helpful to be able to interact with your visualization to explore it. The two most useful R packages to help with this are:
Shiny is used to build interactive visualizations called "Shiny apps" that render in a browser.
Plotly is a complete Grammar of Graphics implementation, with API libraries in R, Matlab, Python, etc., in addition to a web service. R plotly can use either its own syntax, or interface with ggplot2
. Plotly can also be used in Shiny apps. Plotly can also be used to make interactive dashboards without using Shiny.
Plotly tends to be the most straightforward, unless you are trying to build a completely interactive site, which is what shiny is better suited for.
Static and interactive plots made with base R, ggplot2, shiny, or plotly, can all be embedded in R Markdown documents to create fully interactive reports.
The best way to export figures is through a script and not via the 'Export' button on the viewer. In the script, the plot is wrapped in a call to a specific "graphic device" such as pdf()
. A script allows you to control the file type, resolution, size, fonts, etc. much more easily than the viewer does. Learn more about the graphic device call for each file type here.
In general, vector images (e.g. pdf, eps, svg) allow for the highest resolution without too large of a file size. Vector images can be printed at any size without loss of quality. PDF is the most reliable format; however, you cannot be 100% sure that the file will display in exactly the same was on all systems. Fonts may differ and graphic elements may have a slightly different appearance. The EPS format is not recommended because R EPS files usually have large masks that extend the image way beyond the defined size.
Many publications require raster image (e.g. tiff, png, jpeg), which will always look exactly as you export it. Most publications require at least 300 ppi (pixels per inch), often best saved as a tiff
file, if the publication will not accept pdf. Tiff files should use LZW compression, as they are very large otherwise.
- Use the PDF device, setting the figure dimensions, font and default font size in the call to
pdf()
. Ensure all text in your plot (inside the call topdf()
) fits the publisher's allowed size range (for example, by specifying a multiple of the default font size using thecex
property).
Note: R has only 3 built-in fonts: Times, Helvetica and Courier. If you want to avoid installing fonts, use one of these three. See http://blog.revolutionanalytics.com/2012/09/how-to-use-your-favorite-fonts-in-r-charts.html for more on installing fonts and embedding them in pdf's.
- Open the saved pdf file in GIMP (Linux) or Photoshop (Mac, Windows). Export as TIFF with the LZW option checked.
Here is an example R script using pdf()
to draw a figure conforming to PLOS figure guidelines:
# Typography
font.family <- "Times" # Must be allowed by publisher and must be installed on your system.
font.sizes <- seq(from = 8, # publisher's minimum point size (points)
to = 12, # publisher's maximum point size (points)
length.out = 5)
font.size.normal <- mean(font.sizes)
font.scales <- font.sizes/mean(font.sizes)
names(font.scales) <- names(font.sizes) <- c("XS", "S", "M", "L", "XL")
# Figure dimensions
figure.widths <- c(min=2.63, page=7.5, column=5.2) # in inches, as defined by publisher
figure.heights <- c(min=1, page=8.75) # in inches, as defined by publisher
# PDF output
pdf(
file = "./output/plots/fig1.pdf",
title = "Figure 1", # displayed in title bar of PDF Reader
width = figure.widths['page'], # full width, in inches
height = figure.heights['page']*.7, # 70% of full height, in inches
family = font.family, # defined above
pointsize = font.size.normal # default (normal) size of text (in points). Defined above.
)
# Put your plots here. Specify a font scale factor of XS, S, M, L, or XL:
plot(0:10, ann=false, cex=font.scales['M'])
legend(cex=font.scales['XS'], ...) # etc.
# close PDF file
invisible(dev.off())
The tiff()
device can be used directly, although this is less reliable than the pdf().
tiff(filename = "fig1.tiff", res = 300, compression = "lzw", height=5.2, width=6, units="in")
ggplot2
makes it easier to save a single plot in multiple formats.
If exported figures are included in a git repository, the exported images should be tracked with git-lfs (Git Large File Storage). "Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise." Follow the instructions to install git-lfs, add lfs to your repository, and specify which file types or folders to track with lfs. Thereafter, lfs works invisibly and you simply commit and push changes as usual. Note: git-lfs is useful for tracking large data files as well as graphics.
- journal-club doc
- google-sites lab manual
- index of all Drake-lab google sites
- lab-meeting--minutes doc Contact John if you are having trouble accessing google docs or websites.
- repository of public domain images