Data Visualization Practices

Page Editor: @allopole
To request edits to this page, open an issue, and tag @allopole

General guidance for data visualization

These resources offer ways to visualize data to support your narrative and tell a story, but don't focus on using a specific type of software or coding language.

Edward Tufte: Very minimalist approach to data visualization. Some of his books are on the lab bookshelf
Alberto Cairo: He has two books, The Functional Art and The Truthful Art, that offer a lot of guidance on how to use data visualizations to tell a story. His visualization wheel can be especially helpful.
Claus Wilke's Fundamentals of Data Visualization: Wilke contributes to a lot of ggplot geometries and add-ons, and this book (made using bookdown in R!) is a good introduction to some general rules of thumb for data visualization.
R graph gallery: This is more R specific, but does give you some examples of types of plots you can make, with code to create it, if you're struggling to think of one.

Data visualization in R

Static (non-interactive) visualizations

There are many ways to visualize data in R, but the two largest camps are using base R and ggplot. These two approaches use very different langauge in R, and so are generally not compatible with each other (e.g. you can't use a base R call to color within ggplot). Generally, it is best to choose whichever one you like and stick with it, becoming an expert in that method. Familiarity with base R plotting, however, is probably always a good idea.

Within the Drake lab, we have a fairly even split between both base R and ggplot.

Paul Murrell summarizes the differences between base R and ggplot in this article: http://onlinelibrary.wiley.com.proxy-remote.galib.uga.edu/doi/10.1002/wics.22/full.

Base R

There are two base R packages: the default graphics package, and the lattice package, which use different syntaxes and are not mutually compatible. The lattice package is better suited to multi-panel plots such as trellis plots, but both systems can be used for most basic plot types. The lattice package must be loaded first to use it.

Paul Murrell covers both systems in his book R Graphics (2nd edition), and all the R code from the book is available here: https://www.stat.auckland.ac.nz/~paul/RG2e/.

ggplot

ggplot2 implements Leland Wilkinson's Grammar of Graphics paradigm for specifying plots. Specifically, it implements a layered grammar of graphics, which allows plots to be built in a layered format, with each element conforming to similar grammar.

ggplot2 is part of Hadley Wikham's tidyverse collection of data science R packages, and conforms to tidyverse conventions. So your data must be in "long format" to use it.

R for Data Science, by Hadley Wickham and Garret Grolemund, is the go-to online resource on tidyverse, including ggplot2. Also available in print.

There are numerous official extensions to ggplot2.

other packages

Many special purpose R packages exist, too numerous to list comprehensively here. Useful ones we have used include:

igraph - network analysis and visualization
leaflet - for interactive maps

See our List-of-packages for more.

There are lots of topical lists, such as https://github.com/uhub/awesome-r, and CRAN Task Views.

Interactive visualizations

If you are doing exploratory analysis, it is sometimes helpful to be able to interact with your visualization to explore it. The two most useful R packages to help with this are:

Shiny is used to build interactive visualizations called "Shiny apps" that render in a browser.

Plotly is a complete Grammar of Graphics implementation, with API libraries in R, Matlab, Python, etc., in addition to a web service. R plotly can use either its own syntax, or interface with ggplot2. Plotly can also be used in Shiny apps. Plotly can also be used to make interactive dashboards without using Shiny.

Plotly tends to be the most straightforward, unless you are trying to build a completely interactive site, which is what shiny is better suited for.

Static and interactive plots made with base R, ggplot2, shiny, or plotly, can all be embedded in R Markdown documents to create fully interactive reports.

Saving figures in R

The best way to export figures is through a script and not via the 'Export' button on the viewer. In the script, the plot is wrapped in a call to a specific "graphic device" such as pdf(). A script allows you to control the file type, resolution, size, fonts, etc. much more easily than the viewer does. Learn more about the graphic device call for each file type here.

In general, vector images (e.g. pdf, eps, svg) allow for the highest resolution without too large of a file size. Vector images can be printed at any size without loss of quality. PDF is the most reliable format; however, you cannot be 100% sure that the file will display in exactly the same was on all systems. Fonts may differ and graphic elements may have a slightly different appearance. The EPS format is not recommended because R EPS files usually have large masks that extend the image way beyond the defined size.

Many publications require raster image (e.g. tiff, png, jpeg), which will always look exactly as you export it. Most publications require at least 300 ppi (pixels per inch), often best saved as a tiff file, if the publication will not accept pdf. Tiff files should use LZW compression, as they are very large otherwise.

Recommended workflow for publication ready figures:

Use the PDF device, setting the figure dimensions, font and default font size in the call to pdf(). Ensure all text in your plot (inside the call to pdf()) fits the publisher's allowed size range (for example, by specifying a multiple of the default font size using the cex property).

Note: R has only 3 built-in fonts: Times, Helvetica and Courier. If you want to avoid installing fonts, use one of these three. See http://blog.revolutionanalytics.com/2012/09/how-to-use-your-favorite-fonts-in-r-charts.html for more on installing fonts and embedding them in pdf's.

Open the saved pdf file in GIMP (Linux) or Photoshop (Mac, Windows). Export as TIFF with the LZW option checked.

Here is an example R script using pdf() to draw a figure conforming to PLOS figure guidelines:

# Typography
font.family <- "Times" # Must be allowed by publisher and must be installed on your system.
font.sizes <- seq(from = 8, # publisher's minimum point size (points)
                 to = 12, # publisher's maximum point size (points) 
                 length.out = 5)
font.size.normal <- mean(font.sizes)
font.scales <- font.sizes/mean(font.sizes)
names(font.scales) <- names(font.sizes) <- c("XS", "S", "M", "L", "XL")

# Figure dimensions
figure.widths <- c(min=2.63, page=7.5, column=5.2) # in inches, as defined by publisher
figure.heights <- c(min=1, page=8.75) # in inches, as defined by publisher

# PDF output
pdf(
 file = "./output/plots/fig1.pdf",
 title = "Figure 1", # displayed in title bar of PDF Reader
 width = figure.widths['page'], # full width, in inches
 height = figure.heights['page']*.7, # 70% of full height, in inches
 family = font.family, # defined above
 pointsize = font.size.normal # default (normal) size of text (in points). Defined above.
)

# Put your plots here. Specify a font scale factor of XS, S, M, L, or XL:
plot(0:10, ann=false, cex=font.scales['M'])
legend(cex=font.scales['XS'], ...) # etc.

# close PDF file
invisible(dev.off())

Alternate figure export workflows:

The tiff() device can be used directly, although this is less reliable than the pdf().

tiff(filename = "fig1.tiff", res = 300, compression = "lzw", height=5.2, width=6, units="in")

ggplot2 makes it easier to save a single plot in multiple formats.

Managing figures files in git

If exported figures are included in a git repository, the exported images should be tracked with git-lfs (Git Large File Storage). "Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise." Follow the instructions to install git-lfs, add lfs to your repository, and specify which file types or folders to track with lfs. Thereafter, lfs works invisibly and you simply commit and push changes as usual. Note: git-lfs is useful for tracking large data files as well as graphics.

Lab Links

journal-club doc
google-sites lab manual
index of all Drake-lab google sites
lab-meeting--minutes doc Contact John if you are having trouble accessing google docs or websites.
repository of public domain images

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Visualization Practices

General guidance for data visualization

Data visualization in R

Static (non-interactive) visualizations

Base R

ggplot

other packages

Interactive visualizations

Saving figures in R

Recommended workflow for publication ready figures:

Alternate figure export workflows:

Managing figures files in git

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lab Links

Clone this wiki locally