Nicole Kramer | A New Paradigm for Multifigure Coordinate-Based Plotting in R

Transcript#

This transcript was generated automatically and may contain errors.

Hi, my name is Nicole Kramer and I'm a third year bioinformatics and computational biology graduate student at the University of North Carolina at Chapel Hill. Today I'm really honored to be talking to you about a new paradigm for multi-figure coordinate-based plotting in R that I've begun to make possible with my package, BentoBox.

The inspiration for this package and this functionality within the R plotting environment came from some of my own stressful experiences as a grad student making figures and plots. There was one time when my advisor asked me to make a multi-paneled figure that looked something like this. There were two heatmap style plots in a specific genomic region. There were two tracks of genes below that. There were six different tracks of bin signal data below those. And there was a bar graph of some statistical analyses done in R.

And since I work in genomics, all of this data was huge. One heatmap came from a file that was 55 gigabytes and the other came from a file that was 14 gigabytes. One set of those bin data came from three different files that totaled 1.2 gigabytes and the other came from three other files that totaled 1.7 gigabytes. And on top of this data being huge, the process to make this combined figure was extremely tedious and time-consuming with all of its elements coming from different places. There were different screenshots of a couple of genomic browsers. There was a plot made from data analyzed in R. Everything was cropped and arranged in Adobe Illustrator. And all of the fine-tuning and nice labels were also made in Adobe Illustrator.

So when my advisor asked me to change the genomic coordinates I was looking at in this figure, it wasn't a simple fix. And I went to my overcrowded laptop screen with a bunch of genomic browsers open, with my Dropbox open, with all the files I was working at, my RStudio window, my Adobe Illustrator window, and the paper I was taking inspiration from. And I became completely overwhelmed. I thought that there had to be an easier way to make figures like this beyond existing browsers, beyond existing programmatic libraries, and things that I didn't need to use graphic design software for that would let me make and arrange all my plots in one place. Something that was entirely reproducible by being completely programmatic, yet entirely customizable, and efficient for handling large data specifically.

Something that was entirely reproducible by being completely programmatic, yet entirely customizable, and efficient for handling large data specifically.

And so my team and I developed a package called BentoBox that allows for coordinate-based plotting in R, where plots can be programmatically made and arranged on a user-defined page layout with common units of measurement. Here is an example. I'm showing you two tiny page markings with inches and centimeters.

However, BentoBox is specialized to handle large data sets like genomic data, and only BentoBox gives users the precise control of plot placements and dimensions.

BentoBox currently has functions for genomic data, but I hope its paradigms will extend the R plotting environment for all kinds of data visualizations. If you're interested in trying out BentoBox and exploring how you can use it with your data, you can get BentoBox from my lab's github page, or feel free to tweet at me directly. Thank you so much for your time.

Nicole Kramer | A New Paradigm for Multifigure Coordinate-Based Plotting in R | RStudio

Transcript#

Making figures entirely programmatic

Customization and coordinate-based placement

Comparing BentoBox to Patchwork and CowPlot

Efficiency with large datasets

Walkthrough of a BentoBox figure

BentoBox vs. existing libraries

Featured software#

ggplot2

rstudio