
Speeding Up Plots in R/Shiny - posit::conf(2023)
Presented by Ryszard Szymański A slow plots can ruin the user experience of our dashboard. This talk covers techniques for speeding up the rendering process of our visualisations. Slow dashboards lead to a poor user experience and cause users to lose interest, or even become frustrated. A common culprit of this situation is a slowly rendering plot. During the talk, we will dive deeper into how plots are rendered in Shiny, identify common bottlenecks that can occur during the rendering process, and learn various techniques for improving the speed of plots in R/Shiny dashboards. These techniques will range from more efficient data processing to library-specific optimisations at the browser level. Materials: I'd like to include a link to my linkedin profile: https://www.linkedin.com/in/ryszard-szyma%C5%84ski-310a7017a/ Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Lightning talks. Session Code: TALK-1172
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
So, getting to the question. Does this look familiar? Wait for it. It's coming. One more second. There we go. So, as we just saw, small plots lead to a bad user experience. I'm Ryszard Szymanski, I'm an R Shiny developer at Epsilon, and this is my worst nightmare.
So, bottlenecks can occur in different parts of the process, which are, you first start with data preparation, so let's say you fetch data from your database, you do some aggregations before you start working on your plot, then you define your plot in R using your favorite library like, I don't know, Plotly or HRS4R, and later on, all of the plot-related data and settings get sent to the browser where your plots get rendered. And what I've discovered, that there are three main root causes of bottlenecks happening in the plot rendering process, the first one being an inefficient data preparation process.
Inefficient data preparation
So, let's say you want to visualize your home budget data, and you want to visualize it in the form of a box plot. This is how you would go about it in Plotly, and it makes it super convenient to do in R, so you plug in your data set, you pick your variable, and you pick your plot type. But, let's focus on our first line right here, so how do we prepare the data? So, that code here might work very well for small data, let's say you just have a couple of rows, but as your data gets bigger, you're essentially fetching all of your data sets, all of your data set, into the memory of your process, just to create a box plot. So if you have millions of rows in the database, that can be pretty inefficient.
But let's go back to our box plot. To plot a box plot, you just need five values, the minimum value, the lower quartile, the median, the upper quartile, and the maximum value. So what if instead of fetching millions of rows, you would just calculate those five values on the database site, and plug it into your Plotly function like this.
Showing too much data
Sometimes we also try to show too much, and it can manifest itself in different ways. For example, let's go back to our millions of points, and let's say we're using a scatter plot to visualize it. Rendering millions of points can take a while. Perhaps you can convey the same type of information using a different plot. Again, let's go back to the example of the box plot, where you can just fetch five values and show perhaps the similar information you want to convey.
Also, when you're creating a dashboard, it might be very tempting to put a lot of plots in your dashboard, but the reality is users rarely are able to focus on all of them at once. And rendering six plots might take a while, so why not instead just show one plot and provide controls for your users to switch between each type of plot to make it faster.
users rarely are able to focus on all of them at once. And rendering six plots might take a while, so why not instead just show one plot and provide controls for your users to switch between each type of plot to make it faster.
Using plotting libraries inefficiently
We sometimes also use our plotting libraries inefficiently. So a lot of libraries provide convenience functions like toWebGL in Plotly or BoostMode in HighCharter, which leverage a technology called WebGL, which makes use of your GPU to render your plot in the browser faster. And in some cases, there are some available options, like for example, the HighCharter RJSON option in the HighCharter package. And HighCharter by default uses a package called JSON Lite to prepare data and settings for the browser, and as your data gets larger, it can get a bit slow. And RJSON is a bit more efficient, and some users have reported a 25x speed increase while switching to RJSON.
All right, thank you. That will be it for me, and if you have any questions, you can post them on the slide deck that's been mentioned, or if you want to talk to me, you can find me at the AppState booth.
