Joe Cheng | Shiny in production: Principles, practices, and tools

Transcript#

This transcript was generated automatically and may contain errors.

All right, well, I'm really excited to talk to you guys today about Shiny in production. But by way of introduction, I'm going to take a little diversion and take a look at an example app that I've been working on, and we're going to use it to illustrate some of the principles and tools we're going to talk about today.

This is not the example app. That'd be a little weird. I want to talk about cloud.rproject.org, or if you look in the CRAN mirror chooser, it's called zero-cloud. This is a CRAN mirror that you may or may not know is run by RStudio . And it's hosted on Amazon AWS, so it's pretty fast from anywhere in the world. But the thing that makes this CRAN mirror interesting is that the download logs are freely available to download, thanks to Hadley.

This is what the data looks like, just a simple CSV file that you can download, and it tells you every individual package that is downloaded from our servers, and an anonymized IP address. So lots of people have done really interesting things with this data, and what I was interested in is I just had this hunch that every public-facing website that I've ever set up that got a significant amount of traffic always had just a few IP addresses that would act crazy. Just download way too much of whatever it was that we're offering, not in a greedy way, but in a totally WTF way, as you will see in a moment. And I was curious, does this happen on our CRAN mirror as well? Certainly it's a very, very popular service.

So I wrote this application called CRAN Whales, and I'll give you a look here.

So this application is loading right now. It has two parameters that I want to focus on. On the top left, you can pick a date that you're interested in, and then the top N downloaders. So in this case, by default it shows the top six downloaders, the six most active IP addresses. On the right here, you can see there's some summaries about how much total traffic there was. There's three terabytes of data downloaded on the 14th, and however many files were downloaded. And for all those IP addresses that we anonymized, there were 69,000 unique IP addresses.

At the bottom here, there is a plot of the hourly activity for downloads, and you can see that it kind of ebbs and flows during the day. And that highlighted portion at the bottom, that is the fraction of the downloads that the top six downloaders are responsible for. 70,000 downloaders, and six of them are doing that. And depending on the day that you look at, it can vary quite widely.

The second tab will let us look at these whales. I'm calling them whales, the top downloaders. So you can see here, these are the top six, and the height of the bar represents the number of downloads that they made. So PowerfulArgali, these names are randomized, obviously, PowerfulArgali downloaded as much as the next five people put together. But ZealousRabbit is kind of a distant second here.

The next tab will show those same six downloaders, or we can change that number. But now we're looking at their downloads over time. So for each hour, how many downloads did they make? And ZealousRabbit was pretty steady throughout the day, whereas StickyChinchilla... I haven't seen these names. They're different every day. So StickyChinchilla just had kind of a burst in the middle of the day.

And finally, maybe the most interesting view is we have a detailed view where we can actually look at the individual downloads of each of those users. So this is where I go, you know, what the hell is going on here? ZealousRabbit downloaded eight packages 72,000 times. It's probably a Python script gone wrong. 33.4 gigabytes downloaded.

Whereas StickyChinchilla looks like maybe they downloaded all of CRAN, 13,000 packages maybe. They downloaded the current version of CRAN, and then they said, oh, let's go back and get some more. I'm not sure what happened there.

So really weird stuff happening. Strange things are afoot on CRAN, and I'm... If you have any ideas, if you're running one of these scripts that are hammering us, well, number one, maybe stop. But also I would like to know what's going on.

So the good news about these challenges is that we can and have done something about the effort involved in doing testing and profiling and deployment, which I'll talk about in a moment, and about the slowness in 90-plus percent of the cases that I've seen, R's not the problem, and we'll talk about that as well.

Tools for Shiny in production

So the rest of this talk I'm going to be focusing on the actual tools and software that we've been working on to help make it easier to do Shiny in production. And by the way, I probably should have said this up front. If you are coming from a setting where you are not building Shiny apps for production, that does not mean that these things will not apply to you. Even if you're not putting apps up that people are going to access from all over or whatever, if you care about your app being correct and staying correct, if you care about performance, these tools are still going to be relevant for you.

A bunch of these tools I'm going to not talk about, and then other ones I'm going to focus on. So RStudio Connect, you're going to hear about plenty this week, but it is our way of serving Shiny with push-button deployment. It's an incredible product. We're investing a lot in it. ShinyTest is an automated UI testing framework for Shiny. We did, Winston Chang did a talk on this at last year's RStudioConf, and you can download that video, and there's documentation here as well. That is now on CRAN, which it was not last year.

What I am definitely going to talk about today is Shiny Load Test, which is load testing for Shiny, which is more exciting than it sounds. ProfViz is not new, but it is a profiler for R, and I think it's so important that I want to emphasize it again today. So I'll be doing some demos there too.

Plot Caching is a new feature that we've released in Shiny 1.2, which came out, I think, in December, and we'll talk about that in a moment as well as a way of speeding up plots. And one last thing that I will not be talking about today is Async, which is sort of a last resort technique when your Shiny applications have inherently slow portions, and I did a talk about that last year at RStudioConf. Again, there's a video available, and there's a very, very detailed website at rstudio.github.io slash promises. Unlike every other piece of software I've ever written, I am asking you if you are interested in Async to read the documentation in order. I have numbered the articles to kind of emphasize that. If you go out of order, nothing is going to make sense. So Async is hard. That's why it's the last resort. But if you need it, you really need it, and I encourage you to study carefully.

So we'll be just focusing on these three areas today, Shiny Load Test, ProfViz, and Plot Caching.

Setting performance targets with CRAN Whales

Now we're going to look at these three tools through the lens of Cranwhales . Cranwhales was pretty slow, and I'm just picking some random numbers here to set a target for us just to give a for instance. So let's just say we want to support 100 concurrent users for this application, which would probably be equivalent to an average app at a relatively large organization. And then, again, just picking numbers, let's just say we want to support this on one dedicated server with a 16-core CPU. In a lot of cases, you'd be running multiple servers just for redundancy, if nothing else. But in this case, just to keep it simple, let's say 100 users, 16-core CPU. So it'd be great if each of our processes could support between 10 and 20 concurrent users, although we'll definitely, if it's more, great. But let's just start with 20 as a test, and we'll see what happens.

So our overall high-level view of how we're going to test the performance of this app is we're going to start by using Shiny load test to see if this application is already fast enough. If it's not fast enough, we'll use Profvis to see what is wrong with it, what's making it slow. And then the real work begins of optimizing and figuring out how do we make that slow part not slow anymore.

The first and most common way I think that people should do it is to move the work out of Shiny. Don't do the work when a user is sitting there waiting. Do the work ahead of time. And especially don't do the same work for every user. That's extremely wasteful. I see almost nobody doing this on their first try. Almost everybody just says, I have CSV data, I've got gigs of CSV data, I'm going to load it up into my Shiny app whenever someone connects. Why is this app slow? So that's the reason.

Number two, if you can, make the code faster. There are things that you should avoid in R like mutating data frames in for loops. You should avoid text connections. You should avoid using the apply function to iterate over data frames. Sometimes you can find way, way faster ways of doing those things by vectorizing or using dplyr and things like that. Number three, sometimes you can cache. We're going to talk about that in a moment. And then if you really can't do those other three things, then you turn to async, which is a whole other area of discussion.

And once you've done all that, then you repeat. Then now you go back to Shiny load test and see if it's fast enough. And if it's fast enough, then you're done.

Shiny load test

So we'll start by talking about Shiny load test. What Shiny load test does is it generates large amounts of realistic but synthetic traffic to your application. And it monitors how long it takes that traffic to run, and then we can analyze the latency. So step one is you have to have your app running somewhere. So you can run it locally, or you can run it on RStudio Connect or on a Shiny server. Number two, you use Shiny load test in a browser to record what an average user would do, just whatever you think is representative. So click on whatever inputs you think a user would click on. Take as much time to think as you think an average user is going to take to think. And once you've done that recording, you can play back that recording with some level of concurrency. So instead of one user taking those actions, you can have as many users as you want doing those actions simultaneously.

And then finally, we'll take the results of that step three playback, and we're going to analyze them in Shiny load test. We have a built-in reporting feature, which I'll show you. Or it's just a data frame of events, so you can do your own analysis if you like.

So this is what that actually looks like. To run your Shiny app, we just call runAppLikeNormal. In this case, port 6104. And then we launch Shiny load test double colon record session in a separate R process and just point it at wherever your Shiny app is running. So that URL could be on Connect, it could be on Shiny Server Pro, or it could be local.

Now that we have this recording, recording.log, say, is where it goes. We use kind of a sub-command of Shiny load test called Shiny Cannon. And you give it this recording, and you give it a URL to pound on. And then you tell it how many simultaneous workers and how long do you want to run this test for. In this case, 20 workers for five minutes. And then it prints out a lot of log stuff as it goes to work. When it's finished, then you go back to R, you load Shiny load test, and you call load underscore runs to load the data. That gives you a data frame that you can then pass to Shiny load test report. And when that runs, it gives you something like this.

So it gives you a graphical static HTML file that gives you some indication of how long it's taking these sessions to succeed. In this particular case, what we're seeing is really, really bad performance. At n equals 20, we are in really bad shape.

So what you're looking at here is each row of this plot is a single simulated worker, a single simulated person. And the x-axis is time. The red blocks represent the time it's taking the home page to load. And then the blue blocks each represent a reactive operation. So you changed a tab, or you clicked on an input. You caused some kind of plot or something to be generated. And that's how long they had to wait. And the spaces between the blue boxes are the user's think time, where they're just looking at the screen.

So you probably can't see the x-axis, which maybe is a good thing, because that last value is 250 seconds. So we are talking about something that, in an ideal situation, would take 45 seconds, or maybe 40 seconds, when n equals 1. And when we get to n equals 20, it's taking five, six minutes to complete. So completely unacceptable. This application is way too slow to support 20 users.

So what do we do? Well, we could just throw more R processes at it. That's always an option. And if we run out of processes on a server, we could throw more servers at it. You laugh, but at some point, that is the answer. I mean, Google doesn't run on one machine. It runs on thousands of machines. So we do always have the option of horizontal scaling. But what fun would that be? So let's just make it fast.

Profiling with Profvis

So the next step, and a step that everybody skips, is to use a profiler. When your app is slow, use a profiler. When your R code is slow, in general, use a profiler. Do not guess. Your intuition stinks. I don't even know you, but it stinks. I promise you.

It's so easy to use profvis. All you have to do is go to RStudio. And at the top here, there's a Profile menu. Start profiling. Run your app. Wait patiently. Something interesting happened? Stop profiling. And that will generate a very intricate visualization of what was happening during that time that we were waiting. So that's all you have to do.

I have, in advance, prepared a profile. I'm not going to get too far into how to use profvis. Documentation is very good. But let me just say, once you learn how to read these graphs, the answers are right there in front of you. It is so easy. There's no reason to guess what the time is being spent on.

And what this visualization tells us, in this case, is that it's taking 6.3 seconds to read the CSV file alone. Just the parsing, 6.3 seconds, way too long. When we're calculating our first output, which is all underscore hour, it's taking 620 milliseconds just filtering and aggregating the data. And that's filtering and aggregating. That's going to be the same for every user who accesses that day of data. And then the plotting, just for that one plot, is taking a third of a second. So those are all pretty big numbers when you're trying to put apps into production for a lot of users.

Moving work outside of Shiny (ETL)

So as I said before, the most important optimization that you should be thinking of as a Shiny author is to move your work to happen outside of Shiny. Do work ahead of time. If performance matters, do it ahead of time. It's really tempting to load that raw data in, but you're going to pay the price performance-wise. So ahead of time, do as much filtering and summarizing as you can. And then when you save your data, instead of saving it as CSV, save it as Feather files if it is important for you to be able to read it quickly. Feather is incredibly quick to load data. It's a little bit slower, I think, than CSV if you're writing, but for reading, which is what we care about here, it's the best.

If your data source changes over time, then you're not just going to be doing this once. You're not going to be the one sitting down and just running these filtering, summarizing, and then saving to Feather on your machine. You need it to happen on a schedule. If that's you, then you can use RStudio Connect with scheduled R Markdown reports. That is a really excellent solution that has lots of benefits that I don't have time to get into now. If you don't have Connect or you're using open source Shiny server, you can also do something yourself with the Unix Utility Cron, for example. But any way you execute scheduled tasks, this is the way to do it.

And by the way, this sort of approach of preprocessing your data for downstream applications, you may know it by its acronym ETL, Extract, Transform, Load. And lots of organizations have entire departments devoted to this function. So if that's you, then maybe someone can do this for you.

What I've done is I've modified Cranwhales to use this approach. And the repo for Cranwhales is rstudio slash cranwhales. And I've made a branch called ETL that does some of the processing ahead of time.

So if this is our original performance with the original version, this is what it looks like after just that one change. Our overall time to completion has gone from a max of 250 and an average somewhere definitely north of 200 down to, I don't know, 60 seconds, something like that. So a really big difference. The size of each individual block has not only gotten smaller, but also gotten more consistent. So we're seeing more predictable, more understandable user wait times. And overall, we're looking a tremendous amount better.

So I'm going to pause for a second and bring up. Those are just screenshots, but I'm going to bring up. This is the actual Shiny load test web page that was generated. So this is that screen that we've been looking at. This is the visualization we've been looking at. And we can flip back and forth between these sync and ETL versions of the app.

And this view shows all of the sessions that were run, sort of arranged tip to tip. But this actually all happened over five minutes. So this view shows those sessions running where the x-axis is actual elapsed wall time. So you can see here that these 20 simulated users did not finish very many sessions. Maybe each one finished three sessions or something like that. If we switch to the ETL branch, a lot more sessions are getting complete. So a lot more work was able to be done because it was so much faster.

There's also this event waterfall view. On the left hand side, we're basically looking at the recording script that we generated when we kind of simulated being a user. That first line, you probably can't read it, says get home page. And then there are many lines, maybe a couple dozen lines, of retrieving JavaScript and CSS, which should be really fast and it is.

Everything from that kind of hump where they all go to the right down, those are all reactive operations. So that's the actual work of the application being done. And again, the x-axis here is time. So in an ideal situation, if our app is super fast, these lines would drop straight down. They'd drop straight to the ground. Instead, what we're seeing is as soon as they kind of finish the loading of JavaScript and CSS, these lines kick way out to the right, meaning that the user is waiting a long time to proceed from one step to the next.

If we switch to the ETL branch, these lines are much straighter. So you can see not only are they straighter, but they're sort of smoother and more consistent. So each of the individual sessions are behaving pretty much the same way.

I won't get too far into the other options available here, but there are a bunch of other things you can look at. This one shows the amount of latency for each home page request. And the top facet here is the original version, and then down here is the ETL version. So you can see a huge difference. And then the same thing for WebSocket traffic, which really means reactive computation. So reactive computations took a really long time here. And then for the new improved version, way better.

So let's just summarize how we did. So ETL, clearly, much faster. What we learned from that report was that the median session duration has dropped from 210 seconds to about 50 seconds, so a huge difference. And the maximum wait time was unspeakable, now less than 10 seconds.

So 10 seconds is good, but it's not amazing, right? So can we do better? We can start our second round of questions and just for time reasons, I'm not going to show you the profits this time. But what it basically shows us is that almost all the time you can see is in ggplot plotting at this point, several hundred milliseconds per plot.

Plot caching

So in order to make this faster, we now have to turn our attention to optimizing plots. And that is a particularly difficult thing to do for various reasons. What we've done with Shiny 1.2 is introduce this feature called Plot Caching. And if you're not familiar with the concept of caching, what it means is if you have an operation that is going to be performed multiple times with exactly the same results, maybe don't execute it every time. Execute it the first time, save the result, and in future times, you recognize that, oh, somebody's looking for the same result, and you serve up your already cached result. So we're going to do exactly this with plots. It requires Shiny 1.2, so if you want to play with this, you'll need to install from CRAN.

Now, plot caching is not for every Shiny application, so there are some criteria you need to meet before you can decide that this is going to work for you. Number one, you have to have slow plots. If your plots are super fast, then caching them is not going to make a huge difference. But we do have slow plots. There are several hundred milliseconds each. And those plots have to be a significant fraction of the overall slowness of your app.

If you're taking seven seconds to load CSV data, then speeding up the several hundred milliseconds of your plot, you can do it, but it's not going to have a dramatic impact. But for us, we are highly optimized everywhere else. We're just looking at plots remaining.

Third, and probably most important, because caching only speeds things up if somebody asks for a plot a second and third and fourth time, you have to have the kind of app where that's going to be likely to happen. If everybody's looking at individualized or random data or data that updates every second, then caching may not help you. But for us, anyone who's looking at a given day is looking at the same data. So check there as well. So we have an excellent candidate for plot caching.

Now, we've made this as simple as we possibly can. And I'm really proud of the work that was done here by Winston Chang. This is what a regular plot looks like. And this is what it looks like using the new caching feature. So number one, we change render plot to render cached plot. And number two, it's a little bit trickier to explain, but we have to tell Shiny a little bit about what are the variables that matter when it comes to forming this plot. In this particular case, we're plotting diamonds. We're going to make a scatter plot. And we are going to let the user decide what variable to set the color.

So one user may select Clarity. Another one may select Cut. And we need to make sure that we don't mismatch those two as being equivalent. So if a user asks for a plot by Clarity, we don't want to serve them a previously saved version that was based on Cut. So what CacheKey Expert does is it gives you, the Shiny author, the opportunity to tell Shiny, when you cache, these are the variables that matter. Now, we are actually doing some things under the hood. Like, if you are on a mobile device, we'll cache a different version. If you're on a retina screen versus not, we'll cache a different version. We'll actually change it based on the width of your browser. But the part that you need to care about is CacheKey Expert. But this is all you need to do.

So I did this seven times for my Shiny app. This is the ETL version that we have looked at already. And once we have made those caching changes and run the same test again, holy shit is right, there's nothing left. There's nothing left that you can see.

The little rectangles are still there. And I can see them on my screen. But I guess they're not showing up very well there. But the ending point is somewhere around maybe 20 seconds. So this is running significantly faster than the original one was with no load at all, which is not really that surprising. At this point, 20 concurrent users, what are we even talking about? So we are way, way beyond that.

So again, our original ridiculous version. This is ETL. And with caching, it's hardly anything to see. And I think it's most clear when you look at the latency here. That bottom one, I mean, there essentially is no latency.

Just for fun, I don't have the results here. Oh, actually, maybe I do. I ran it this morning with 100 concurrent users just to see what would happen. If we could do all 100 concurrent users that we want to support in one process.

And it is now visible. The latency is visible now that we've quintupled the traffic. But it is maybe borderline acceptable. I'd run maybe two processes with this branch.

Wrapping up

So that kind of summarizes these tools, Shiny Load Test, Profvis, and Plot Caching. I will make the URL for these slides available. So if you want to see any of these other URLs, you can see it. So when I do Q&A, I'll be showing that.

So with these tools, our goal is to make it much easier to run Shiny apps in production. But these challenges remain, cultural and organizational challenges. That'd probably be a subject for a pretty good talk. But it will not be this talk.

And finally, I want to leave you with this thought that deploying production apps, I hope you've gotten the sense that it's a real skill. And it takes experience to do this well. So be humble when you're deploying production apps. And especially the people in IT that are haranguing you and raising a skeptical eyebrow, they're not the enemy. They might be the enemy. But oftentimes, they're not the enemy. They have your best interests in mind. And they are trying to protect you from pitfalls that many, many companies and organizations and programmers before you have stumbled into. So if you have people in IT and engineering that you can treat as partners, you're a huge step ahead. And definitely lean on those resources if they're available to you.

And especially the people in IT that are haranguing you and raising a skeptical eyebrow, they're not the enemy. They have your best interests in mind. And they are trying to protect you from pitfalls that many, many companies and organizations and programmers before you have stumbled into.

I do want to give credit to the wonderful members of the Shiny team, past and present, and other members of the RStudio team who have helped with all the software that I demonstrated today. The Shiny team, I couldn't ask for a better team. We're really excited about the work that we've done in the last couple of years. And we're really, really excited about the stuff that we're doing this year. So thank you.

So you can download this slide deck at this URL. Oh, actually, if this URL doesn't work, just take off that Shiny in production. I don't remember if I set the slug right. But speakerdeck.com slash jchang5. And I also wanted to call out, Kelly O'Brien has started working on a Shiny in production book. Sean Loth and Kelly O'Brien did an awesome workshop the last couple days on Shiny in production. And they've taken some of their materials and they're condensing it into a book. So that's where the work in progress is located.

And with that, I will take seven minutes of questions. So if you've been to an RStudio conf before, you'll know we'll have these throwable microphones. And you will also know that I am very poor at throwing.