Joe Cheng | Managing long-running operations in Shiny

Transcript#

This transcript was generated automatically and may contain errors.

Hi. In this video, I want to talk to you about a new feature that we've launched in Shiny for both R and Python that has to do with taking really slow, long-running operations and making them a lot nicer for your users. So, to talk about this topic, we have to talk about slow code. This topic is not interesting unless your app has some part of it that is running slowly. Ideally, you wouldn't have any slow code. Ideally, any code you put into your Shiny app should be fast and responsive. And that's where I would start. If you have something, some code in your app, some operation that is slow, the first thing you should try to do is make it fast.

And there's lots of tips and tricks that you can use for making your Shiny apps fast that I have talked about in a talk I gave in 2019, and I'll put the link right there. But despite our best efforts, sometimes things are just going to be slow. We might need to call a slow API, for example. Or maybe you're training a large model directly within your Shiny app. Or you're compiling a huge dynamic report that's driven from your Shiny app that you then want to present to the user to download. All these things, they might just be slow and there's no way to make them faster. And when that happens, that can be a really big problem.

And one of the most difficult things about promises is that they're infectious.

That once you have a function that uses promises, well that function now has to return a promise in order to really work correctly. And that means that any function that calls a function that returns a promise, those functions also need to return a promise. And you end up in this world where you want to introduce this sort of asynchronicity at this particular point in your code where something slow is happening. And then these promises sort of ripple through the whole rest of your code anywhere that directly or indirectly relies on that long-running operations result. They all become this promise-oriented syntax that frankly is a pretty weird syntax to begin with. So that was the biggest problem, I think, is that it was hard to use because of this sort of infectious property.

But also, even if you went through all of that, if you learned how to use this syntax and learned how to deal with this infectiousness that happens, all of that still didn't solve the problem of intercession concurrency. The way Shiny async was designed, it really only helped you with intercession concurrency. So your app, your experience using an app, was going to still be sort of slow and blocked. But other people could connect and have their own experience that was not going to be blocked by you. That was the best that Shiny async could do.

How extended task changes the reactive graph

And to explain a little further how that worked and how Shiny extended tasks solved this problem, I'll need to use some visuals. And for that, I'll use some diagrams from the amazing book Mastering Shiny by Hadley Wickham , which is the best source for understanding reactive programming in Shiny that we have today. Mastering Shiny is only for R. We don't have a Python version right now. But hopefully you still get the idea from reading the chapter on reactive tracking.

So this is a diagram that we use to illustrate how reactive programming works in Shiny. On the left, these shapes here are reactive values or inputs. They are pointed on the right to show that they have a value that can be read by someone coming in from the right. On the left, we have the opposite. So these are outputs or reactive observers or reactive effects if you're in Python. So these are either outputs or code that is going to execute just for side effects. And in the middle, we have what are called either reactive expressions or reactive calc, depending on whether you're using R or Python. And these are things that can both read values. They can read reactive inputs or reactive values, or they can read other reactive expressions, and they can be read. So they have this sort of shape on both sides. They can both read and be read.

So this is the view that Shiny has of your app when a user first connects. It knows that there are these reactive expressions or reactive calcs, and it knows that there are these outputs, and it even knows that there are these reactive values or inputs, but it doesn't know how they are related. And what happens is Shiny will look for the first output or observer or effect that it can find, and it just starts executing it. This is denoted in this diagram by turning orange. So something orange is executing. So this thing starts executing, and pretty soon it makes a call to this reactive expression here. That has not executed yet, so it needs to execute. So it turns orange, and that is going to read from this reactive input. It's also going to read from this reactive expression, which reads this input, and pretty soon it's done executing, and it turns green. Everything to the left here has turned green, and next this output is done executing, so it turns green. Repeat that for the next output and the next output, and pretty soon everything is green. Everything has finished executing, and this is what we call being at equilibrium. So Shiny is done doing all the reactive things it knows to do right now, and is just waiting for you to do something at this point. And everything from the first diagram to this one is called a single tick, a reactive tick. Tick meaning like a tick of the clock.

And that's because somewhere in the depths of Shiny there's some source code that looks like this. There's a loop, an endless loop, that's sitting there waiting for input to appear from the user, taking that input and recomputing anything that's reactive that needs to be recomputed, and then taking any changed outputs that result from that and sending it to the browser. So all of that is called a reactive tick. One trip through this while loop is called a reactive tick. And notice that only at the beginning of this tick do we check for input changes, and only at the end do we send outputs to the client. So in between we cannot respond to the client or the browser in any way.

That means that our long-running tasks, like in this case I've drawn one of the reactive expressions really large to indicate like it's going to take a really long time, that code executing and taking a super long time is going to block this part of the reactive tick. It's going to block the recomputation, meaning we can't get to the part of the reactive tick where we wait for more input changes or we send outputs. We are stuck in the center of this reactive tick. So in order to fix this problem we need this while loop to be able to keep turning over and over and over without getting hung up in this recompute all affected things step, and yet still have the operation occur, still be able to use the results reactively. So how do we separate the task from the tick?

So ShinyAsync solves this by running multiple graphs concurrently. So they each have their own while loop going essentially. But you can't run multiple tasks within a single graph because the shape of the graph is unchanged. Like without changing the shape of this graph we will never escape this inability to do intra-session concurrency. So extended task changes the shape of this graph and it does it by taking this long running operation and actually splitting it into two parts. There's the part on the left which is an observer that is going to launch the operation and a totally separate but related piece that is going to hold the result value that can be read. And I've put a little emoji here to represent a background R process that's going to actually do the work for us.

So when the Shiny app starts the background process is sleeping. It hasn't been told to do any work yet and none of these relationships again have been established. So remember what Shiny does is when it first loads up and it sees all these observers and effects and expressions and things, it picks the first output it finds and starts executing it. And in this case it's going to kick off the extended task and the emoji changes to the sweaty guy because he is now working and super tired. He's working very hard. While that background R process is executing, the rest of the reactive graph is able to proceed as normal and we quickly get to this equilibrium. Everything is done executing as far as the reactive graph is concerned. Although this background task continues to execute in a separate R process. So this is great. This means we've completed the tick. Any outputs that are ready can be sent to the browser and we can start waiting for the next input from the user.

So let's say the user touches some unrelated slider or input or button or something like that. That's fine. That part of the graph can respond, can re-execute, can get back to equilibrium, can send results back to the user and then start waiting for the next input. So this is like when we added a new Microsoft stock quote to the application that was working with extended task while it was still busy. This still works which is exactly what we wanted.

Pretty soon this background task is finished so it changes to this smiling angel emoji and takes its result and brings it back into the reactive graph via this second piece of the extended task, this reactive value. And that causes reactivity to trigger in all the right ways and the outputs that depended on that result are now updated and everything is now at true equilibrium. Not only is the reactive graph at equilibrium but we have no more background task running. So now we truly are just waiting for the user to do something else.

Summary and alternatives

So to recap, with this approach we're able to achieve both inter and intra session concurrency unlike the previous approach of ShinyAsync. Extended task doesn't require you to learn a strange syntax like ShinyAsync so we sincerely hope that more people will be able to adopt this strategy for their long-running tasks than ever did with ShinyAsync and it will not be as invasive a refactor as ShinyAsync would sometimes force you to do.

Now keep in mind that for R, extended task still relies on the future package to put the task in the background. If the task is not happening off of the main R thread for the Shiny R process then nothing we do is going to achieve inter or intra session concurrency. We're just fundamentally limited to doing one thing at a time so we really need to use that future. There are a couple of limitations you should be aware of also if you're going to use this extended task feature.

In R there is no support currently for canceling a long-running task. Once you have invoked an extended task you can't or there's no built-in way to tell it to stop executing. There's also no built-in progress reporting. There is the ability to use a button that you click to start the operation and that button will say like processing but having like a progress bar that shows you how far along you are that is not currently supported either for R or for Python. We would really like to add both these features. On the R side it's going to involve working with future to make that possible.

Now one more thing I said you have to use future to launch your long-running task and that's sort of true. You do need to use future or something like future. There are a couple other alternative or complementary ways to run R code in the background. Now future is great. It's very convenient and it has like a very magical API I would say like you just say future and then put some code in it and even if that code refers to variables or packages that are outside of your future code block it'll just sort of figure out how to bring in everything it needs or tries to you know based on some crawling around in your environment and will work pretty hard to automatically make sure all those things are available automatically in the background R process that it launches. That's pretty cool and the other really great thing about future is that it's popular and has been been used for years by many people so there are pitfalls but those pitfalls are somewhat known.

Now the downsides of future are it has like pretty high runtime overhead so you do lose some performance whenever you invoke a future and the other thing about future is it's quite ambitious as a project. There are tons and tons of options. There are tons of extensions. There are lots of different um policies you can use with future for how it schedules so it's a lot and it can be its own thing to to learn which which can be uh make it a little harder to get started with. And finally it's pretty complex because of that automagicness because it automatically like teleports the data and packages you need to your background R process. That's a little scary and it can be doing things that like you weren't aware of in terms of you know maybe copying more data than you thought it would or accidentally depending on an object that really shouldn't be transported across. So just when it comes to that kind of automagic stuff sometimes it's hard to form a mental model for what's actually happening.

So one promising alternative slash complement to future is called Mirai , M-I-R-A-I, which I believe is just Japanese for future. And that it's a new package by Charlie Gao that is similar to future in that it can run code in a background R process. The good thing about this Mirai package is that it's super low overhead compared to future. It's very very fast and it's it's designed to be very simple and easy to understand so there's very little magic that it does for you. It doesn't magically slurp in whatever variables you happen to be using inside of your code chunk. It won't automatically load packages for you that it that it can see that you need. Instead you give it and basically expression that's going to be evalved in some R process and if that expression uses variables or even functions that are not just going to be provided by R then you need to provide them yourself. You need to tell Mirai and here are the variables and functions that you're going to need. Please make them available over on the other side.

The downsides of Mirai are it's still relatively new so it is not as battle-tested as future so there might be things that you know failure modes that we don't know about, might have bugs, and the fact that it has the simpler model and doesn't do all this magic for you is a kind of a double-edged sword. It's nice in that it's fast and simple to understand but it can be less convenient if you do have a lot of functions that you're using or a lot of values that you need to transport to the background R process. You then have to do that all explicitly.

And finally there's a package called CREW by Will Landau that you might know from the targets package and CREW builds on Mirai so it uses Mirai underneath and it adds basically a convenient way to launch multiple or many tasks both locally and on like standard HPC clusters that you might have especially in like pharma environments. I haven't used CREW that much but it is designed now to integrate with extended tasks somewhat and it has some examples in its documentation about how you can do that so if you have access to a big HPC cluster or you have many many tasks that you want to launch and manage together CREW might be the way to do it and you can do it using extended task as well.

So to wrap up avoid long running tasks in your Shiny app if you can. If you have slow code first try to make it fast or eliminate it altogether and if you can't then look to these tools like the new extended task feature in Shiny for R and Shiny for Python which will let you run these long running tasks in the background and provide you with inter-session concurrency and intra-session concurrency which will be a much nicer user experience for your Shiny app users.

If you have slow code first try to make it fast or eliminate it altogether and if you can't then look to these tools like the new extended task feature in Shiny for R and Shiny for Python which will let you run these long running tasks in the background and provide you with inter-session concurrency and intra-session concurrency which will be a much nicer user experience for your Shiny app users.

If you have any questions about this or anything else with Shiny please drop by our Discord or our forum and we are always happy to meet users and hear about how Shiny is or isn't working for you. All right till next time!

Joe Cheng | Managing long-running operations in Shiny | Posit

Transcript#

Demonstrating the problem with slow code

Introducing extended task

Code walkthrough: R

Code walkthrough: Python

History of async in Shiny

How extended task changes the reactive graph

Summary and alternatives

Featured software#

Shiny