Barret Schloerke | plumber + future: Async Web APIs | RStudio

plumber is an R package that allows users to create web APIs by decorating R functions using roxygen2-like comments. In the latest release, asynchronous code (using future or promises) may be inserted at any stage of a plumber route execution, enabling parallel processing using multiple workers. In this talk, I will go through how you can set up your own asynchronous plumber API to leverage your full computing potential. About Barret: I specialize in Large Data Visualization where I utilize the interactivity of a web browser, the fast iterations of the R programming language, and large data storage capacity of Hadoop

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, my name is Barret Schloerke and I'm a software engineer on the Shiny team at RStudio. Today we're going to talk about how plumber and future can be combined to create an asynchronous web API.

So let's consider an example where we have two R functions. Each function takes an ID, but the first one will do a fast calculation. We can think it'll take roughly zero seconds, and the second one will do a slow calculation. This one will take about 10 seconds.

Plumber can take these two functions and decorate them using roxygen2 -like comments. In this case, both of these functions can be accessed using a git route, but the first one will be accessed under fast ID, and the second one will be accessed using slow ID.

How plumber handles requests

Any questions? The ID will represent the order in which the request was sent, so if ID 1 will be sent before ID 2. Let's see how plumber will handle these requests.

In the receive stage, the request will be unpacked and be prepped for processing. In the processing stage, we will execute the user function, your function, such as the fast calculation or the slow calculation. And then finally, once they are done, we will bundle them up in the respond stage. The receive and respond stage must be handled by plumber, because that is very specific plumber logic.

So let's look at an example where we have four requests that arrive at roughly the same time. In order, it'll go a fast, then a slow, then a slow, and then a fast. Let's see how these execute.

Fast comes in, is processed really quickly, responded, and then done. Slow comes in, is received, processed, but processing takes 10 seconds. It is a long time, a very long time, but that's okay. That's how long that route takes. And then we'll finally respond.

The second slow route has been waiting patiently, but will also take another 10 seconds to calculate. Little long, but it's okay. That's how long it takes. And then finally, it will respond. And the fast route has been waiting patiently, processes really quickly, and then responds to its original request.

So in total, this will take about 20 seconds. The first fast route will be processed roughly instantly. The second request, which was a slow route, will take 10 seconds to execute, and then it will respond. But since R cannot process multiple things at once, slow three will have to wait 10 seconds before it can be processed. And finally, fast four had to wait a total of 20 seconds before it could even be ingested.

In our current situation, all four requests completed in 20 seconds. However, we accumulated 30 seconds of wait time. Ideally, we would like this to be zero.

Adding future for async processing

Let's explore how Future can solve this for us. So here we have our existing plumber API code. This is what it would look like if we incorporated it with Future.

Future is an R package made by Henrik, which allows for parallel and distributed computing in R. Let's talk about the three lines of code that have been added to our plumber API. The first line of code tells Future what plans should be used. In this case, we're going to use a multi-session plan. I'll leave it to you to figure out which plan is best for your situation.

The remaining code is used to tell plumber that, hey, this expression needs to be calculated using the Future R package. This allows for the main R session to not be blocked. A point of note, you must return the result of your Future call in your plumber router for it to take effect. Otherwise, plumber will not know what's happening.

Let's take a look at the execution time with plumber and Future. With just plumber alone, it took 20 seconds total, and we had 30 seconds of accumulated wait time. However, when we add in Future, everything is run immediately. This allows for the total execution time to only take 10 seconds, and we have zero accumulated wait time. This is the ideal situation, and everything is responding as fast as possible.

This is the ideal situation, and everything is responding as fast as possible.

Fast 4 responded 20 seconds earlier, and Slow 3 also responded 10 seconds earlier. This is a pretty good deal.

How the execution works under the hood

Let's dive deep to see how this execution is done. Plumber has the three steps that we talked about earlier, Receive, Process, and Respond. This is all done on the main R session. Only one route can be there at a time.

Future, however, can launch multiple R sessions, and we'll call them Worker 1 and Worker 2. In our use case, we're only going to be using them in the Processing step. So we must receive and respond from the main plumber R session, but we can offload the Processing to the Future Workers.

Let's watch to see how this goes. Fast comes in, it is processed, and we respond. Nothing new. No surprise. Slow 2 is offloaded immediately to Worker 1. Since the main session is free, then Slow 3 can be offloaded to Worker 2. This allows Fast 4 to be processed and respond immediately. Once Slow 2 is done, it can respond. Same with Slow 3.

That's a lot faster and a lot more going on, but in the end, we have no wait time. Let's look at that again. This allows Fast 4, since it's not being blocked by Slow 2 or Slow 3, Fast 4 can respond immediately at the 0 second mark. And since we can do things in parallel, Slow 3 is also being processed at the same time as Slow 2. Both of these do not block the main R session. This is ideal.

Limitations

However, there are some limitations to what we can do with plumber in the future. One, you have to manually add your Future Call wherever you deem appropriate. This is because adding the Future Call adds a little bit of execution time to each route. Now, if your execution time is normally in the seconds, and adding a quarter of a second does not mean much to you, then great, I think Future is a perfect candidate for that route. However, if adding a quarter of a second to your execution time dramatically increases it, maybe adding Future is maybe not the best idea, and instead we'll just keep it in the main R session.

The other one we have to remember is that processing power is finite and memory is finite. We cannot necessarily launch 100,000 workers on one laptop. That would be a little bit of overload, either with processing power or the memory. One of the two will limit what you can do on your machine.

I'd like to thank you for listening in on how plumber and Future can be integrated to help you create an asynchronous web API. Remember, offload your slow routes using plumber expressions. This will help keep your main plumber worker available to process incoming requests. Thank you.

Featured software#