Barret Schloerke || Maximize computing resources using future_promise() || RStudio

Transcript#

This transcript was generated automatically and may contain errors.

Okay, so future is a wonderful R package that does like a lot of voodoo behind the scenes so that you can easily run code in a separate R process. And I think that in itself deserves a ton of praise because the API is so simple and is so useful. I actually don't even use the shorthand API because I like the very direct explicit future where you say, hey, run this code in the future, and then go get a value. And I think that in itself is a wonderful handshake and I really like this and I really enjoy embracing this.

This is very useful if you have like, hey, I run a, I want to run 100 simulations and I have 10 cores. Let's, you know, make a plan to use all 10 cores and just churn over those and we can actually use R on all separate 10 cores or in 10 different R processes. Hopefully your machine is smart enough to use all the cores. But like typically R is single, single worker is the way I want to think of it. And so if it's a single worker, you know, you churn through it serially. But if we were able to use future, we can now churn through it in parallel using as many workers as we have available. And I think that's awesome.

Setting up a multisession using the future package

It works out really well. And we can actually kind of take a quick peek at this. So we library future and then we can set up plan and say multi-session. It's just a good default. And then we can say workers is two for the demo purposes, or for the other example, I can set it to be 10. You know, that works great as well, or a hundred, you know, it doesn't matter, but two works. And then here I have my like simulations from one to 10 that I want to run. And I'm going to create the future. I'm going to set up the future. I'm going to say it was created, and then I'm going to return that future object. And then finally at the end, we will look at all of the values for each of the vows.

So let's library plan run vows. And it immediately goes into the first two are created right away. And then we start to go through this just a little bit slower, but in pairs of two. And that's really neat. Like if we're looking at it with assist sleep too, that should have taken what 20 seconds. But if we do a start is sys.time and end is sys.time. What is it? End minus start. Maybe this will work. Try it again.

So it took 10 seconds and this is awesome. We have two workers and well, it took nine seconds, not really 10, but we have two workers and it all finished up very quickly in parallel, not 20 seconds. That's the important part. We can get in later to like where that extra second went, but I'm not too worried. It's roughly 10. So that's the good part. And if we get the values, we have one to 10. Perfect.

This is really neat. But one of the things that was interesting is when you notice and you watched how it was working, I could not actually execute this end or this end minus start until the LApply had finished processing. That's a little weird because I'm saying, Hey, dear future, please run this in the future. And this is because it actually had ran out of workers to submit jobs to. So if I switch this and I say workers is 100 plan, I start do the vows and should go much, much faster. Oh, let's actually cancel that. Let's say workers is 10, because it's actually making 100 R sessions. We don't want to do that. Um, so let's set it up to be 10. Now they're created very quickly. And so let's actually just redo the whole thing because I didn't even hit the end time point. So it took two seconds, you know, submitting the jobs is not trivial, takes time. But it's a lot faster than if we are to do this serial. So that's good. And we can look at the values. Everything is great.

What happens when we run out of workers?

So that's a lot of fun. Like it's, it's really neat, like how all of that is able to be done. But there's that funny situation of when you run out of workers, and what happens, you know, when we run out of workers, and my thought is it just immediately returned. And turns out, it actually blocks. Blocking has very big implications for Shiny and for plumber .

Shiny will actually treat future processes very similar to a promise. And a promise is a thing in R that says like you can chain these promises, but you can also kind of interleave the processing of promises, promises kind of jump into later. But the idea is that you can, you can interleave the processing of promises. So if you have a job that takes a very long amount of time, you could actually break it up into a chain of multiple promises. And that would allow other things to work in between, rather than waiting for your job to finish completely.

This idea can also be scaled up for Shiny in that like, what if we have multiple users coming in? If this user comes in and says, please process my model, and it's one big block of code, no one else gets to do anything, all of your cores are being underutilized, nothing. And so instead, you could use promises to say, if you could break up that section of your model or processing of that model into multiple, multiple promises, then other users can come in and just say, oh, give me hello world, give me hello world, and no problem, they don't have to wait till the end.

So I think it's really cool. Future, we treat just like a promise, and it gets upgraded. And the result is, is handled accordingly. The bad part is future blocks when submitting a job if there's no workers available. So there is a brand new function in the promises package called future promise. And a future promise for all practical purposes is just, I promise to execute this in the future.

And a future promise for all practical purposes is just, I promise to execute this in the future.

Anytime you say future parentheses, just say future promise. Solve a lot of issues, and for the most part, as long as everything's scoped just like it should be, then everything will behave as expected. For the most part, it's just a one-to-one drop-in.

For comparison, let's see how the animation works for this one. So we have all of our one through six slow routes, and then a fast route. And the addition here is that we're going to have this promise queue of promise to compute with future. So it's a pre-queue, not a post-queue. So we have this pre-queue because there's no workers available, so we'll just say I will, I promise to do this. So we can now execute fast seven immediately, and then slow one, and slow two have finished processing after 10 seconds. And the order may be a little fuzzy here, but for the most part, they go in and out in pairs. You can only animate so well in Keynote.

But throughout the whole process, the main R session is free 98% of the time, not blocked 60%.