{mirai} and {crew}: next-generation async to supercharge {promises}, Plumber, Shiny, and {targets}

Transcript#

This transcript was generated automatically and may contain errors.

Good morning, everyone. I'm Charlie Gao . I'm author of mirai, an asynchronous evaluation framework for R. I'm really glad to be joined here by Will Landau. He's not quite on stage, but will be.

Will Landau is the author of Target and also the author of crew, which extends mirai to high performance computing. We'd also like to think that we're joined in spirit at least by this guy, Zhou Cheng, creator of Shiny . You'll notice he's not on stage because he's at this moment on another stage talking about some other little project that he's been working on, something about bringing Shiny to Python. So it will be down to me to do the big reveal on another little project that we, all three of us, have been working on collaboratively, just a little something to advance the state of async for R and R Shiny. And after I'm done with my part of the presentation, Will is going to highlight some of the important scientific work that all of this is supporting.

What is async?

So going back to the title of the presentation, what do we actually mean when we say next-generation async? Or what do we mean by async in the first place? Well, let me give you a very simple definition, which is if we say that parallelism is the ability to do multiple things at once, then async is just not waiting around while that's happening.

Now, this definition, this concept isn't very controversial in a lot of other programming languages. So even if you've never programmed in Rust or Go or JavaScript, you might have heard that async generally works very well in those contexts. Most modern websites have some JavaScript in them. So if you open a web page, perhaps on your mobile, and you click a button, you generally expect it to just work. When you're hitting that button, you're not likely to be wondering whether that thing is going to hang on you.

And that conveniently brings us to where we are now in R. So if I bring back this definition of async, so we have this in mind, then I suggest that perhaps most of us are missing what I term a first class async experience. And why do I say that? Well, I think for most of us, our typical experience with parallelism is with the parallel package, which has been part of base R for over 20 years now. And that is simply parallel and not async. So you send tasks to a lot of parallel workers, but you're waiting until they all happen, and then you collect the results at the end.

Of course, there are many other excellent packages, such as Polar, which is used for local parallelism because it relies on writing and reading files from disk. But again, that's not always async. If you create what are called persistent sessions, then if those sessions are all busy doing jobs, you cannot send another job onto those sessions. And that actually is the same thing with the future package, which also promises async. But again, simply if you have more tasks than the number of workers, this will actually block your session.

Introducing mirai

So what would it actually take to bring first class async to R? Well, Nanomessage Next Generation, or NNG, implements async in C. And this is async on a par with what I've mentioned before in Go and Rust, etc. It's a very lightweight C library, and it implements a brokerless model. And by that, I just mean there's no need for a central server anywhere. And Nanonext , the package, brings NNG to R.

So coming now to mirai, the star of the show, well, at least my part of the show. And in case you're still wondering, mirai is just simply Japanese for future. And it's an async evaluation framework for R. Again, if I just bring back the definition of async so we have this in mind, then mirai uses Nanonext to deliver true first class async. And what I mean by that is you can connect thousands of parallel processes and launch millions of tasks all at once. So many more tasks than processes. And because mirai uses Nanonext, which is so lightweight, the response times to these processes reduce right down from a millisecond to the microsecond range.

And what I mean by that is you can connect thousands of parallel processes and launch millions of tasks all at once. So many more tasks than processes.

This continuum between transient persistent workers is something we've never had in R before. And with crew, you can hit any point in the middle.

Managing tasks and plugins

To manage tasks, this is, in some sense, the easier part. Because all the configuration optimization details are part of the controller. At this point, crew has standard verbs to submit tasks, push. In push, you give it an expression in R, the data it needs, and it submits the task to a worker. And the pop verb gets any results, gets the result of a task, if it's available. And there are functional programming verbs like map and walk and collect to work with multiple tasks at once.

Like I mentioned before, crew can plug into a lot of different high-performance computing systems. Slurm and Grid Engine are a couple examples, but also AWS Batch. And in addition to the settings I described earlier, it's just a matter of plugging in the configuration details for how you want to access Batch, your job definition, job queue. Networking may be a little tricky, but that's at the platform level. And happy to talk about that afterwards.

crew is also designed for you to be able to write your own plugins. If you know how to launch a worker or terminate one in your system, then you can write one of your own. And crew is designed for users to be able to do this. And there's a whole package vignette that describes it.

crew with Shiny and targets

Now, as Charlie mentions, a lot of this enables really cool stuff to be done with Shiny. And so far, I've only been talking about parallel computing, but async computing is equally possible. So this controller push method returns a mirai promise. And, well, it returns a mirai task object that automatically becomes a promise when it needs to. And these promises automatically invalidate Shiny reactive expressions as soon as the promise is resolved. And you get really responsive apps. Every time this text is changing on the screen is a promise resolving and triggering an update. This can happen really fast. And it unblocks the current app session when this happens. So even if you're running a single app session, or if you're running multiple app sessions for a single app, regardless, you see really high responsiveness like this.

Up to this point, I haven't even mentioned targets. And targets is really the tool I designed crew to support. It's the primary use case. And the previous talk mentioned pipelines arranged in a directed acyclic graph. That's exactly the contribution of targets. And all you need to do to use crew and all the distributed computing and parallel computing is to supply a controller. And the rest targets takes care of. And there is a whole vignette on that as well.

To recap, mirai is this fantastic, blindingly fast parallel package with first class async that we have needed for a long time. And crew tries to take this to the next level by plugging it into high performance computing systems where people doing scientific work need mirai the most. Thanks very much.

Q&A

Well, thank you both. That's incredible. You have my mind racing on a bunch of different ways I want to apply it into my normal day to day. So we have time for maybe just one or two questions from the Slido. Are there any hardware requirements for mirai and crew?

Not that I can think of. It supports R3.6 and onwards. And for crew, it really depends on the plugin. So if you want to use it locally, there's a local controller. And if you use a plugin for, let's say, Slurm, then you'll need access to the Slurm. But that's very situation specific.

What thoughts do you have about Coro and async await?

So mirai supports Coro. You can use it instead of the teacher or any other asynchronous function. Plugs well into Coro. I've tested it with Neonel, the author of Coro. And yeah, it's just you can use Coro to write a more succinct code than if you were to chain promises. A good way to use mirai.

{mirai} and {crew}: next-generation async to supercharge {promises}, Plumber, Shiny, and {targets}

Transcript#

What is async?

Introducing mirai

mirai promises for Shiny and Plumber

crew: extending mirai to high-performance computing

How crew works

Autoscaling

Managing tasks and plugins

crew with Shiny and targets

Q&A

Featured software#

mirai

plumber

R6

Shiny