
The Future of Asynchronous Programming in R - Charlie Gao
Asynchronous programming can be a powerful paradigm, whereby computations are allowed to run concurrently without blocking the main session. It is an opportune time to survey the current landscape, as R infrastructure in this respect has matured significantly over recent years. Instead of running a script sequentially from top to bottom, logic that takes a long or unpredictable amount of time to complete may be offloaded to different R processes, possibly on other computers or in the cloud. In the meantime, the main session may be running constantly and non-interactively, performing operations in real time, synchronizing with these tasks only when necessary. This style of programming requires a very specific set of tooling. At the very base, there is an infrastructure layer involving key enabling packages such as later and mirai. It will be explained at a high level why these two packages together currently offer the most complete and efficient implementation of async for the R language. There are further tools which expand async functionality to cover specific needs, such as the watcher package for filesystem monitoring. There are then a range of tools built on top of these, bringing async capabilities to the end-user, such as the httr2 package for querying APIs and the ellmer package for interacting with LLMs. In addition to these existing tools, exciting developments in asynchronous programming are just around the corner. These will be previewed, together with speculation on what might be possible at some point in the future
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you for joining this session on The Future of Asynchronous Programming in R. I'm Charlie Gao, an open source engineer at Posit. I work on all things asynchronous, both in the tidyverse and in Shiny.
I'm going to start the talk today by sharing a slide on email. Now, email is something that every one of you is familiar with. And in terms of receiving email, there are two principal ways of doing so. Either you poll for the results at intervals or you can receive push notifications.
Polling is where your client checks the server every 15 minutes. For example, this used to be the norm 10, 20 years ago. But even now, sometimes it's the only way to poll servers where there's no interface for push notifications. But however, if you're using a large provider, often you will get event driven push notifications. So as soon as email arrives on the email server, you will get notified as soon as that happens. This is obviously the more modern approach.
Now, if you use something like the Gmail app, then you'll often get push notifications for your Gmail account. But for all your other accounts, it will use the polling method and you'll get those emails with potentially a 15 minutes delay.
Now, it turns out that polling and push are two main ways to think about async in general. If you notice with the email example, we don't always have the email app open on our phones. This all happens in the background. So similarly, we can use this analogy when we're thinking about async in our programming.
Introducing mirai
So I'm going to use mirai to demonstrate the two ways that we can deal with async. mirai is a package that I wrote that runs R code in the background very efficiently without blocking the current R session. It's rather straightforward to use. You simply wrap the expression you want to run with mirai and that returns a mirai object immediately without blocking your session.
So once you have this object, how do you get the result of the background computation or how do you even know whether that computation has finished or not? So if we take the polling approach, the way to do this would be to call unresolved on the mirai object. And if that's still ongoing, then it will return true, as we have here.
Now, how do we use that to use mirai in an asynchronous fashion? Well, we can use a loop like this. So while the mirai is unresolved, sleep for a small amount of time and then check again. And when you exit from that loop, you can actually access the data. And instead of just printing the data, you would likely pass that to another function which uses the result.
And similarly as well, instead of just sleeping for a while, you can actually do work concurrently with while this mirai is being computed in a background process. So if you look at this while loop, it's very much like checking email every 15 minutes. Except here, we're checking much more often than that because the computations we typically deal with are in the seconds or minutes. So we want to have lower latency.
How mirai handles task completion
Now, before we go too far with this email analogy, I want to delve a little bit deeper about what actually happens with mirai when tasks complete. Because actually a mirai is much more asynchronous. And what I mean is with email, when you check if there's new email, you're actually sending a request to the server and then that replies. If it has email, then it actually sends it back to you. That is not what a mirai does.
When you call unresolved, it is not sending anything to the background process. Instead, what happens is when the background task completes, it will send the result straightaway back to the main process. And this is handled at the C level by a background thread. A completion callback checks the error code. If that's successful, it stores a pointer to the binary data that's already been received and it stores the completion status in the mirai object itself. This is all done on the background thread in parallel to whatever's happening on the main R thread.
This means that later when you call the R function unresolved or try to access the data element of a mirai, all that does is check the completion status of the object. If that's success, then it checks if there's binary data. If there is, then it simply un-serializes this into an R object.
We can see that this is much more efficient because we're doing as much as possible ahead of time. The only reason we actually have two stages is because we can't do the un-serialization and creating of an R object on the background thread. Because R is single-threaded, that needs to be at a time when we're actually on the main thread.
Event-driven async with promises and later
Does this help us also implement event-driven async? What can we do there? Well, we can, for example, use the brackets method for a mirai object to actually wait for and collect the result. This is efficient in that there's no constant checking in a loop, but it also blocks the session. So this in itself is not a solution for event-driven async.
For that, I will need to add to my async toolkit. So the packages that I will need here are promises and later. Now, you can see that there are no hex logos for either of them. So that should tell you immediately that these are serious low-level packages that do the real heavy lifting here.
So in terms of promises, that provides a high-level interface for creating these async functions. And what a typical signature of the key function there looks like is you wrap an object like a mirai in as.promise, turn that into promise, and you can then access the then method. And you pass a function to that. So what this means is as soon as a mirai completes, then call this function on it.
Later is a lower-level package, and it actually provides some of the key implementation for the async. And the signature for the main function there is later function and seconds. And what this does is it schedules a function to run with a certain delay.
Now, how does mirai promises and later work together to give you event-driven async? Well, let me give you an example. First, not of event-driven async, but of polling async. So what you see here is the actual as.promise method for a mirai as it was about two years ago. Pretty much the same. Simplified a little bit for clarity, but the core logic is the same.
Let me go through this line by line. So this is the as.promise.mirai method. We create a promise, and we pass it a function with resolve and reject. I won't go into that details of why. But just note, inside that function, we define another function called check. And what this does is it calls unresolved. And if the mirai is unresolved, then it will use later to schedule check again in 0.1 seconds. And then if it has resolved, it just takes the value, extracts the value, and it resolves or rejects the value. And that's the definition of the check function. And we actually call check to kick things off. But essentially, you can see that this schedules itself to check every 0.1 seconds via later, whether the mirai has resolved or not.
So apart from polling, you can see that it's also not very efficient, because if we have something that takes quite a long time, then we're checking potentially hundreds, thousands of times. And this is just one promise. If we have a lot of promises in whatever we're building, then we could be spending a lot of time doing these checks.
So how do we turn this into event-driven promises? Well, we can see that this is, again, it's a simplified version, so it's not the entire story, but the core logic is accurate. You can see the stretch is maintained pretty much the same. But just scanning this, you can see there's no check function that repeats itself. There's no 0.1 parameter anyone in here.
So what's changed is the core logic of your promise function is if the mirai is unresolved, then you call this dot keep. And dot keep is a special internal function which tells essentially the mirai completion callback. So if you remember that from a few slides ago, this is something that's happening independently of the R thread, which processes as much of the completion as possible. It basically tells that callback to additionally call into later via the C interface to resolve the promise. So later has a C interface, which means that we're continually dealing with C code. We're not calling any R code, and that lets us do this completely independently of whatever's happening in R. And what this means is that the action is scheduled as soon as the mirai completes. We don't need to wait until unresolved is called or anything like that. And you can also see that dot keep is called once only. So not only do we have event driven, it is much more efficient as well because we're not potentially calling unresolved thousands of times.
So not only do we have event driven, it is much more efficient as well because we're not potentially calling unresolved thousands of times.
Ecosystem upgrades
And with event driven async, we've upgraded the entire ecosystem. So these are packages that we maintain at Posit, and we're really pleased that we've upgraded everything that we work on.
The shiny, shiny at its core has always been event driven async, so I don't mean that. But we've updated our documentation for promises and for extended tasks so that users who want to run their own async code can do so via mirai in an event driven fashion. We've also added async file watcher to shiny so that when you're developing an app and you change files, then that will cause the shiny app to auto reload.
With Plumber, you've always been able to use mirai with Plumber, but with Plumber 2, which is under development, we've implemented async at its core. So there's a nice at async decorator tag that you can apply to your functions to automatically turn those to use event driven async with mirai.
Now, if we look at the bottom row, we have httr2, our HTTP client. We have ellmer, our LLM client, and we have ShinyChat and MCP tools packages for interacting with LLMs. They have all been upgraded to use event driven async.
Going back to the top row, Purr has integrated with mirai to provide it with parallel map. We're working on a future edition for long running operations, which we're looking to make async. Tidy Models as well is using mirai to parallelize model runs. Again, in the future, there is potential to make some of those operations async.
And that's all I have time for to share with you today. We're really glad at Posit to be working on making async more advanced and easier to use for everyone. And I'm just really happy to share with you that we now have push notifications in R. Thank you.


