Winston Chang | Asynchronous programming in R | RStudio (2020)

Transcript#

This transcript was generated automatically and may contain errors.

Thanks, everybody. Okay, so my talk is entitled Asynchronous Programming in R, but I have to give a little disclaimer. It's sort of, but not really about asynchronous programming in R. When I was developing the materials for this talk, I realized I couldn't... Asynchronous programming is such a huge topic, I would have a hard time really explaining it in the way that I found satisfying in 20 minutes. And then I thought about all these other topics that come along with asynchronous programming, in my experience and the stuff that I work on.

So there's also parallelism, concurrency, and event-driven programming. So let me define these for you in case you're not really familiar with these. So asynchronous programming is when you call a function, it doesn't block. So normally, when you're writing your R code and you're running it, it steps through, it does each thing, and if you do something that takes a long time, like, let's say, you tell R to go download a file, it stops there, and then once the file is downloaded, it continues and your script continues to run. In an asynchronous program, you might, if there's an asynchronous download, what it would do is, you're essentially, it would say, hey, go download this file, some other thing in the computer, go download the file, and the code would keep running, and then later you'd check, hey, is my file downloaded, or maybe you might say, or it might run a callback and tell you, hey, the file is done now. That's asynchronous programming.

Parallelism is when you do multiple things at the same time. That's very common on modern computers with multiple cores. Concurrency is when it seems like you're doing multiple things at the same time, but you might not actually be doing multiple things at the same time. So if you're familiar with JavaScript in a web browser, that's single-threaded, and it can seem like a web page is doing a lot of things at once, but it's really splitting its time in between different tasks and switching between them very quickly, so it just seems like it's doing things in parallel. And finally, there's event-driven programming, where you have events that occur, there might be some outside signal, and that causes some code to run.

So I was thinking about this, and asynchronous programming was too big, and so I thought I'd try to talk about all of these a little bit, or at least actually a common thread that runs through all of them, which is a package called Later. And Later was originally created by Joe Chang, and I don't know if it's a coincidence, but it was a conversation with him that helped me settle on this topic. So thanks, Joe.

Introducing the later package

All right, so I'm gonna show you a demo here of... Let's say I'm creating a data frame called data, and I've populated it with some X and Y values, and this plot magically appeared in RStudio here. And if I modify it, so I square all the X values, that plot redraws, and now I have a parabola shape here. And I can do the same with Y, and if I restore data to what it was before, it re-plots again. So there's something going on here. It's not any magic from RStudio, the IDE, but it involves later.

And what you're seeing here is sort of, at least from the user perspective, this is event-driven programming. So I'm not telling it to re-plot every time I change data, but every time I change data, it causes this plotting code to re-execute. And I ran that, actually, ahead of time, before I started this talk, and I'll show it to you in a little bit.

But let's talk about what the later package does. So later provides something called an event loop. An event loop is a queue of functions that will run in the future, and it's very similar to setTimeout in JavaScript, if you're familiar with JavaScript. And if this is confusing to you, I will show you a very simple example of how later is used. So you load the later package. I'm setting a flag to tell me when I'm... To signal when something is done. And then I say, later, run this function here, which prints out a message and updates the flag, after five seconds. And then at the end, I'll say, while I'm not done, run now. It means keep running these functions that are in this event loop, in this queue.

So let's do this. Let's run this stuff here. And you won't be surprised to see that after five seconds, it prints out this message. Hello, world. And I know there's people out there who have already figured out how to write this in about six lines of code to implement this. So this part is not really that difficult. But what later... What you might be a little bit more surprised to see is that if I run this same code here, and I don't do run now, and we wait five seconds, it will also print hello, world. So later has some C code that runs. And when your R console is idle, when the call stack is empty and you're not running anything else, it will continue running this event loop.

So later has some C code that runs. And when your R console is idle, when the call stack is empty and you're not running anything else, it will continue running this event loop.

Okay. So that plot watching code that I showed... Or that... You saw the plot watcher before. And this is the code. So what it does is, first... It's pretty simple. So I'm just setting data to null and the last value to null. And then I have this function called plot watch. And if the data's not null, and if it's different from the last value, then plot it, and then update the last value. So that's all really standard R code. The thing that's different is, right here, I call later plot watch. So this function is rescheduling itself to run after a quarter second. And then after we define the function, we have to kick it off. We have to get it started by invoking the function once. And then... And it's doing what's sort of... I guess you might call it a polling loop, where it just keeps running every quarter of a second. And every time data changes, it executes the plotting code.

All of these use later. That's what makes it all possible for them to run concurrently. And this is all in one R process.

Okay. So one thing... I took a look at the later CRAN page. And I looked at the packages that we're using later. And one thing I noticed was that even though it's been out for a couple of years, there's not that many packages that use it. And all of them are maintained by people that work at RStudio. So I'm hoping that... We have a lot of this knowledge internally from... We've got battle scars working on this stuff. But hopefully... Hopefully if you're working on async programming or parallelism or concurrency, this will be useful for you. And I have the URL here. It's not... The materials aren't actually up there right now, but they will be. Thank you.

Q&A

Thanks, Winston. That was fascinating. I can assure you that I am using your work. I have promote writing on a scheduled basis to take photographs of the work my colleagues are doing. For those of you that are leaving, we are just carrying on with questions. So please do so quietly if you can.

Question number one. Does Later deal with the issues of tail call stack optimization that could arise from the recursive event loop format? It does not. That's actually a great question. You have to be... I wanted to mention it, but I didn't have time. If you're going to be calling the function itself, you have to make sure not to create any closures that will keep increasing the depth of the call stack. So you have to create a function outside of your function that you're calling.

And the second question is, if Later is running and you want to run another line of code, does the new line you are running jump ahead of later, or does it wait those five seconds to run? So if you... It will jump ahead of later. So after five seconds have elapsed, even whatever you've written, whatever you've done before, it won't wait an extra five seconds. It will wait for five seconds total, unless R is occupied at the moment, that callback would occur.

Thank you. And last question. It's getting quite a few votes. Is there a way to prioritize different asynchronous streams? Is there a way to prioritize different asynchronous streams? That, I'm not sure I fully understand the question, so I'm sorry I can't answer that. Winton, thank you very much. This was fascinating. Thank you.

Winston Chang | Asynchronous programming in R | RStudio (2020)

Transcript#

Introducing the later package

The C API and thread safety

WebSocket client demo

HTTPUV web server

Remote R console and Chromote demo

Summary and the later ecosystem

Q&A

Featured software#

rstudio