
Lift Off! Building REST APIs that Fly (Joe Kirincic, RESTORE-Skills) | posit::conf(2025)
Lift Off! Building REST APIs that Fly Speaker(s): Joe Kirincic Abstract: Picture the scene: you've successfully deployed your ML model as a plumber API into production. Your company loves it! One team uses the API's predictions as an input to their own ML model. Another team displays the predictions in an internal Shiny app. But once adoption reaches a certain point, your API's performance starts to degrade. What can you do to help your service maintain high performance in the face of high demand? In this talk, we'll show some strategies for taking your API performance to the next level. Using two R packages, {yyjsonr} and {mirai}, we can augment our API with faster JSON processing and better responsiveness through asynchronous computing, allowing our services to do great things at scale at no additional cost. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
And I just want to caveat everything. This talk will be focused on R, but know that the strategies we're going to be talking about are abstract and could apply to any programming language, Python, etc. So what I'd like for you all to do is picture the scene. You've just launched your first machine learning model into production as a plumber API, and your team loves it. Word gets out over a few weeks, and suddenly other teams want to start consuming your API as well. Maybe they use it as part of a feature in their Shiny application. Maybe they use the predictions as a feature in another downstream predictive model. The point is that your new service, this wonderful service called Appy, has now created value across various segments of your organization, and you've done the equivalent of a data science grand slam, and all is good.
That is, until it isn't. You show up to work one day, and the Slack messages start pouring in. It's like, hey man, I was trying to prototype a new feature in Shiny, and I'm not getting any responses back from your API. What's with that? Or another team may message you and say, hey, we're getting cascading failures in our ETL pipeline because we can't get your predictions for our machine learning model or something like that. So we're starting to struggle at this point. And you and your manager, you go through the logs and you see that your app is still up, it's still running. The issue is that it's struggling to handle all of this new traffic that it's receiving.
So this brings us to an important question, which is, how can we make our plumber APIs more performant? Some of you may be in this talk, and you have never made a REST API before. Not even sure what REST is. And that's awesome. We're happy to have you here. A quick recap of what REST is, it's just a URL, but instead of returning a web page, it returns some junk that looks like this. Which is called JSON. And I want you to just take my word for it that this powers a lot of the modern web today.
Two strategies for API performance
API performance is a huge topic that spans all sorts of ideas. And we won't be able to cover all of them in today's talk. So instead, we're going to focus on two ideas, two strategies. We're going to talk about minimizing serialization costs and maximizing responsiveness with async programming. And we're going to cover these in particular because they will improve your REST API performance. They're relatively they require relatively minimal code changes, and there's great R packages for implementing them.
Minimizing serialization costs
So when I say that, what does that mean? So the idea here is that serialization costs are the time it takes for you to take an R object in memory and turn it into some format that you can send over the wire. Like JSON. Right? You can optimize the business logic of your API endpoints, but at the end of the day, there's going to be this serialization cost that you pay to turn your R object that's going to be the API response into JSON.
So there's an opportunity here to focus on to start to zero in on, like, how your objects are being serialized to see if there's an opportunity to improve performance. So to think about, like, why this matters, I want you all to consider a scenario. I have a party, and I'm going to invite 100 people to this party. 200 doesn't matter. And I have two options. I can either send the invites through a handwritten postcard or I can send them through e-mail. To send all of these invites through postcard, I have to go get the postcards. I have to sit there and handwrite 100 postcards. Then I have to put them in the mailbox and then wait for them to get to their intended recipient. Versus with e-mail, I can write my message essentially once and then send it to everybody on a distribution list and because of the Internet, it's essentially instantaneous.
We want our serialization cost to be closer to e-mail than to postcards. Because if we do that, we're going to be able to return our responses much faster. If we can return results faster, we're going to be able to handle more requests, things of that sort.
We want our serialization cost to be closer to e-mail than to postcards. Because if we do that, we're going to be able to return our responses much faster.
So sounds like a decent deal, good idea. How can we do that? So I have here a simple example of a plumber API that uses a, that uses what's called a serializer function to swap out the defaults and then get a performance boost. So how we go about doing this, the first step is find a package that serializes JSON faster. Out of the box, plumber uses something called JSON Lite, which is a robust battle-tested package. It's great. But in this example here, I'm using something called YYJSONR, which uses a very fast C library under the hood to read and write JSON in record times.
So once you've found your package of choice, then what you're going to do is you're going to write this little function, YYJSONRSerializer, and what that's going to be used is it's going to be used as a special function that you give to plumber so it knows how to turn your R objects into whatever you're serializing it to.
So with that serializer function, from there, it's very simple. You can go ahead and just use this serializer tag to say that you want to use your new serializer, and then your endpoint gets an immediate performance boost. There's no need to change the underlying business logic of your API endpoint. You just get the lift for essentially free.
Now, on this tiny app, you're thinking, like, this isn't a huge deal. But if you can imagine an API with 100 endpoints, then it starts to become, like, great from a maintenance standpoint as well, because now I can just change the serializer on my 100 endpoints, and I get improvement across all of them. So we go back to our little web service, Appy, and we go ahead and we change out his serializer, and he gets a nice speed boost. He gets a sick set of roller skates. So now whereas he was trudging along before, now he's gliding through a little quicker.
Maximizing responsiveness with async programming
But we're not done yet. We can improve Appy even more by using the second strategy that we have here, which is maximizing responsiveness with async programming. Now, to understand this strategy, we need to take a quick sidebar to understand how R and plumber works out of the box. So when you have our API here on the left and our incoming requests on the right, how does plumber process these? R, I mean, plumber is an R package, and R is single-threaded, meaning that it can process one instruction at a time. So when those three API requests come in, your plumber API is going to munch through each one of them sequentially.
Now for a lot of REST services, this, like, synchronous execution model is perfectly fine. Especially if you have your API has, like, low to moderate traffic and the end points are relatively snappy. There's plenty of synchronous web services out there today.
But when your traffic becomes higher, and your end points can vary in terms of their execution time, then this model of execution can start to get hairy pretty quickly. And to understand why, I want you to imagine that with these three requests, one is slower than two and three. Say one takes ten seconds, and two and three take two seconds apiece. The first request is going to take ten seconds, but now request two doesn't just take two seconds, it takes 12, because it has to wait for request one to complete. And what's worse, request three will have to wait a grand total of 14 seconds, because it has to wait for the other two requests to finish. And you can imagine beyond three requests, you know, if you're getting hit with a thousand requests, then you're going to start to dog pile your R process, and it's not going to be a good time.
So we need another execution model. And that's where async programming comes in. I like to think of async programming as just waiting or executing computations without waiting for them to complete. But, like, intelligently. It's computational multitasking.
To make it a little more concrete, I want you to imagine, like, two ways of going about your day. One is that you go, you brew a pot of coffee, you pour yourself a cup, drink it, then you get your bread, put it in the toaster, and then when it's done, you enjoy your toast. Versus the way I think the way that a lot of us do it, naturally, is you're going to wake up, you're going to start a pot of coffee, you're going to throw your bread in the toaster, and then you're going to wait for each of those things to complete, and, you know, maybe you'll sip on your coffee while the toast is finishing, but you're ultimately going to be able to do those two tasks simultaneously without waiting for them to finish.
And so what we want is an execution model that's closer to way two. And that's where async programming gets us, or what async programming gets us.
So how does this work for something like Appy? So what's going to happen is, instead of having just one R process that's running our API, what we're going to do instead is we're going to spin up a certain number of child processes. And what we do then is our main R process is going to function essentially as, like, a relay, where as the requests come in, our API will direct them to one of these workers where the requests will actually get processed.
And so in doing this, we end up with this nice effect where request one takes ten seconds, but now requests two and three now both take two seconds to complete, because they're sent to dedicated workers and there's no longer a line, essentially.
Implementing async with mirai
So that sounds like a good deal. That sounds good. So how do we go about implementing this in code? So to do that, you may have heard of something called mirai by now. It's a great package for doing asynchronous programming, and it's the one we're going to use in this example. But you may be wondering, well, why not use the future package? The future package has been around for a while. It's another durable battle-tested package that works in a variety of contexts, and it was used in the first ‑‑ it has been used for asynchronous plumber work and whatnot up until today. I'm choosing to use mirai in this example for reasons that are kind of like beyond the scope of this talk, but I'm happy to talk to folks about it later.
So how can we go about doing this? So we have this basic example. It's another hello world example. And the way that we're going to go about doing this is we're going to use a function called mirai from mirai called daemons, which is going to spin up four workers in this example, four R processes that are going to be lurking in the background there waiting to do stuff. And then what we're going to do next is our business logic, okay, and our API endpoint is then going to basically get bundled up and passed into another mirai function called mirai. And what that mirai function is going to do is it's going to take the R expression or, like, the business logic that you've passed it, and it is going to run that code in the child R process.
And then the important thing is that when it's running or after it runs, it's going to immediately return something. It's going to return something called a promise. And the way to think about a promise is that it is a placeholder for the eventual result of a computation. Now, this notion of a promise, I'll kind of hand wave here, but the idea of a promise is what allows our API process to remain responsive to other incoming requests.
Because when the promise returns or, like, when we get our promise back, the main R thread is free to do other things, like intercept another inbound request. And the nice thing about mirai is that once this promise is ready to go, when there's a result able to be fetched, it's going to inform the main R process directly, and then R is just going to send that result back to the client that requested it.
So we go back to Appy, and we instrument some of his code with some async programming inside of it, and he gets another speed boost. He gets this sick jet pack. So now he's not just rolling along the street. He's rolling with, like, a Mach 8 engine or something like that, something real fast. And the end of Appy's story is that with these two strategies, it was sufficient for the web service to scale to meet the demands that our other teams were bringing to it, and all is right with the world again.
Recap and takeaways
Now, with that, what have we learned from Appy's story? So just kind of recap. We have these two strategies that we talked about, which is minimizing serialization costs, you know, really try to drive down the time it takes to turn our R objects into something that can be sent over the wire, and two, to keep our main application responsive by using asynchronous endpoints or async programming in general.
And then there's kind of like this third point that I think underlies, like, the two of these things, which is, you know, R is much more powerful than you think it is. We have a lot of sophisticated tooling today that allow us to make very powerful, Shiny apps, or in this case, plumber APIs. So a lot of the times, like, folks on Hacker News may tell you that, like, R is great for prototyping but not production. They'll tell you to take your API and write it in, like, Rust or something like that. Myself and other people at this conference are here to tell you that you don't have to do that. A lot of API design or solid performance is about solid design, not programming languages. So I encourage you all to take some of these strategies that I've shown you today and use them in your own projects and see how they can take your own web services to new heights.
A lot of API design or solid performance is about solid design, not programming languages.
Q&A
So what was the learning curve to mirai for someone who hasn't used async programming before? And is future simpler for a newbie to learn the concept of async? I do think that there is a learning curve with async. You do have to kind of wrestle with it for a little bit. I think that you can go either way between mirai and future. Both of them have just great APIs for, like, working with them. They're very user-friendly. So you can't go wrong with either one.
Okay. And is there a lot of overhead cost in using mirai, startup, et cetera? Yeah. So that's a great question. Because the overhead, right, can come from the idea where now that we have these other processes, you have to send the data, right, your requests over to these processes. And there's some overhead with funneling data between processes on the server. In my experience, like, running some local testing before the conference here, I actually noticed that with the more recent versions of mirai, like, the overhead cost is, like, next to nothing, which is really cool. So I wouldn't worry too much about it.
Amazing. We have one more question here. So does mirai run on a single thread? If so, does this affect the user experience and UI responsiveness? So the important thing is that so with R, everything is always single-threaded. The idea is that instead of doing multi-threading, we're doing what's called multi-process, which is where so, like, if you spin up this API with several workers and you open, you know, like, activity monitor or Windows task scheduler or, like, whatever the monitor is, you'll see that along with, like, your API process, there will be, like, other R sessions that are running in the background. So it really shouldn't impact the user experience in the way that I think, like, multi-threaded code could, but...
Great. And I think we have one more question here. Why are faster serializers not the default for plumber? I feel like I might as well just kick that over to Thomas at this point. Yeah, he's better got ass than me. The only thing that I'll say is that, you know, choosing default dependencies for packages is a really complex thing, right? Because certain packages may be really fast, but they may only have, like, one maintainer that doesn't really work with the project anymore. So you got to choose your dependencies wisely there. Great. Thank you so much.

