
Charlie Gao: Advances in the Shiny Ecosystem
Charlie Gao, Senior Software Engineer on Posit’s open source team will review some of the latest high-performance async tooling developed by Posit to support R Shiny in terms of performance, scalability and user experience
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
and the Open Source Team at Posit. Thank you very much. So I'm Charlie Gao. I'm a senior software engineer on the Open Source Team at Posit.
Before I start my talk, I'm going to start by actually answering the question from the last talk, in that, you know, do we need to actually know Colin personally to be able to get a talk at this conference? And the answer is, well, yes, you do, to get away with a title like that, Advances in the Shiny Ecosystem. And nobody knows exactly what I'm going to be talking about, not even Colin.
And so if you don't know me yet, that's perfectly fine, because I am the newest member of the Tidyverse. I joined Posit officially earlier this year. So if this is the first time that you're listening to one of my talks, that's perfectly natural.
So my place is in the Tidyverse, but I also work extensively with the Shiny team itself, so much so, in fact, that I am actually an honorary member of the Shiny team.
So you're here at Shiny in production, so I assume that you probably actually use Shiny in some form in your day-to-day, and so I just want to spend a moment on this photograph, which was taken just a few weeks ago at PositConf, and there's me with the whole of the Shiny team. I just want you to see, you know, the people who actually write the software.
So on the other side from where I'm standing is Carson Sievert. He is actually taking over as the official crown maintainer for the Shiny R package. Next to me is Barret Schlurker. He's one of the core Shiny devs. He does a lot of the hard stuff, so if you've ever worked with Shiny Test 2, that's made by him. And next to Barret, of course, is Joe Cheng himself, the creator of Shiny, and next to him, Winston Chang. So the two of them have really driven development of Shiny over the years.
Now there's one person that's missing from this picture, and that's Garrick Arden-Bouie. In case you've been wondering, he is still a member of the Shiny team. He just didn't make it for this photo.
Overview of the talk
So this is what I'm actually going to be talking about today, and it's going to be a talk of two halves. First, I'm going to talk about async, because for those that do know me, this is probably what you're expecting me to talk about, in that this is sort of my area of expertise. But in the second half, I'm going to talk about open telemetry, and this is all about observability at scale, and I'll talk about what that means when I get there.
And this, in fact, is what the Shiny team actually want me to talk about. But no, jokes aside, I'm excited to be talking about both, and the reason I'm talking about both is because these are initiatives where we've brought this concept, or we've brought advances in these concepts, and rolled them out across the ecosystem, so to all the packages that we at Posit maintain, and we maintain quite a lot of packages. So not just Shiny, but across the ecosystem.
Async and Mirai
So first, in terms of async, I'm mainly going to talk about async in the context of Shiny itself. On the left here is the documentation for using promises in Shiny. Some of you will have come across that, because it's been up for seven or eight years. It's written by Joe Cheng himself. What we've done earlier this year is update it to assume that when you're launching these async tasks, you're using Mirai to do so, which is right at the bottom. And then on the right, what is Mirai?
Mirai is an RLib package. It's the package that I created to essentially bring modern async to R. And Mirai means future in Japanese, by the way, in case you're wondering. So it's a Japanese flavor of future, it's a Japanese future.
So what do I mean by modern async? Well, I like to use this analogy, email, because everyone should get this, everyone uses email, and basically before Mirai, what we had was what I call a scheduled fetch, but I mean it doesn't really have a name, but it's when you have a desktop client and it just checks for email every 15 minutes in the background, something like that. And we've all had this, you know, we've requested an OTP, we're waiting for the code, so we're just constantly clicking refresh, it's like, you know, why haven't I got my code yet?
And that's sort of what we were doing before we had Mirai, and you can imagine when we actually get an email, we get, the client says there is new email, then it actually goes to the server and fetches that back. So if you have an email and it actually contains large attachments, you could actually be waiting some time for that attachment to be downloaded before you can open up your email. So that's essentially the type of async, the experience you had before you had Mirai.
What Mirai brings you is basically push notification. So again, this is the newer mobile experience. We all know what happens, we get a notification on our phone, as soon as there's email on the server, notification pops up, and the difference here as well is when we get that notification, the email is already sitting on your phone. So when you click into notification, the email will open up immediately.
So this is just an analogy, but this actually corresponds very well to what's actually happening under the hood when we talk about async in R and in Shiny. So this is pretty much what actually happens with Mirai versus not. So this is what I mean by modern async.
So if you use Mirai within an extended task in Shiny, you get all of that, but you also get all the other advantages of Mirai for free. And I'm just going to spend a couple of minutes talking about some of the other advantages. If this goes over your head, just think, oh, wow, this sounds really advanced.
But Mirai was designed thoughtfully, I'd like to say, I like to think, and on these four pillars in that it has a modern foundation, which gives it the performance that it does, and it was designed for production. This is important. We'll get back to that. So you have the confidence to deploy it everywhere.
So in terms of modern foundation, it's built on NNG, which stands for Nano Message Next Generation. This is a high-performance messaging library. What this means is that we get the most optimal types of connections out of the box. So this is inter-process communications, TCP, or even secure TLS, where we need that. We've also engineered BaseR's serialization mechanism to better support custom serialization of newer cross-language data formats, such as Apache Array, such as if you're working with Torch tensors.
Because we have this foundation, Mirai can scale to millions of tasks over thousands of connected processes, and it can do this all at 1,000 times the efficiency of anything that was available prior to Mirai.
Because we have this foundation, Mirai can scale to millions of tasks over thousands of connected processes, and it can do this all at 1,000 times the efficiency of anything that was available prior to Mirai.
The zero latency promises are what I've just talked about, and this is probably the most important point. Mirai was designed for production, and because it was designed for production, it's designed to be 100% reliable. So it has this clear evaluation model, which matches, again, it matches what's actually happening under the hood. So that means the code that you write with Mirai, you can expect it to be executed consistently and transparently and reliably. And we've minimized the complexity in the package itself, and we don't have any hidden state.
And finally, you can really deploy this everywhere, so you can use Mirai to parallelize on your local machine, on remote machines where you have SSH, so this is any compute on your local network or any cloud instance that you spin up. And of course, if you have access to a high-performance compute cluster, you can use your scheduler of choice. And Mirai has this concept of compute profiles, so these work in a modular way, so you can be connected to all three types of resources at the same time, and you can do things like send different portions of compute to different destinations.
So that's all I'm going to be talking about on Mirai itself. Mirai is an RLIT package. It is now the primary async backend for Shiny. It is the built-in async evaluator in Plumber 2. It powers Parallel Per, and it's used in other parts of the Tidyverse as well, such as in an upcoming release of Ragnar, and it's also used in Tidy models for things like hyperparameter tuning.
OpenTelemetry and observability
So moving on to my second topic, this is open telemetry. This is something that's probably going to be new to most people in the audience. And open telemetry is all about observability at scale. This is especially important for something like a Shiny app, because a Shiny app can be quite complicated. I mean, you can be doing a lot in a Shiny app. So you could be ingesting data from a database. You could be making a call to an API using HIDA2, and you can be doing computation using Mirai on another machine, for example. And to be able to see what's happening through all the layers of packages can be a challenge. And this is the problem that open telemetry is designed to solve.
So open telemetry has this concept of traces. A trace is just what happens in response and action. So in the context of a Shiny app, this could be someone clicking a button, which sets off a reactive update. So when that happens, then everything that's the result of that action, you can see in spans, which is what happens in the various packages as a result of the action. And I won't spend too long explaining this, because I have a live demo where you can see this in action.
But first of all, why might you want to use data like this? Firstly, if you want to improve the performance of your Shiny app, you can easily see how long these spans are, and you can look at minimizing the span length. And also, if they're very heavily nested, you might look at reducing the amount of nesting you have.
Secondly, is you can see errors immediately. And these are errors that actually happen in actual use. So not just theoretical errors where you're just testing. If there's an error, they will show up in your spans. Third point, all this data will be centralized. So even if you're doing things in other processes or even on other machines, you can receive this data in one place, in one dashboard that's easy for you to look at.
And this final point is probably the best point. You can leave this on in production. So this isn't just when you're profiling your apps in development. For an actual deployed Shiny app, you can leave this on. And what this gives you is real-time monitoring. So for example, if your Shiny app makes an API call and that API goes down, you'd be able to see that in real-time, and you can actually choose to get alerted to that. So if you wanted to mitigate that, that's possibly something that you can do.
You can leave this on in production. So this isn't just when you're profiling your apps in development. For an actual deployed Shiny app, you can leave this on. And what this gives you is real-time monitoring.
Right, so you might be wondering, how then do I enable this? How can I make use of this? Well, the good news is you don't need to do anything. You don't need to change any of your code. We at Puzzit have done all the hard work. We've instrumented all the key packages where we think this will be useful to you.
So OpenTelemetry is already integrated in Mirai. Versions of Shiny, HIDA2, and Elmer are going to be released imminently, not yet, but imminently, with this enabled. After that, we're going to be instrumenting Plumber2 and other packages where we think this will be useful. For you as a user or developer of a package or a Shiny app, all you need to do is install the two packages, otel and otel-sdk, and then you set some environment variables. And these just tell the packages where to send the data. So these are the actual environment variables I'm using for the demo I'm just going to show you shortly. This sends the data to an online service called LogFire. But for you, you don't have to send any data over the internet. You can have a collector that's on your local network or even on the same machine.
Live demo
So time for the demo. I'm going to do a demo that uses ShinyChat, Elmer, Mirai, and HIDA2, basically all of the packages that we've implemented for OpenTelemetry.
So I'm going to exit out of here. This, by the way, is the interface for LogFire. You can see it's very empty at the moment. I am just going to, in our studio, just run this app. And this is just a chat app. We can see that, oh, some things have popped up already. But I'm just going to click this button, which is, what is the, we're asking a chat bot, what is the weather in Atlanta? It's Atlanta because this was used for Puzzle.com a couple of weeks ago. But we can see, okay, we get the answer we expected.
If we move back to this, and I'm going to zoom in so you can see what's happening here. So we can see what happened in response to that. First of all, we have these two actions from Mirai. These are just daemons, so these are the background processes starting up. These handle the async. And we can see that here, the Shiny session has actually started. There's some reactive updates that happen straight away. But this reactive update is when I actually click the button. So let's just see what happens here.
So there's some reactivity going on. And then here, you can see this belongs to Elmer. So we see that an agent was invoked. The agent uses Claude Sonnet model. That made a Hitter2 request. And there's another Hitter2 request being made here.
So this is the exciting part. So the Claude model came back and requested a tool call. And it asked to use this GetWeather tool, which actually uses Mirai to execute it asynchronously. So we can see this goes to Mirai. This is actually now being evaluated on a daemon. So this is another process on the same machine. But in reality, this can easily be on another machine altogether. And you can see that... This then makes calls to the API to actually get the weather. And then we get some reactive updates, actually updating the UI.
So if I go back to this quickly and just say... what is the weather in Newcastle, U.K.? Oh, it failed. And I expected it to fail. So if we look at these traces, we can see immediately there's red popping up. And if we click through, we can actually see that this call actually failed. There's a 404 error. Basically the API only works for locations in the continental U.S., so we expect it to fail. But you can easily see exactly the type of information that you can get.
Performance workflow
So you can use this in a lot of ways, as I sort of briefly went through before. But one possible way you can use this is to improve the performance of your Shiny app. So this performance workflow was given by Joe Cheng back in 2019, and it was updated earlier this year by my colleague Barrett.
And essentially what we're seeing now with all the new tools that we have since 2019 is, firstly, you can enable open telemetry, so you can see how long these spans are. And for the long spans, you can then profile using ProfViz or another profiler to see where your code is slow. And to optimize slow code, I mean, this has always been true. You can try and move as much work out of Shiny as possible. So pre-compute a lot of stuff before you even start a Shiny server, and this will quite often work.
You can also try to make code faster. And again, very often this actually works. And we have a lot more tools now versus in 2019. So instead of reading a CSV file, you can now use something like Duck Plier to read a Parquet file. So you can easily make your code faster. You can use caching, and this works sometimes. This is a feature that's built into Shiny itself. And you can use non-blocking reactivity. So this is async with something like Mirai. And again, this will work sometimes, but this is going to work in many more cases now, because Mirai, if you remember, has 1,000 times the efficiency that you had access to before. So you can use this in a lot more contexts.
So there, I've nicely tied the second part of the talk back into the first part of my talk. So I think this is also a good place to leave this presentation. And I just want to say, it's been great to work with my colleagues on the Shiny team to roll these advances out, not just to Shiny, but across the ecosystem. And I hope to either be here next year, or I'm sure another member of the Shiny team will be here, and they'll be talking about all the exciting new things that they've implemented over the year. So thank you very much. I'm happy to take questions.

