Charlie Gao: Advances in the Shiny Ecosystem

Transcript#

This transcript was generated automatically and may contain errors.

and the Open Source Team at Posit. Thank you very much. So I'm Charlie Gao . I'm a senior software engineer on the Open Source Team at Posit.

Before I start my talk, I'm going to start by actually answering the question from the last talk, in that, you know, do we need to actually know Colin personally to be able to get a talk at this conference? And the answer is, well, yes, you do, to get away with a title like that, Advances in the Shiny Ecosystem. And nobody knows exactly what I'm going to be talking about, not even Colin.

And so if you don't know me yet, that's perfectly fine, because I am the newest member of the Tidyverse . I joined Posit officially earlier this year. So if this is the first time that you're listening to one of my talks, that's perfectly natural.

So my place is in the Tidyverse, but I also work extensively with the Shiny team itself, so much so, in fact, that I am actually an honorary member of the Shiny team.

So you're here at Shiny in production, so I assume that you probably actually use Shiny in some form in your day-to-day, and so I just want to spend a moment on this photograph, which was taken just a few weeks ago at PositConf, and there's me with the whole of the Shiny team. I just want you to see, you know, the people who actually write the software.

So on the other side from where I'm standing is Carson Sievert . He is actually taking over as the official crown maintainer for the Shiny R package. Next to me is Barret Schlurker. He's one of the core Shiny devs. He does a lot of the hard stuff, so if you've ever worked with Shiny Test 2, that's made by him. And next to Barret, of course, is Joe Cheng himself, the creator of Shiny, and next to him, Winston Chang . So the two of them have really driven development of Shiny over the years.

Now there's one person that's missing from this picture, and that's Garrick Arden-Bouie. In case you've been wondering, he is still a member of the Shiny team. He just didn't make it for this photo.

Overview of the talk

So this is what I'm actually going to be talking about today, and it's going to be a talk of two halves. First, I'm going to talk about async, because for those that do know me, this is probably what you're expecting me to talk about, in that this is sort of my area of expertise. But in the second half, I'm going to talk about open telemetry, and this is all about observability at scale, and I'll talk about what that means when I get there.

And this, in fact, is what the Shiny team actually want me to talk about. But no, jokes aside, I'm excited to be talking about both, and the reason I'm talking about both is because these are initiatives where we've brought this concept, or we've brought advances in these concepts, and rolled them out across the ecosystem, so to all the packages that we at Posit maintain, and we maintain quite a lot of packages. So not just Shiny, but across the ecosystem.

Async and Mirai

So first, in terms of async, I'm mainly going to talk about async in the context of Shiny itself. On the left here is the documentation for using promises in Shiny. Some of you will have come across that, because it's been up for seven or eight years. It's written by Joe Cheng himself. What we've done earlier this year is update it to assume that when you're launching these async tasks, you're using Mirai to do so, which is right at the bottom. And then on the right, what is Mirai?

Mirai is an RLib package. It's the package that I created to essentially bring modern async to R. And Mirai means future in Japanese, by the way, in case you're wondering. So it's a Japanese flavor of future, it's a Japanese future.

So what do I mean by modern async? Well, I like to use this analogy, email, because everyone should get this, everyone uses email, and basically before Mirai, what we had was what I call a scheduled fetch, but I mean it doesn't really have a name, but it's when you have a desktop client and it just checks for email every 15 minutes in the background, something like that. And we've all had this, you know, we've requested an OTP, we're waiting for the code, so we're just constantly clicking refresh, it's like, you know, why haven't I got my code yet?

And that's sort of what we were doing before we had Mirai, and you can imagine when we actually get an email, we get, the client says there is new email, then it actually goes to the server and fetches that back. So if you have an email and it actually contains large attachments, you could actually be waiting some time for that attachment to be downloaded before you can open up your email. So that's essentially the type of async, the experience you had before you had Mirai.

What Mirai brings you is basically push notification. So again, this is the newer mobile experience. We all know what happens, we get a notification on our phone, as soon as there's email on the server, notification pops up, and the difference here as well is when we get that notification, the email is already sitting on your phone. So when you click into notification, the email will open up immediately.

So this is just an analogy, but this actually corresponds very well to what's actually happening under the hood when we talk about async in R and in Shiny. So this is pretty much what actually happens with Mirai versus not. So this is what I mean by modern async.

So if you use Mirai within an extended task in Shiny, you get all of that, but you also get all the other advantages of Mirai for free. And I'm just going to spend a couple of minutes talking about some of the other advantages. If this goes over your head, just think, oh, wow, this sounds really advanced.

But Mirai was designed thoughtfully, I'd like to say, I like to think, and on these four pillars in that it has a modern foundation, which gives it the performance that it does, and it was designed for production. This is important. We'll get back to that. So you have the confidence to deploy it everywhere.

So in terms of modern foundation, it's built on NNG, which stands for Nano Message Next Generation. This is a high-performance messaging library. What this means is that we get the most optimal types of connections out of the box. So this is inter-process communications, TCP, or even secure TLS, where we need that. We've also engineered BaseR's serialization mechanism to better support custom serialization of newer cross-language data formats, such as Apache Array, such as if you're working with Torch tensors.

Because we have this foundation, Mirai can scale to millions of tasks over thousands of connected processes, and it can do this all at 1,000 times the efficiency of anything that was available prior to Mirai.

Because we have this foundation, Mirai can scale to millions of tasks over thousands of connected processes, and it can do this all at 1,000 times the efficiency of anything that was available prior to Mirai.

The zero latency promises are what I've just talked about, and this is probably the most important point. Mirai was designed for production, and because it was designed for production, it's designed to be 100% reliable. So it has this clear evaluation model, which matches, again, it matches what's actually happening under the hood. So that means the code that you write with Mirai, you can expect it to be executed consistently and transparently and reliably. And we've minimized the complexity in the package itself, and we don't have any hidden state.

And finally, you can really deploy this everywhere, so you can use Mirai to parallelize on your local machine, on remote machines where you have SSH, so this is any compute on your local network or any cloud instance that you spin up. And of course, if you have access to a high-performance compute cluster, you can use your scheduler of choice. And Mirai has this concept of compute profiles, so these work in a modular way, so you can be connected to all three types of resources at the same time, and you can do things like send different portions of compute to different destinations.

So that's all I'm going to be talking about on Mirai itself. Mirai is an RLIT package. It is now the primary async backend for Shiny. It is the built-in async evaluator in Plumber 2. It powers Parallel Per, and it's used in other parts of the Tidyverse as well, such as in an upcoming release of Ragnar , and it's also used in Tidy models for things like hyperparameter tuning.

OpenTelemetry and observability

So moving on to my second topic, this is open telemetry. This is something that's probably going to be new to most people in the audience. And open telemetry is all about observability at scale. This is especially important for something like a Shiny app, because a Shiny app can be quite complicated. I mean, you can be doing a lot in a Shiny app. So you could be ingesting data from a database. You could be making a call to an API using HIDA2, and you can be doing computation using Mirai on another machine, for example. And to be able to see what's happening through all the layers of packages can be a challenge. And this is the problem that open telemetry is designed to solve.

So open telemetry has this concept of traces. A trace is just what happens in response and action. So in the context of a Shiny app, this could be someone clicking a button, which sets off a reactive update. So when that happens, then everything that's the result of that action, you can see in spans, which is what happens in the various packages as a result of the action. And I won't spend too long explaining this, because I have a live demo where you can see this in action.

But first of all, why might you want to use data like this? Firstly, if you want to improve the performance of your Shiny app, you can easily see how long these spans are, and you can look at minimizing the span length. And also, if they're very heavily nested, you might look at reducing the amount of nesting you have.

Secondly, is you can see errors immediately. And these are errors that actually happen in actual use. So not just theoretical errors where you're just testing. If there's an error, they will show up in your spans. Third point, all this data will be centralized. So even if you're doing things in other processes or even on other machines, you can receive this data in one place, in one dashboard that's easy for you to look at.

And this final point is probably the best point. You can leave this on in production. So this isn't just when you're profiling your apps in development. For an actual deployed Shiny app, you can leave this on. And what this gives you is real-time monitoring. So for example, if your Shiny app makes an API call and that API goes down, you'd be able to see that in real-time, and you can actually choose to get alerted to that. So if you wanted to mitigate that, that's possibly something that you can do.

You can leave this on in production. So this isn't just when you're profiling your apps in development. For an actual deployed Shiny app, you can leave this on. And what this gives you is real-time monitoring.

Right, so you might be wondering, how then do I enable this? How can I make use of this? Well, the good news is you don't need to do anything. You don't need to change any of your code. We at Puzzit have done all the hard work. We've instrumented all the key packages where we think this will be useful to you.

So OpenTelemetry is already integrated in Mirai. Versions of Shiny, HIDA2, and Elmer are going to be released imminently, not yet, but imminently, with this enabled. After that, we're going to be instrumenting Plumber2 and other packages where we think this will be useful. For you as a user or developer of a package or a Shiny app, all you need to do is install the two packages, otel and otel-sdk, and then you set some environment variables. And these just tell the packages where to send the data. So these are the actual environment variables I'm using for the demo I'm just going to show you shortly. This sends the data to an online service called LogFire. But for you, you don't have to send any data over the internet. You can have a collector that's on your local network or even on the same machine.