Resources

Max Kuhn -SHINYLIVE IS SO EASY

SHINYLIVE IS SO EASY by Max Kuhn Visit https://rstats.ai for information on upcoming conferences. Abstract: shinylive is an extension to the Quarto open-source scientific and technical publishing system. It enables shiny applications to run locally, without a shiny server using WebAssembly. I’ll show examples and discuss the limitations of using shinylive. Bio: Max Kuhn is a software engineer at Posit PBC (nee RStudio). He is working on improving R’s modeling capabilities and maintaining about 30 packages, including caret. He was a Senior Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. He and Kjell Johnson wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, which recognizes the best book reviewed in Technometrics in 2015. He has co-written several other books: Feature Engineering and Selection, Tidy Models with R, and Applied Machine Learning for Tabular Data (in process). Twitter: https://twitter.com/topepos Presented at the 2024 New York R Conference (May 17, 2024) Hosted by Lander Analytics (https://landeranalytics.com)

Jun 11, 2024
19 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

All right, our next speaker is Representing the year 2019 and he's also spoke at 15, 16, 18, 20, 21, 22, 23 And I always put him toward the very end to make sure people stick around because he's awesome All right, you thought I was gonna insult him a little bit. No, that was awesome because he's awesome guy And he's been held at gunpoint twice But he's never been as scared as when he was teaching his son to drive Please everyone a really big welcome for Max. Hello. You okay?

Great so maybe for the first time here. I'm not talking about modeling. It's a new experience for me Hopefully for you too. I'm gonna talk about shinylive Has anybody ever heard of that yet? Okay, good. All right

Introducing Quarto

Excellent. So before that I want to talk about Quarto I didn't think I'd have to introduce this But I did actually meet a few people this week who had never heard of it Quarto is sort of like the next generation of like the R markdown family of stuff like blog down and all the other downs It's independent of R. So it's like a it's its own Application you install you can use it with Python or Julia or bash or whatever you want to use

And what I'm going to do is I'm going to talk about using Shiny inside of Quarto My main application is for books, but you can use it for like these slides in particular Or other things or blogs or whatnot. We were kind of looking into of Hadley's out there I hope we are looking into using it for our package down sites, which would be a nice change

And then Shiny, you know, I'm pretty sure everybody here's heard of Shiny and probably telling you things you already know but historically if you want to deploy shiny app like Have somebody interact with it who's not on your computer or not you or whatnot You basically had to use a server so you could get your own shiny license or you know Download shiny and build your own server and manage all that yourself where you could use one of the hosted services posit shiny apps.io and posit connect

But just to tell you what you probably owned already know is if you have a shiny app all that the the software and everything Sits on the server Maybe data is already there. Maybe you upload data or whatnot. And then your your your TV your phone your computer It's just a terminal that you interact with it It sends messages Shiny the server the server does calculations and brings it back and you know, that's pretty traditional

What is WebAssembly and webR?

Just as an example app. This is something where if anybody ever heard of you map, it's like it's supposed to be like a souped-up PCA It makes really cool looking pictures It's not very great for modeling But you know in talking about and describing it I wanted to have something that could show like hey It's not really stable like if you Initialize it in a bunch of different ways or change some of the parameters a little bit like you can get very different Results and so for me in my application I want to build books in HTML that has like a lot of things like this that you can interact with them instead of it being like Let me tell you about what happened in the static figure And so that's sort of like my application of abusing shiny with Quarto at least

So along came webR so webR is like just something it just seems like magic to me It probably is something called WebAssembly was built a while ago and in Python was eventually built inside of WebAssembly And now we have it with R. So what you do is what WebAssembly does is basically Builds R into a binary format that you can then access in JavaScript and so on So it's not like you're rebuilding R to the application that's on your computer You're building it to be used on the web and in the browser. That's the point So basically JavaScript is your interface there if you want more information

Posit about a year or so ago hired a guy named George Stagg who is doing a lot of work on this on his own and then we hired him and Brought him in to develop webR more Extensively and just basically fund that and this link to this video is really really good if you want to learn more about WebAssembly

George manages a repository. It's not CRAN, but like a repository where he's taking all the CRAN Packages in with WebAssembly building a binary version of those so that you can use it with webR And if you look at the statistics there, there's like two interesting sort of differentiations is As of like today There's about 19,000 packages built with WebAssembly, which is about 94% of CRAN But that's 6% that's missing or things that other package depend on so that's number of packages built but the packages you can actually Use are somewhat lower than that because if you depend on any of those 6% that are left out You can't really load the package

So he's just slowly whittling away at all the sort of crazy things as you can imagine that happen inside of R packages But it's it's fairly complete at this point. I feel I've yet to try to list something or try to use something That's not really available. So it might sort of be in the tail of things that are used

So just to give you a sense of how webR works This is a like the most simple example I had is you know You you get a terminal you execute it and this is all running inside my browser so just be clear there's a version of R that's been packaged up with any packages that you need and that is embedded into the Website that my browser is serving. Okay

So it's all local and I'll talk about that more in a minute. If you want another package, let's say dplyr You can do that and of course the fonts too big here, but you can see it got it and loaded it pretty oops pretty quickly

Introducing shinylive

So it's quite amazing So that's webR and then came along shinylive So shinylive is like sort of a version of webR that's built to serve Shiny locally So no server required the server is kind of like built into your browser at this point And so what it does is all the computations run locally. So when I start up this particular this is built with shinylive This these slides. So when I started the slides with Quarto It loads R it loads the packages that I declare that I've need and then it also builds a Shiny serving side that I could Just use it without sending any Messages back and forth to an external server

It does almost all the heavy lifting for you and we'll see the almost in just a second

It does almost all the heavy lifting for you and we'll see the almost in just a second

So just to clarify we kind of seem obvious at this point shiny server all the things happen on the server You're just sending messages back and forth and it's serving you images and and whatever it is that you're looking for in your output Shinylive downloads R in packages runs everything locally

Setting up shinylive with Quarto

Now if you want to get set up with Quarto the first thing you have to do is go to a terminal inside of your Quarto project and run this command and Essentially Quarto has a bunch of extensions you can use for various things like if you want to use like font awesome there's an extension for that and so on but the shinylive extension does is gives you all the infrastructure to run Shiny in the way that we're going to do and then one more thing is the way Quarto runs is Everything is done in markdown. So when you like When you process your files, you get a markdown file at the end of that and then we use Pandoc to basically convert it to HTML or TeX or whatever it is that you're gonna like render it in like your target Output format. And so what you have to do is you have to sort of shim that process and so a Quarto filter Is something that will basically not interrupt but sort of get in the inside of that process of making the markdown Converting to Pandoc and then compiling it into the final format And so you just have somewhere in your Quarto yaml file You have to have a filters argument and just tell it to treat shinylive a little bit more differently

And after that you're done you can start writing code chunks So in code chunks and R at least if you're used to like knitter and R markdown You would just use like bracket R bracket and that's telling it that you know I have an R chunk instead of doing that you shiny live dash R and I should also say That everything I'm telling you about R is also true about Python. So we have Shiny for Python We have shinylive Python was the original sort of like first version of that and so on Of course Quarto works with Python So if you're more of a Python person all this will also be true for you

One other thing you have to do is you have to declare this one option called standalone and set it to true which makes it Work inside of Quarto and then just put your Shiny app in so it's it's pretty it's pretty simple Like if you're used to writing Shiny and you just do the setup then now you have Shiny locally, which is kind of amazing

Declaring packages and loading data

How do you declare packages so you're gonna use various packages like in that that you map example I have like ggplot like one of the color palette packages as well as dplyr loading in there. And so Shinylive in webR need to know what you need so it can go out and get them And so the way you would do that is it uses renv if you've ever used that so You just say like library dplyr and it knows to go out and get it now There might be we do this in tidymodels a lot You might have packages you went there, but you're not going to directly load attach them You want to call them by namespace or you know, it's an imported package for you and so what you can do in that case is you can just declare it as a library like you'd normally would just comment at That out and then renv will pick that up

So right now what it does is when you build when you render your your Shiny app It goes out it downloads R and then it goes to the the wasm code repository and gets The current version of that package and that's what you use, but we think a lot about reproducibility So we're continually George we here's George We're continually working on making that happen more. So the there's I don't think it's been committed yet But he was telling me what they're gonna do is at the time that you build the app not the render time for the build Time that they'll go out to the wasm repository get the versions that are like the most recent versions then when you built it and then sort of like either cache them or Or he'll just keep keep he'll retain all the old versions all the binary versions these packages So you can get probably get the package version that you started with Originally or something that's pretty close to it and I imagine this is gonna get more sophisticated as time goes on because there's a lot we could probably do with renv to get Around this like a like a lock file or snapshot But we're not quite there yet

All right, so how do you get your code in like if you're not just gonna if you have like a long Shiny app You just don't want to paste that into some Quarto doc Probably and how do you get data in like you might want to upload a CSV file or let the users do that and analyze? It and this is where it gets like the almost this is where it gets interesting There's really one simple simple principle that you have to remember, which is this

This you're in JavaScript actually your own WebAssembly world now But you know that you're sort of constrained a little bit So getting data in is a little bit more difficult because if you've used Quarto at least when I started using this I was like well if I have a code chunk that makes some data frame or some model object and I go to do the Next code chunk. I have that just laying around just go ahead and use that right doesn't work like that It's basically spawning another R process inside the browser that is basically clean So that's why we list the packages we just don't Inherit the ones we've already loaded or attached. So you have to do a little bit extra a little bit differently

Techniques to load data or source files and things like that. So WebAssembly doesn't really let you just like open a network connection So you can see if you want to learn more because I don't understand this part You can go to the github issue around this but curl is definitely a no-no. So if you've been using curl That's not gonna be allowed and what George has done is he's patched download dot file and base R so what we'll use Basically a separate protocol to get the files and that works pretty well For me, it's a little like more simple than maybe what you're gonna deal with because I keep all my stuff on github in a public Repo, so you'll see how I do in a minute But but this is the only thing that might trip you up a little bit in terms of like well How do I get my data in in a way that shinylive and WebAssembly will permit and again? If you want to read more there's an issue here that you could upvote or get feedback on

Security considerations

Now Gordon Shotwell who's in the Shiny group like succinctly put he's like I'm terrified of this in a way Because we're gonna have people that are going to you know Not realize that anything data or code that's in their Shiny app is available locally to whoever has the URL, right? Yeah, so he's like And so any data you have goes to the client any code you have goes to the client if you Accidentally maybe have an object that has some API key that you need for the app. It's available to the client So you're happy you like super careful about what you put in the app even if it's like you're giving somebody access to upload CSV files to analyze or something like that or Let them connect to a database. You just have to be cognizant that anything that gets in there Theoretically could be used by somebody else or accessed by somebody else

Not realize that anything data or code that's in their Shiny app is available locally to whoever has the URL, right? And so any data you have goes to the client any code you have goes to the client if you Accidentally maybe have an object that has some API key that you need for the app. It's available to the client

Code and data loading patterns

So Here's sort of the example pattern that I have here is I have like a figure chunk Single equal true and then you know, I'll start with some library calls. So it I could say it's a tidymodels talk now Not really And then what I'll do is like I have like for that for that you map example I've just pre computed all the configurations. I'm going to use in in Shiny So I just have R data file out there on github and then I can basically use the github raw URL structure here. It's the same Structure. It's just has a different route of the URL to load the data and then I like my Shiny stuff to be a little bit more modular because the book has a certain styling that you do beforehand and I don't want to Replicate that so you can just like download or source specific R files That are somewhere that's accessible and then you just return the app

I'm telling you this in particular because it may seem obvious But the two sort of nuances that are true right now But I'm guessing what might get better over time is when renv scans your code to see what packages you need if those package Declarations are in the sourced file. It won't see them And the other sort of constraint is for some reason if you if you return the app in the R file It also won't load the app. So you have to explicitly return it here Okay, so that's little nuances that I think I don't know if they're oversights but are something we can just make the experience better for

Just to give you a sense of that you map one Here's what it looks like and I have like a set of file that I source and then the actual you map app One nice thing about books is you can do a lot of cross-referencing So we have a way to like, you know, make the Shiny app an actual like figure in your book Not just something that's sitting there. So So there's a little bit of Quarto work you do here to make it work, but it's it's pretty simple

Why Posit supports shinylive

Some of you might be wondering like why would Posit do this like why would George do this right? He's like we sell Shiny we sell shiny servers and Posit and things like that and the answer, you know And I remember when I first met him not long after he was hired We were talking about this and like how they might be like sure I don't know if I'm saying something I shouldn't but like I was like, so why are we do? I mean, I want to do this, but like why are we doing it? And at the time I think George said that you know I think maybe we talked to some people in our company. They were like look We know somebody's gonna do this, right? We can't we wouldn't try to stop it but what we should do is we should bring it in and make sure that we fund it and support it and nurture it in A way that we can so that when we go to use webR or WebAssembly, we get the best possible experience

Limitations of shinylive

So George is in Posit. He does if you ever have a chance to meet him The guy is crazy amazing. He's Intel just incredible technically But I don't know that we're terribly worried about shinylive because it's really not the answer to a lot of things that you might Want to do so, you know, you could be running this on your TV, right? Your computing power is not that great You have to basically assume like a lowest common denominator in terms of what resources people have to run the Shiny app I've already mentioned about code and data security And so that might be sort of a no-no for what you're doing and also just moving data around might be expensive So, you know my you know, I'm loading for the book I'm loading like I don't know 80 70 80 megabyte R data files at most and that takes maybe like a second or two To do depending on your web connection, and I should also say I'll do that at the end here It's really faster than I thought it would be as you're downloading R and packages in R data But if you're like, hey, let's do deep learning and shinylive. It's like that's probably not gonna be I don't know what kind of computer you're using but it may not be something that's really realistic

And then, you know, we have connect for good reason like we'll handle authentication Parameterization reports like cron jobs and all the stuff and it's not like you can't do that yourself In fact as a company we've you know Tarif has always said that we don't ever want to have anything that we lock you into for money So we're never gonna have a product that you will have a similar version of product that will make it easier and nicer to use But you could do it yourself if you wanted to and that's what makes connect so wonderful You can write cron jobs yourselves We know how to do this But to have like technical support and all that done for you is one of the reasons I think that shinylive For most most commercial people are not going to be canceling their Their accounts for that.

I should also say I meant to do this earlier Going back to the actual app. Let me just reload it It may not be accurate because there is some caching that it does But if you think of I think the R build the binary is like maybe like six or seven Megabytes and the packages are pretty small even dplyr and so Just loading it. I timed it about four or five seconds

So yeah, so it's pretty fast. It's remarkably fast, especially compared to what it was in the first version, you know And I joke about dplyr, but you know, there's all these people like oh the tidyverse and all their packages And you have to get like 20 packages I'm using ggplot and dplyr and that probably has I don't know I'm gonna guess like a dozen or so dependencies and it's loading in seconds with the data which is about 20 megabytes So that that's pretty good in terms of speed

Wrapping up

So, yeah That's pretty much the end of it If you want to learn more Joe Chang last year's Posit conference gave an excellent excellent talk about Shiny and wasm. It's definitely worth looking at You know me I'm always for something that's new especially I want to see examples of what other people are doing So this link here will go out and look for all the github projects that have shiny dash live or shiny dash Python And then George has some really good applications, too There's a great talk that he has But also if you want to think of it in terms of like well Can we use arrow or like what would we do? So he has a Shiny app on his github repository That Uses DuckDB and parquet files So if you want to test that out locally and say like how you know, would that work? Well, you can go out and test it now

So thanks for speaking, especially thanks to Nicole Like we've called her out a couple times and she's like, you know The the I don't know the Cub Scout den mother of the place. She literally like keeps everything running. No offense Jared, but it's true So, you know big thanks to her and Jared and all the people especially you put webR and shinylive together