
Kelly O'Briant | Interactivity in Production | RStudio (2019)
https://rstudio.com/resources/webinars/interactivity-in-production/ In part 3 of this 3 part series, Kelly covers: Interactive products take your data science to a new level, but they require new coding decisions. This webinar will give you clear guidelines on when and how to add interactivity to your work. Here you’ll learn: when to use off-the-shelf interactive products like parameterized R Markdown and htmlwidgets, when to create bespoke interactivity with Shiny, how to make your Shiny apps as fast as possible, how to support interactivity in production, and much more. About Kelly: Kelly is Solutions Engineer for RStudio and also an organizer of the Washington DC chapter of R-Ladies Global. It’s an R users group for lady-folk and friends
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Today I'm giving a webinar on Interactivity in Production. This webinar is the third in a three-part series that we've been doing. Last month, Garrett Grollman delivered Reproducibility in Production, and Thomas Mock delivered RStudio Connect in Production. Those are freely available on demand at resources.rstudio.com, the same place this one will be made available. But before jumping into Interactivity in Production, I wanted to acknowledge some of the themes from those first two webinars that I'll be touching on as well.
The first theme is production means different things to different people. Garrett talked about this, Tom talked about this really well with RStudio Connect and how the Connect product can meet you where you are and make production available in different forms with different meanings to wherever you are in your analytics lifecycle. Also along those same lines, RStudio Connect is this publishing platform for communicating impactful data science work. Our Markdown is a tool for creating computational documents and communicating impactful data science through either static reports or documents that have interactive elements inside them.
And the one thing that I want to use as a jumping-off point for starting our discussion today is what Garrett introduced in the reproducibility webinar, which was the interactivity spectrum. So in this interactivity spectrum, Garrett made this case for thinking strategically about what level of interactivity your data product actually requires and meeting that level of interactivity with the type of experience that you want to deliver in order to communicate effectively.
So I really like this a lot, but it does kind of presuppose that you know what you want to build in the first place or you're at a point in your iterative development work cycle where you're willing to do some introspection on what you should or could be building. This is important because as you progress down the interactivity hierarchy, things become more complicated to build, and we tend to think of Shiny applications, the highest level of interactivity, as the most complex to develop.
Now on the other side of things, Garrett also spoke to this in his webinar. We tend to see people learn Shiny and then use it for just a lot of things, borderline on everything. And I know I've done this, and it's very hard once you commit to this Shiny application workflow or development cycle to walk back from Shiny. In other words, once you've committed to using Shiny, it's difficult to take a step back and evaluate whether you would be better served delivering your desired user experience in another format.
The case for starting with Shiny
Now all that being said, I love Shiny, so I want to talk about the case for starting with Shiny because I know that folks out there do this too. It's a great tool. It makes things really easy, and it's a great tool for doing a lot of things like exploratory data analysis, prototyping, fast iteration through building tools that other folks might want to use and that you can use to drive impact of your data science work inside of your own organization.
Now I'm no Shiny wizard. You can go and look at the Shiny contest results we ran last spring to see evidence of actual Shiny wizardry, but I'm comfortable enough with Shiny that I can build really powerful prototypes in a short amount of time, and I know a lot of folks are in that same boat. So when I need to build a tool that other people might want to use too, my natural inclination is to start developing that tool in Shiny. Now these things can quickly escalate when you want to give access to someone else to use that tool, and that's what I want to talk about today.
I sort of think of starting with Shiny as the equivalent of writing an hour-long talk. I call starting with Shiny and local development as the hour-long talk of data products, and hour-long talks, if you've ever developed one, can be rambling and cluttered. You have a lot of time to work with, so there can be parts that work really well and parts that don't work so well, and I know I'm giving an hour-long talk currently, so I hope that in the questions you'll let me know which parts work well and which do not so that I can clarify those.
But the goal is when you're taking this local development asset into production, so you want to turn that hour-long talk of data products into the lightning talk of data products. So the difference between hour-long talks and lightning talks, our lightning talks are these targeted, really elegant, streamlined little pockets of information that you're trying to deliver in a very short amount of time, and the irony is that it can take a lot more time of your own to develop a lightning talk than it can to develop an hour-long talk where you have a lot of space to work with, because those lightning talks, they just need to have that streamline, that concision. You need to get to your message effectively and conclude everything in the right amount of time.
So the difference between hour-long talks and lightning talks, our lightning talks are these targeted, really elegant, streamlined little pockets of information that you're trying to deliver in a very short amount of time, and the irony is that it can take a lot more time of your own to develop a lightning talk than it can to develop an hour-long talk where you have a lot of space to work with.
So maybe you start with Shiny in local development, and by the end, maybe in production, that original Shiny application you had on your local machine looks very different. Maybe once you get to production, it's a group of assets that complement each other. Maybe it's a very different kind of Shiny application. As long as you're evaluating along the way, starting with Shiny in the hierarchy of interactivity is really okay by me.
Most important things for Shiny in production
So this is my agenda for the webinar discussion today. I started with making a case for starting with Shiny, and now I want to talk about when you're taking a prototype of a Shiny application into production, what I consider the most important thing is. And this is my perspective as a solutions engineer at RStudio. Then I want to talk about Joe Chang's most important thing for Shiny in production. Joe is, if you aren't familiar, the CTO of RStudio, and he was the creator of the Shiny package.
After that, I want to talk about an example, a prototype app that I built, and then went through steps that Joe outlines in his performance workflow to take that application into production. And along the way, I'm going to show you what steps I took to refactor, and then evaluate the impact of that application, and finally reconsider the hierarchy of interactivity for what I've produced.
So let's jump right into it. This is what I consider the most important thing for running Shiny in production, and that is you need to have a sandbox publishing environment as part of your development infrastructure. This thing right here, you need it. It should be a place where you can publish freely into a server that is ideally configured to be an identical twin of your production server, and this is your testing ground. You can call it a sandbox, staging, test server, whatever. You need one of these.
If you only have one server at your disposal, in a pinch, even something like this can work. You can have the same piece of content be published to two different destinations on a single RStudio Connect box, and have one destination be reserved for production, while the other destination is reserved for testing, staging.
So that was my most important thing. Joe has a very different perspective, and it's equally valid, so these are two awesome opinions, but Joe's Shiny in production, most important thing, he highlights in this excellent talk that he gave at RStudio Conf 2019, which is also available at resources.rstudio.com slash webinars for you to watch. This talk is excellent, and this is your homework from this webinar, is to go watch Joe's talk if you haven't seen it already, because it is really that impactful if you are doing anything that involves Shiny in production.
So this talk was the keynote where he introduced the tools that our Shiny engineering team here at RStudio has been working on to help support bringing Shiny into production. So it starts with going through Shiny test, Shiny load test, profvis, plot caching, and async, and he highlighted a couple of these tools throughout his talk, but the thing that's most important to Joe is profvis for Shiny in production. This is a general profile for R code, but it's super important for when you're bringing Shiny applications into production.
A lot of prototype applications fall over in production because they contain these performance anti-patterns, so it's important to, when you're trying to turn a prototype into a production application, to ask yourself questions and suss out whether your application is suitable for production in its current state. So two questions you can answer that are great is, is this app fast enough for production, and then if it's not, what is the cause of that performance bottleneck?
So he lays out this excellent performance workflow, which is the real gem here of this talk. You ask if the app is fast enough, you test to see what's making it slow, and then three is you optimize, you solve those problems, and the importance of using profvis is, you can tell how passionate Joe is about this when he talks about it in the keynote. He says, you know, you cannot guess at what is making your R code slow. You do have to use this profiler because your intuition about what is making that code slow, it stinks. It's garbage. It's not good.
Like, even folks who are very competent at building Shiny applications can get tripped up by, you know, simply assuming you know what the problem is. So profvis is an amazing tool to help you get to the optimization stage and then make real impactful changes to how your code works.
The trending R packages app
So, in the spirit of Joe's talk and his crayon whales application success story, I wanted to build a similar application for this webinar, and I won't repeat the performance workflow demo that Joe has already done in his. Definitely go watch it. It's really great, seriously. But I wanted to create this prototype to start that would realistically highlight some of those same or similar performance anti-patterns.
So, I have a crayon logs app that's called the trending R packages app. And here's what it does. As you can see here in the screenshot, it has this table. It's a DT table. And you can see which R packages have been trending over the last period of time. Top ten. And then click on each of these applications to view a little bit of metadata about what that package is and what it does.
So, all of this data is coming from the crayon logs API project, plus a little bit of Rvest magic scraping metadata about each package. So, the workflow that it goes through is that I call out to the crayon logs API slash trending endpoint to get those top ten trending packages. Then I make ten more calls to that crayon logs API to retrieve the download counts for each package. Put that data into DT here in the application. And then I use Rvest magic to scrape out that metadata about each package when at the point the user clicks on it.
So, this app, you know, it makes sense to me. I'm constructing this data as I go. I'm not thinking about scaling at this point. Works pretty great in my local development environment. When I push it to my staging server, it works okay. But I can already tell that there is some danger here. As more users come to this application, I'm worried that it won't be fast enough to support the number of users that I expect.
Optimizing: moving work out of Shiny
So, I'll spare you the first two points of the workflow. Again, go watch Joe's talk for those. And skip to number three where I'm optimizing my code. So, at this point, the number one thing that you do very often to optimize some of these performance issues is to move work out of Shiny.
And Joe says there are two important things to consider when you're moving work out of Shiny. Things that you don't want to do. So, you don't want to do work in Shiny while the user is waiting. And the second thing is you don't want to do the same work for every user. That's very wasteful.
So, he had this example of loading the CSV data into a Shiny application every time a user connects. And this application I actually designed for the purpose of highlighting these two anti-patterns. But it does a very similar thing. I am making 11 API requests to the trending R packages table to create that data for the user. And then once a user is on the application, I'm calling Rvest two times to perform that metadata scraping for each row selection that occurs. So, that would be repeating that same work over and over again for every user. Which, again, very wasteful. So, how do we solve this problem?
There are a number of ways I can think of to solve this. I could generate the data ahead of time and publish it in the bundle along with my application code. And this is a totally valid workflow and makes sense to do for a lot of different types of Shiny applications. Sometimes you can just pull the data together and throw it in that bundle and send it along as you're deploying to staging, to production, and it's fine. The problem here is that I want users to see the latest and greatest trending R packages. It's literally an application to explore trends.
So, I don't want to get stuck manually deploying that new data or cobbling together some sort of cron job coupled with a programmatic deployment workflow that I then need to monitor and make sure continues to work. I just want new data delivered to my Shiny application that users can see fresh information about any time they visit the app.
Division of labor: R Markdown as ETL
So, this is a great case for adding R Markdown to the equation. I call this the division of labor. Using R Markdown to create an ETL process to put that data together and then host an output file along with that report that I can then point my Shiny application to consume. So, if you don't know, content deployed to an endpoint on RStudio with an R Markdown output metadata output file. In this case, in the case of this image, I have this report called report title. Very descriptive. And it has an output file that contains this data.csv file. In the course of this report running, it creates this data.csv file, writes it out, and then it just gets hosted on RStudio Connect alongside my R Markdown report.
If I'm running that report on a schedule, that schedule is producing this new data.csv file that lives at this web address, and then I can point Shiny to read in that data from the URL hosted on my Connect server. So, very cool. And that's exactly what I want to do for this application in order to move that work out of Shiny and do it somewhere else so that it doesn't have to be in my Shiny code itself.
So, this is what I built. My R Markdown ETL to generate this same data frame out of the making calls to the trending API and then making those RVEST web scraping function calls. And I do this all in an R Markdown document, publish that document to Connect, and then I schedule it to run weekly every Friday at a certain time. And I can schedule that to run daily or monthly or hourly. Whatever schedule I want to produce new data on, I can do that directly in the R Markdown hosted on RStudio Connect in the dashboard panel that I'm showing here. So, really, really nifty tool to create output data files.
So, now I have these three pieces of content here living as my production development package. And I'm happy with the performance of my Shiny application because I've moved that work out of Shiny and into this ETL process. So, I have my production assets being this ETL in R Markdown, my production application in Shiny, which is now fast and doesn't repeat work for each user. And then a third asset, which is not really a part of it, but it is fun to publish alongside as reference information, which is the Profvis profile for that production application. So, all of these three things can live together and be a package that I can feel confident about running on my production server and serving my users effectively.
Evaluating impact with instrumentation data
So, at this point, you know, am I done? Am I making an impact at my organization through what I built? How do I answer that question? Is what I built correct? You know, I have these two data products, Shiny and R Markdown, working harmoniously together. Is there anything else to do?
So, I'd like to take a stab at trying to quantify or track impact on RStudio Connect with my assets. So, how do I know that what I've built is effective at communicating my intended message? The whole goal here was to create something useful and to drive users to find, you know, insights through this tool or make changes to how we do work at our organization through communicating that in Shiny and R Markdown. So, am I doing that?
Well, there's also tools for getting that data off of your RStudio Connect server, and that is through usage of another API called the RStudio Connect Instrumentation Data API. And this is a really cool feature that Connect has that I think is underutilized. And so, I want to use this talk to encourage you to look at some of the resources we currently have for, you know, visualizing this type of data on your Connect server, if you have access to a Connect server, and explore different variations of building tools that can give you insights into how much impact you're actually making through these data products.
So, RStudio Connect records different types of user activity for different types of content. You've got records for Shiny applications about each visit to that application and the duration of that visit. And then you have records of each visit for static and rendered content.
This dashboard you see here on the right is one that my colleagues on Solutions at Sharing put together called the Connect Usage Dashboard, and it is available at that GitHub URL. The code is all there, and we would really encourage you to take that. You can drop it into your own server, as is, if you'd like to see this same information about your data products, or you can tweak it and make it custom to see or gather information that will answer your own questions.
I've really been having a fun time lately tweaking this same instrumentation dashboard that my colleagues put together and turning it into something that I can answer questions about individual apps. So, this is the same dashboard driven by the same helper functions that they created, but I've tweaked them just to look at my production application. And I want to know whether this app is popular or whether this app is just not driving traffic, it's not being visited. And I want to take actions based on the answers to those questions.
So, if this application is popular, that's awesome. I want to go to the runtime for that application. I want to evaluate my load test and make sure that I have the right settings for our processes and connections for process so that my user base is being served there. And if the application is not driving traffic, it's not getting visits, I want this dashboard, which is an R Markdown report, to send me an alert about that information as well. This app, it's not driving the desired impact at the organization. So, maybe I need to rethink what's happening there.
And I built this application so that you can come in and set a daily visit goal and then set a session duration goal. And you'll see here that based on the inputs that I have provided, my application production app, this crayon logs trending R packages app that I'm tracking here, isn't hitting the total session goal for daily count, but it is hitting the average session duration goal. So, one not so good, one pretty good. And these value boxes will change color based on whether or not I'm hitting my goals.
Behind the scenes, what you don't see here is that this report also contains code to create a custom email alert, excuse me, based on these value boxes as well. So, if over time I see that I'm just not driving the traffic to my crayon logs application that I wanted, this report will capture that data and send based on what it finds, it'll make a conditional decision to send me an email alert about my underperforming app or suppress that email. So, in this case, it might send me an email because my app is underperforming. And that email is based on a custom template that I packaged along with my R Markdown report.
And it looks something like this. So, I can have this email come in and say, hey, for this application title, you know, we've seen that traffic has been chronically low. You might want to consider communicating the same information, if it's still relevant, in another format. So, think about, you know, whether or not somebody is still interested in interacting with your Shiny application. You know, is this the right method of communicating? Or why is this tool not having the impact that I want it to?
Sometimes web interactivity gets old. Like, somebody will come to the dashboard, play around with it once or twice. But, you know, it's not that much fun or interesting to click around. And they would rather have that same information delivered to their inbox on, you know, some sort of schedule so that they don't have to go through the work of interacting with a Shiny application, as funny as that sounds. Taking the steps to go to the Connect server and remember to do that, you know, are steps that some folks in your organization, maybe upper management, are just not going to take the time to do.
Sometimes web interactivity gets old. And they would rather have that same information delivered to their inbox on, you know, some sort of schedule so that they don't have to go through the work of interacting with a Shiny application, as funny as that sounds.
And so, thinking about tailoring the experience to your audience is a really important part of that whole process when you're evaluating what you put into production should continue to exist there. So, some things you might want to consider to sacrifice Shiny runtime in order to achieve are, you know, you can have that parameterized star markdown that we've seen in previous webinars that gets scheduled, sends custom emails like this one to your management. Scheduled email report delivery, it just said. And then a third asset that we didn't really talk too much about in this series is creating programmatic access to communicate insights through the Plumber package and developing REST APIs. So, all of these things are perfectly excellent alternatives to creating more Shiny interactivity.
But they do take effort in order to, you know, context switch and refactor the code that you've invested time and energy to create in producing an alternative asset. But I think, overall, it's definitely a worthwhile process to go through.
Wrapping up: the hierarchy of interactivity
And that brings us back to the hierarchy of interactivity. So, I think my key takeaway here is definitely think about how you want to tailor your experience to your audience. And in the process of doing that, remember that, you know, Shiny applications do come at a cost of complexity. You do need to iterate through Joe's performance workflow in order to move prototype applications into production ready applications.
And in order to get through all of that, I would point to these production building blocks that we have made available in different resource formats here at RStudio. So, again, one more shout out to Joe Chang and the RStudio conf keynote from 2019. It's an excellent talk about Shiny in production. And then, individually, we have resources that exist on either subdomain sites that we maintain here at RStudio or open source package reference sites that are also available on GitHub and externally to the RStudio website on code profiling through profvis, general information on version control, which is also an important aspect of production, testing through the Shiny test package, thinking about deployment and release patterns. And for that, these resources come from thought articles that we've put on solutions.rstudio.com, access and security for those applications, also on solutions.rstudio.com, and performance tuning, looking at the results of Shiny load test and making tweaks to your application code and to your application runtime settings based on what your goals are.
So, I hope you've enjoyed this brief discussion of, you know, thinking about the interactivity hierarchy when it comes to building assets that include interactivity like Shiny and putting them into production. I hope that you go and check out some of these resources and then continue the discussion with us at community.rstudio.com as well.
Q&A
So, here's one. It says, what if we don't have a subscription to Shiny Connect? So, if you don't have RStudio Connect, there are good alternatives. You can always stand up an open source Shiny server as long as you have the requirements to do so. And we have plenty of resources on shiny.rstudio.com about how to get started doing that. And a lot of these tools for putting Shiny into production with the open source server will still apply. And some will be more difficult to implement. But, you know, either way, there is a path to doing Shiny, sharing Shiny assets without our pro products.
Let's see. What is required to be able to share Shiny app with coworkers who do not have RStudio? So, today I talked a lot about a product called RStudio Connect. And sorry if I didn't put a lot of context around that. You should definitely go and watch Tom Mock's webinar from two weeks ago called RStudio Connect in production. But like I said, even if you don't go the route of RStudio Connect, we have other tools from RStudio like Shiny server open source or even Shiny server pro if you're interested in alternatives there.
Here's one that says, could the workflow just described be done with a separate Shiny app run on a schedule rather than the R Markdown doc? Now, Shiny applications don't natively work running on a schedule. You have to interact with them or have some sort of programmatic trigger to be able to do that. And they definitely don't run on a schedule on RStudio Connect. So, if you are trying to run an ETL process on RStudio Connect, the choice for that would be R Markdown.
Do the profiling tools work for Shinyapps.io as well? What are the considerations there? Yeah. So, Profvis is just a general R profiling code tool. So, you can definitely use that for R code. You can use it for Shiny code. And you interact with it on your local side. So, you're just profiling that code in development on the RStudio IDE or on RStudio server. And then you're using that information to tweak your code, which then will end up living on Shinyapps.io or one of the other service products. Yeah. So, that's a great question. And is kind of the problem that Joe addresses with the Cranwhales app and his talk.
Just get it loaded. So, that does the data get reloaded whenever a new user opens the web address? And that's gonna depend on where you've put that action in your code. So, you definitely don't want to reload the data for every user. And you probably want to try to utilize a drill down approach for exposing access to more data as it's required for that user, if possible. We have a great example of building an enterprise dashboard with that drill down approach that my colleague Edgar put together on db.rstudio.com. That's db for databases.rstudio.com. So, that could be a really helpful resource for you when you're trying to build an application that loads a large amount of data. Thanks so much.
