Resources

Jeff Allen | RStudio Connect Past, present, and future | RStudio (2019)

RStudio Connect is a publishing platform that helps to operationalize the data science work you're doing. We'll review the current state of RStudio including its ability to host Shiny applications and Plumber APIs, schedule and render R Markdown documents, and manage access. Then we'll unveil some exciting new features that we've been working on, and give you a sneak peek at what's coming up next. Materials: http://rstd.io/rsc170

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

All right. Well, thank you for being here. I'm excited to talk to you today about RStudio Connect. I want to talk to you a little bit about where the product's been. I'm going to unveil a few things that we've been working on in recent months today. And then I want to talk to you about where we're going in the future.

But I'm going to do this through a vehicle that's a little bit unorthodox. So just between us, and this doesn't leave this room, I've been working on a little side hustle called Hats4Cats.me. This is outside of my role at RStudio. And basically, Hats4Cats is all about addressing the booming market of cat owners who need headwear for their pets.

And so one of the things that we've been trying to work on here, we've raised a round of funding, hired a data science team, and I kind of got a front row seat to sort of our experience using R there and sort of what that looked like. And so I thought I'd share that with you today and see if any of that resonates with kind of how your journey.

So things started off really great. Everyone was in R. We had two data scientists originally. They were both using R. And they both installed R just recently when they got started. They were using the latest versions of R packages. And everything was just fine.

There were only a handful of artifacts that they produced. So they had a couple R Markdown reports. We had one Shiny app. Everybody knew where everything was. And so there wasn't really any ambiguity. And we were all using the latest versions of everything. So everything just worked out fine. But pretty quickly, things went downhill.

So after about four or five months, we had enough content now that people started getting confused about where does this report live? Where does this Shiny app live? And our data scientists were spending all their days not actually doing data science, but answering these requests about can you send me this report? Can you rerun this report to get this week's data? Or can you run this report just for my product?

And so they spend all their time doing these menial tasks instead of doing the stuff that we really want them to do, which is unlocking value by tapping into and exploring the data that we have. Secondly, it was all irreproducible. So we started having problems where one analyst would send the other analyst some code, and they weren't able to run it successfully. Or they would go to rerun something that they'd run six months ago. And all of a sudden, nothing works because some package changed. And so they'd spend all day fixing this thing that should have taken five minutes.

And the content is spread all over. So we have dozens of reports now. They're on our network share. They're on Slack. They're trickling around in email. And nobody knows where anything is, which creates a huge problem. And then lastly, we had some issues with security. We had, at a couple of occasions, we had some sensitive financial reports that ended up on a network share that were accessible to the whole company, which nobody wants.

Introducing RStudio Connect

But we solved these problems with RStudio Connect. If you're unfamiliar, RStudio Connect is a content hub and execution engine for your data products. And so what that specifically means as an analyst is that your first experience is going to be clicking a little Publish button in the RStudio IDE, which I'll show you. And that will do all the hard work of getting your content from your computer onto the remote machine.

You can schedule and distribute our markdown content. And so now you get an easier way to be able to operationalize some of the content that you've produced. You can handle the kind of scaling problems that we've been discussing about all morning around Shiny applications to be able to send more and more traffic towards Shiny or Plumber APIs that you've deployed. And one of the more important features, you can manage access controls. And so you can actually go in and say, these users or these groups should be able to access this report, but nobody else is going to be able to see it.

And an important caveat, this is an on-premises commercial application. So this is not one of our open source contributions, but this is something that you can purchase and run inside your firewall on-premises. And so this will integrate with your directory of users and groups and all that kind of stuff.

So a couple things that I'm excited to announce today that are available in the 1.7.0 release. First of all, it's programmatic deployment. And so most people tend to reach for the push button deployment from the RStudio IDE. But you also have the option now of saying, if you want to point your Jenkins or your Bamboo server at a Git repo and have it monitor that and automatically pull down changes and publish those, you now have that option with the programmatic deployment.

We have customizable email, if you haven't seen this yet. So you can go in and you can tailor the body of an email or the message, as Tarif showed this morning. And lastly, we have content instrumentation, or metrics, around your content. So you can see who's viewing your content when. And then you can go in and use that data to develop reports or answer questions like, what is the most important content? What should I be maintaining? Where should I be putting my energy?

Demo: publishing R Markdown reports

So let me show you a quick demo of what that looks like. So let's actually start in the RStudio IDE. And what I have here is just a kind of run-of-the-mill R Markdown report. So I can knit this locally. And the original workflow that our data scientists were using is that they would do some analysis. And then they would render it locally on their desktop. And then they'd go in and say, for this week's sales, we're going to copy this image out into an email or copy it into a PowerPoint and send that over to our executives.

So it's a lot of kind of manually operating on this data. And they spend a lot of their time just kind of hand-crafting these reports, rerunning these things every day.

So the better alternative that I'll suggest today is you click this Publish button here. And that allows you to send up the relevant files that you're working with, any data that you're using locally. And you can publish that over to RStudio Connect. And what's going to happen behind the scenes is it's going to capture all of the code that you're using in this report. And then additionally, it's going to build a manifest of all the packages that you're using and even the versions of those R packages.

And so now you get all of that that's happening here. And then that's going to get recreated identically on the server. And so now you have no kind of lack of parity between your local experience and what's going to run on the server. And so lo and behold, I now have this RStudio Connect report that looks just like what it looked like on my desktop and is using the same packages, the same version of R, et cetera.

Now what's exciting about this is that now I have a URL that I can use to just send other collaborators. And they can now see the same report without any work on their part or any installing of software or anything like that. Secondly, you've got these access controls over here. So I can now lock down who should be able to view this report. Maybe it's just me. Maybe I want to make it public. Or maybe everybody in the organization should be able to see it. And then because it's in our markdown report, I can also schedule this to repeat. And so maybe I want this report to run every week. Let's say I want it to run every Monday morning and then perhaps even send an email out once it's done to the executives who are interested in this report. And so all that now is going to happen for free. And what's great is that if there's ever a problem with this report, if there's an issue running it, I'm going to get an email saying, hey, your report had a problem. You should go check it out. And so now I don't have to think about this anymore. And I can just schedule this report once and then leave it running.

Parameterized R Markdown

So this is pretty exciting. But we can actually go one step further. And so Tarif alluded to a concept this morning called parameterized R Markdown. And I'm curious, how many of you have used parameterized R Markdown yet? OK. So a decent number of you. So parameterized R Markdown is actually a concept that you can use locally. And so I'd encourage you all to try this. It's a pretty handy way to just kind of define a set of parameters that you want to put into an R Markdown report. And then you can actually just go through and leverage those parameters to kind of tailor the report to the experience that you want.

And so within the IDE, you just click this knit with parameters button. And now I can go in and I can say, rather than just running a weekly report, maybe I want to run a report for a particular product in our line or a different number of days for the historical data. And now I can go through and knit that. And then I just get the local report here that has the data that I want.

So this would be at least a step up for you if you have people coming back and repeatedly saying, I want this report but slightly modified. This gives you a way to handle one source code, one bit of one report that you need to maintain, but still be able to create things for your users that they're interested in. But even better, if you deploy this on RStudio Connect, you actually get this little input tab on the left. And that allows other users who haven't installed R, who haven't installed RStudio, who are not scientists, to go in and be able to specify those parameters themselves. And so now I can say that same report for 30 days, cowboy hat. I can run that report, and I can just get the data that I'm interested in.

Even better, this is a full-fledged report on Connect. And so I can kind of hop around and view different reports here that I've previously configured and saved. And then if I want to schedule these or even distribute these automatically, I can do that.

And so now I have one R Markdown document that I'm maintaining, one bit of source code that I've worked on. And yet, all these people are able to kind of get the customized, tailored views that they're interested in, that they can even define themselves without me having to be involved at all. So this is a huge step up.

And so now I have one R Markdown document that I'm maintaining, one bit of source code that I've worked on. And yet, all these people are able to kind of get the customized, tailored views that they're interested in, that they can even define themselves without me having to be involved at all.

Shiny and Plumber on Connect

And then Connect also can support your Shiny applications and your Plumber APIs. So this, I usually wouldn't show sensitive financial data at my company here, but I think we're all friends, so I'll show this to you. So you can go back, and this is a little dashboard that we would share internally. We've got a big monitor in our office that kind of shows this. And I think there might be a bug or something in this report.

But basically, you can kind of see how the company's doing and how things look. So you've got all the normal toggles that you want with Shiny to be able to host that and let people interact with it. Again, the same access controls and everything that you'd want there. But even better with Shiny now, I can go in and I can scale out processes to handle the capacity that I'm expecting. Or even, as Joe was alluding to earlier, we can also handle clustered environments for our Studio Connect. So if you were here last year, you might have seen Sean Lopp give the demonstration of having 10,000 concurrent users on a Shiny application using our Studio Connect. So there's really, you can handle pretty broad scale of incoming content here.

And then lastly, we have the same thing around Plumber APIs for using Plumber. We can easily host those, and you get all the same scaling and tuning parameters that you'd expect with Shiny to be able to define how responsive you need this to be and what sort of load you'd expect.

Python support in RStudio Connect

All right. So let's hop back to the slides. So that's really everything that I want to show you around R. But there's one more topic that I want to address before in our time together, and that is Python. So this has come up already a couple times on this track. But one of the things that we did at Hats for Cats is we hired a data analyst that was using Python. And so historically, it was pretty difficult to integrate together, right? You've kind of got these two different languages that they don't interoperate all that well.

This has improved a lot recently with the reticulate package. And so if you haven't seen this, reticulate is a way for you to invoke and interact with your Python code using R. And so this got a lot better, and we were kind of able to find some ways to work together. But there was still this open question of what are we going to do when it comes time to publish? So we had this great publishing and operationalization system for our R content in RStudio Connect that we lack with Python. And so I'm excited to announce today we now have Python support in RStudio Connect.

So specifically, this looks like two different features. So first of all, you have what Tareef showed this morning, which is RSConnect and the RStudio IDE now capture the Python dependencies and the Python packages that you have locally before you deploy. So if you are using reticulate in your R content and you go to publish, you'll get the same experience with your Python packages that you're used to with your R packages. But even more exciting, in my opinion, is we now have a Jupyter Notebook plugin, which allows you to publish directly from your Jupyter Notebooks. You get the exact same push-button publishing experience that you have in the RStudio IDE with your R content you now have in Jupyter Notebooks with Python.

So I'll show you this real quick. So the motivating feature here is in Hats for Cats, a lot of our customers would say, look, I know what your hat looks like. I can see that picture online and I know what my cat looks like, but I can't always envision what they would look like together. And so we wanted to do some engineering to solve this problem. So first of all, we hopped into a Jupyter Notebook and we did some exploration to say how effective can we be at detecting faces in a given image? And so this is a Jupyter Notebook that I have here, which our Python analyst produced using all Python code. And I click this publish button in the Jupyter Notebook and I get that same experience of being able to send this code over into RStudio Connect. And you'll see that it's doing all the same kind of intelligent features that we had in R around looking at the version of Python that we're using, the versions of the different packages, and then it sends it up to connect. And now I click this button and I can see, as you can see, I didn't have it rendered locally, but this actually rendered on the server. And so you get kind of, you know, the full abilities of your server here and that entire environment was recreated so that I get the exact same experience locally as I get remotely. And then you get all the same bells and whistles that you have with our markdown content in terms of being able to schedule this, rerun it, email it, et cetera.

One other feature then around Python is if you wanted to use this in a reticulated way, then you can do that as well. And so we have the Jupyter Notebook. We've proven that we have kind of a validation of our technology. And so now we know, okay, we can detect faces effectively. But one of the things that we now want to do is we want to operationalize this. And we have, you know, a great ability to do that with an R, as you've seen through Plumber or through Shiny. There are a lot of great ways for you to take your R code and get it into the hands of your users. And so one of the things we wanted to do was actually build a Shiny app that uses the Python code on the back end to do the image detection.

So Python has some really strong image processing libraries, specifically OpenCV. And so we wanted to continue using Python for what Python is good for, but use R for what R is good for. And so develop a Shiny application that allows us to kind of blend those two. And so here we put together just kind of a preview of what that's going to look like that we can host on our page. So this is a Shiny application with a little webcam widget input. And now I can envision what it's going to look like for me to purchase any of these different hats for my cat. I unfortunately don't have a cat up here with me, so you'll have to use your imagination. But right, so now you get the kind of same capabilities that you would expect around being able to operationalize your content, whether or not it's using Python under the hood.

What's next for RStudio Connect

All right. So a little bit about what's next. One of the things that we're working on is called SAML. If you don't know what that means, just skip it. But if you do, that might excite you. This is a new way to kind of authenticate users into an organization that your IT people may be excited about. We're also working on the job launcher integration. And so you've probably heard a couple of references to this. But the basic premise is that we want to be able to run R in other environments rather than just the local server where Connect is installed. So if you're using Kubernetes, or if you're using Slurm, or some other kind of compute bucket, we'll be able to distribute our R processes there so that you can see them and run them there.

We're also working on formal Git integration. So being able to use Git kind of as a first-class citizen within Connect so that you point at a GitHub repository, we'll monitor that repository for changes, pull down any changes that come in, and automatically deploy it. You also get an improved view of scheduled content. And so you can see what processes are scheduled to run when and who's scheduled what kind of work to run. You can get customizable views into content. So we've heard from a lot of people, especially consultants, who say, I want a kind of branded portal for my customers. I don't want to show your logos. I want to show my logo. And so being able to kind of do that in a way that allows people to sort of tailor the different views that they get and the different experiences is more of kind of a content management system. And then lastly, more Connect server API endpoints. And so if you've been paying attention lately, you've seen more and more of these API endpoints that we've put out to allow people to control Connect as kind of a black box rather than just interactive usage.

Customizable emails demo

And since I have one minute, I will show you one other feature that I forgot to show you earlier, which is one of the features that Tarif showcased a little bit this morning was the ability to go in and customize your emails programmatically. I wanted to drill into that just a little bit, which is so right now we've got this single rendered view of my content that just, you know, in general, when you put in our markdown document together, it's going to render to an HTML or to a PDF. So you get that same experience here in Connect. But we also have some conventions defined by which you can say, you know what, I actually want to tailor the email message that's going to go out associated with this email.

And so now when I go to distribute this report, I'm going to email this just to myself. And when I go look in my email inbox, as soon as that report is done generating, then I'm going to get an email. And you can see already, first of all, that we've got a customized subject line. This is tailored to the report that just ran. I've got 142 sales this past week. And then you can see that I've got this prose. It doesn't look exactly like what I had in my markdown report that I rendered. This is a different view. And so I've got this code. And, you know, with graphs embedded, I can kind of define these tables that are customized. And this is all leveraging the blast JILA package under the hood.

But now what I can do is rather than relying on Connect just to kind of attach an HTML file to an email and my executives are going to open, instead I can actually tailor it so that exactly what I want them to receive is going to land in their inbox. They can look at this on their phone on the way into the office. And even customizing the attachments. So here I've generated an Excel file and I've attached that so that if somebody wants to actually download and peruse the data, they can do so themselves.

All right. And so that is everything that I've got for you. If you're interested in Connect, I'd encourage you to visit this link up top, and that will get you a 45-day free trial for Connect. So you can download that and start playing with it. We'd love to get your feedback. We're going to be hanging out at the Publish booth out in the Pro Lounge. And so if there are features either that you're excited about or features that we lack that are kind of deal breakers for you, I'd especially love to hear about those. So please come talk to us. And the slides and code are all available at that link there in the middle. With that, I've got time for a few questions, I think.

Q&A

Thanks, Jeff. Yeah, we've got time for a couple of questions. We'll throw those mics around.

Quick question on the parameterized reports. Is there any way to, in the future, programmatically schedule those? Because I can imagine having a bunch of parameters, wanting to generate a bunch of reports. How do we...

Yeah. Yeah, we'd definitely like to do that. So I'd file that under the category of more connect server API endpoints. But that's one of the things that we want to do, is allow people to more efficiently invoke executions of reports, or even tailor the parameters that they want to invoke them with. So yeah, that'll definitely come soon.

Connect makes it easy to change read access to the reports, like the R Markdown. Can it be used for collaboration purposes to manage the write permissions on that report?

Yeah. Yeah, yeah, yeah. So I kind of glossed over that, but let me show you that real quick. So inside of Connect, what you can do is you can go into any report here, and you can manage viewership permissions, and also collaborator permissions. And so I can go in and I can add some user that's inside my organization, and I can add them as a collaborator under the document, so that they would then be able to publish to this endpoint as well.

Everything would be hosted on the server, right?

Yeah, yeah, yeah. So typically, you would use Git to organize your collaboration locally, and then somebody's going to publish at some point. You could use the programmatic deployment, or one of you could push button publish, if you wanted.

And you would also have diffs and version control?

Yeah. So we don't have that built into the product, but we would encourage you to use something like Git to be able to track some of that. You do have access to those different source versions that have been deployed. So you could go back and grab the bundle that was deployed previously, but it's not going to show you a diff or a commit message necessarily with that.

So three quick questions. One, for the usage monitoring, or people accessing the resources, is there support for interacting with the underlying data and maybe plugging it into other organizational data that might be available to people with the apps? The other question is, for apps that are published, Shiny apps that are published, is there any support for updating users who might be looking at an app that a new version is available? Or that's for an individual app, or just across the board, if you want to mention that you're updating the server and it's going to be down or something like that, you just want to tell everyone?

Yeah. So we don't have a way to proactively reach out to users of a particular application, although you could use the instrumentation to pull down all the users that have visited it in the past 30 days or something and notify them that way if you wanted to. We do have options within the server to be able to define, right up here, you could put a little warning box just to alert your users that, hey, we have scheduled maintenance for during this window or something. So you'd be able to do that.

And for the first part of the question, I'm not sure if I totally understand, were you referring to the server metrics? So here we have CPU and RAM and things like that, or are you referring to data that's inside your report?

Oh, so it's not on a user-specific user name basis?

Yeah, yeah, yeah. So we don't have resource monitoring at the user level just yet. But we do have some little visibility here that you can use to kind of envision that here on your admin page. And then we also do support exporting this to a graphite system, which may be something that your IT team, if they do a lot of this.

But you're saying that there are future plans for maybe having user-specific usage?

Yeah, yeah. So I think what we want to do is give you at least better insight into what's going to be running and when, and then kind of sort of a static view into, all right, at this moment, who's using the most memory, who's using the most CPU, and be able to kind of tune that. And so, yeah, I don't think it's out of reach that we would have historical views on that, too.

Jeff, I think he's asking about instrumentation, the real, well, the use of the instrumentation.

Okay, yeah. And then the instrumentation is also available there to see who's consuming the content, too.