
Connect Production | RStudio Webinar - 2017
This is a recording of an RStudio webinar. You can subscribe to receive invitations to future webinars at https://www.rstudio.com/resources/webinars/ . We try to host a couple each month with the goal of furthering the R community's understanding of R and RStudio's capabilities. We are always interested in receiving feedback, so please don't hesitate to comment or reach out with a personal message
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
So if you're new to Connect, Connect is an enterprise publishing platform for static and dynamic content that you're creating with R. So that can include documents, presentations, dashboards, and Shiny applications, and basically anything that you're producing in R. It is an on-premises application, so you run this on your own server behind a firewall if you want, and you're going to host it on your own equipment there. And the goal is really around sharing data science artifacts that you produced inside your organization.
So if this is new to you, I would recommend actually that you check out the introductory webinar that Bill referenced a moment ago. So we recorded this two weeks ago, the recording is available online, as are the slides. And I'd recommend that you start there because we're going to be building on some of the foundation that we laid a couple weeks ago. So if you are new to Connect, you might want to start there before you jump into some of these more advanced topics. But without any further ado, let's go ahead and dive into some of the content that we want to cover today.
So my goal for today is really just to give you pointers to the different tidbits that I think are important for you to know if you're going to be managing RStudio Connect in a production environment. And so I won't go into too much depth about a whole lot of these, but really I just kind of want to give you links and pointers to the different things that you might be interested in if you are responsible for managing a production environment with RStudio Connect. So pardon me if I'm kind of flying through some of these topics, but feel free to ask questions and we can come back and do a little more detail on some of these things later. Or just check out the links once we post them online and hopefully you can kind of find the information that you need there with the links that I provide.
So today I want to cover first of all user management, then secondly the management of resources on the server, and then lastly we'll talk a little bit about kind of the system management and system security implications for the server.
User management
So let's dive into user management to get started. So we covered this a bit last time so I won't belabor this, but we have three different user roles on the system. The first is an admin user who has all privileges on the server and they can access and kind of manage anything that they need to, but irregular actions are audited and I'll dive into what that means here in a moment. Below that there would be a publisher who's someone that can upload or publish content onto the server, and then lastly there's a viewer who's just a consumer of content and can't author content that gets executed on the server.
So let's talk a little bit about what the experience of a Connect administrator looks like. So first of all they have access to special admin only actions, and so for instance this includes the admin tab, so if you're an admin on the user you're able to access certain pages within the dashboard that other users aren't able to manage that shows you things like metrics, you're able to manage users and things like that, and then lastly you're able to customize some application settings that other users aren't able to. So you can define vanity URLs for an application, you can customize the run as argument for an application, and I'll show you what all this means here in just a moment.
But the trick is that you don't get everything for free as an admin, and so some of these things you're going to be able to just hop on and start changing. Other things you're going to actually have to go out of your way to explicitly grant yourself permission to things that you otherwise wouldn't have had permission to do. So let's, I think that's probably best covered by just an example, so let's look at an example of this.
So first of all this is me logged in as an admin user on Connect. And so you can see here that first of all I have this admin tab, and so that takes me first and foremost to this metrics page where I'm able to view information about the CPU and RAM usage across the server. I also have an audit logs page here where I can view kind of different changes that have been taking place on the server recently. And then lastly when I dive into particular content, even though this is not authored by me, this is authored by someone else and I don't have particular permissions on it, I am still able to go in and define custom settings for this content. So for instance here I can define a vanity URL.
Let's take a look at kind of a more rich document here. So in this case we have a schedule. I as an administrator, even though I don't have special privileges on this document, I can go in and customize the schedule for this content. I can define a vanity URL, change the run as user, etc. So I as an admin have free privileges to be able to do some of these things. However if you look at something like a document here that's private, so here I'm logged in as a different user, this is the publisher user, and you can see that I for this content here I've defined it to only be visible to myself. So this means that the admin user should not freely have access to this content. This is sensitive content that the admin shouldn't be able to see. And indeed if I go as the admin and I go to look at that content, this is the view that I get. So you can see that I'm still able to manage the settings for that content here, but I do not have free access to be able to view that content.
And so this is kind of what we're talking about when we say the admin has the privileges to do whatever they need to do on the server, but they don't get everything for free. So I am not able to view this content, although I can go in and I can add myself as a publisher or as a viewer because I'm able to manage the settings on this content, and at that point I would be able to view the content. And so the trick here is that all of this though is managed when you make these explicit actions to add or remove yourself to a particular bit of content, all of those actions are going to be captured in the audit log that I referenced earlier. So when I go look at the audit log you can see here that the admin user assigned themselves as an owner as a collaborator on this app and then they remove themselves as an owner as a collaborator on this app. So now while I am able to do everything and navigate whatever I need to cover to be able to manage the server, anytime I take those explicit options of kind of going out of my way to take special privileges on an application, that's going to be captured in the audit log and that's kind of the balance that we try to strike here with an admin.
That's going to be captured in the audit log and that's kind of the balance that we try to strike here with an admin.
Lastly, if you've missed this in our latest release, the 1.441 release, we also do have the ability to download the source code for content. And again that's only available to users that are explicitly granted collaborator privileges. So I as an admin do not get free access to source codes published on the server, however if I add myself as a collaborator then I am able to download the source code for an application. So this is kind of the balance that we tried to strike with an admin but I think it's important if you're going to be managing a Connect server in production that you kind of understand what the privileges of an admin actually encapsulate and what you get for free and what you don't.
So next we'll just kind of move on and a lot of these things I'm just going to cover in rapid fire, but the next one that I wanted to cover is the default user role. So this is the setting that's managed in the authorization section under the default user role setting and this is basically the role that fresh users should take when they sign on to the server. The default right now is publisher, which means that when a user signs up on your server or when they you know log in using whatever authentication protocol you're using, that user is going to become a publisher on the server, they're going to have access to publish new source code. If you want to change that, if you want to limit that so that default so that new users coming into the server are just viewers, you can change this configuration setting here and make that a viewer. And this is actually subject to change, we've considered actually making the default viewer and so in which case if you wanted the default to be a publisher you could of course override that here.
Another tool that you should be aware of if you're managing Connect is the user manager command line interface and this is a root only command line interface that allows you to interact with Connect in kind of a batch way. And so right now there's kind of a limited subset of what you can do with this command but we envision this growing over time to capture more of the interactions that you might want to take within Connect. So right now one restriction is that the server actually does need to be stopped in order for you for you to use the command line interface here and that that's a restriction that may be lifted here in the coming months. But right now you can access the tool at this location, optr-studio-connect-bin-user-manager and then you can run commands such as list to list all the users in the server, you can run alter to change a user for instance promoting a viewer to a publisher or promoting a publisher to an admin user, and you can also actually dump the audit logs even in the CSV format here. So if you wanted to navigate and browse the audit logs on your own time or using your own tooling you could do that here using this user manager tool. So something you should be aware of and you can view all the documents in the admin guide for what different endpoints are available in this command.
Next there's an idea of user locking in the server that you should be familiar with. So as of the time of this recording in early 2017 we don't have a notion of user deletion and the reason for that is that there are a lot of open questions around what should you do with a user who's deleted, what should you do with their content, should you migrate it to another user, should you keep it alive on the server, or should you get rid of it. So until we settle some of those questions right now we've kind of settled on this compromise of user locking. So if you for instance have an employee leave the company and you don't want them to have access to the server anymore you can lock their account which forbids any further login or interaction on the system, they can't publish updates or new content to the server, and they also don't count against your license so they're not going to count as a named user, they're not going to take up the seat for your license when they're locked. And also you should be aware that you can rename users and so if your goal is just to get rid of a user who's taking up a username that you want you can certainly just change the username for that user and lock it and then create a new user account with the username that you desire.
Resource management
So that's a bit about user management and we can transition into resource management now which is a bit more of a complex topic. So one of the most common questions that we get around Connect is the idea of resource budgeting or how large should my server be, and the problem or the difficulty in answering this question is that the requirements here almost depend entirely on what your users are deploying. So if you just have a couple of simple documents that you know are a couple of simple dashboards that are updated once a day that people are going to access on your Connect server, you could probably get away with some running this on you know something like very very small even you know a Raspberry Pi could probably handle that kind of workload. However if you're doing you know very intensive Shiny applications that are doing you know genomic analysis on multiple gigabytes of data, then your server requirements are going to be much much larger. And so ideally if you if you have the luxury of being in a virtualized environment where you can kind of scale up or down a server that would probably be your best bet. But otherwise if not you can kind of run a proof of concept and see what the hardware requirements are given the applications and the types of work that your that your users are publishing to the server, and that's really the best way to kind of get a feel for what your requirements should be in terms of hardware for the server.
So in terms of the philosophy of Connect, on-demand requests are largely what we're servicing when when requests are coming in, and those are just going to be service-best efforts. So you know as a request comes in for a Shiny application we're going to do our best to spin up that Shiny application and hope that there's enough memory available on the server. So but there are some knobs and some tuning that you can do to cap the resource requirements available to particular applications or different different use cases.
So first of all that around Shiny there's there's a notion of Shiny scaling. So you can scale Shiny applications and Connect to multiple processes. So if you're not aware R is single-threaded and Connect can actually load balance a particular application across multiple independent R processes running that same application. And so this is what's managed here in the performance tab on a Shiny app. So as an admin you can go in and you can override particular performance settings for an application and set for instance the number of max processes, how many processes you know under the heaviest load that you might be willing to run for this application. And then down at the bottom here you can see kind of some of the scaling parameters around how many connections should be supported per process. And then the load factor is basically how quickly do you want to ramp up from the minimum number of processes to the maximum number of processes as load increases. The minimum number of processes as the name implies is going to guarantee that in processes are running for this application at all times. The right answer for this for almost all applications is zero. However if you have an application that takes a really long time to start up and you don't want a user hitting it having to wait multiple seconds or even minutes for the Shiny
