Resources

Using RStudio Connect in Production

Jun 16, 2017
38 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So if you're new to Connect, Connect is an enterprise publishing platform for static and dynamic content that you're creating with R. So that can include documents, presentations, dashboards, and Shiny applications, and basically anything that you're producing in R. It is an on-premises application, so you run this on your own server behind a firewall if you want, and you're going to host it on your own equipment there. And the goal is really around sharing data science artifacts that you produced inside your organization. So if this is new to you, I would recommend actually that you check out the introductory webinar that Bill referenced a moment ago. So we recorded this two weeks ago, the recording is available online, as are the slides. And I'd recommend that you start there because we're going to be building on some of the foundation that we laid a couple weeks ago.

So if you are new to Connect, you might want to start there before you jump into some of these more advanced topics. But without any further ado, let's go ahead and dive into some of the content that we want to cover today.

So my goal for today is really just to give you pointers to the different tidbits that I think are important for you to know if you're going to be managing RStudio Connect in a production environment. And so I won't go into too much depth about a whole lot of these, but really I just kind of want to give you links and pointers to the different things that you might be interested in if you are responsible for managing a production environment with RStudio Connect. So pardon me if I'm kind of flying through some of these topics, but feel free to ask questions and we can come back and do a little more detail on some of these things later. Or just check out the links once we post them online and hopefully you can kind of find the information that you need there with the links that I provide.

So today I want to cover first of all user management, then secondly the management of resources on the server, and then lastly we'll talk a little bit about kind of the system management and system security implications for the server.

User management

So we covered this a bit last time so I won't belabor this, but we have three different user roles on the system. The first is an admin user who has all privileges on the server and they can access and kind of manage anything that they need to, but irregular actions are audited and I'll dive into what that means here in a moment. Below that there would be a publisher who's someone that can upload or publish content onto the server, and then lastly there's a viewer who's just a consumer of content and can't author content that gets executed on the server.

So let's talk a little bit about what the experience of a Connect administrator looks like. So first of all they have access to special admin only actions and so for instance this includes the admin tab, so if you're an admin on the user you're able to access certain pages within the dashboard that other users aren't able to manage that shows you things like metrics, you're able to manage users and things like that, and then lastly you're able to customize some application settings that other users aren't able to. So you can define vanity URLs for an application, you can customize the run as argument for an application, and I'll show you what all this means here in just a moment.

But the trick is that you don't get everything for free as an admin and so some of these things you're going to be able to just hop on and start changing. Other things you're going to actually have to go out of your way to explicitly grant yourself permission to things that you otherwise wouldn't have had permission to do. So let's, I think that's probably best covered by just an example, so let's look at an example of this.

So first of all this is me logged in as an admin user on Connect. And so you can see here that first of all I have this admin tab and so that takes me first and foremost to this metrics page where I'm able to view information about the CPU and RAM usage across the server. I also have an audit logs page here where I can view kind of different changes that have been taking place on the server recently. And then lastly when I dive into particular content, even though this is not authored by me, this is authored by someone else and I don't have particular permissions on it, I am still able to go in and define custom settings for this content. So for instance here I can define a vanity URL.

Let's take a look at kind of a more rich document here. So in this case we have a schedule. I as an administrator, even though I don't have special privileges on this document, I can go in and customize the schedule for this content. I can define a vanity URL, change the run as user, etc. So I as an admin have free privileges to be able to do some of these things.

However if you look at something like a document here that's private, so here I'm logged in as a different user. This is the publisher user and you can see that I for this content here I've defined it to only be visible to myself. So this means that the admin user should not freely have access to this content. This is sensitive content that the admin shouldn't be able to see. And indeed if I go as the admin and I go to look at that content, this is the view that I get. So you can see that I'm still able to manage the settings for that content here but I do not have free access to be able to view that content.

And so this is kind of what we're talking about when we say the admin has the privileges to do whatever they need to do on the server but they don't get everything for free. So I am not able to view this content although I can go in and I can add myself as a publisher or as a viewer because I'm able to manage the settings on this content and at that point I would be able to view the content.

And so the trick here is that all of this though is managed when you make these explicit actions to add or remove yourself to a particular bit of content, all of those actions are going to be captured in the audit log that I referenced earlier. So when I go look at the audit log you can see here that the admin user assigned themselves as an owner as a collaborator on this app and then they remove themselves as an owner as a collaborator on this app. So now while I am able to do everything and navigate whatever I need to cover to be able to manage the server, anytime I take those explicit options of kind of going out of my way to take special privileges on an application that's going to be captured in the audit log and that's kind of the balance that we try to strike here with an admin.

So now while I am able to do everything and navigate whatever I need to cover to be able to manage the server, anytime I take those explicit options of kind of going out of my way to take special privileges on an application that's going to be captured in the audit log and that's kind of the balance that we try to strike here with an admin.

Lastly if you've missed this in our latest release the 1.441 release we also do have the ability to download the source code for content. And again that's only available to users that are explicitly granted collaborator privileges. So I as an admin do not get free access to source codes published on the server, however if I add myself as a collaborator then I am able to download the source code for an application. So this is kind of the balance that we tried to strike with an admin but I think it's important if you're going to be managing a Connect server in production that you kind of understand what the privileges of an admin actually encapsulate and what you get for free and what you don't.

So next we'll just kind of move on and a lot of these things I'm just going to cover in rapid fire but the next one that I wanted to cover is the default user role. So this is the setting that's managed in the authorization section under the default user role setting and this is basically the role that fresh users should take when they sign on to the server. The default right now is publisher which means that when a user signs up on your server or when they you know log in using whatever authentication protocol you're using that user is going to become a publisher on the server they're going to have access to publish new source code. If you want to change that if you want to limit that so that default so that new users coming into the server are just viewers you can change this configuration setting here and make that a viewer and this is actually subject to change we've considered actually making the default viewer and so in which case if you wanted the default to be publisher you could of course override that here.

Another tool that you should be aware of if you're managing Connect is the user manager command-line interface and this is a root only command-line interface that allows you to interact with Connect in kind of a batch way and so right now there's kind of a limited subset of what you can do with this command but we envision this growing over time to capture more of the interactions that you might want to take within Connect. So right now one restriction is that the server actually does need to be stopped in order for you for you to use the command-line interface here and that that's a restriction that may be lifted here in the coming months but right now you can access the tool at this location optr-studio-connect-bin-user-manager and then you can run commands such as list to list all the users in the server you can run alter to change a user for instance promoting a viewer to a publisher or promoting a publisher to an admin user and you can also actually dump the audit logs even in the CSV format here so if you wanted to navigate and browse the audit logs on your own time or using your own tooling you could do that here using this user manager tool.

Next there's an idea of user locking in the server that you should be familiar with. So as of the time of this recording in early 2017 we don't have a notion of user deletion and the reason for that is that there are a lot of open questions around what should you do with a user who's deleted, what should you do with their content, should you migrate it to another user, should you keep it alive on the server or should you get rid of it. So until we settle some of those questions right now we've kind of settled on this compromise of user locking. So if you for instance have an employee leave the company and you don't want them to have access to the server anymore you can lock their account which forbids any further login or interaction on the system they can't publish updates or new content to the server and they also don't count against your license so they're not going to count as a named user they're not going to take up the seat for your license when they're locked. And also you should be aware that you can rename users and so if your goal is just to get rid of a user who's taking up a username that you want you can certainly just change the username for that user and lock it and then create a new user account with the username that you desire.

Resource management

So one of the most common questions that we get around Connect is the idea of resource budgeting or how large should my server be and the problem or the difficulty in answering this question is that the requirements here almost depend entirely on what your users are deploying. So if you just have a couple of simple documents that you know a couple of simple dashboards that are updated once a day that people are going to access on your Connect server you could probably get away with some running this on you know something like very very small even you know a Raspberry Pi could probably handle that kind of workload. However if you're doing you know very intensive shiny applications that are doing you know genomic analysis on multiple gigabytes of data then your server requirements are going to be much much larger. And so ideally if you if you have the luxury of being in a virtualized environment where you can kind of scale up or down a server that that would probably be your best bet. But otherwise if not you can kind of run a proof of concepts and see what the hardware requirements are given the applications and the types of work that your that your users are publishing to the server and that's really the best way to kind of get a feel for what your requirements should be in terms of hardware for the server.

So in terms of the philosophy of Connect on-demand requests are largely what we're servicing when when requests are coming in and those are just going to be service best efforts. So you know as a request comes in for a shiny application we're going to do our best to spin up that shiny application and hope that there's enough memory available on the server. So but there are some knobs and some tuning that you can do to cap the resource requirements available to particular applications or different different use cases.

So first of all that around shiny there's there's a notion of shiny scaling. So you can scale shiny applications and connect to multiple processes. So if you're not aware R is single-threaded and Connect can actually load balance a particular application across multiple independent R processes running that at that same application. And so this is what's managed here in the performance tab on a shiny app. So as an admin you can go in and you can override particular performance settings for an application and set for instance the number of max processes how many processes you know under the heaviest load that you might be willing to run for this application. And then down at the bottom here you can see kind of some of the scaling parameters around how many connections should be supported per process. And then the load factor is basically how quickly do you want to ramp up from the minimum number of processes to the maximum number of processes as load increases.

The minimum number of processes as the name implies is the is going to guarantee that in processes are running for this application at all times. The right answer for this for almost all applications is zero. However if you have an application that takes a really long time to start up and you don't want a user hitting it having to wait multiple seconds or even minutes for the shiny application to come online you might want to define this as one or even two to ensure that that there's always a process available for incoming users. You can actually cap this if you find that users are kind of you know setting absurd minimum process values. You can actually cap this using the scheduler dot min process limit setting and that will allow you to say no application should ever define a min process that's greater than X.

Next we've got a notion of shiny timeouts and so this currently is a globe these are global settings. Eventually we envision that these will be customizable by the application but these define two different parameters. So first of all this defines the maximum amount of time that you would want to wait on an application to start. And the default is 60 seconds which means that if a shiny application takes longer than 60 seconds to start and come online we're going to assume that the process is having some problems and we're going to give up waiting on it. If you find that you're building shiny applications that indeed genuinely take longer than 60 seconds to start you can increase this limit to say you know wait on that process a little longer. The second value here is the minimum time that we should keep a worker process alive after it goes idle. And so what this means is that after the last user leaves process after they disconnect and the process is now empty and not serving any shiny sessions we're going to wait five seconds and if no new traffic comes in then we're going to reap that process. That's a pretty reasonable default for average workloads but again if you're running these very large processes that take a lot of time or take a lot of resources to start up you might not want to reap the process that quickly. You may want to leave it open for multiple minutes or hours or even overnight so that you don't have to shut down and spin up these processes which might be an expensive operation.

Next we're stepping away from shiny and more into the notion of scheduled reports. We do have a throttle around report concurrency in the latest version of RStudio Connect. So this is managed in applications.schedule.concurrency and this allows you to determine the number of concurrent scheduled reports that you would ever want to run in parallel. And by default this is set to two and so if you have 20 users who all set up their documents to run every night at midnight rather than trying to run all 20 of those reports at one moment which you know might cause them some resource contention or even some stability problems on the server we're only going to run two at a time and we'll just iterate through as quickly as we can to get all 20 reports executed as close to midnight as possible. So if you find that you have more capacities than two you can certainly up this limit to have multiple processes running at the same moment or likewise if you need to tune that down you can.

Disk usage and storage

The next notion of resources that you might want to manage is around the disk. And so there are a few different things that we're going to store on disk. So most obviously we're hosting the applications that your users are sending us. And so first and foremost we're going to be storing on disk the content that your users upload to us. We call these bundles but these are basically the application and any associated data or metadata that's associated with those applications that the user is going to give us. We store those in a compressed format and basically we're storing the exact blob that the user has uploaded to us originally. We're going to we're going to keep that retain that so that you always have a copy of kind of what the user originally gave us.

Next in order to actually run the application we obviously need to unzip those bundles and have them in a directory where we can actually start executing that code. And so you will have one copy of each application that's actually laid out in an unzipped folder where we can actually go have R run that code and access the data associated with it if there is some. Next there's the notion of a pacrat cache and we talked about this in the last webinar but we use a system called pacrat to manage to kind of recreate the environment in terms of the R packages that are available on the publishing environment or the you know the IDE that your that your user is using. We try to recreate that environment and server using this pacrat system. And so what that means is that every time that a user publishes some code if they have a particular version of a particular package we're going to compile that and store it on the server as well. And so you're going to have one copy of each version of each package and that is specific to the R version. So if you support multiple R versions which we'll discuss here in a moment then you might have multiple copies unique to each R version. However that being said all of these are shared in the same library or same cache and so that means if you have multiple users who have published the same version of ggplot2 using the same version of R you're not going to be storing those copies redundantly they're all going to be pointing to the same singular copy of that package.

Next we store some metrics. These are usually a pretty small usage of the disk space but you know for instance you saw the RAM and CPU usage that gets stored on disk. And then lastly we store the information around R processes that have executed in the past, the logs associated with them. I wouldn't expect that you'd ever see any disk concerns here unless you have a user who's you know dumping multiple kilobytes of data to the log each time somebody you know touches the slider on the Shiny application or something absurd like that.

So this is just one snapshot of what our the beta.rstudio connect server looks like right now. And so you can see here that we have over 2,000 applications or 2,000 bits of content that have been published to the server. And we really do almost no policing on the server in terms of the bundles that are uploaded. And so if you look into this in a little bit more detail it turns out that there are a couple of applications in particular that have multiple gigabytes of data associated with them which really bloats up the the bundles on the unzipped package size. And so I would expect that on a server that's actually you know if you're concerned about disk usage and you want to police this a little more tightly I suspect that you'd be able to drive this down even for a server that has lots of applications. But you can see here that for 2,000 bits of content we have 85 gigabytes of disk usage and you can kind of see kind of the ratios here of where that disk usage is coming from.

So in terms of managing the disk usage and kind of putting a capacity around what you want to store there are a couple of knobs that you can tune here. So the first one is around bundle retention and this allows you to throttle the number of bundles that are going to be retained for each application. And so as you saw as I as I previewed here a moment ago in RStudio Connect 1.4.4.1 we now have the notion of being able to roll back an application to a previous version or roll forward to a more recent version. And that is all predicated on the idea that that your old bundles are still alive on disk and that you haven't reaped them. And so by default we actually retain all bundles forever and that might not be what you actually want. You may say that having you know five copies of an application is plenty and any version older than five versions ago you can go ahead and delete. And if that's the case then you can just customize the setting here to set that to a different value other than zero.

Really here though you may find more useful than actually configuring the setting just the idea of introducing a policy which basically just says ask your users not to publish very large sets of data alongside their applications. And so this is really where large bundles come from is if you have you know a four gigabyte CSV file that you're wanting to analyze in some R Markdown document the naive thing to do would be to bundle that four gigabyte CSV file alongside your R Markdown code publish that all as a single bundle which now means that every time that you update your R Markdown code you're republishing those four gigabytes of data which is going to consume a lot more disk space than you really need because your data presumably is not being versioned alongside the code that's actually analyzing that data. And so as a best practice you should really probably provision your data separately on the server and then just publish only the source code that's going to analyze that data.

Next and I won't belabor this one because it's a usually a pretty small consumption of disk space but we do have throttles around how many jobs we're going to retain or how many process the process information that we're going to retain. So by default we say we're going to keep up to a hundred processes associated with each application and as soon as we have more than a hundred processes the oldest ones are going to start getting deleted and likewise any process that's older than 30 days we're going to start deleting off the disk. So as soon as either of these constraints is violated the job is going to be deleted from disk we won't have the process logs associated with that anymore. If you're in an environment that's highly audited and you need to make sure that you have you know the logs associated internally you can customize these settings to keep older copies of jobs or more copies of jobs for each application.

So kind of a larger picture than a few things that you should consider from an IT perspective. So first of all you can manage the server.data.dir setting which by default is going to be var lib RStudio Connect. That's where we're going to store our variable size data associated with Connect. If you have an NFS share or something larger where you know you want kind of the bulky data stored for Connect you can customize that setting and we'll put all the data there. So in reality if you can keep the bundle sizes small the disk usage should be probably under 100 gigabytes for a kind of medium traffic server. If you're expecting that you're dealing with large data sets and people are going to be publishing large bundles you might want more than that. But obviously if you have the luxury of being an environment where you can have kind of a scalable disk volume that would be the most ideal.

But you're going to get the best performance out of a fast disk so something like SSD is usually appropriate for the kind of workloads that we're using here. An NFS should be fine although you should consider performance make sure that it's a performant network share. And do be aware that the SQLite database which can be pulled out separately from from this data directory must be on local disk. That cannot be on NFS. So that's everything other than the SQLite database can be shared in NFS.

If you well sorry when you want to run backups for RStudio Connect you know you can certainly use a snapshotting system if you have you know that that luxury on your on your disk volumes. If you don't have snapshotting then and you actually want to run a manual backup then here are a couple directories that you should consider including in your backups. So first of all obviously the var lib RStudio Connect directory which is going to have all the all the data that we referenced earlier on bundles and applications and process logs etc. And you should also consider including etc RStudio Connect. This is going to include your config file and then some other peripheral files that might be used to manage the server. And then of course if you pulled your database out because of you know NFS concerns or anything like that then be sure that you backup your database as well. In order to do this you do need to bring the server offline run the backup in order to get a consistent backup and then you can restart the server there.

Multiple R versions and process configuration

If you were unaware we do support multiple versions of R. By default we're going to at startup we're going to scan the most common locations for R and also look at the path to try to find the versions of R that you have provisioned on the server. If they're not in the most common locations or if you have R and kind of a custom directory you can register those explicit R locations wherever they are in your server. And all of this is documented in the Admin Guide at the link that I referenced below. And now what's going to happen then is every time that a user publishes you're going to we're going to do our best to align the version of R that they're using to the version of R that to the versions of R that you have available on the server. And there are a few different alignment algorithms that we have available all of which are again are described in this link here that I mentioned there.

Another feature that you should be aware of is the notion of a process supervisor. And so this is basically a prefix that will provide entity R process invocation. This is defined in applications.supervisor. And for instance we provide the example here of running nice which is a Linux command if you're unfamiliar that allows you to kind of set the process priority for a given application. And so this is a prefix that we're going to run before spawning R. And so in this case we're going to say give the nice priority level 2 and then invoke R. If for instance you have a shell script that needs to be run before running R in order to provision certain resources on the server you could use this to kind of define a custom shell script that's going to run before running R. And there are some caveats around how you should invoke this and how it kind of needs to behave but all of this is documented at the link that I provide here.

Secondly there's the notion of customizer run as user. So RStudio Connect has a certain user that we're going to invoke R as. So the Connect server itself is going to run as root but we don't spawn R as root. So we're going to spawn R as the user RStudio Connect by default which is a user that gets created when you run the installer. And the primary group for that user is the RStudio Connect group. And all of this is described in more detail at this link. But you can actually go in and start customizing that. And so if you define globally a new application start run as user you can say I don't want to run R as the RStudio Connect user by default I want to run it as someone else. And that's totally fine and you can change that globally. You can also change that per application. And so this is configured in the dashboard and you do have to have admin privileges in order to change the setting.

So only an admin can modify the setting but basically this allows you to go in inside the dashboard to a particular application and say rather than running as the default user which again by default is RStudio Connect rather than running as that user I want to run as a different user on the server. And so for this particular process you can have it execute as a different user on the server. And that allows you to make special system resources that may be available only to particular users available to certain applications. And so perhaps globally you want all your processes to run as RStudio Connect user but there may be certain applications that require special access to disk resources. If you're following that pattern of provisioning data on the server before you know separate from the bundle perhaps you want to actually customize and segment off certain data sets and say only certain applications should have access to them. So you can do this using the standard Linux file permissions and then you can just have the R applications that should have access to those data sets run as a certain user who would have access to them on the server.

And do be aware that any user that you specify does have to be a member of the primary group of the default run as user which again is the RStudio Connect group. So you can't just go in and specify any Unix user on the system and have them start invoking R. You would actually need to make sure that those users are members of the RStudio Connect group before they would be eligible or they would be candidates to execute R through RStudio Connect.

You can actually take this one step further by enabling another setting which is applications.runAsCurrentUser and now this does require PAM authentication and it defaults to false which is disabled. But if you go in and you disable the setting and you're using PAM then what that actually means is that you could have a Shiny application that rather than running as a hard-coded user or even a custom hard-coded user this the Shiny application will actually run as the user who is viewing the application. And so again if you know this does require PAM because that means that the user you know the user has already logged into the system we know that their username is a valid Linux account on the server and so we can actually go out now and start running the R process as them which means that your Shiny application you know for instance if you're if you're in like a Kerberos environment or something like that where you have particular resources that are available to every user on the system and you want to know that the R processes are the Shiny interactions that they're having are actually based on the R processes running specifically as that user then this is a way to accomplish that and all of this again is documented in the admin guide at this link here that I provide.

We also do support PAM sessions and so if you want to use PAM to kind of provision additional resources before spawning R you can specify a custom PAM service and more documentation is available in the admin guide there.

System security

Alright so then moving on to the last section here around system security and kind of some of the considerations that you should be available that you should be aware of from a security standpoint. So first of all as I suspect most of us know HTTPS is very important and so if you're accepting usernames and passwords through your system then then you should almost certainly be using TLS encryption or error HTTPS and so this is pretty simple to define you just set up the HTTPS section you set up the listen and the key and the certificate settings to define what port and what certificate and what key you want to use for HTTPS. You can also set up the HTTP redirect which will say for instance if you want to listen on port 80 for HTTP requests and forward them on to HTTPS then this is a way that allows you to do that all self-contained within the server and this is something that you should really pretty strongly consider especially if you're accepting usernames and passwords.

Another thing that you can consider here around security implications would be the notion of browser security and so there are a few different options that you can configure all of which are documented in more detail at the link that I provide here but we do support HSTS we have content type sniffing customization you can change the X frame options and you can even define custom HTTP headers that you want included on all requests that are going out of the server and the best way to kind of verify that you've got some of these settings available is SSL labs provides a great tool and if you go to this link here you can actually point this at your server assuming that it's public you can point this at your server and it'll run a suite of tests against first of all your SSL certificate but also it'll kind of confirm that that you have all these settings set appropriately and that your server is appropriately locked down.

I alluded to this earlier but we do support audit logs and so this is something that you can monitor if you're interested in kind of staying up to date with the system changes and the interactions that users are taking on the server you know this is something that you should definitely keep an eye on and if you want to do this in a more automated way or kind of in a more customized way you can certainly consider dumping these to CSV using the user manager tool and then inspecting them using your own tooling.

One other question that we get often is the idea of kind of separating staging from production and this is something that is now supported in Connect now that we support that the idea of exporting a bundle. So the best way to accomplish this now and this is something that we do want to improve in the future but the best way to accomplish this now would be to run two different Connect servers one of which can be kind of your staging environment where people can sort of publish content that they're that they're still working on and then you know once it gets QA'd and once you're actually happy and comfortable with the content you can export that bundle and then republish it to a production environment and that will allow you to kind of separate the staging from the production environment and if you're concerned about resource contention between you know the staging environment and the production environment this is kind of the best way to sort of wall off those two things and to accomplish that.

One other feature that you should be aware of that's kind of nice if you're going to be managing this we do support two different settings that allow you to provide just messaging for users and so the server dot public warning is going to provide a custom HTML warning on the unauthenticated landing page so when people visit Connect and they're not logged in they'll see this public warning and then for logged in users you can define the logged in warning setting and that will show a warning message again on the kind of the landing page once once the user is logged in. This is a great way to communicate if you're you know scheduling maintenance windows or if you have you know information that you want to communicate to all of your users this is a nice way to do that.

Lastly the last topic I want to cover is the notion of private packages and so PackRat is going to gracefully handle all the public packages that you have whether they're on CRAN or Bioconductor or even if they're you know hosted on GitHub or GitLab or anything like that you know these are all going to work pretty well within PackRat without any customization but private packages are a bit more complicated and so when you have a private package the best way to actually facilitate this is to use a private CRAN and so if you're able to set up kind of a private CRAN instance and we have documentation it's actually not not nearly as complicated as it sounds but basically if you're able to set up a private CRAN instance then that that can just host your private packages and then what you can do is you can have your users install those packages from that private CRAN instance and now you know within your firewall you're able to kind of host your private packages and then Connect would be able to pull from those or from that private CRAN instance to be able to pull those packages down and install them just like it does from the public CRAN instances so all of this again is documented here at the link that I provide in the admin guide and it's actually not as intimidating as it sounds but this is actually a really nice way just internally for you to kind of version and manage your own R packages regardless of the fact that it would enable them enable you to access them from the Connect instance as well.

Lastly the last topic I want to cover is the notion of private packages and so PackRat is going to gracefully handle all the public packages that you have whether they're on CRAN or Bioconductor or even if they're you know hosted on GitHub or GitLab or anything like that you know these are all going to work pretty well within PackRat without any customization but private packages are a bit more complicated and so when you have a private package the best way to actually facilitate this is to use a private CRAN.

Additional resources and Q&A

Alright so that is kind of the the tour of the things that I wanted to introduce around RStudio Connect. A few additional resources that I'll point you to here so first of all if you haven't tried out Connect yet you can download and install a 45-day free trial using this link here. The admin guide is available all of the information that I've just covered is described and documented in the admin guide and so the goal of this webinar is really just to kind of highlight the specific points that you might want to consider out of the admin guide but certainly if you're going to be managing a production Connect server it would be well worth your while to spend some time familiarizing yourself with the admin guide here. We do if you're kind of have an IT bent we do have an IT Q&A page that's available for Connect that'll answer some common questions that we get from IT folks. If you're looking to set up authentication the authentication details are available here and lastly we do have our release notes available online as well and so if you're interested in kind of checking out the differences between particular versions or seeing what the kind of you know more detailed changes that take place every time we release a version then all of this would be available at this link here.

And so that is everything that I have and so let me go ahead and just take a moment and pull up some of our questions and see if if we have any questions that might be worth. Okay so the question is does RStudio Connect have the equivalent of an obscured URL anonymous view only access like Google Docs share URL. The use case I have in mind is so that we can have external clients allowing them to view a shiny app hosted on our RS Connect server. So this is not something that we support today. Basically everything that we that we do is around kind of explicit authentication and so if you want users to have access to a particular application then you would have to explicitly add them to your content and then they would you know when they log in they would be able to view that in your content listing and it would be kind of fully enumerated for them. But that is an interesting feature request and that's something that we should definitely keep in mind.

Yeah so one question here that you should be aware of around self-signing SSL certificates and I'll point this out here so back on the HTTPS section it is not a problem for you to self-sign a certificate when you set up HTTPS and the server basically any valid certificate and key pair we would be happy to host for you. One thing that you should be aware of though around custom certificate authorities is the fact that while your browsers may be entrusted within your organization while your browsers may be instructed to trust this custom certificate authority when you're publishing from the RStudio IDE that is using a different set of network connectors than the browser and so we're usually using curl or you know whatever networking systems are available in your server or on the desktop if you're using RStudio desktop. And so you do need to be aware that not only do your browsers need to trust your SSL certificate but also whatever systems you're using from the RSConnect package and IDE to publish content onto the server which is often curl those also need to be instructed to trust your custom certificates and so if you have any custom SSL certificates or custom CA or an internal CA you do need to be aware that all of your RStudio clients would also need to be instructed to trust that custom CA.

Can you provide an example of a functional run setup? Yeah so we definitely if you look into our documentation we definitely have some some information some more details on that and you would see certainly you know the most trivial application would be just to customize a hard-coded user here and you can just define that you know whatever user in whatever group that you want you can just define here in this custom setting that should be pretty straightforward. Kind of these more advanced applications around you know run as current user really just depend on your environment and so I guess the reason that we don't have kind of more details or kind of an example setup for this is that it really just depends on what your environment is how you have you know Kerberos and PAM configured and so a lot of that is just going to depend on kind of your environment but definitely even if you're doing a trial even if you haven't purchased the product yet I'd encourage you to contact support at RStudio.com if you're you know encountering any troubles here or if you want kind of help in reviewing the architecture or working with your IT folks to kind of provision an environment that's going to be successful for your Connect instance we'd be happy to work with you on that.

But other than that I think that is about everything that I was hoping to cover today so I think we are about good to wind up.