Darby Hadley | RStudio Job Launcher Changing where we run R stuff

Transcript#

This transcript was generated automatically and may contain errors.

I'm going to be talking about the RStudio Pro Launcher and how this can change or help change where we run R stuff.

What would have taken forever now takes about 15 minutes with the 25 cores.

This might seem like a silly example, but the ability to launch R jobs on a Kubernetes cluster opens up some neat possibilities. You can see how if you have an intensive job, you can utilize another processing environment to speed up that analysis.

How the launcher works

Now that you've seen one of the things a launcher can do, let's talk about what is actually going on. We've added in two additional layers into the system. The RStudio launcher is a separate service that is independent of RStudio server and it communicates with a plug-in to connect to these different environments.

So what am I talking about with these plug-ins? So the core of the job launcher is simply a framework for loading external plug-ins which contain the actual logic for communicating with the job's destination. So how do these things work together? Once your plug-in is configured and the RStudio launcher process starts, it executes your plug-in process as a trial of itself and communicates with it via standard in and standard out.

So what kind of plug-ins are available? So for the 1.2 release of RStudio, we are releasing three plug-ins, local, Kubernetes, and Slurm. And side note on Slurm plug-in, it's still in development and it's not really ready to be played with yet but it's going to be coming soon.

So what about other plug-ins? We're still trying to figure out what comes next but we're planning on adding more plug-ins in later RStudio releases. But what if you have a super awesome processing platform that you use in your own company? Well good news is that you can create your own plug-ins. And here's the documentation to do so. And that should have everything you need to get up and running with whatever environment you want to connect to.

Types of jobs: ad hoc and containerized sessions

So there are two types of jobs you can launch from RStudio. The first is ad hoc jobs and that's what we were talking about in the first example. You can use the jobs pane in the IDE to launch specific scripts to run wherever the plug-in is configured for. You might see that I snuck a Python job up there. The job launcher can also run Python jobs directly from RStudio.

The second is containerized sessions. So in this example I developed my code with an R session on the RStudio server that lived on that box. This is where I wrote and tested my script and it's a session that's tied to the console in the IDE. Then I launched the script as an ad hoc job to the Kubernetes cluster.

But with the launcher I can also develop and test directly on the cluster with a containerized session. I can write and debug the code interactively with all the capabilities your code will have when it runs. That means you can completely separate these R sessions from the RStudio server if you would like. And this is integrated within the RStudio UI.

So typically you have a single server and everyone is forced to use the same version or versions of R and Linux dependencies and an admin has to maintain that box for those users if they need updates or new software. But with the launcher and containerized session those dependencies don't even have to be installed on the server. They live in the Docker image itself. So various users can be using different images with different versions of R, specific R packages installed by default. And this allows you to create those truly reproducible environments. Also these images can be shared and distributed on Docker Hub or any other container registry.

So on the same RStudio server, user 1 can be running a session with a Docker image that contains a specific version of R, get specific Linux packages and user 2 could be running a completely different version of R and different R packages installed in shared libraries by default. Or one user could be running a bunch of different types of images for different sessions for various types of work. And these dependencies are now self-contained and isolated from other containers being run so a user can completely screw up everything in the container and it won't affect the others.

And these dependencies are now self-contained and isolated from other containers being run so a user can completely screw up everything in the container and it won't affect the others.

Because of this containerized session, one huge benefit is scaling. So currently with RStudio Server Pro, if you'd like to scale, you have to set up multiple RSP nodes with all the required dependencies and load balance between them. If you get more users and sessions, you have to provision more nodes and reconfigure the group. Things like autoscaling becomes a much harder problem to solve if that's what you want to do.

But using the launcher, you can let a container orchestration platform handle that scaling for you. As you can see here, we have RSP running on one server with the launcher and the launcher communicates with the Kubernetes cluster to create these containerized sessions there. Kubernetes has all that functionality to do the scaling for you. So this allows you to just have that one RSP server. You could have multiple for resiliency, but all the memory and processing load is kicked to the cluster and not on the RSP server itself.

Live demo

Now I'd like to demonstrate launching the two types of jobs from with RStudio Server. This includes ad hoc jobs and the containerized sessions.

So here's RSP running the latest 1.2 release. New to 1.2 is this jobs pane, as some people have already talked about. And here I have a script that's just a test script that prints a message, it sleeps for a little bit and then prints the R version string.

So in the jobs pane, I have the ability to start local jobs, but on RSP and if I have the launcher set up, I can also start launcher jobs. So first I'm going to start a local job with this test script, just to kind of show you what this jobs pane works or how it works. If you didn't see, Jonathan already talked about it.

And so as you can see, I can watch the output as it comes in. It shows a progress bar and I can see a list of all my other running jobs.

And then while that's running, I have another script, and this is my top Twitter.rscript, which is the one I use to get the top 100 Twitter users, and I'm going to start this as a launcher job.

So as you can see, I've got a little bit different options when I'm going to start a launcher job. I have the environment here that I can choose my R script or working directory, just like I can on local jobs, but then I can also name it, so I can name this RStudio. I can choose the cluster that I want this job, where this job to go to. Right now this RStudio server pro has the local plug-in and Kubernetes plug-in, and so I'm going to choose the Kubernetes plug-in. I can choose the amount of CPU and memory. You can see that it was defaulted to certain numbers, and I also have a maximum, and this can all be configured by an RStudio admin in the profiles.

So for this script, I'm going to run with 10 CPUs, I'm going to leave it at 10 gigs, and then this is where I can choose my image. So I have a couple images already set up on Google Cloud Registry, and I also have the ability to choose other and choose whatever URL I want for whatever image I have. So you can choose a Docker Hub image or whatnot. So I'm going to go with this default that I have set up, that I have dependencies for the script, and I'm going to hit start.

So as you can see, it's now running, it has this little icon showing that it's a launcher job instead of a local job. I can click into it to see the output, and then it will just continue to run.

So I'm going to go back to this test script, and I'm going to change this so it doesn't take so long, and then I'm going to run this test script as a launcher job. So just to show off that you're running in a different environment, this current environment that I'm in is running R version 3.5.1, and then I'm going to start this launcher job in a different image.

And I'm going to run in a different image. So some of you might be familiar with the Docker Hub image R base, and that will be running the latest version of R, which is R 3.5.2. So I'll see what the output is here.

Start the job. Look at the output. And as you can see, in my current running environment, I'm running 3.5.1, and this image is 3.5.2. So you can see how you can run different images with whatever dependencies you have for whatever scripts you want to run.

So now I want to show off containerized sessions. So Tarif showed off this a little bit in the keynote. But if you go to the RStudio Server Pro homepage, if you're not familiar with this, it shows your currently active sessions, you're currently running jobs, your completed jobs, and I'm going to go here to new to create a new session, and from here, I have the same type of options as I had before. So I'm going to name this. I'm going to choose the same cluster. I'm going to use the default two CPUs, 10 gigs of memory, and the same default image.

So now this currently running RSP session is running in Kubernetes and not on the RSP box. So it looks exactly the same, runs exactly the same, but now any CPU and memory you use is actually on that cluster.

Frequently asked questions

So I have a couple frequently asked questions here. How do I get the launcher and plugins? So for the 1.2 release, the launcher will come bundled with RStudio Server Pro and include those three plugins that we talked about. The release is available as a preview currently if you want to download it and try it out.

So will this come to the RStudio desktop? And the answer is yes, it will. But for the 1.3 release. So we're going to implement the launcher after this release and then we'll start working on 1.3. If you're unfamiliar with what RStudio Desktop Pro is, because it's only going to be for the Pro edition, it's an enhanced version of the IDE with enterprise feature that we're shipping with the 1.2 release.

What about other RStudio products? So the launcher is something we eventually want our products to standardize on and so RStudio Connect and Package Manager will be integrating with the launcher in future releases.

So to sum everything up, RStudio Job Launcher is a separate service bundled with RStudio Server Pro. It's available now as a preview release. The three plugins are Kubernetes, Slurm, and Local. And you can do some cool stuff with it, like launch ad hoc jobs so you can run those computationally expensive R Python scripts somewhere else. You can run these containerized sessions, so you can run an isolated environment with all the desired dependencies, create reproducible environments, and this will help you scale. So it should make admins happy.

So I have some further reading here. Here is the documentation for our admin guide of RStudio Server Pro and then also the Job Launcher documentation. And if you have any additional questions you want to talk to us, please meet us up in the professional lounge. Thank you very much.

Q&A

So we actually have time for a couple of questions, if anyone has some.

I have a question about an admin's access. Can you set the Kubernetes clusters that the users have access to individually? Because some users may need to be in one region, cluster in one region, some users in another, things like that. Yes. You can use the RStudio profiles to set whatever clusters they have available to them, as well as how much CPU and memory they can utilize, and so you can set those restrictions.

I'm gonna ask, what happens to your jobs if RStudio's server crashes? They'll actually continue to run, and so if RStudio is able to come back, you should be able to reconnect to them.

Do you have a command line of the launcher, or can we call inside R? A command line level of the launcher. So, we're actually gonna be working on an implementation with our RStudio ID package, and so we'll be working on that you can launch jobs through that package, that R package.

Darby Hadley | RStudio Job Launcher Changing where we run R stuff | RStudio (2019)

Transcript#

What is the launcher?

A motivating example: Twitter portraits

Scaling up with Kubernetes

How the launcher works

Types of jobs: ad hoc and containerized sessions

Live demo

Frequently asked questions

Q&A

Featured software#

rstudio