Resources

Karl Feinauer | Using Jupyter with RStudio Server Pro | RStudio (2020)

This talk is for R admins who want to learn how to set up Jupyter notebooks on RStudio Server Pro. We'll cover prerequisites, basic configuration, best practices for management, Jupyter Lab, and more

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, I'm Carl, I work on the IDE team here at RStudio, and today I'm going to be talking about how to use Jupyter with RStudio Server Pro. So I'm going to talk about, you know, why would you want to do this, how do you set it up, how do you administer it, and I'll show you a brief little demo of it.

Why combine R and Python

So to start, some of you may be wondering, why are we doing this? So you know, R isn't going anywhere, we very strongly believe in R, but as a company we're very much invested in making the R ecosystem as great as it can be, and at the same time we're committed to supporting the open source data science ecosystem as a whole, regardless of language choice. So you know, that said, R and Python are not rivals, they're both great tools for data science and we believe strongly in both of them, and you know, R's better for some things than others, and Python's better for other things.

So you know, combining them will provide us the best of both worlds and give everyone what they need to do their data science effectively. So you know, I barely scratch the surface on, you know, our aim as a company to, you know, kind of combine Python and R and, you know, provide you the best tools that you need to do your job, but on our website we have an R and Python love story that I highly recommend you to read, it's very good and goes into more depth about, you know, where we're going as a company. You can read that on rstudio.com.

Why use Jupyter with RStudio Server Pro

So why use Jupyter specifically? Jupyter is well known in the Python community, and it provides a lot of tooling for working with various versions of Python seamlessly. It also provides Jupyter notebooks, which many of you may know are great ways of organizing and visualizing data, similar to our markdown documents, but currently provide better Python integration.

So why would you want to use Jupyter from RStudio Server Pro? So doing so allows you to provide a one-stop portal for all of your data science needs. So there's no need to manage multiple portals and frameworks for your users to log in and just to do their jobs. You also get all the auth and authorization benefits that you get from RStudio sessions with Jupyter now, and that's a huge boon for administrators.

You also get all the auth and authorization benefits that you get from RStudio sessions with Jupyter now, and that's a huge boon for administrators.

It's also a better user experience. So right now if you want your data scientists to use Jupyter, you have to have them connect to it via a terminal, start it through the terminal, and, you know, that's just not a great user experience. So, you know, integrating it in with RStudio Server Pro allows them to, you know, launch it just like an R session and get access to Jupyter at the touch of their fingertips. And finally, if you integrate Jupyter with RStudio Server Pro, it becomes very easy to publish Jupyter notebooks to connect, which gives you a great sharing ecosystem that we've built with R but also with Python.

Architecture overview

So I just want to quickly very basic talk about architecture, just so you can kind of understand how this all works. So, you know, today we have RStudio Server Pro and now it talks to the launcher, which Jonathan talked about in the last talk. I won't talk about the launcher in depth here, but basically it allows you to launch your R sessions and your Python Jupyter sessions potentially in the cloud via Kubernetes or Slurm or also locally as well on the local server. But the key takeaway here is that Jupyter sessions require the use of the launcher, which is a new feature in 1.2, and R and Jupyter sessions are peers of each other, so they're treated very much the same way.

Configuration and setup

So to talk about how to use it, so again, you need the job launcher, so I won't go into detail there. There's plenty of documentation online that you can find about that. But once you have that enabled, all you really have to do to turn on Jupyter sessions is to flip some switches and set a binary path. It's very simple. So a sample configuration file just to enable Jupyter is this. You just set the binary path for Jupyter and then turn on labs, Jupyter labs for the Jupyter labs interface or notebooks for the Jupyter notebooks interface, and you can have both of those at the same time.

Demo

I'm going to attempt to show you a demo here, if I can figure out where my desktop is.

Okay, so this is the RStudio Server Pro homepage, and some of you may be familiar with this, but right now, when we launch a session, what's new is we can choose our editor. So before, every time you would start a new session, you only had one option, which was to launch RStudio. But now, you have the option to start JupyterLab or JupyterNotebook.

So just real quick to show you what that might look like, if I start a JupyterLab session, it shows up on the homepage, and then I'm loaded into JupyterLab very seamlessly. And then you can interact with Jupyter just like you would regularly connecting to Jupyter. You have access to multiple documents at the same time, and that sort of thing. So nothing revolutionary, but it is great that you can access that through the homepage of RStudio Server Pro.

Now I'll just quickly show you JupyterNotebook. I want to show you how easy it is to publish a JupyterNotebook to Connect. So if I started a new JupyterNotebook session, and then let's create a new Python notebook. So I'll just print some Python code here. And then you'll see that in the JupyterNotebook, it's a little hard to see, let's make it bigger. There's this button here that allows us to publish to Connect. So if we click that, it becomes very easy to publish it. All I have to do is click Publish, and then it shows up on RStudio Connect. So this is the new notebook that I just created, and then there it is. Very simple.

Best practices for administration

So I just want to quickly talk about best practices for administering Jupyter with RStudio Server Pro. I recommend that you use the default configuration wherever possible, because we've put a lot of work into making the configuration defaults just work out of the box with very sane defaults that you shouldn't need to be tweaking. But two settings that you may actually want to tweak are related to session suspension and automatically cleaning up sessions that were forgotten about.

And then when you're mounting your NFS directory, you need to mount NFS directories so that users have access to their notebooks. And if you're doing that with NFSv3, we recommend using local lock equals all as a mount option or just using NFSv4. Otherwise you could run into some problems there. And then if you're going to be using this with Kubernetes, we recommend that you use the R session complete RStudio Docker image. So this is basically a Docker image that we published to Docker Hub that has all the RStudio binaries that you need to launch Jupyter, and then it has Jupyter itself and an installation of Python, and then you can create derivative images from that to provide your users with whatever you need in terms of Python versions.

So there are two plugins that we provide for Jupyter to kind of help with the user experience. These are totally optional, but I recommend that you have both of these installed. The first plugin is the RSConnect Jupyter plugin, which is what actually allowed us to publish the notebook to Jupyter. And then the other one is the RSP plugin, which shows notebooks on the homepage. So just to remind you what that looks like, at the top you see the connect publishing button, and then at the bottom on the homepage you'll see your recently used notebooks on the homepage as well.

Troubleshooting and next steps

So in some rare cases, you know, Jupyter won't start right up out of the box. We've tried to make it as easy as possible, but there may come a time where, you know, you see that your Jupyter sessions won't start for whatever reason. So if you need help troubleshooting those issues, the first thing I would recommend to do is run the verify installation command, and this is documented in the RStudio Server Pro admin guide. It's very easy to use. You just run it, and it gives you information about why your job couldn't start. So hopefully you can use that information to fix whatever's going wrong. And if you need further help, just reach out to support.

As far as next steps go, if you're interested in integrating Jupyter with the RStudio Server Pro, please go to the website on screen, solutions.rstudio.com, and we have a great primer to help you set this up. And that's it. Thank you.

Q&A

Thanks, Carl. We have a number of different questions that we can go ahead and work through. One of the questions was, are all Jupyter kernels supported, Python, R, Julia, et cetera? Are those available? We've just focused on the Python kernel for now. The one thing that I will mention is it's just Jupyter, so anything you can do in Jupyter, you can do here.

How does Jupyter handle ODBC connections to databases? Is there anything specific to RStudio Server Pro done there? There's nothing specific done for ODBC. It should just be a complete pass-through for whatever you have set up with Jupyter.

Why not enable Jupyter by default on RStudio 1.3? Because to enable it, you have to actually have it set up and configured and installed, so basically we don't want to forcefully turn that on if it hasn't been set up.

Are there any tools for admins to be able to kill and manage Jupyter sessions? It's really easy to kill Jupyter sessions from the home page, like regular R sessions, so users can self-administer that sort of thing, and admins can see Jupyter sessions from the RStudio Server Pro admin page, so if they need to kill sessions, they can do that very easily.

Will the Jupyter integration be available in the RStudio desktop IDE in the future? I don't know. Maybe, perhaps.