Resources

Daniel Petzold || RStudio Team: Building and Sharing Jupyter Notebooks || RStudio

Learn more about RStudio Team here. https://www.rstudio.com/products/team/ Find the code for this example here. https://github.com/danielpetzold/space-tracker Read our blog post here. https://www.rstudio.com/blog/build-and-share-jupyter-notebooks-on-rstudio-team/ Timecodes 0:00 - Intro 0:07 - Build Jupyter Notebooks to analyze and visualize data 2:47 - Publish directly from RStudio Workbench to your content hub 5:13 - Share With Your Stakeholders on RStudio Connect Jupyter Notebooks are interactive documents for code, outputs, and text. However, they’re often stuck in data scientists’ local computing environments. Collaborating can be difficult and sharing can be tedious. To live up to their fullest potential, data science teams need a way to scale their development securely and efficiently — while providing stakeholders easy access to their output and visualizations. RStudio Team, made up of RStudio Workbench, RStudio Connect, and RStudio Package Manager, brings everything together to help data scientists create, reproduce, and share insights from their Jupyter Notebooks. Let’s dive into a real-life example by exploring data from NASA’s Center for Near-Earth Objects (NEOs). Daniel Petzold walks us through his data analysis and reporting. Want to explore the report yourself? Check out the published report on RStudio Connect here. https://colorado.rstudio.com/rsc/space-tracker/space_tracker.html On RStudio Workbench, you have a choice of editors: the RStudio IDE, JupyterLab, Jupyter Notebook, or VS Code. Choose your preference. From here, you can explore your dataset, embed HTML directly in your document, create visualizations, and more. Once you've run your analyses and created insightful visualizations, you want to be able to share them with your team. RStudio Workbench allows you to publish to RStudio Connect, the content platform from RStudio. You have multiple options: push-button deployment from Jupyter Notebook or using terminal commands from JupyterLab. It’s not enough to publish your work. Once on RStudio Connect, you can share with end-users. Make your analysis accessible to specific users or more generally with different authentication measures. In addition, you can schedule the document to run at a certain time and send out an email with refreshed data. Click the links below to learn more about these offerings. RStudio Workbench: https://www.rstudio.com/products/workbench/ RStudio Connect: https://www.rstudio.com/products/connect/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

In this video, let's look at RStudio's ProToolSet and how you can use RStudio Workbench to create new JupyterLab sessions. You also have the choice of using Jupyter Notebook or VS Code with Python, but in this case let's start there. We'll then publish to RStudio Connect so stakeholders can see and view and easily understand that data.

So let's open this new JupyterLab session and the beautiful thing about RStudio Workbench is that you can authenticate users, you can have the resources that are needed on this platform and share the various resources between your collaborators.

Building the Jupyter notebook

So I have this example that I've already put together and so in this case we have all this data from the Center for Near-Earth Objects or asteroids and comets from NASA. And so I've got a lot of columns here, way more than I want to work with, about 40 or so, but what I want to do is just get to about 15 of those columns and understand the data a bit better.

So I've put together this JupyterLab notebook and in this case you can see that we've pulled in some Python visualization libraries that we're going to use like Pandas, Seaborn, Plotly, Express, and we can use those to take a closer look at the data. So here we have a column list that we've just pulled in that specific data that we would like from the CSV file. Using Pandas we're reading that CSV. And then we have our column list above which we're pulling in here.

So to make it a little easier to understand the data I've renamed my columns. You can, you know, of course just use the columns that come existing, but in this case we're displaying the first five rows to get a quick view of that. And I've also broken my data down by asteroids and comets based on the orbit class which helps you do that. So you've broken that data up a bit with Pandas.

Now using Seaborn you could start to do things like visualize that data. There's some things that we're doing to observe the data here. In this case, you know, we're using Plotly to have a more interactive visualization. We can see the majority of these comets or actually, yeah, comets in particular have been discovered in the last two decades.

And then going down, there's other ways we can look at the data. But the point here is that we've done a few things to plot our data and then we can, you know, describe that data sum and maybe even have some things that we embed in this document. Like in this case, we've got the ability to embed, you know, HTML and more visualize some things from NASA and so on. But you get the basic point. You've got a data glossary here.

Publishing to RStudio Connect

So let's take this Jupyter Notebook now and we'll publish it directly to RStudio Connect. So to do that, you'll actually need to open a new session in RStudio Workbench. And so we have the option to have a JupyterLab session or a Jupyter Notebook session to build our notebook. But to publish it, we'll actually go to Jupyter Notebook to do this inside of RStudio Workbench. And once we have this session open, then we'll see that we have another option for publishing to RStudio Connect.

So we'll go to that same notebook. And once this is open, and we'll give it a moment, we'll have the option to publish directly. So now we see we have this option for publishing to RStudio Connect. And the first time that we do this, we'll actually have a dialog that pops up and we'll have the need to bring in our server name, server address.

So once we have that, and we'll also want to go back to RStudio Connect and get an API key. Okay, so once we've logged in RStudio Connect, we see in the top right corner, we've got the user login icon. Let's select that. And then you can select API keys to set up this initial key. So we're just going to set up a new key here. And then once we have that, we'll copy it. We can select that to copy or just select here.

And then we'll close this out and go back to RStudio Workbench. We'll paste in that API key from RStudio Connect. And just give this a name as test server and use the default secure connection settings. We'll add this server. Then when we publish this, we have the option to either publish a document with the source code or publish a finished document only as a static notebook. In this case, we want to publish the document with the source code. And then we'll also select the files that are associated with it. So we're going to actually select the CSV file and then there was an image file for that notebook as well.

Okay, so now we're going to publish using the existing requirements text file. You could also generate one if we needed to, but in this case, we already have one. So we'll publish it. And we'll set this as a new location. And publish.

Sharing with stakeholders on RStudio Connect

So let's take a look at this space tracker notebook that's inside of RStudio Connect. So we have this loading here with our table and our plots. And everything is functional from the HTML standpoint within RStudio Connect. The great thing about this is that now we can share this actual set of data and information with end users in a way that we can control.

The great thing about this is that now we can share this actual set of data and information with end users in a way that we can control.

So we could have so that anyone could log in and view this document. We may have specific users that we want to set up. So in this case, we'll set up a profile publisher to have access. And then we can set, you know, copy this URL and share that with them to use anywhere.

Another thing that you can see here is that we can schedule this document. So in this case, you could set a particular time for that to run. In this case, I have a CSV file. But you could have live data that you've connected to here. And this could rerun that document at a particular time. And even send that off to them in an email and publish it to them. So this is really helpful to control that the information is current.

Now the end user has full access and can easily take a look at this notebook anywhere and be able to see the particular data live and totally up to date.

Wow. That's really helpful. Thanks for watching. And we'll see you next time.