Introducing Notebooks | RStudio Webinar - 2016

Transcript#

This transcript was generated automatically and may contain errors.

Like Bill said, my name is Jonathan. I've been at RStudio for about the last three years or so and I'm really excited to show you this because we've been working on it for quite some time now and we are only just now starting to show it to people. One thing you should know before we even spend a lot of time discussing it is that if you want to use R Notebooks and you open up whatever version of RStudio you have installed, the chances are that you'll be disappointed because the current stable version of RStudio does not have Notebooks.

This is a feature that we would consider to be complete, but it is still in development. So if you want to play with Notebooks, you can do it today, but you'll need the preview version of RStudio to do it. So everything that I'm about to show you is available today but only in the RStudio preview.

So let me give you an idea of what we'll be talking about today. We'll be talking about some background concerning Notebooks and data analysis workflows. I'll talk just briefly about Notebooks themselves. We will spend almost all of our time doing a hands-on demo. This is a very interactive feature and we will devote most of our time to showing you exactly how it works. And finally, we will have some time at the end like Bill mentioned for questions and answers.

So, the idea here is that it should be very easy for you to take one little bit of code, execute it, and see exactly what that one line of code did.

So, we've talked about what it's like to run – oh, one more thing. This is sort of a small thing, but something that people have asked us for for a while, and we finally took the time to do. You'll notice here that I didn't actually have to select this whole thing and run it, and that is because we now actually detect where your statement begins and ends. When you hit Command-Enter on the Mac, or Control-Enter if you're using a different operating system, we now actually will run the entire statement instead of just one line of the statement, and this is really to prevent you from getting into an awkward situation where R is waiting for you to finish the statement, but you don't know it's waiting because in this mode, you really typically aren't using the console.

Multi-language support

Alright, so we've talked about the different kinds of output you could have inside of a notebook, we've talked about what it's like to run code inside the notebook and interact with R, and again, you should think of this as really a new way to interact with and have a conversation with the R interpreter. But one thing that you might not be aware of is that R Markdown, and really by extension notebooks, has support not only for R code, but also for code written in other languages. So you're not actually just limited to R chunks, you can have code in Bash, or Python, or C++, or a variety of other languages.

Here for instance is a code chunk that is written in Python, this just prints out the Fibonacci sequence, here I'm asking it to print out the first 11 terms, so there's no R here, it's all Python, but I can run it just like any other chunk, and you can see here it actually fired up the Python interpreter and fed this code into it, and then I can see the result right beneath the chunk.

And this also works, here's another example, this also finds a particular term in the Fibonacci sequence using the RCPP engine. So you can see here I'm making an RCPP function, which actually takes a couple of minutes because it needs to compile, then I can use that function here to compute a particular term in the Fibonacci sequence. So the idea here is that a notebook is not just a way to mingle R and your documentation and produce something you can share, it can actually be used to compose a workflow that uses tool chains and systems from really a variety of different languages. So you can use a notebook to create your data analysis workflow in a very language agnostic way, even though R is sort of the host for your system.

Error management

So we've talked a bit about what you can produce and how to run code and the ability to run code in different engines, let's talk a little bit about error management. So sometimes your code is going to generate errors, it is just going to happen. We have built a bunch of error management features into the notebook as well. So here is some code that generates an error. I specified error equals true so this code can run without stopping the notebook, but if I try to run it, you'll see that I get this error here, and this error basically tells me what happened here.

So you'll notice a couple of things happen when we get an error. One is that we color the gutter to tell you which line generated the error. In this case there's really only one line, but in many cases if you have a long multi-line statement it's very helpful to know which of the lines in your code actually generated the error. So I'll highlight that for you. We'll typically show you a trace back of the error which tells you where the error came from. So if I click show trace back here, you can see it tells me you called source, source called file, and when the file tried to open it couldn't find it.

So we hope you'll find that when you encounter errors, again you're not going to need to dig into the console to figure out what line of code generated the error. That information as well as all the information you need to know where the error came from is going to be available to you right inside the notebook.

Running all chunks and saving notebooks

So we've talked quite a lot about what it's like to run code in the notebook and how to do it. We've talked about linewise execution and about running the whole chunk at once. There are also a variety of tools which I'm not going to spend a lot of time on, but you're not typically going to want to just run individual chunks or lines. A lot of times you'll want to bring your whole notebook into a consistent state, and we have built a number of tools for this.

So it is very easy to run just the current chunk or the next one. These commands are very helpful. They will allow you to bring your notebook into a consistent state by running all the chunks that are above or beneath your particular chunk, and when you do this, let me show you how this works. So I'll say run all chunks above and you can see now I'm actually getting a little progress bar at the bottom of the notebook here that shows me what's being run, and then you'll notice here that I am brought right here to where the error is when it's finished.

So there are lots of tools for running your notebook, a lot like you were knitting an R Markdown document, and I'll talk about the difference between running code in a notebook and knitting in just a minute. So I'm going to switch now to RStudio and talk a little bit about how to save and share these things.

So here is RStudio. I'm going to bring my notebook back into RStudio here. You'll notice that when you bring the notebook into RStudio and open it, it will typically maximize the pain so that the console gets out of your way because most of the time you don't need it. It's still there if you do, so you can just open it up here and you can see that when you execute some code inside the notebook, it is actually just sending it directly to the console and you're seeing the result here, but most of the time this is going to be redundant, so you're not going to need to look at it.

So I promised that we were going to look a little bit at saving and sharing, so let's do that. So if you are familiar with R Markdown, you probably know that when you need to update your document after making a change to your code, you actually need to completely re-render the document. So there's this command at the top called knit, which you can see, like here I have this other document which actually contains my presentation. In order to update my document, I need to knit, which means to re-render the whole thing, re-run all the code chunks and so forth.

In a notebook however, it is not necessary for me to re-run all the code chunks to see the document because if you think about it, I have actually already run these chunks and I already have the output available. So in a notebook, we can actually generate the HTML result without re-running any of your code. All we need to do is render the document part and then put the output that you've already generated into it.

So let me show you what that looks like. So I'll hit preview here, and you'll see that here, let me maximize this to make it a little bit easier to see, so you can see here we have exactly my notebook that I've been working on, and you can see all of my code, there's my sequence, there's my plot, there's my interactive widget, there are all of my random numbers that I really enjoy generating. So all of the output and input that I encountered in my session is right here inside this file, and again, generating this thing is not like generating an R Markdown document where I've got to re-render the whole thing. Whenever I save the notebook, this file is updated.

So for instance, let's say now I want to generate, I don't know, maybe 200 numbers. Now I'll save this, and you can see that it's going to update here with a sequence of 200 numbers right on the side. So every time I update this thing, it's going to update the preview here on the right. So you can see these things are much easier to iterate on than a traditional R Markdown document because the whole step of re-rendering things is gone. It's a much more immediate system where all you have to do is save it, and it will instantly update with the file.

Now it's important to note that this file that I'm viewing right now, this HTML file, it's not only generated when I have it open here. This is actually generated every time you save any notebook. Let me say that one time again because it's important to understand when we later talk about how these things work. Whenever you save any notebook file, we will generate this HTML file for you. This HTML file is basically the way that you can save and share the notebook content.

In some other notebook formats, there's really no separation between your code and your output, like it's all in one file. In our notebooks, this is not the case. You have a very clean R Markdown document that contains just your code, but you also have this file. We call it the nb.html file because we're not very creative and couldn't come up with a cooler name. Anyway, we have this file which contains not only your code, but also a beautiful rendered copy of your document as well as the output. This thing contains your code and your output, and it even contains the source code for the notebook.

If you were to send this file to somebody, here's what they would see. You can see that this is just a nice rendered copy of my notebook with the output that I've created, and if somebody wants to work on this notebook, they can actually download the original RMD file right here. The RMD file is embedded inside the HTML. One wonderful thing about this is that this file that you share with somebody is not in some sort of proprietary format that they're going to need a viewer to open. You can send somebody one of these notebooks, or one of your rendered notebooks, and it's just an HTML file. You can open it in a web browser as I have done right here.

One wonderful thing about this is that this file that you share with somebody is not in some sort of proprietary format that they're going to need a viewer to open. You can send somebody one of these notebooks, or one of your rendered notebooks, and it's just an HTML file.

It is very, very easy to use, and if they want to sort of continue your analysis, it's very easy for them to download the RMD and do that. There's one other really interesting trick that you can do here, for which I will switch back to RStudio.

I just showed you that you can take that HTML file, open it in a web browser, and use that to publish your notebooks. Really, any web host, it's just plain HTML. You can also use this to use this HTML file to share with somebody, not only the code, but also the output. So, to demonstrate that, I am going to do something a little bit risky in this demo, which is actually, I'm going to close this RMD file, and I'm actually going to delete the RMD file. So, here's that demonstration RMD file that I've been working with. I'm just going to delete that file. Alright, it's gone. So now, I no longer have my RMD file. All I have is that notebook HTML file.

So, imagine, for instance, that somebody sent me this HTML file. RStudio, if you open this HTML file in a web browser, you will see what I just showed you, which is a rendered copy of the notebook. However, if you open the HTML file in RStudio, it will actually automatically extract the RMD file, so that I can continue working on your notebook as though it were my own. Let me show you how that works. So, I'm going to click open in editor here, and you can see that it opened up the RMD file, and the RMD file just got created here. So, RStudio automatically took the HTML file, extracted that embedded copy of the notebook's code, and then produced for me this editing experience for the notebook that I just had. So, this HTML file is sort of a magical jack of all trades. You can view it in a web browser, you can open it in RStudio and continue to edit and work on the notebook. It is a great way to bundle up your notebooks and share them with anybody.