Resources

Garrett Grolemund | Reproducibility in Production | RStudio (2019)

https://rstudio.com/resources/webinars/reproducibility-in-production/ In part 1 of this 3 part series, Garrett covers the following: Computational documents offer limitless opportunities for your business. With them, your consumers can rerun your report with new parameters, apply your analysis to new data, or schedule future, automatic updates to your work—all with the click of a button. This is the first in a three part webinar series that will describe this new form of reproducibility. Here, we begin by showing you how to write executable R Markdown documents for a production environment. About Garrett: Garrett is the author of Hands-On Programming with R and co-author of R for Data Science and R Markdown: The Definitive Guide. He is a Data Scientist at RStudio and holds a Ph.D. in Statistics, but specializes in teaching. He's taught people how to use R at over 50 government agencies, small businesses, and multi-billion dollar global companies; and he's designed RStudio's training materials for R, Shiny, R Markdown and more. Garrett wrote the popular lubridate package for dates and times in R and creates the RStudio cheat sheets

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thank you, everyone, for attending. This will be the first in a series of three webinars that focuses on how to use reproducibility in a business setting. And the real gist of what we're doing here is, over the past few years, as you may know, academics have been very concerned with the problem of reproducibility in data science research. Here's just one of many headlines that the public has read about reproducibility. This one comes from Forbes. But there's been a quote-unquote reproducibility crisis in data science research. And there's been a lot of progress made to address that crisis. And from a technological standpoint, the problem's been solved, in my opinion. But there are many cultural changes that still need to be made. The cultural changes don't concern us so much as the technological solution.

The new technology that data scientists are using to make their work reproducible has created unintentional benefits for people who use data science in a business setting. And that's what these webinars will look at. The first webinar, which you're at today, is reproducibility in production. And I'm going to talk about the technological solutions to the reproducibility crisis, specifically one type of technology that I'm going to call computational documents, which are just documents with executable code inside of them. I'll show you how you can use those to create opportunities both for yourself and the people who consume your data science intelligence, whether that's customers, clients, bosses, colleagues, so on.

In the second webinar on September 18th, Thomas Mock will talk about RStudio Connect in production. RStudio Connect is a production platform that allows you to share the types of documents that we'll be talking about today. And it really completes the circle in terms of providing opportunities and advantages in an everyday business situation. In the final webinar, Kelly O'Brien will talk about interactivity, which is probably the largest opportunity created by the technology of computational documents. She'll go over the best practices for making interactive material and tell you the things that you should be considering as you develop interactive material for others to consume. That webinar will take place on October 2nd.

What are computational documents?

So for today, let's start by looking at what computational documents are. This is what I mean by computational document. Let me exit the slides for a moment. Let's go to RStudio. This is a computational document. It's just a regular document, a file. You can see text in here. But it has executable code embedded throughout the file. And a computational document is just a document that contains executable code. And when you can put executable code into a document, then you can allow the reader of the document to run the code. And the code can do things for them. As someone who knows how to write code, you could write whatever code you like to do whatever you want the reader to be able to do with your document.

This document contains a data analysis, a very simple one, as a matter of fact. And the code here just recreates the analysis. And this is typical of what computational documents do. They're a way to solve reproducibility crisis by putting the actual reproducible data analysis in your report. So whoever you pass the report onto has everything they need to recreate the data analysis. For example, I can run the code that's in this document, these buttons, and reproduce the document as I go. So here this analysis is making some graphs. We'll look at it in detail later. And then the other thing that this particular type of document allows you to do is you can knit this report. Both the text that the author wrote in the document and the code that runs the document is combined. And so we see the text here with the code results. And it creates a finished presentation to pass on to someone who wants to know about our results, but maybe not about the code.

So that's the gist of a computational document. It is something that can contain embedded code. And because it can do that, it can automatically reproduce the data analysis done with code. And you can see here how computational documents are a definitive solution to the reproducibility crisis in data science. Everything we need to reproduce our work is here. But because we're reproducing it with code, not only can we just simply reproduce analysis, we can reproduce the analysis automatically.

That automation is what creates opportunities for businesses who want to use computational documents. And not only does it create opportunities, it's going to be a disruptive technology that changes how you think about delivering and performing data science if you deliver or perform or conduct data science as part of your job.

And not only does it create opportunities, it's going to be a disruptive technology that changes how you think about delivering and performing data science if you deliver or perform or conduct data science as part of your job.

The printing press analogy

To put that in perspective, let me draw a historical analogy. Go back to the slides. This is a picture of the printing press. The printing press, famous example of disruptive technology. When Johan Gunnberg introduced the printing press into Europe in 1439, no one could really foresee the sort of changes that it would catalyze in society. Now, the video here is stressing the reproducible aspect of the printing press, but that's actually a red herring. You know, monks can reproduce books by hand before anyone came along with the printing press. What the printing press did to change society is it made the reproduction of books automatable. It took a little bit of initial setup, but after you did that, you could automatically reproduce books.

Now, compare that to a computational document. It takes a little bit of initial setup to write the document, to put the code you want inside the document, but after you do that setup, reproducing the analysis or reproducing the tasks that the code does is automatable. You could pass that document on to someone who doesn't know how to use code. They could automatically do what you designed that document to do. What sort of changes should you expect this to trigger downstream?

Well, again, consider the printing press. When it became automatable to reproduce books, many things changed. For the first time in history, it was now profitable to create a book to sell, and that gave rise to a huge new industry, the publishing industry, which is still with us today. For the first time in history, authors had considerable economic and political power, and that triggered new laws, copyright laws, which protected authors from, you know, publishers and people trying to steal their work, and also censorship laws, which protected the public, supposedly, from the authors. Another thing that's very prominent when people talk about printing press is, for the first time in history, the Bible was no longer a black box. People could look at it without an intermediary in the form of a priest or the church, and they could develop new interpretations on what they were working with, and that led to new conflicts. And then just more generally, after it became automatable to reproduce this sort of work, it became beneficial, perhaps for the first time, for the common person to learn how to read. I think some of these changes are things that we'll see on a much smaller scale as computational documents become more and more used in our fields.

Benefits for authors and users

There's two main benefits of computational documents that we're going to focus on today, and the first benefit is for authors. When you use computational documents to report your results and to deliver the insights you develop with your data science, you can start reporting those results much more efficiently. You could reuse the same initial work, the same setup that you put into making one document, to create many, many documents. The second change that computational documents facilitate is for users. The people who consume your reports that you generate with computational documents can interact with the content in new ways, ways that might be very valuable to them, ways that they might be very excited to have and even willing to pay for.

So we're going to look at how you could create computational documents that you could use in either of these ways to make your life better.

R Markdown overview

For the remainder of the webinar, we'll look at R Markdown, which is a technology that makes it easy to create computational documents. I'll give you some tips about the features of R Markdown that I think have the largest payoff for using computational documents in a sort of business or production setting. And then finally, we'll think about how to include interactivity or not include interactivity in the documents that you make. Now throughout this webinar and the two webinars that follow in the series, we're going to talk about the word production. Now I want to be clear up front that production means different things to different people, and we're aware of that. So please don't be distracted by the word production if it shows up in these webinars. But that would just mean that these materials that we're showing you are ready to use however you want to use them and with whoever you want to use them with.

All right, so we've looked at the promise of computational documents. Now let's look at a way for you to make computational documents with R. There are many different ways to make computational documents, but we're going to focus on just one known as R Markdown. And again, I'll leave the slides and go to a demo here.

The example document I showed you earlier right here is an R Markdown document, and R Markdown documents are just plain text files, but they contain three different types of content. So if we look at this file here, you'll see up here as the file extension .rmd. Don't let that fool you. Everything in this file is just a character and a plain character at that. So this is a plain text file that makes it easy to edit, to save. If you wanted to use version control on this file, say I have this file saved, so if I delete this and save it again, I could come over here to my RStudio tracker, go to webinar on my way, and follow this 02.rmd, and I could diff it. And since it's just plain text, it's very easy to read what's changed here. This is a git diff. And it's also easy to save on GitHub and that sort of thing.

So we're looking at R Markdown file, and we see that it does have text here. It also has code, as we saw before. And it has a header here, which is just some metadata about this file. The three types of content are demarcated by combinations of characters. That's what allows us to remain a plain text file. So if you have, if you want to have code in your file, you can surround it by three backticks. And any program that's aware of this file format will be able to recognize that there's executable code between those backticks. If you want to make a header, you can surround it by three dashes. And if you just want to write plain text, well, you can just write plain text.

Since this is a plain text file, you can open it in any text editor. But I would recommend opening it in the RStudio editor because RStudio is designed in some ways to be a text editor for R Markdown files. The RStudio ID will automatically locate your code chunks in here, surrounded by the backticks, and will allow you to play them in that fashion I showed earlier. You can use this as you write your document to make the document effectively a notebook, a notebook interface. You can write your code, test it out, see if it does what you want. You can make the results go away if you don't like looking at them. But if you do like looking at them, they appear right here next to your code. And then finally, the RStudio ID provides a knit button at the top of your document, which allows you to use this plain text document to generate a new document that contains the same content.

At the moment, this document is set up to generate an HTML file. The HTML file has everything that's in the original document, including the code results. And in this case, I asked for R Markdown Studio not to include the code itself because I thought that'd be distracting. And you can see this is an HTML file. I could share it on a website or not. RStudio opens up a preview so I could see the results and read them. And then if I look down here, I will see that the actual file being previewed exists in my directory, and that's how I could take it and share it and do something with it. Like I said, this document's set up to return HTML output, but I could use the dropdown menu here, and I could create the same content as a PDF file.

None of this really affects the plain text file. It creates a new separate file. This file is a PDF file, so if I was submitting to a journal perhaps or going to print this out, maybe I would use the PDF version instead of the HTML version. And then also I can make a Microsoft Word version. These are just some defaults that come in RStudio for making these files, but the idea behind R Markdown is you can use your plain text file to generate a report in a limitless number of formats. Each report will contain the same content, but it'll be adapted to a new format. In this website, rmarkdown.rstudio.com slash formats is an enumerative list of the options you have available for changing your plain text R Markdown files into new types of files. Each of these are a function or an R package that can handle an R Markdown file and turn into something new. You'll see you can make books, websites, blogs, packages, no shortage of things with R Markdown files.

I'll click through to the Flex Dashboard one. This is a way to make interactive, well, to make BI dashboards with R Markdown files. And, you know, if we look at some of the examples, maybe click through on one of them. Here's the dashboard. It's made with R Markdown. If we look at the source code here, you can see, you know, here's the familiar R Markdown header separated by dashes. Here's some code chunks separated by backticks. And, you know, there's some formatting here that tells us what should be in a column and whatnot.

So, to recap the strengths of R Markdown, they're just plain text files, which makes them very portable, very usable. But you can embed code in them, which makes them a true computational document. When you author an R Markdown file, you can use RStudio ID to have a notebook interface if you like that experience for testing out your code. And then, finally, the whole idea behind R Markdown is that even though you're working in a plain text file, you could generate a report that's not plain at all. And, in fact, the report can be in a variety of different formats. So, in my example, I just used the knit button in R Markdown. But it is worth noting that you could also generate these things programmatically. There is a function in the R Markdown package called render. And if you send it the file path to your file, it will render the file for you programmatically. So, you don't even need the RStudio ID, you just need R terminal to render these files. If you do it programmatically, RStudio ID isn't going to open a preview or anything for you. But the file will be regenerated and it will be saved alongside your other file.

Key R Markdown features for production

So, now that you know the value proposition of R Markdown, let's look at the features of R Markdown that will be the most successful for making computational documents to share with consumers of a business nature. I'm not going to show you every single thing you need to learn to create R Markdown files, because I've already documented all of this in a self-paced tutorial at this website. This tutorial takes about 45 minutes to maybe 90 minutes if you take your time to finish. And when you finish it, you will be an expert at R Markdown. It's very simple technology. You just have to learn the rules and they're easy to reapply. It will take you about the length of this webinar to learn this. But I'm not going to use the length of the webinar to teach it. Instead, I want to focus on the more high value features here. I'm also not going to go very down into the weeds with R Markdown, because all of the details are documented at this website, bookdown.org. The book is available in its entirety for free online at this website, and it documents any questions that you might have about R Markdown. Instead, what I'm going to focus on are the features that I think will pay off the most for you watching today.

All right. So first of all, first feature, how do you create an R Markdown file? If you want to create an R Markdown file, go to the RStudio IDE, go to File, New File, R Markdown, and when the wizard pops up, just click OK. This will open a file that has a template inside it. You can read the template if you want. It tells you a little about R Markdown format, but typically, you would just delete everything and start writing your file. Or you could just open a plain text file and start writing a file. Same thing.

So let's start writing a file. I'm going to try to recreate the report that I showed you earlier, and to make it easier, I'm just going to copy and paste some of the text over here. All right. So I'm starting my report. The graph below displays the adverse events reported for a drug. This report is going to look at Pregnizone, which I'm pretty sure I did not spell right there. And the most common side effect of Pregnizone I happen to know is that the drug is not always effective, which, as side effects go, is not the worst by far. Okay. So this is a report. This is a very simple report. I could write a report that goes on for pages, but right now, this will be enough for us.

When you write text in R Markdown, you can mark up the text. For example, if I put asterisks around it, then RStudio will know that I don't want those asterisks to be in the final report, but I want the words between those asterisks to be italicized in the final report. So here I have adverse events are in italics. There's about two dozen signals you could use to mark up the text in your final report. I'm not going to cover it today, but you could learn it in the tutorial.

Instead, I want to focus you on a more interesting way to change your text in reports. You could type out your text by hand, which I've done, or you could use R code to generate text for you. So first, let's put some R code in here. I'm going to just copy over the setup chunk from the report I'm trying to create, put it at the top of the document. What this chunk does is it loads some packages in the helper script. It makes some functions there. It takes a drug. It takes an age range. It queries the FDA's open FDA database to find out what adverse events or side effects are associated with that drug, in this case, pregnanzone, for people in this age range, and then it makes a data set out of those things. Right here, the data set is called events.

If I want to find out what the most common side effect is in the events data set, that's where drug and effective comes from, I could look it up myself and type it out, or I could ask R to find it for me. So I do, I have a script, a one script, it's in the webinar files, contains all the code I'm using today. It's as a script. This might be how you typically write R code. There's no reason not to write your R code inside an R Markdown document. I just have this over here so I can copy and paste throughout the webinar. I've written this code earlier, events, dollar sign term, so it looks at the term column of the events data set, and it finds out which type of event has the maximum count. So this returns which side effect is the most common for this drug. If I want the results of this to finish my code, I could surround it by backticks, and then after the first backtick, put the letter R in the space. Now when I knit this result, I can see that I have the most common side effect for this drug. Now when I knit this report, the reports go to run that code and insert the results right into my line of code here. So it says drug ineffective because that was, you know, what you get if you run this code, or run this code in the console, perhaps we'll see that. It adds it right into my report.

As long as my code returns something that's like a number or a piece of a character string that could be inserted into text, I can use that code to finish the report for me, and the reader won't know which parts of the report are hard written in with real text, and which parts are moving parts that can change as the results of the code that generates them change.

All right, so that's the first thing. Let's add some markdown to make all events a header here. If I wanted to put a code chunk in here, I could just type out the backticks. It's all, you know, just character conventions after all, but I prefer to use a keyboard shortcut, which in this case is option shift I, option control I, inserts it for me, or you could go to the insert button up here, and it will insert a code chunk for you wherever you want to run. An interesting thing revealed by the insert button is you don't have to insert only R code chunks. You can run any type of code you like in an R markdown document. For example, I could run Python code in my document with the Python code chunk. You do need to tell R what type of code appears in a code chunk, so you type the name of the language inside curly braces after the three backticks. Now, I'm not a Python user, so I'm going to stick to R code today, but if you're interested in using other types of code in R markdown document, specifically Python code, I recommend that you check out Sean Lopp's recent webinar, R Studio in Python, A Love Story.

All right. Let's use this code to do something. Again, I'll borrow some code over here, and I will use this code to make a plot. Here's some ggplot2 code that makes a plot. I'll put it in this code chunk, and if I knit the document, now I added a plot to my report, but the code that makes a plot also appears in the report. So, this is my second tip. If your consumer is not an R user, I wouldn't show them the code in your report. It makes an experience that they will process psychologically very different than if you just make a plain text report like they're used to reading, and if you want to hide the code in your report, you could do that by adding a knitter chunk argument up here after the R in your code chunk, and the argument that I suggest you consider using is echo equals false. This will cause the results of the code to appear in your report, but it will hide the code itself from appearing. So, now when I knit this document, I have my chart, but I don't have the code that creates the chart, and to be honest, normally if you're making a graph, people don't care what code you're using to make the graph. They just want to inspect the graph, and that's what echo equals false does for you.

All right. So, we're slowly building up a report. It's probably a bit of a whirlwind, but hopefully you're getting a sense that it doesn't take too much to write these reports. They're actually fairly simple. All right. Echo equals false isn't the only knitter option to consider, but if you want to write more options, you can just start typing and then use R Studio's tab completion for a list of possible options, but learning about these options is a bit tricky, I confess, because echo and code chunks, they're not R functions, so you can't open an R help page to learn about them. So, what I recommend is that if you want to learn about chunk options in R Markdown, go to this website, ua.name slash knitter, and it contains basically the help pages for every chunk option in R Markdown. This would be the place to go to. It's the first result when you Google R Markdown code chunks or knitter code chunks.

Parameterizing reports

Now that we have some code, you could consider this a complete document. If we look at what we have over here, run again, document says, okay, the graph below displays adverse events, report for prednisone, the most common is drug ineffective, and here's the other ones on the graph. This might be all your consumers interested in, and you could pass this off to them, but maybe the consumer gets sick and they go to the doctor and he prescribes a second drug and they want to know what the adverse events are for that drug. You could recreate this analysis, create a new report for them, and send it off to them, but when you're working with code, usually you don't want to repeat yourself, one, because it's more work, two, it invites the chance to make more mistakes, but three, and most importantly, when you're working with code, it's normally not necessary to repeat yourself, and now that we've written this code, it's not necessary for us to repeat ourselves.

We can reuse this report in multiple places by adding report parameters, and to do that, first, we'll have to add a header. A header is a section of the report that doesn't appear in the finished document, but allows you to enter information like metadata that customizes how the report is generated. To make a header, surround it on each side by three dashes. It should come at the start of your document. It is a header. Here, you could do things like set the default output for the document. This would be the output that gets used if you call the render function instead of choosing one up here. One of the things you could set here is a parameters field. You could learn how to write YAML and headers in the R Markdown tutorial, so I'm not going over it, but you could see it's key value pairs separated by colons. I'm going to create a parameter. I'm going to name it drug, and I'm going to give it an initial value of prednisone, or whatever you want it to be, and now that I've created a parameter in my header here, the code in my document can access the value of the parameters as if they were a list.

Instead of having a hard code prednisone here, I could just access params dollar sign drug, and this will return the value prednisone because I set that value to drug here. If I wanted more elements to be in that list, I could name whatever parameters I want and set them to whatever I want them to be. I do have to spell correctly, so let me fix this. Now that I've created some parameters, if I go to knit, it's going to knit my report. It's going to use the default value of the parameters, and I won't see any changes now I'm talking about prednisone, but if I come down to knit with parameters, I will have the option before I knit the report to change either of these parameters, so we've been looking at prednisone, but let's look at a different drug like Tylenol. Now when I knit the report, wherever params dollar sign drug appears in my code, it will return Tylenol, and if I wrote my report in the correct way, this will change the entire report to speak about Tylenol instead of prednisone. Now you can see here I missed some places, but the graph is talking about Tylenol, and the most common effect of Tylenol is nausea.

It doesn't take much to parametrize the whole report to use this new params dollar sign drug. For example, down here where I said prednisone, well, I could insert some inline R code, and that inline R code could just return the results of params dollar sign drug. This inline R code is actually pretty useful because it's going to return the drug name, and maybe I would use it up here to create a title, and now when I knit this document, I have prednisone here. If I knit it with parameters, stay with Tylenol, I get my Tylenol document. I can now use this document in a similar context to describe any drug in the database. I've automated that work by putting parameters into my R Markdown document.

Adding interactivity with HTML widgets and Shiny

Next, let's look at another opportunity created by the fact that there's executable code in this document. There are two types of code in R that return interactive components. One type is called HTML widgets. HTML widgets are a collection of R packages that contain functions, and the functions build a JavaScript-powered self-contained widget, normally to make a graph that has interactive properties. And then there's the Shiny package, which provides ways to toggle a live R session, and to interact with that R session that runs in the background of your document. Let's look at each of these separately because when you combine them with R Markdown documents, you very quickly have a way to create interactive reports for your consumers.

Let's start with HTML widgets. The best place to learn about them is the website, htmlwidgets.org, so let's go there. Let's jump right over to the gallery. Each of these entries here is a different type of HTML widget that does a different thing. These widgets are sort of pre-programmed to use out of the box. They all do their thing, and if that thing is what you want, then it's your lucky day. You can just start using the widget. If you want to do something that an HTML widget doesn't do, well, you won't be able to use HTML widgets. You'll have to use Shiny, and we'll look at that in a minute. You can see that most of these HTML widgets are designed to make graphs. For example, we can make a map that we could toggle here. You can make key maps and read off values, but for our report, since our graph is just fine the way it is, I'm going to use the underrated DT tables widget. What this widget does is allow you to actually explore the raw data in a table.

To add it to my plot, all I have to do is maybe put some text to set up, so this will be our appendix. The raw data is available below, and then I'll borrow some code from over here. I'll put it in a code chunk. Now, as long as I pass this document on in an HTML-based format, because this is a JavaScript table, users can explore the actual data. They can sort by the different variables in the data set, or they can decide how much they want to look at at once, and so on. It adds real value to this report. All I did was call a function that already exists in R to insert this interactive HTML widget into my document at a certain place. Now, like I said, HTML widgets are really good for doing what they're designed to do, but they're not very flexible for creating new functionality.

On the opposite end of that spectrum is the Shiny package. Shiny is designed to let you do whatever you want to do in an interactive fashion. Go to the Shiny website. Here's a very simple example of Shiny up down here. You see I have a drop-down menu that I could use to select different things. The graph output created with R changes when I make different selections. I could select different dates, overlay a smooth line, and do this and that to the graph. The idea is the user gets to control some inputs. The inputs are used behind the scenes in R analysis, and the output of that analysis, be it a graph, a table, text, or what have you, is displayed to the user, and that changes as the user changes various components here. That's a very simple Shiny widget or Shiny app, but you can see that you can make some more interesting and complicated Shiny apps here.

You normally design an HTML page that sort of serves as a UI to the app, and then you set up the R session and the code that runs in the background. Well, you do that with R code, and since you can insert R code into an R Markdown document, you could do that in R Markdown document as well. Now, I have a document that does the same thing we've been looking at, but it uses Shiny functions like render plot and whatnot to make the document we were just looking at interactive. This document is now a Shiny app. All right, so here's a Shiny app. It's showing us Tylenol's results. I'm also bringing down by gender here. Here's our table at the end. We could change the age ranges for Tylenol. Maybe something like this. See a big side effect for Tylenol is complete suicide. Didn't expect that. For older people, it's fatigue, and if we wanted to look at another drug, let's say Advil, we could type in right here, and the app almost immediately changes. This is actually and truly a Shiny app made from an R Markdown document.

You can see the document here. If we were to add the Flex Dashboard formats to this app, we'd have an R Markdown document, but this really looks like a Shiny app now, so this is a dashboard version of that app made through R Markdown.

Recap and thinking about interactivity

All right, so let's pause. I mean, there's quite a bit there. I'd like to recap what we looked at. When you're making computational documents with R Markdown, the opportunities you should pay attention to are actually generating the text of the document with code, using inline code, because if you do that, it will pay off when you parameterize the document. Next, don't think of it in terms of R. Think of Python, SAS, JavaScript, whatever you like to use. You can put that into an R Markdown document. Some more exotic languages will take some work on your part on the back end, but things used for data science like Python are ready to go. Hide the code from your users if they're not looking for a code experience, and then add parameters to reports to make them flexible. Your entire report becomes like a function in R that takes some input and returns some output, and all you have to do is change the input to change the output. If you're generating the text with inline code, you're able to change the results that are reported to the user as the parameters themselves change.

Consider adding HTML widgets for self-contained interactive pieces, and then consider going to adding Shiny components if you need to do some sort of custom interactions based on your unique R algorithms. If you look at these things that we went over, you'll notice here at the end these last three options are sort of like a hierarchy of ways to make your report more and more interactive, and you can think of this hierarchy of interactivity from static R Markdown documents on one end to Shiny apps on the other end, which are completely interactive tools.

All of these are available for you to use, and so now as someone sophisticated who can make computational documents, you need to start thinking about your strategy for supplying needs using these different levels of interactivity, so let's look at that, thinking about interactivity. Not only does each rung on this hierarchy create a more interactive document, it also requires more work on your part. Interactivity is complicated to make, and when it comes to making something like a Shiny app, those of you who have made Shiny apps, you know that there is a lot going on there, and that is a very specialized undertaking to make a Shiny app, but you don't have to make a Shiny app every time you want interactivity.

For example, if you just want to generate the same report but on a new drug or on a new market segment or on a new updated database, you might be able to do that with the simple R Markdown document that has been parameterized with parameters. Alternatively, if you want your user to be able to poke and prod the document, you might want to create a Shiny app, or you might be fine with an R Markdown document that uses HTML widgets, or even an R Markdown document with Shiny components. What I recommend is that you aim for the level of interactivity that's as simple as possible to do the task that you want, because that leaves less room for things to go wrong and need to be debugged on your part, less room for things to go wrong and for the user to have a poor experience, and less time needed to create those documents.

What I recommend is that you aim for the level of interactivity that's as simple as possible to do the task that you want, because that leaves less room for things to go wrong and need to be debugged on your part, less room for things to go wrong and for the user to have a poor experience, and less time needed to create those documents.

Here's some simple bright lines that you could use to think about this hierarchy. If you ask yourself, how many times do I need to change the data in my report during my presentation or during the viewer's presentation? How many times do they need to change the data for themselves? If it's less than once, then you don't need anything that involves Shiny. This is a very important question to ask yourself, because we tend to see people learn Shiny and then use it for everything. It's very hard to walk back from Shiny once you've adopted a Shiny workflow, and that's okay. Shiny is excellent, but it's very easy to overestimate what you need to accomplish the results you want, and if you're not updating your data more than once per session, then you can just use a simple non-Shiny R Markdown document, and you can create a new document before each session, and that document will suffice for that session.

The next thing to ask is, do you need to create a custom UI for your app? If what you want to report to your user would do fine in a document format or a dashboard format, you can do that with an R Markdown document, and you don't need to invest in creating a Shiny or an HTML UI to contain your interactive components.

RStudio Connect preview

All right, that is computational documents in a nutshell, and hopefully from there, you can see how you can achieve some of the benefits I mentioned. I think the case for generating reports efficiently with parametrized R Markdown reports sort of speaks for itself, and then adding interactive components either through parameters or HTML widgets or Shiny components would be the way to empower your users to do different things, but we've really only covered half of the user's story. Computational documents create the opportunity to give your users a document that can run code and interact for them, but whether or not that opportunity will exist at the user's fingertips will depend on how you share that document with your user. The platform you use to share the document will need to be able to run and execute the code in your computational document if you want the user to be able to execute the code for you. If you want the user to be able to execute that code by interacting with the document, and that's what the next webinar will focus on.

On September 18th, we'll look at RStudio Connect, which is a production platform for sharing data science products. Among the things that you could share on RStudio Connect are R Markdown documents and Shiny apps. In short, you can share your computational documents. Computational documents create a lot of new opportunities for letting people interact with the things that you make, and RStudio Connect is built to take advantage of each of those opportunities.

To give you a taste of what's happening, last, I've taken one of the documents that I've made in this webinar material, and I've put it on a demonstration RStudio Connect server. So I'm going to a website now. The website is an RStudio Connect server. When I get to the website, I can see all the content that I've posted on RStudio Connect, and the thing that I posted on there for today is called 03RMDWithParameters. It's a file that's in GitHub Materials for today, and if I click on it, I can see the file. It's the same file we've been looking at, and I could send this to my boss. I could send this to the president of RStudio, Tarif. He could click on the file. He'll see the same thing that I can, and if he wants to change this to look at a different drug, he doesn't need to interact with the R code at all. He can use the tab in RStudio Connect, and he could choose his drug here about aspirin, and he could run the report. The code that I wrote for this report is now being executed by RStudio Connect on a server somewhere. It doesn't matter that I have R on my computer. That's just because I'm me, but you could go here, and if you had access to this document, you could run this without downloading R.

Here we could see some things about aspirin. The other thing is, since all the code's here to regenerate this report whenever I want, it'd be very easy for me to do things like schedule it to be updated whenever I know that the data itself is getting updated. It'd be very easy for me to update this report and automatically email it to my collaborators every Monday or Tuesday. It'd be very easy to see how people have been interacting with this report or how this report has changed over time, possibly looking at the version history or something else here. We'll learn all about that in two weeks in the RStudio Connect in production webinar.

All right, so that wraps up this webinar. Here is the link, rstd.io slash repro in production. It has versions of the file that I was developing today at different waypoints, so you could see all of the different technology I put in there, and thank you. Thank you for sticking out to this webinar. If there's any questions, I'll do my best to try to answer them now.

Q&A

Have any surveys been done on what percent of data scientists use computational documents versus those that don't? I'm not aware of that research, Steve. I think computational documents are sort of in the early adoption phase. It does also depend on how liberally you want to define computational documents, but what I have seen anecdotally is that adoption is spreading, so the percentage of data scientists using these things will be increasing.

Will Pagedown be covered in this webinar or any others in the series? No, Ryan, Pagedown isn't. It's very interesting but not really central to the topic we're covering today, so we won't go over Pagedown. However, if you go to rmarkdown.rstudio.com slash formats, that link in the slides, you will find a link where you can learn much more about Pagedown. It is one of the formats.

How do we handle reproducibility when package version changes may cause results to change? Can we force a certain version of each package? Yes, you can do that. Depending on your use case, if you have a budget, if you work for a company that's supporting your work with R professionally, then you might look into the RStudio Package Manager, which is a tool designed specifically to do this, to help your group keep track of which versions of packages are being used for what. It makes it very simple to reproducibly track which packages should be used when and then to create libraries of those packages to use with certain projects. You can learn more about RStudio Package Manager at rstudio.com. It's part of the new RStudio team bundle.

If you're working as an individual, depending on your work and how you use it, you might find RStudio Package Manager useful to you as well, again, if you're an academic lab or whatnot. But if you don't want to use RStudio Package Manager, you can check out some of the open source solutions that sort of get at this problem, like the rn package, r-e-n-v, in the pre-existing pacrat package, p-a-c-k-r-a-t. Handling changes in versions in open source software is probably the trickiest problem related to reproducibility when you use open source software. There are many people working on this problem. I recommend RStudio Package Manager because I know it's a very professional solution that will work. But obviously, when you talk about open source software, we're not always looking for professional solutions.

Can you please repeat the difference between rsetup versus R for the code chunks? Yes. And let's go to the code and take a look at that. So, I'll go back to the one that is developing here. The first chunk has the word setup in it, and the other chunks don't. You can give any chunk you want a label. So, I might call this one graph. That'll help me remember that, you know, this is the chunk that contains the graph. And if you're inside RStudio, down here, you can navigate the chunks. So, I can jump chunk 1, 2, 3. If I use a label, it's much easier to figure out which chunk I'm looking for because the labels show up here. And they're kind of used in other places, too. But R Markdown itself will use one of the labels in a special way. If you name a chunk setup, then whenever you try to run any code in this document, say this was a brand new document. I just opened it up. You can see here, in the setup chunk, I'm sourcing some needed functions. I'm creating the data set that's needed down here for events. Here's where events is created.

You can imagine sharing this file with a colleague. And they just want to make the graph. They're excited to make the graph. So, they come down here. They open the file. They don't create the data. They don't run any of that code up there. They just try to create the graph. Well, if events doesn't exist, they would get an error message. But if you put events or anything else in the setup chunk, then no matter what chunk gets run first, R Markdown will ensure that the setup chunk is run before that chunk. So, the setup chunk is an ideal place to load your packages, to download data if the report relies on it, to set up intermediate objects that the chunks below rely on. The setup chunk will always be run at least one time before anything else in the document gets run. And that's why I use the setup chunk label here.

How does R Markdown handle HTML widgets when rendering into a PDF in HTML by Edgar? That's a great question, Edgar. HTML widgets actually return JavaScript. They return a JavaScript widget. And if you put that into an HTML document, which is then presented by a web browser, they'll know what to do with that JavaScript widget. And it's going to work. And there are multiple R Markdown formats that rely on HTML. There's IO slide presentations, other types of slideshows, blog down, book down. Anything that's HTML-based, HTML widgets will work in. But HTML widgets are not PDF widgets. PDF files are largely static, and they don't recognize HTML widgets. So, if you were to output that file to a PDF, the HTML widget would not appear. You would not be able to have an interactive PDF. You could think of PDFs really for printing. And anything you print out is not going to be very interactive, unless origami is involved, I guess.

How would you generate both reports as separate documents, but not edit run each time? All right. So, I'm guessing we have a parameterized report. We're talking about prednisone. We're talking about Tylenol. If I go back to RStudio, this is the parameterized report. It's called 07tmp. I'm going to open the console here. Because if I wanted to generate both those reports and not have to use the knit button or whatever, I would use the R Markdown render function. I would pass it this 07tmp.rmd document. This will render it. If I just render it like this, it's going to use the default values for these params. Parameters up here. I could set the parameter values I want to use in the R Markdown render call. Right here with the params argument. Okay. So, it is params itself. If I go in here, put my double colon in there. Go to window out of the way. Set params equal to a list. Equal to a list. Here I could say, you know, I want drug equal to Tylenol. If I cared about Dr., I don't actually use it. I would set that here, too. Dr. A. Then when I run this document, it will use those parameters there. I could also set out the output file name.

Okay. So, how does that answer your question? No. But it set up the solution. If you can generate this document programmatically like