
Jenny Bryan | Help me help you: creating reproducible examples | RStudio (2018)
What is a reprex? It’s a reproducible example. Making a great reprex is both an art and a science and this webinar will cover both aspects. A reprex makes a conversation about code more efficient and pleasant for all. This comes up whenever you ask someone for help, report a bug in software, or propose a new feature. The reprex package (https://reprex.Tidyverse.org) makes it especially easy to prepare R code as a reprex, in order to share on sites such as https://community.rstudio.com, https://github.com, or https://stackoverflow.com. The habit of making little, rigorous, self-contained examples also has the great side effect of making you think more clearly about your programming problems. Webinar materials: https://rstudio.com/resources/webinars/help-me-help-you-creating-reproducible-examples/ About Jenny: Jenny is a software engineer on the tidyverse team. She is a recovering biostatistician who takes special delight in eliminating the small agonies of data analysis. Jenny is known for smoothing the interfaces between R and spreadsheets, web APIs, and Git/GitHub. She’s been working in R/S for over 20 years and is a member of the R Foundation. She also serves in the leadership of rOpenSci and Forwards and is an adjunct professor at the University of British Columbia
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome everyone to today's webinar. We're going to talk about reproducible examples from a conceptual point of view and why they're surprisingly important and then also a great deal from a mechanical point of view, how to make your reproducible examples in a way that they're easy to share with other people.
Okay, so this short link, the rstud.io, reprex currently points to a copy of these slides on speaker deck. Another very relevant URL is reprex.tidyverse.org which is the website for the package I'm going to talk about. And we may change what the upper short link links to as a result of this webinar because we're pretty excited to get some of these materials captured, but I promise it will always point to something very relevant to this package that will link to absolutely everything else. And you're also going to see that short link many many times during this presentation. And as said in the intro, this video will also be posted within 48 hours and all of these things will be publicized.
Basic usage demo
And this is my license. Okay, so the first thing I want to do is show basic usage and we're just going to get right into it and then we'll unpack what you just saw. So I'm actually going to leave my slides and go over to RStudio. So I'm sitting here in an RStudio session. It's fresh and I have a little bit of code up here in my source editor. I'm going to make a factor X, a factor Y. I'm going to combine them and get what to most of us is kind of a puzzling result. So this is just going to be an example of a small piece of code that maybe you want to talk about on the community site or share with your local R expert and ask like what's going on.
So this is how you would use reprex to turn this little snippet of code into a reproducible example. This is the path of least resistance. We'll talk about other methods later. So I would select a little piece of code and copy it to my clipboard. Then over in the R console I'm going to type reprex and you will see that that little piece of code is run and then basically a beautiful attractive version of that is stored on my clipboard and I can preview it here. So if I were to paste the contents of my clipboard right now, you actually see what's called markdown and this is what's necessary to create the attractive version of this code. And why is this helpful? Because you can go to places like GitHub, the RStudio community site, or Stack Overflow and paste this markdown in.
So I'm going to show you what this would look like in a GitHub issue. So that's that same markdown that you just saw. GitHub lets you preview things and you'll see that it looks just the way it did locally for me. It's been rendered, it's syntax highlighted, we have a tiny little ad down here that tells people how you did this and I could submit that as a GitHub issue. So that is the basic process and the reason I can just type reprex is that I always have this package attached and so you might need to call this and we're going to talk a great deal about that next.
So that is what basic reprex usage looks like. It creates a small little piece of code, renders it nicely, and it's ready to paste into other formats.
What is a reprex?
So this is just a static version of what we just did and this is the GIF that I use on the reprex website. I re-watched this clip, it's from a movie called Jerry Maguire, it's still highly recommended. And basically the reason for bothering to do all of this is if you're going somewhere to have a conversation about R, to have questions answered, or to describe a bug in software, being careful about how you make your reproducible example makes it much, much, easier for other people to help you.
And I want to explain where this word came from, reprex. So Roman first tweeted this and I thought it was just a great made-up word. So it is short for reproducible example, so it is a completely made-up word, but it's just very handy. And I'm going to use the word reprex over and over and over again in this webinar, so I want to be very clear that I'm using it in, I'd say, three distinct but related ways. So I think people, at least in the small R community, are starting to say reprex just as a noun, like it is a reproducible example. And that has nothing to do with whether you use this package or not. But then today's webinar is going to show you use of a package with that same name, reprex, that you can install from CRAN. I'll show you how to do that in just a moment. And then this is a pretty small package. It has a couple of functions, but really the main function it has is also called reprex. So this webinar is going to talk about how to use the reprex function inside the reprex package to produce a good-looking reproducible example.
And when does this come up in your life? It's very handy for conversations that you have on community.rstudio.com. It's very handy for preparing questions or answers for Stack Overflow. It's very handy for reporting bugs or making feature requests for an R package that is developed on GitHub. And also very useful for having detailed conversations about R in Slack or in email. So a reproducible example, conceptually, is useful in all of those places. And then this package, reprex, smooths over some of the mechanics.
Installing and setting up reprex
So here I'm going to talk to you about what you're going to need to do on your computer to make this package available to yourself. So reprex does not come with R. It does not come with RStudio. You have to make an explicit effort to install it. So you should pick one of these methods and it's the type of thing that you do once per computer. So you could use install.packages, open quote, reprex, close quote, to install just the reprex package. It is also part of the meta package that we call tidyverse. So if you did install.packages tidyverse, reprex would be one of the many packages that get installed on your machine.
In general, there is very little harm that you can do to yourself by reinstalling packages. So you also should not stress out too much about, you know, you could install just reprex and then install the tidyverse and nothing bad will happen. So do it once per machine just because that's the minimum you need to do, but it's no tragedy if you reinstall things. Once you've installed, you still need to use our sort of library call to make the reprex functions available in your R session. So you would need to do this in every R session that you plan to use reprex in. So that might be something that you do multiple times per day, certainly way, way, way, way more than once per computer. So every time you want to use the package, you'll need to execute the library reprex command.
Now I use reprex several times a day, the most part, and so that would be very annoying. So an alternative, if you also become a semi heavy user of this package, is to make it available to yourself all the time. So you can control the startup behavior of R through a file called dot R profile and conventionally it's found in your home directory. And so this snippet of code suppressed messages require reprex looks a little different from what you just saw, but it's sort of a better way to attach this package in your in your startup file. So you would put this snippet of code there in your R profile and then forevermore when you start R, the reprex package would be available. And if you have never thought about your R profile file before, there is a function in the use this package, which you would also have to install, that will create it for you if you don't have it, or if you do have it, it would open it for you for editing in case you wanted to put something like this snippet in there.
So once you've done those two things, you've installed it and you've attached it through either the library command or by putting something in your dot R profile, you are ready to use the reprex function. And what you saw me do in that first demo was I actually called the reprex function explicitly in the R console and something you'll see me do before we're done is we also have put some what are called RStudio add-ins into this package that give you even more ways to launch this function.
And this is a bit of a sidebar, but since I have shown you how to put things in your R profile file, I also want to tell you how to do that responsibly. So reprex is a workflow package. It's something you would use in your daily work to make your life a little bit easier. You use it interactively. I would be pretty shocked to see it show up in a typical person's R scripts, R markdown files, packages, or shiny apps. And so the fact that it doesn't show up in those things, it's an interactive package, makes it safe to attach in your R profile. But I don't want you to get the wrong idea and think oh my god I should do this with all my packages. So I would not want to see this kind of code using like dplyr or ggplot2 or things that do show up in your scripts. And it's because your scripts would then become highly not self-contained. And they they would work for you because of stuff in your R profile, but they won't work for other people. So this is an interesting technique to know about, but you need to be really really careful about what you do here. So I think it's safe to put reprex in there. It is not safe to put dplyr in there.
The reprex philosophy
Okay so we're going to get back to the package and to reproducible examples now. So I wanted to give a brief intro like what on earth in my life drove me to make this one of my missions. Before I joined RStudio I was a professor at the University of British Columbia and I had a course called stat 545 that has a lot of content online to this day. The course continues, you could go there. And I ran this course entirely on github and so it meant that all of my dialogue with students, both sort of me to the whole class and and me talking to individual students, took place in github issues. And I actually analyzed my github usage in the course and I found that every fall I was participating in at least 300 maybe 500 github issue threads and that's just in my teaching life. So I spend a great deal of time talking about R in those places and solving people's R problems. And to do that well I actually wanted to use executable R code and the friction involved in making that look good started to drive me crazy.
And now that I'm no longer a full-time faculty member and I'm working full-time on R packages, this just gives you a sense of the intensity of my github activity over the last year. So now I still work with tons and tons of github issues, you know, in a different capacity and then I also talk about R a lot in Slack. So I still have this sort of hourly need to run little pieces of R code and share what I'm seeing with other people.
So trying to remove friction for myself and other people led me to create a few principles that I knew had to be true of a tool to make this easier. So this is the reprex philosophy. I think that conversations about code are much more productive if they contain three things. Well it's one thing but three properties. Code that actually runs. Code that I do not have to run as the reader but code that I can easily run. And so there is a little bit of self-contradiction here but the point is you want to make it easy for people to interact with your reproducible example in a whole bunch of different ways. They can just be a consumer. They can just read it or they can easily grab it and run it themselves, modify it and share that back with you.
I think that conversations about code are much more productive if they contain three things. Code that actually runs. Code that I do not have to run as the reader but code that I can easily run.
Self-contained code demo
So I want to be really detailed about what I mean when I say code that actually runs. So you're gonna isolate a little piece of R code and you hand it off to reprex. You've seen one demo we're about to do a whole bunch more. That code is taken and it is run in a completely new R session and that means it has to be completely self-contained. So it must include the command to load all necessary packages and it must create all necessary objects. And this can be very frustrating for people but it's extremely important. So I'm going to go do this live to show exactly what I mean.
Okay so I'm looking at an R script that contains the code you just saw on that slide. And I'm going to restart R. So let's imagine like a typical interactive R session. So I'm going to be down in the console here and I'm going to say oh I'd like to play a little bit with this praise package I've heard about. So there I go. I say library praise down on the console. Now up in my source editor I make a new object called template and it's a template string exclamation your reprex is adjective. And so if I then call the praise function from the praise package I don't expect you to know this I'm just using it as an example. It's going to create like random little sentences for us praising someone for their awesome reprex. So let's say I want to share my joy about this with people using the reprex package. I would select this little snippet of code again this is the the long way I'll show you a short way later. Copy it go down to the console type reprex and hit return. And now let's look at the preview here. It shows defining template and then my praise call fails. Error in praise could not find function praise. And that's because you don't have the library praise command here. So over in that fresh R session the praise package is not available to use.
So here's something else you might do you're like okay I'm going to add that command then I'm going to make my call to the package. Let's see if that works. Copy call reprex again. I have a new error. Error in grep whatever whatever object template not found. So this snippet is incomplete in a different way. It actually doesn't contain the code that defines the template object. So here's the full snippet. It loads the praise package. It defines the template object and it makes this function call. So I'm going to copy all of that to the clipboard. Re-execute reprex and we have made an exquisite reprex.
So that's a little belabored but when I try to answer our questions for people and I try to run their code the two most common ways that I fail are that they haven't explicitly listed all the packages they're using and I have to either sleuth it out of them or figure it out for myself and add those commands or the objects they're referring to are not available to me and so those are the two reasons why I can't run their code.
Do's and don'ts for reprex
So on the reprex website I have a list of do's and don'ts that are distilled from a lot of other really fantastic sources about creating reproducible examples which are referenced there but the three big big high points are you need to write this reproducible example using the smallest the simplest and the most built-in data set you can get away with and that is very uncomfortable for people. Include commands on a ruthlessly strict need to run basis so you really need to strip your example down and then I say pack it in pack it out and don't take liberties with other people's computers and this is referring to making sure that if you create files you remove them or if you change the working directory you reset it if you change options you reset them but basically leaving things as you found them.
Worse than copy paste is the screenshot so this of course does again hit some of our checklist it clearly shows the code and the output but again if somebody else wanted to check this and reproduce it they actually have to retype everything which frankly is never going to happen and so this is what I want to see in a reprex because it can be copy pasted and run so I'm going to prove that to you right now so if I go to this issue on github and I copy I could copy all of this or I could as long as I get all the commands I'm okay so I'm going to put that on my clipboard I'm going to go back to our maybe to make this really explicit I'll show you what I copied right that's what I did so I can copy this again and call reprex and I get exactly what this person was reporting on github so I've been able to reproduce it very quickly from a copy paste.
But as you saw reprex is like are you sure you want to do this because I'm I've got this output here and so if you if you really want to get really clean code from a reprex that someone else has made you capture it and use one of the undo functions and the reprex package I could use a reprex clean and I'll show you that right now so here's what I copied from github so I could copy that and call reprex clean and now if I paste you'll see all the output has been eliminated and so I think that's a slightly obscure thing you might want to do but there are the full set of backwards functions in reprex so it helps you take code that people have copied from the console or that they have already made a reprex from.
Shock and awe: advanced features
reprex by default goes and does its work in the session temp directory that's all part of it sandboxing all of your work but if your reprex does for example file input and output it could be much easier to force reprex to work in your current working directory so out file equals na is shorthand for that so if I try to if I ask are to write the first six letters of the alphabet to a file with out file equals na all of a sudden these four files that reprex needs to create are being left behind in my working directory instead of in a temp directory and it's the R script that reprex makes it's the HTML that it uses for the preview and the markdown that it puts on the clipboard for you so all those usual files are left behind in a much more accessible place but you'll notice it has a god-awful file name because we just created it out of thin air so if you want to work somewhere specifically and have nice file names you could also provide the base for that in out file and now you see that it leaves the same four files behind but they have a much better file name.
The human side of asking for help
It turns out when you sit down to make a good reprex out of your problem and you keep it self-contained you strip down your giant hairy data set to the smallest data set that reproduces the problem it is amazing how often you end up answering your own question in the privacy of your own home and you didn't have to make yourself vulnerable to other people.

