Resources

Reproducible Examples with the reprex package

Reproducible Examples and the `reprex` package. https://speakerdeck.com/jennybc/reprex-reproducible-examples-with-r Jump to: 0:08 Intro 0:40 Basic usage of reprex 3:35 Motivation, why use reprex? "Help me help you" 4:08 Define `reprex`? Three commons ways to use the term. 1. noun, a reproducible example 2. the reprex package. a tool to build R `reprex`s 3. reprex::reprex(), a function in `reprex` to make a reprex. 5:26 When should you use a reprex? 6:14 reprex installation and setup. How do you actually get repex on your machine? 7:59 Advanced setup and discussion. 9:45 Please use advanced features responsibly. 11:02 Why does the reprex package exist? Anyone who has helped teach R or dealt with github issues, twitter, stack overflow & RStudio community questions knows that helping people diagnose their coding problems can be hard. This tool comes from hard-won experience. It’s aim to is help people ask well formed questions and increase the chances of getting well formed answers quickly. 12:52 philosophy behind reprex 1. code that I can run 2. code that I don't need to run 3. code that I can easily run 13:52 code that I can run. 17:25 Tips on writing good `reprex`s. Dos and don'ts. 18:52 How do I get my data into my reprex? Getting small data and CSV type data into your reprex is easy. “I have a big hairy data object and I can only show their problem by using it”, but that's not always the case. 21:02 code that I don't need to run reprex gives your reader the code and reveals the output being produced by that code. For experienced coders, that might be enough to help you. 22:44 code that I can easily run Don't copy and paste from the R console. This is usually annoying for your reader. Worse than console copy-pasta is the screenshot. (Many people think screenshots of code are downright offensive.) 25:03 reprex_clean If you copy someone else's reprex into your consolve, it may include their output, making your new reprex a untidy. Here are tips for taking someone else's reprex code and output, and create a clean reprex reply. 25:54 shock and awe More interesting features of the reprex package. - 26:29 What about figures and plots in your reprex? So happy you asked about that. reprex will automatically upload your images to imgur.com. - 28:23 Create a reprex by explicitly providing your code in the reprex call. - 29:00 when you need your reprex to work in the current working directory. - 30:45 Differently flavored markdown. Optimize your reprex markdown output for github, stack overflow, or the RStudio community. - 30:31 Make your reprex create an R script, with your reprex outputs as comments. This is handy for pasting into an email or slack-type-app. - 32:25 Rich text format, rtf output. (currently experimental feature as of this video) - 33:06 supress the reprex add at the bottom of your reprex - 33:19 Include session info. - 33:54 Auto styling of your code. Good if you're dealing with poorly formatting code. - 34:25 Change your comments string. - 34:32 Silence Tidyverse startup messages. - 35:00 Capture a reprex that sends messages to standard output and standard input (e.g. package installation compilation messages). 36:13 Set up personal defaults for your reprex usage. 36:54 reprex RStudio addins; render reprex and reprex selection. These accelerate your use of reprex. 39:01 The human side of reproducible examples. How to ask questions in ways that are most likely to get answered. Sorry for the tough love, but this is important. Why are you always asked to give a reprex? - Experts try to use reproducible examples to ensure their advice works. - Making a good reprex is hard. But, you are asking them to solve a problem for you, so meet them halfway. - Creating reprexes is good coding practice. - Making a good reprex is often a good way to debug your issue in the embarrassment-free privacy of your own home. - reprexes lead to discussions more likely to help people in the future. 44:34 Behind the scenes of reprex 44:44 Thanks for those that helped make reprex possible. Questions and Answers - 46:05 can reprex capture variables and objects in the current environment? (not yet, maybe in development) - 47:25 does reprex actually check that the code is self contained? (self contained) - 48:08 does readr::read_csv support the text argument? (yep, just read the help manual for readr)

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome everyone to today's webinar. We're going to talk about reproducible examples from a conceptual point of view and why they're surprisingly important, and then also a great deal from a mechanical point of view, how to make your reproducible examples in a way that they're easy to share with other people. This short link, the rstud.io reprex, I promise it will always point to something very relevant to this package that will link to absolutely everything else.

Basic usage of reprex

So the first thing I want to do is show basic usage, and we're just going to get right into it and then we'll unpack what you just saw. So I'm sitting here in an RStudio session, it's fresh, and I have a little bit of code up here in my source editor. I'm going to make a factor X, a factor Y, I'm going to combine them and get what to most of us is kind of a puzzling result. So this is just going to be an example of a small piece of code that maybe you want to talk about on the community site or share with your local R expert and ask what's going on.

So this is how you would use reprex to turn this little snippet of code into a reproducible example. This is the path of least resistance, we'll talk about other methods later. So I would select a little piece of code and copy it to my clipboard. And over in the R console, I'm going to type reprex. And you will see that that little piece of code is run. And then basically a beautiful, attractive version of that is stored on my clipboard and I can preview it here.

So if I were to paste the contents of my clipboard right now, you actually see what's called markdown and this is what's necessary to create the attractive version of this code. And why is this helpful? Because you can go to places like GitHub, the RStudio community site or Stack Overflow and paste this markdown in. So I'm going to show you what this would look like in a GitHub issue.

So that's that same markdown that you just saw. GitHub lets you preview things. And you'll see that it looks just the way it did locally for me. It's been rendered, it's syntax highlighted. We have a tiny little ad down here that tells people how you did this. And I could submit that as a GitHub issue. So that is the basic process.

And the reason I can just type reprex is that I always have this package attached. And so you might need to call this and we're going to talk a great deal about that next. So that is what basic reprex usage looks like. It creates a small little piece of code, renders it nicely, and it's ready to paste into other formats.

Motivation: help me help you

So this is just a static version of what we just did. And this is the GIF that I use on the reprex website. I rewatched this clip. It's from a movie called Jerry Maguire, it's still highly recommended. And basically the reason for bothering to do all of this is if you're going somewhere to have a conversation about R, to have questions answered, or to describe a bug in software. Being careful about how you make your reproducible example makes it much, much, much easier for other people to help you.

Being careful about how you make your reproducible example makes it much, much, much easier for other people to help you.

And I want to explain where this word came from, reprex. So Roman first tweeted this and I thought it was just a great made up word. So it is short for reproducible example. So it is a completely made up word. But it's just very handy. And I'm going to use the word reprex over and over and over again in this webinar. So I want to be very clear that I'm using it in, I'd say, three distinct but related ways.

So I think people, at least in the small R community, are starting to say reprex just as a noun, like it is a reproducible example. And that has nothing to do with whether you use this package or not. But then today's webinar is going to show you use of a package with that same name, reprex, that you can install from CRAN. I'll show you how to do that in just a moment. And then this is a pretty small package. It has a couple of functions, but really the main function it has is also called reprex. So this webinar is going to talk about how to use the reprex function inside the reprex package to produce a good looking reproducible example.

When to use reprex

And when does this come up in your life? It's very handy for conversations that you have on community.rstudio.com. It's very handy for preparing questions or answers for Stack Overflow. It's very handy for reporting bugs or making feature requests for an R package that is developed on GitHub. And also very useful for having detailed conversations about R in Slack or in email. So a reproducible example, conceptually, is useful in all of those places. And then this package, reprex, smooths over some of the mechanics.

Installation and setup

So here I want to talk to you about what you're going to need to do on your computer to make this package available to yourself. So reprex does not come with R. It does not come with RStudio. You have to make an explicit effort to install it. So you should pick one of these methods, and it's the type of thing that you do once per computer.

So you could use install.packages, open quote, reprex, close quote, to install just the reprex package. It is also part of the meta package that we call tidyverse. So if you did install.packages tidyverse, reprex would be one of the many packages that get installed on your machine. In general, there is very little harm that you can do to yourself by reinstalling packages. So you also should not stress out too much about, you know, you could install just reprex and then install the tidyverse and nothing bad will happen.

So do it once per machine just because that's the minimum you need to do, but it's no tragedy if you reinstall things. Once you've installed, you still need to use our sort of library call to make the reprex functions available in your R session. So you would need to do this in every R session that you plan to use reprex in. So that might be something that you do multiple times per day, certainly way, way, way, way more than once per computer. So every time you want to use the package, you'll need to execute the library reprex command.

Now, I use reprex several times a day for the most part, and so that would be very annoying. So an alternative, if you also become a semi-heavy user of this package, is to make it available to yourself all the time. So you can control the startup behavior of R through a file called .rprofile, and conventionally, it's found in your home directory. And so this snippet of code, suppress messages require reprex, looks a little different from what you just saw, but it's sort of a better way to attach this package in your startup file. So you would put this snippet of code there in your .rprofile, and then forevermore, when you start R, the reprex package would be available.

And if you have never thought about your .rprofile file before, there is a function in the use this package, which you would also have to install, that will create it for you if you don't have it. Or if you do have it, it would open it for you for editing in case you wanted to put something like this snippet in there. So once you've done those two things, you've installed it and you've attached it through either the library command or by putting something in your .rprofile, you are ready to use the reprex function.

And what you saw me do in that first demo was I actually called the reprex function explicitly in the R console. And something you'll see me do before we're done is we also have put some what are called RStudio add-ins into this package that give you even more ways to launch this function.

Using .rprofile responsibly

This is a bit of a sidebar, but since I have shown you how to put things in your .rprofile file, I also want to tell you how to do that responsibly. So reprex is a workflow package. It's something you would use in your daily work to make your life a little bit easier. You use it interactively. I would be pretty shocked to see it show up in a typical person's R scripts, R markdown files, packages, or shiny apps. And so the fact that it doesn't show up in those things, it's an interactive package, makes it safe to attach in your R profile.

But I don't want you to get the wrong idea and think, oh, my God, I should do this with all my packages. So I would not want to see this kind of code using like dplyr or ggplot2 or things that do show up in your scripts. And it's because your scripts would then become highly not self-contained. And they would work for you because of stuff in your R profile, but they won't work for other people. So this is an interesting technique to know about, but you need to be really, really careful about what you do here. So I think it's safe to put reprex in there. It is not safe to put dplyr in there.

Background and philosophy

Okay, so we're going to get back to the package and to reproducible examples now. So I wanted to give a brief intro, like what on earth in my life drove me to make this one of my missions. Before I joined RStudio, I was a professor at the University of British Columbia, and I had a course called Stat 545 that has a lot of content online to this day. The course continues, so you could go there. And I ran this course entirely on GitHub. And so it meant that all of my dialogue with students, both sort of me to the whole class and me talking to individual students, took place in GitHub issues.

And I actually analyzed my GitHub usage in the course, and I found that every fall I was participating in at least 300, maybe 500 GitHub issue threads. And that's just in my teaching life. So I spend a great deal of time talking about R in those places and solving people's R problems. And to do that well, I actually wanted to use executable R code, and the friction involved in making that look good started to drive me crazy.

And now that I'm no longer a full-time faculty member and I'm working full-time on R packages, this just gives you a sense of the intensity of my GitHub activity over the last year. So now I still work with tons and tons of GitHub issues, you know, in a different capacity. And then I also talk about R a lot in Slack. So I still have this sort of hourly need to run little pieces of R code and share what I'm seeing with other people.

So trying to remove friction for myself and other people led me to create a few principles that I knew had to be true of a tool to make this easier. So this is the reprex philosophy. I think that conversations about code are much more productive if they contain three things. Well, it's one thing, but three properties. Code that actually runs, okay? Code that I do not have to run as the reader, but code that I can easily run. And so there is a little bit of self-contradiction here, but the point is you want to make it easy for people to interact with your reproducible example in a whole bunch of different ways. They can just be a consumer, they can just read it, or they can easily grab it and run it themselves, modify it, and share that back with you.

Code that actually runs: self-contained examples

So I want to be really detailed about what I mean when I say code that actually runs. So you're going to isolate a little piece of R code and you hand it off to reprex. You've seen one demo. We're about to do a whole bunch more. That code is taken and it is run in a completely new R session, and that means it has to be completely self-contained. So it must include the command to load all necessary packages, and it must create all necessary objects. And this can be very frustrating for people, but it's extremely important.

So I'm going to go do this live to show exactly what I mean. Okay, so I'm looking at an R script that contains the code you just saw on that slide, and I'm going to restart R. So let's imagine like a typical interactive R session. So I'm going to be down in the console here, and I'm going to say, oh, I'd like to play a little bit with this praise package I've heard about. So there I go. I say library praise down on the console. Now up in my source editor, I make a new object called template, and it's a template string exclamation. Your reprex is adjective. And so if I then call the praise function from the praise package, I don't expect you to know this. I'm just using it as an example. It's going to create like random little sentences for us praising someone for their awesome reprex.

So let's say I want to share my joy about this with people using the reprex package. I would select this little snippet of code. Again, this is the long way. I'll show you a short way later. Copy it, go down to the console, type reprex, and hit return. And now let's look at the preview here. It shows defining template, and then my praise call fails. Error in praise. Could not find function praise. And that's because you don't have the library praise command here. So over in that fresh R session, the praise package is not available to use.

So here's something else you might do. You're like, okay, I'm going to add that command. Then I'm going to make my call to the package. Let's see if that works. Copy, call reprex again. I have a new error. Error in grep, whatever, whatever. Object template not found. So this snippet is incomplete in a different way. It actually doesn't contain the code that defines the template object. So here's the full snippet. It loads the praise package. It defines the template object, and it makes this function call. So I'm going to copy all of that to the clipboard. Re-execute reprex, and we have made an exquisite reprex.

So that's a little belabored, but when I try to answer our questions for people, and I try to run their code, the two most common ways that I fail are that they haven't explicitly listed all the packages they're using, and I have to either sleuth it out of them or figure it out for myself and add those commands, or the objects they're referring to are not available to me. And so those are the two reasons why I can't run their code.

Do's and don'ts for good reprexes

So on the reprex website, I have a list of do's and don'ts that are distilled from a lot of other really fantastic sources about creating reproducible examples, which are referenced there. But the three big, big high points are you need to write this reproducible example using the smallest, the simplest, and the most built-in dataset you can get away with, and that is very uncomfortable for people. I'm going to talk about that in a second. Include commands on a ruthlessly strict need-to-run basis. So you really need to strip your example down. And then I say pack it in, pack it out, and don't take liberties with other people's computers.

And this is referring to making sure that if you create files, you remove them, or if you change the working directory, you reset it. If you change options, you reset them, but basically leaving things as you found them. But I want to talk about, so let's see, here's what that web page would look like if you want to read it more. But let me just give a short example of something that a lot of people struggle with, which is that they feel like they have some big, hairy data object, and they can only show their example using it.

So tricks to know, so the read CSV file you probably think of as normally being a function that you use to bring data, sorry, the read CSV function is something you usually bring delimited data in from a file, but it also has a text argument that allows you to inline really tiny R objects. And then also just the data frame function itself. So I'm going to reprex using a keyboard shortcut those two snippets of code. And see, that's a very easy way to make a very tiny data frame either inline using read CSV or sort of from first principles using data frame.

And then if you are a tidyverse adherent, the tibble package is what takes care of the care and feeding of tibbles, which are a flavor of data frame. And the tribble function is extremely useful for creating tiny little data frames because it allows you to write it in this really humane row wise way, like the same way it would look in, for example, in a CSV file. So if I reprex this little snippet, you'll see very, very similar output as what we just saw with the base function, but it allows you to inline the creation in this case of a two row, two column data frame, or you can, again, use just the tibble function directly. And so if you make a lot of reprexes, you get really good at figuring out how to inline the creation of very small objects.

Code that readers don't have to run

Okay, another principle is that the reprex should contain code I do not have to run. Because a lot of your readers have a great deal of R experience. And sometimes, not always, but sometimes they can quickly see the point without actually running the code. But that is greatly enhanced if they can see the output instead of having to run it in their head and in their imagination and try to figure out what's happening. It's just much easier if you can actually see the output. And so that's why I think it's important that your typical reprex contains the code and it also reveals the output being produced by that code.

So here's an example I took from the GitHub repository where the readr package is developed because it's a perfect little example, and it probably was produced with reprex. You can't tell. And this person is just reporting a bug, but it's like a great minimal example. It says, you know, if the header in your CSV contains quoted new lines, you get kind of a weird column name and you get weird data. And the fact that this person provided a small example and it completely shows the problem, I imagine, is what allowed the maintainer, Jim Hester, who's listening to this call, to quickly label this as a bug. And we've already got at least one other user giving it a thumbs up, meaning they've experienced it as well.

And so if you would only have the code here, I think you'd have a lot less sort of quick engagement with this issue. Okay, so code that I can easily run is very important, and we're going to keep working with that issue. So if that person had instead copied and pasted the output from their R console, this is what we would be faced with. So if I were Jim Hester and I needed to reproduce this issue and make sure that it's still a problem, I have a lot of really annoying editing to do. So I have to get rid of all the prompts at the beginning of the lines. I have to get rid of all this output to isolate the three lines of code that actually do anything. So copy-paste from the R console hits some of our checklist, but it's not great because it's very hard for the next person to run this code.

Worse than copy-paste is the screenshot. So this, of course, does, again, hit some of our checklist. It clearly shows the code and the output. But again, if somebody else wanted to check this and reproduce it, they actually have to retype everything, which, frankly, is never going to happen. And so this is what I want to see in a reprex because it can be copy-pasted and run. So I'm going to prove that to you right now.

So if I go to this issue on GitHub and I copy, I could copy all of this or I could, as long as I get all the commands, I'm okay. So I'm going to put that on my clipboard. I'm going to go back to R, maybe to make this really explicit. I'll show you what I copied. All right, that's what I did. So I can copy this again and call reprex. And I get exactly what this person was reporting on GitHub. So I've been able to reproduce it very quickly from a copy-paste.

But as you saw, reprex is like, are you sure you want to do this? Because I've got this output here. And so if you really want to get really clean code from a reprex that someone else has made, you capture it and use one of the undo functions in the reprex package. I could use reprex clean, and I'll show you that right now. So here's what I copied from GitHub. So I could copy that and call reprex clean. And now if I paste, you'll see all the output has been eliminated. And so I think that's a slightly obscure thing you might want to do. But there are the full set of backwards functions in reprex. So it helps you take code that people have copied from the console or that they have already made a reprex from.

Shock and awe: advanced features

So we've gotten to essentially the meat of the webinar now. So if you were really interested in basic usage, you've seen it now. And now I'm going to go into the shock and awe section, where I run through a lot of more interesting features of the reprex package that I still think are pretty cool. So the slides show you what we're about to do live. I'm going back to RStudio, and I'm in a script called shock and awe. I'm going to restart R just for good measure.

So the first thing I want to show you is how frictionless reprex can make it to talk to people about figures. So I'm going to load the Gapminder data and ggplot2, and I'm going to make a plot with ggplot2. So you see it down here in my plots pane. So let's say that there's something about this I don't like or that I want to discuss with a colleague. I can use reprex for this. So as usual, I can select the snippet, copy to my clipboard, and run reprex. You're going to see all the same stuff. So we've got a nicely rendered reprex that includes the figure.

But watch this. I'm going to go to a GitHub repo that I created just to play around with. I'm going to create a new issue. I have a question about this plot, and I'm going to paste. Let's look at what we've got. We have the usual sort of nicely formatted markdown, and look at this. So when reprex rendered this code, it made your figure and pushed it up to Imgur and dropped this link into your markdown. So if I submit this issue, people see my code, and they see the actual figure that you just made. So this is an example of one of the cool things you can do that removes a tremendous amount of friction if you're trying to have a quick conversation with somebody about code that produces figures.

OK, so we're going to go back to this shock and awe script, and we're going to execute reprex many times, showing some of the options and different arguments you have. So so far, I've only shown you reprexing when the source code is on the clipboard, but there are a lot of other ways to provide the input. So you can provide it directly in the reprex call as an expression. So here you see that the assignment of x and y gets done, and we compute the correlation between them. There's also an input argument that I'm actually not going to demonstrate where you can provide the source as a file or as a character vector.

Reprex, by default, goes and does its work in the session temp directory. That's all part of it sandboxing all of your work. But if your reprex does, for example, file input and output, it could be much easier to force reprex to work in your current working directory. So out file equals NA is shorthand for that. So if I try to if I ask R to write the first six letters of the alphabet to a file without file equals NA, all of a sudden, these four files that reprex needs to create are being left behind in my working directory instead of in a temp directory. And it's the R script that reprex makes. It's the HTML that it uses for the preview and the markdown that it puts on the clipboard for you. So all those usual files are left behind in a much more accessible place, but you'll notice it has a godawful file name because we just created it out of thin air. So if you want to work somewhere specifically and have nice file names, you could also provide the base for that in out file. And now you see that it leaves the same four files behind, but they have a much better file name.

OK, so so far I've been producing reprex output that's optimized for GitHub. So it's producing what's called GitHub flavored markdown. But stack overflow is another common target, and that produces slightly different looking markdown. Stack overflow doesn't use fenced code blocks, it uses indented code blocks. And let me show you what this would look like. I'm going to pretend like I'm going to answer my own question on stack overflow, but I won't actually submit this. But now if you paste that into stack overflow, it also has a preview feature and it will be formatted correctly for stack overflow.

You can also make reprex produce, it creates an R script, which seems sort of weird, but an R script that includes the output as comments, and that is very handy for pasting into an email or into Slack. So I'm going to show you the Slack version of that. So this is me talking to myself on Slack. I'm going to create a code snippet, paste it, maybe I always have it set to R, create a snippet, and that would create a little R file in Slack, properly syntax highlighted. And again, people could copy paste it into R and run it, or sometimes you just want to inline it. You don't get the syntax highlighting, but that also looks quite nice.

The final venue I'll talk about is RTF for rich text format. And this is a very experimental venue. It only works probably on the Mac at this point, because I actually have to call an external utility to do this. But I'd like to show you this, and in fact, it's how I made the slides for this talk. So we run a little bit of code, but now I can go over to Keynote or PowerPoint or something. And I could paste that in, and I'm getting rendered R code that is properly syntax highlighted. And that is, in fact, how all the snippets in my webinar were produced.

You can suppress the inclusion of that little ad at the bottom, or you can include it. You can ask for your reprex to include session info. And for the GitHub venue, it can be placed in this cute little collapsible thing. So this is a great thing to include if you think, for example, that the bug you're reporting could possibly be related to the version of software on your computer. And I love that it gets folded here. So then sometimes people include this when they don't need to, and it's kind of overwhelming. So the fact that we can put it in this folding tag is really nice.

Reprex can also use the Styler package to restyle your code. So here's a really, I would say, poorly formatted piece of code. So by default, reprex trusts that you know what you're doing and that you like your formatting. But if you don't trust yourself, you can explicitly ask for reprex to restyle your code and give it a much more conventional layout. You can be silly and change your comment string and make it some sort of emoticon if you want.

Reprex is part of the tidyverse, right? So the tidyverse meta package can be quite chatty at startup and tell you all the packages that you've just attached and if there are conflicts between them. So we actually have a special argument where you can control whether you want that or not. And usually you don't. So we default to silencing it.

And then the last thing I'll show you is reprex can actually capture input that in an interactive session shows up in your console, but it's actually being sent to standard output or standard error. So I'm going to install a package from GitHub that requires compilation. This takes a moment, so I'm going to chat over it. But what you're going to see when this reprex actually renders is that we have captured everything that would normally show up in the R console when doing this. So the stuff that is sort of coming from R as well as the things that are being sent standard output and standard input.

So this captures the output of installing the bench package from GitHub, which does require compilation. And so this is the part that's coming through sort of normal R channels. And this is capturing what's being sent to standard output and standard input.

Personal defaults and RStudio add-ins

So that was a very quick live demo of some of the more, I don't know if they're really advanced, but features you don't need in every reprex, but that you might need before long. A lot of the things that I showed you toggling on and off. You can actually set up your own personal defaults for these things by, again, putting some code in your .R profile, which we've already talked about. So this is just an example of someone who hates the ad. They always want to include session info. They always want to restyle their code. They have a whimsical sense of what the comment string should be for output, and they always want to see the tidyverse startup message. So these are not my defaults, but it's an example of what you can do.

And the last thing I'll show you mechanically is most of the time I do not do what I've shown you, which is copying code to the keyboard and then going to the console and typing reprex. There are two RStudio add-ins that really accelerate your reprex life. One of them is called render reprex, which launches a GUI. I'll show you that in a second. Or reprex selection, literally reprex is the code that you have selected, and it's absolutely conceived for use with a keyboard shortcut. And RStudio lets you modify your keyboard shortcuts. And so I have bound that add-in to shift command R. This is how I usually use reprex. And for example, Hadley also uses it a lot. He has bound it to something else, and let me go show you the add-in.

So again, I could select the snippet of code that made that figure and launch the add-in. And so this allows you to specify a lot of the things that you can specify in the call by clicking. So I'm going to take the source from the current selection. Let's target stack overflow, and yes, let's append session info. Click render, and the usual things happen, and the usual output appears down here. And stack overflow doesn't have the capability to support this little folding toggle, so the session info actually gets dumped in there, and it's full glory. So these are two other ways to get your input into the reprex function that actually are probably more humane than typing it all the time.

The human side of reproducible examples

All right, the last thing I'll say we have, I'll try to go quite quickly because we're already at 45 minutes, is talking a little bit about the human side of making reproducible examples. And now this has nothing to do with the reprex package. It's just about asking questions so that they actually get answered. And I like this image because it conveys somehow that, you know, we're talking about programming, maybe we're all supposed to be acting like robots, and people often seem to assume that they're talking to robots, but that there's a lot of humans involved in this process.

And I want to warn you, I'm getting a little bit of tough love here. There's been a lot of, although probably still not enough, talk of experts being empathetic to newcomers and question askers, but since this is a talk targeted at people asking questions and preparing examples, I also want to say it has to go the other direction as well. So bear with me for a moment here. But I need to say, you know, with all the love in the world, sometimes people come with a question and they have like a very rigid theory about what's wrong or how they should be solving a problem. But if your theory about what was going wrong was so great, like you wouldn't be here asking this question right now.

And this is the origin of why people really want to see code instead of having sort of a prose discussion. Because it's very hard sometimes to tell what people are really talking about. The other life phenomenon I want to link this to is, I don't know if you've had this experience, but if you've ever tried to help, for example, one of your relatives sort out a computer problem over the phone, it can be extremely difficult. A lot of what they're saying doesn't really make sense. They don't use the words you're used to to refer to things. You just feel like you can't really get a grip on things. And this is basically what it feels like when you're trying to answer someone's programming question just based on English prose. And again, like this is why people constantly push you to actually just show a small piece of code. It removes all sorts of ambiguity.

So let's assume that everybody, the question asker and the question answerer, is acting in good faith. And if they're not, then they're irrelevant to me. Okay, so everyone's in good faith. It turns out that experts posting on public sites actually are afraid to post code that doesn't work. And so another reason why these people want to see your code is, you know, they're not just reading it and guessing. Most of these people are actually running your code, proving that their proposed solution works. And then they post it when they know that it's safe to do so. And this was a big revelation to me. I really used to think that the people I looked up to as experts just knew all this stuff by heart, and they were answering all these questions just off the cuff. And then it gradually dawned on me that part of why they're experts or expert behavior is that they are constantly running lots of small examples and experiments.

So sharing your problem in code is extremely fruitful. Solving a good reprex is a lot of work. Like sometimes you think I can only show my problem in my R session, and I haven't restarted R for seven months. And it requires the full data set from my thesis. And that is in fact true, it is a lot of work. But you're asking other people to solve a problem. And so this is part of meeting them halfway. But it turns out you get a lot out of this as well.

So let's be very selfish. If you make a good reprex out of your hairy, messy problem, and if you reproduce other people's problems, even reproducing other people's problems is a real service. And then sometimes you're going to be able to solve them. It turns out this discipline, it's like playing scales or serving over and over again. You actually get better at programming by doing this. The last selfish point that I'll make is it turns out when you sit down to make a good reprex out of your problem, and you keep it self-contained, you strip down your giant hairy data set to the smallest data set that reproduces the problem, it is amazing how often you end up answering your own question in the privacy of your own home, and you didn't have to make yourself vulnerable to other people.

The last selfish point that I'll make is it turns out when you sit down to make a good reprex out of your problem, and you keep it self-contained, you strip down your giant hairy data set to the smallest data set that reproduces the problem, it is amazing how often you end up answering your own question in the privacy of your own home, and you didn't have to make yourself vulnerable to other people.

So this is a great revelation. And I think the reason this works is that when you have a problem, it's very easy to just keep going in circles and banging your head against the desk. But there's something about preparing it for other people, and the reprex package is also being a real hard-ass about making sure that your problem is self-contained. It kind of knocks you out of that very unproductive place and gets you back on the path of actually working the problem. So most people report this when they first start making reproducible examples, is that it's kind of amazing how often this exercise means you actually answer your own question.

Conclusion and Q&A

I want to give a huge thank you to Yi-Hui Zhe and all the people who have brought us the R Markdown package and Pandoc. The reprex package is just a wrapper around those things, and I mean that in both senses. Like, that is literally all that it is. So in some sense, there's not much there there. But on the other hand, when I first made it, and then especially now that I've worked on it more, there's actually a lot more going on. Like, the friction that it removes is friction that is really important to remove, I think. So in any case, it could not be possible without R Markdown and Pandoc, and in fact, I should thank all the co-authors of reprex, and there's a lot of people who have contributed to it and a lot of users who've been extremely generous with their bug reports and feature requests and reporting how things work on different platforms.

So to conclude, go forth and engage in very precise code-heavy conversations about R.

Okay, okay. So I can see these, you cannot. Can reprex be able to also capture variables in the environment and include in the reproducible example? Currently it cannot, and that is very intentional because we really prioritize this sandboxing. I am actually contemplating basically creating a backdoor for this, mostly because of creating RTF snippets, but at the moment it cannot, and that is intentional, but you know, I'm open to people sharing their use case about why that should be.

But basically if you do this, yeah, there are ways you could do it that actually make those objects available to other people as well, but it becomes a much heavier weight package. So, so far we have not done that. Oftentimes, I think this is a continuation of that question, the issue is buried in some part of larger code, and rather than going back to the places where that data is gathered or calculated, wouldn't it be nice to grab the variable in the current state? Again, like yes and no, I would say extracting that out is perhaps part of your job.

Does reprex, the function, actually check that the code is self-contained, or does it just generate a template? It just generates a template. So, the way you would find out that your code is not self-contained is that you get an error. But it explicitly allows errors, because sometimes you're using reprex to show errors, so no, it is just, it's on you, the human, to decide if you're getting the result you expect.

Does readr, readcsv, support the text argument as an alternative to file, or is that only in read.csv? Readr also does this. It also supports the inline creation of a data frame. It does it through its primary argument, so you would want to read the help for read underscore csv, but in fact, it's actually sort of even a smaller departure from your usual use.

I think I have another instance of the same question, which is, could reprex help you package up an existing data set and stick it into your data? What reprex does and don't document outlines various ways of doing this. For example, deput is a great function. If you have a slightly awkward object and you simply cannot make your point without using it, deput at least creates a representation of that that you can put into a reprex. But so far, you're right. I have not made this terribly easy. At least the story I tell myself is that finding a really simple object and making it is intrinsically part of the reprex task, but this does come up over and over again.

Can you do a reprex for a Shiny app? It is not easy. I think the Shiny team is or has developed their own web page talking about how you create Shiny reprexes and they don't use this package. So now I'm talking about reprexes as a concept.

Can reprex produce interactive plots? All I will say is I have not done so. So I'll end on this question. I don't know. It would have to be tried. But what I will say, the slide that I skipped over, what actually happens, so currently reprex works by dumping your R code into a templated R script. And I have contemplated revisiting this and dumping it into an R markdown file. And then it would be much more capable in terms of what you can do. In some ways, there's a lot of equivalence between rendering R scripts and R markdown files, but there are some key differences. So I could imagine that the ability to include interactivity is probably present if you're using R markdown and could easily be absent if you're using R. All right. I think I will leave it here and conclude the webinar.