
It's a Great Time to be an R Package Developer! - posit::conf(2023)
Presented by Jenny Bryan and Hadley Wickham (Due to unforeseen circumstances, Hadley Wickham presented this talk "slide karaoke" style, from materials prepared by Jenny Bryan.) In R, the fundamental unit of shareable code is the package. As of March 2023, there were over 19,000 packages available on CRAN. Hadley Wickham and I recently updated the R Packages book for a second edition, which brought home just how much the package development landscape has changed in recent years (for the better!). In this talk, I highlight recent-ish developments that I think have a great payoff for package maintainers. I'll talk about the impact of new services like GitHub Actions, new tools like pkgdown, and emerging shared practices, such as principles that are helpful when testing a package. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Package development. Session Code: TALK-1132
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi everyone, so unless there's something seriously wrong with you, you'll probably notice I'm not Jenny Bryan. Unfortunately, Jenny tested positive for COVID, and I volunteered to do slide karaoke for her. So I have never seen these slides before. I am, however, familiar with the topic, so we should be good. And I will say that Jenny developed these slides knowing that I will be delivering them live with no preparation, so there may be some surprises along the way. And hopefully this will take me about 20 minutes, but we'll see.
OK, so the point of this talk really is to announce that there's a new version, a new edition of the R Packages book. I wrote the first edition, Jenny contributed most of the second edition, really, I think, updating it for the ways that our package development workflow has changed, and also to introduce this new whole game chapter that Jenny wrote, which really introduces you to the whole flow of developing a package, which we really strongly believe that with DevTools and use this and Roxygen and test that, is really, really easy these days. If it takes you longer than 60 seconds to create a package, something is wrong with your workflow. Now, obviously, that's not going to be much of a package in 60 seconds, but it's going to get you started and on to more things.
The R Packages second edition
So in this slide, you will see, I think, you know, yeah, so we worked, so the first edition of R Packages, like, really, you can see, kicked off in, like, 2015, and nothing really happened for, like, five years, basically, with this book. It just sat. It was, you know, live, people were using it, but it wasn't being updated, even as the practices that we in the Tidyverse team were using. So a couple of years ago, I was, like, we probably should really update this, and, hey, Jenny, you know, you do lots of package development. Wouldn't you love to write a book? And Jenny, because she had never written a book before, said, yes, I would love to do that. I think she has some level of regret about that, but overall, I think we're very, very happy with the second edition, and I believe that Jenny will not be writing any other books in the future.
So the second edition really is, you know, it's a lot bigger, I think. So we've doubled the number of pages, we've doubled the number of authors, and we've done something to the number of chapters, which is very hard to see due to this extremely deceptive 3D bar chart.
I will tell you one really cool thing about 3D bar charts, and that is if you want to know, like, the first time or the first time I know of that someone, like, railed about how bad 3D bar charts were, there's this really cool paper from, like, the 1940s saying, like, don't use 3D bar charts, and that just blows my mind, because it wasn't just, like, pick a setting in keynote, it's, like, go away and draw 3D, like, who wants to draw 3D bar charts by hand?
What's changed in eight years
So really, 2015, three main packages. Since then, the number of packages has really exploded as, I think, I forget what we... the conscious uncoupling of dev tools into smaller packages. Again, you only need to use this sort of dev tools, that gives you all these tools, but behind the scenes, a much, much richer thing. So if there's only one thing you take away from this talk, it's that I think the easiest, highest impact thing you can do for a package is to give it a website. And you can do that in one minute. Like, you can get a decent website for your package in under a minute. You can certainly spend much more time on it to make it awesome, but a massive payout just for that one line of code.
the easiest, highest impact thing you can do for a package is to give it a website. And you can do that in one minute.
pkgdown demo
So Davis was joking earlier that Jenny was just gonna have a slide that said live demo, but she did make a video, and so here is a video showing a little bit of the package development workflow. We've got vignettes. We've got R files. We don't seem to have any tests in here. But this is a pretty simple package, right? With just one function and one vignette, and it's gonna be pretty easy to turn this into a website. We've already passed our command check. That's, you know, a really good practice to be doing. It's well-formed, and I think we're really ready to do something fun on the next line. Fun and amazing. And that is we're gonna call use package down. Now, that's gonna create some basic metadata. Not much excitement going on here with this metadata, but we can also do that with an add-in if we want, and that add-in is gonna build the package. It's gonna take the contents of the package, combine it with that metadata, and it's gonna make a bunch of HTML files for us. And that gives us this basic website. We've got a reference page. We've got a reference index. Every function gets its own web page. All the examples get run. They get hyperlinked. They get nicely syntax highlighted. Similarly for the vignettes. They get built in a way that you can really, really easily share for others. Everything is cross-linked, and footnotes get turned into nice pop-ups, and you can click on things, and things happen.
Now, if you use GitHub, if your package is in the open, there's even better. You can call use package down GitHub pages. What this is gonna do is automate the whole process of building that website. It's gonna call package down build site for you every single time your site changes, and it's gonna publish it to GitHub pages. So this is seriously like a one-minute workflow to get your website up on the internet where you can share it with others.
This uses GitHub Actions, which is a really great tool that ensures your code doesn't just work in the one place where you wrote it, but also works on some other random computer on the internet, and the chances are, if it works in two places, it's gonna work in just about any places. A bunch of actions are built in. Check standard is gonna run R command check, and combine all of these cool packages that you have locally with the power of GitHub.
What else is new in the second edition
What else is new in the R packages second edition? Lots of expanded coverage on the is it that we actually do? Lots more diagrams and updated diagrams. I think one of the things that's really changed in the last five years for us are practices around testing, things like code coverage, ideas about how to organize our tests. These are now written up in R packages in a way that is hopefully easy for you to read and understand and learn about.
We've also learned a lot more about how you look after the package in the long term, how you manage the process of releases, how you deal with CRAN in the way that is helpful for you and as helpful for them as possible.
So I think this is my favorite new chapter. This is all 100% Jenny's work. This is the whole game. This is really showing you, like, if you've never made a package before, if you've got no idea why you'd want to make a package, here's how you can go from that function you have in a random R file and get it into a form that's easy for you to share with others. That's easy, that's documented, that's tested, that's in Git, and then is hopefully being checked automatically by GitHub Actions.
So building on top of that, like a lot of the time, you're not going to start from a completely clear slate. You are going to start by trying to get the package that exists inside of your code currently out into the world.
And so really focusing on, like, what's the difference between writing code and a script? That's what you do all the time as a data scientist. How is it going to change? How do you need to think about code differently as you move to create a package instead of a script?
Perhaps more information about how a package is structured, more information about the various ways your package transitions from various different states. You know, how does your you're very used to using installed-up packages to get a package from CRAN on to your computer. You're familiar with using library to get a package that's installed on your computer into memory on a current R.
All of those things change a little bit when you start developing your own package because now you're going to have the source version of a package on your computer. You're not installing it from CRAN. You are building it from scratch. And your life, there's certainly ways to do this without DevTools. They are painful and horrible, in my biased opinion. And DevTools was really designed to make those problems go away.
I still remember, like, the first time I taught package development, it was like a two-day workshop. I turned up and I was like, okay, now go to the terminal and type R command build. And people were like, what's a terminal? And I was like, huh, people aren't actually born knowing what a terminal is and how to use it. And so rather than trying to teach the whole world how to use a terminal, we've invested a lot into the DevTool, the goal of DevTools is that you can stay in R. You don't need to switch out to use a terminal if you don't want to.
Another big workflow difference between packages from CRAN and your own package, you're going to use DevTools load all a lot. This is kind of a simulation of the package building and loading process. It's a simulation because it's designed to be as fast as possible. It's not 100% accurate, but it allows you to build this iterative loop where you try something out in the code, load it, and then you can experience it in the console.
This is one of the diagrams that I like the best, although as I look at it now, it looks kind of vaguely sexual. But really, these are the four functions, the four keyboard shortcuts, command shift L, command shift T, command shift D, and command shift E. This is your life as a package developer. You write some code, you edit some code, you load it, you run it. Once you've done some interactive experimentation with it, you write some tests, you check those tests work, you write some documentation, you look at that documentation. Everything in DevTools and the surrounding packages is designed around this idea of tight feedback loops. You might not know exactly what you're trying to do, but you can try it out, see what happens, and iterate.
If you want to learn more, I assume we're nearing the end of the talk. If you want to learn more, the best place to start is devtools.rlib.org. We've got some great cheat sheets there that were newly updated. I think they are hopefully included in the cheat sheets you got in the conference. Newly updated, all the latest advice on our workflow. And we're not finished, so now we're going to talk more about testing. How much? We've got like seven minutes.
Testing practices
We've got quite a lot of time. So, again, like one of the things that I said earlier, one of the, I think, things that we have learned as we, the tidyverse team, have become more experienced package developers is just the importance of testing. And one of the things one of the reasons that we've kind of exposed ourselves more to tests in the wild is now one of our policies is that if we make a breaking change in a tidyverse package, a change that breaks, causes packages on CRAN to no longer pass R command check, we are going to give you a pull request to fix that problem. And what that means is we are now like parachuting in to some random package on CRAN that has some random failing test and trying to fix it. And like this is challenging for me because I like to make things like clean and perfect from the ground up and I want to like rewrite everything, but I've just got to put my blinkers on and be like here's one tiny problem, let's fix it. But this is like forced us to see like what are the problems that people are actually having in real life. And that, like one of those problems we see is like how do you manage state and test. Like sometimes you need to sometimes your functions depend on things that are happening in the world outside of your function arguments. If you're modifying options, then if you do this the wrong way, this is going to affect like every other test that happens after this.
So we've been continuing to build out the withR package, which is going to let you just temporarily make those changes. The changes are going to be scoped to the current function. As soon as that function is over, the changes will revert. The other big challenge is like code that lives outside of tests. So if you're foofy2 does that test fails and now you want to create a minimal reprex, you're wading through like lines and based on experience of CRAN packages, tens, hundreds or possibly thousands of lines of code in this file trying to figure out exactly what does this test depend on.
So tests, unlike functions, it's okay to have a little duplication in your tests. It's not so bad to copy and paste in your tests because you are not like they're generally less coupled. Like if you change one test, you don't have to worry about updating every single other test. So it's okay to do more copying and pasting so your tests are easier to understand in isolation and if there are problems, they're easier to fix.
And it's always better to do this kind of like up front rather than when you're trying to debug like this failing test, which is causing your package to fail. I command check on CRAN. You've got an angry email and you need to like fix this and you're cranky. It's so much more pleasant if you're prepared for like angry you in the future with these very simple tests rather than trying to have to do them on the fly.
It's so much more pleasant if you're prepared for like angry you in the future with these very simple tests rather than trying to have to do them on the fly.
We've also continued to revise our advice on like where do you put stuff that can't live just in the test. If you're duplicating code in your test and you want to remove that, where do you put it? Helpers run by both load all and test so they're available interactively so when you're interactively interacting with your tests, they're available to you then, and not just when they're running.
Package lifecycle and releases
So we also have just continued to learn more about how do packages survive in the wild for long amounts of time. I've been kind of joking that like ggplot2 is about to turn 18 so it's time to emancipate it and it's just going to be responsible for its own maintenance in the future. But as many of you probably experienced in tidyverse packages already, we're moving to this explicit acknowledgment that there's some life cycle behind functions. Functions will stay in the stable phase of the life cycle for a long time, but sometimes we need to get rid of them and we don't want to get rid of them quickly so everyone has a chance to accommodate so now we're moving through this process of deprecation.
We're also huge believers of checklists in the tidyverse so whenever we release packages, we call this function called use underscore release issue that creates this beautiful checklist where all you have to do is work down this list.
And the entire package development ecosystem is no longer just a product of the tidyverse team, so I really want to appreciate everyone who has contributed to the books, to the packages, whether that's with code or issues or ideas or feedback, like this is a community effort. We love the help we get from others. And here is a photo of me and Jenny in Iceland.
And it is very cold in Iceland compared to where I'm used to, but not where Jenny is used to. We went to a really cool place in Iceland called, and I'm blanking on the name, the Sky Lagoon. In the Sky Lagoon, there's a hot springs thing. There's a Vancouver room, which is where Jenny's from. It's cold and raining. Jenny loved that room, and I hated it. And there's also the Houston room, which is like the sauna, where it's extremely hot and humid, and I loved it, and Jenny hated it.
So, again, if you want to learn more, rpackages is the place to be, and that is it. Thank you.


