Practical {renv} (Shannon Pileggi, The PCCTC) | posit::conf(2025)

Transcript#

This transcript was generated automatically and may contain errors.

I'm not the first person at this conference to talk about renv . In 2020, Kevin Ushay introduced the renv package. And in 2022, David Aja gave the talk, You Should Use Renv. So where are we at today in 2025?

Here's some posts I've pulled from social media. Andrew says he's about to resubmit an R&R, which means it's time for the obligatory fight with updating renv. TJ says, the restore on renv never works on the first few goes either. And Libby says, I have a tolerate-hate relationship with renv, which is unfortunate because renv is a wonderful package.

And Kevin and David are both fantastic software engineers, but sometimes there's a divide between software engineers and data scientists such that they cannot anticipate all of the truly weird and wonderful ways we try to use their package.

So this talk is for frustrated renv users and potential renv users. If you're in the potential bucket, I'm gonna do my best to get you up to speed, but I might go a little fast for you. And for the frustrated renv user, it's because you've had an experience like this, where you attempt to renv restore, you download 19 packages in 25 seconds, and after all of that time waiting, you have a failure. And at this point, you're trying things again in a different order, and you feel like you're banging against a copy machine hoping for a different outcome.

And I think the problem is, when users open a project backed by renv, they feel like they have one button they can push, and that's renv restore. So I hope to convince you you have a second button you can push, which is evaluate your project state against your current R configuration.

So I hope to convince you you have a second button you can push, which is evaluate your project state against your current R configuration.

Even if you do that, I'm not promising it's going to be smooth sailing. This is me preparing for this talk. I submitted the talk title, How to Make Renv Actually Work, and I downgraded it to Practical Renv. So my goal is to empower renv users to successfully restore a project environment by either diagnosing or avoiding restore errors.

To do that, we need a shared baseline understanding with an intro to renv. We're also going to need to get into the weeds of package installation, because that is what restore does for you. And then we're going to get into a concrete example with projects over time.

And if you find yourself lost and wandering to this undefined territory, it's not your fault. It's nothing to be ashamed of. With RM, unfortunately, it is quite easy to accidentally do something that is quite unnatural to the R ecosystem.

So am I doing something weird? Yes.

So the correct way to freeze a project is to return to its initial state. I want to revert back to that original R version. So let's go back in time. Now, if you're not aware, you can have multiple versions of R installed on your machine, and a wonderful tool for switching between those R versions is called rig , which is known as the R installation manager. So in your terminal, you can submit rig default 4.4.2. So now, I've reverted back to my original version of R, and at this point, RM restore becomes a trivial and easy process.

So I can successfully install package versions from my repository specified in the lock file. And remember, that was originally set to CRAN when I started this six months ago. Now, I've advanced in the future, and when I'm looking at my binary availability chart, if I'm really trying to go back in time to this version 1.8.9, that's no longer available on CRAN in binary. So if I want to take advantage of that binary, I can go to P3M to get that binary. And you can do that by supplying the repose argument in the RM restore function. And there's a third way you can do this, which is with the RM checkout function. So in RM checkout, you supply a date, and then RM will install date-based package versions such that they fetch the latest available versions up to that date.

Let's talk about the manage decision. So I'm calling it manage when I want to maintain my R version of 4.4.2, but I want to advance to JSON Lite 2.0 to use the latest and greatest. And remember, this is an okay place to be in the R ecosystem. This is pretty simple. In this case, I would do install of JSON Lite, and then snapshot to record that to my lock file.

And lastly, what about update? Update means I want to bring everything forward into the future, including my R and my package versions. And in our R ecosystem map, this is a great place to be. There's three steps to updating. First, you'll execute RM upgrade to upgrade RM to the latest available version. Then you'll execute RM install to install all latest package versions. And lastly, snapshot to record those versions to the lock file. And I want to be clear here that if you're actively working on a project, regularly updating is the happy path. It allows you to reckon with upgrading your package versions in a controlled manner.

Summary and recommendations

So to summarize, RM places an immense amount of responsibility on the user. And it is a package that you should truly read the documentation front to end. Because what we've talked about here in terms of what RM does and does not do for you is listed in the caveats. So please, read the documentation.

And another framework that's in the RM documentation is how you handle active projects. It gives you these kind of decision-making frameworks. And if you're using RM on an active project, if you're thinking about the freeze option, I just want to keep those package versions stable, that's really just short-term sustainable. It's gonna come back to bite you, and it's gonna be harder the more time that elapses. Managing by doing one-off upgrades of a package is also short-term sustainable. It's gonna really kind of lead to a fragile R ecosystem. And then updating is truly the only long-term sustainable way to go.

And if you think about the reproducibility spectrum, just remember where RM sits, right in the middle. There might be times and places where you need to go to the upper end of that reproducibility spectrum and use something like a container, like Docker. And I love this example from Andrew Heiss. He's got a wonderful repository that shows you how you can execute code from a published paper using all methods of the spectrum, using Docker, using renv, or just using packages installed locally system-wide.

So to go back to what David said, you should use renv, I agree, but you should use it with intention. There's a lot of places you can get into the weeds here. I didn't have time for everything. So if you want more tips and details, you can go to my GitHub repository, Shannon Pileggi, PracticalRenv. And I wanna give you a huge thank you to David for helping me reason through all of this, and Kevin for also listening to this, and everyone else who's supported me along the way. Thank you.

Practical {renv} (Shannon Pileggi, The PCCTC) | posit::conf(2025)

Transcript#

Intro to renv and reproducibility

Package installation and libraries

Projects over time: freeze, manage, or update

Summary and recommendations

Featured software#

renv