
Practical {renv} (Shannon Pileggi, The PCCTC) | posit::conf(2025)
Talk title: Practical {renv} Speaker(s): Shannon Pileggi Abstract: The {renv} package aims to help users create reproducible environments for R projects. In theory, this is great! In practice, restoring a package environment can be a frustrating process due to overlooked R configuration requirements. Join me to better understand the source of environment restoration issues and learn strategies for successful maintenance of {renv}-backed projects. Materials - https://github.com/shannonpileggi/practical-renv Example - https://github.com/shannonpileggi/jsonlite-example posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
I'm not the first person at this conference to talk about renv. In 2020, Kevin Ushay introduced the renv package. And in 2022, David Aja gave the talk, You Should Use Renv. So where are we at today in 2025?
Here's some posts I've pulled from social media. Andrew says he's about to resubmit an R&R, which means it's time for the obligatory fight with updating renv. TJ says, the restore on renv never works on the first few goes either. And Libby says, I have a tolerate-hate relationship with renv, which is unfortunate because renv is a wonderful package.
And Kevin and David are both fantastic software engineers, but sometimes there's a divide between software engineers and data scientists such that they cannot anticipate all of the truly weird and wonderful ways we try to use their package.
So this talk is for frustrated renv users and potential renv users. If you're in the potential bucket, I'm gonna do my best to get you up to speed, but I might go a little fast for you. And for the frustrated renv user, it's because you've had an experience like this, where you attempt to renv restore, you download 19 packages in 25 seconds, and after all of that time waiting, you have a failure. And at this point, you're trying things again in a different order, and you feel like you're banging against a copy machine hoping for a different outcome.
And I think the problem is, when users open a project backed by renv, they feel like they have one button they can push, and that's renv restore. So I hope to convince you you have a second button you can push, which is evaluate your project state against your current R configuration.
So I hope to convince you you have a second button you can push, which is evaluate your project state against your current R configuration.
Even if you do that, I'm not promising it's going to be smooth sailing. This is me preparing for this talk. I submitted the talk title, How to Make Renv Actually Work, and I downgraded it to Practical Renv. So my goal is to empower renv users to successfully restore a project environment by either diagnosing or avoiding restore errors.
To do that, we need a shared baseline understanding with an intro to renv. We're also going to need to get into the weeds of package installation, because that is what restore does for you. And then we're going to get into a concrete example with projects over time.
Intro to renv and reproducibility
This is from the first sentence of the website. The renv package helps you create reproducible environments for your R projects. So let's break down this important word, reproducible.
There's a spectrum of reproducibility. On the lowest end of the spectrum, we have code, and the middle ground, we have code and associated package versions at the time of execution. And on the highest order of reproducibility, we have code, package versions, R version, your system dependencies, and your operating system. And where renv fits is in the middle, with code and package versions, which is both the beauty and the challenge of renv. The beauty being that it doesn't have all of the overhead of the upper end of the spectrum, and the challenge being that it doesn't address the overhead of the upper end of the spectrum.
We also need to talk about this really important word helps. The renv package helps you create reproducible environments. It does not do it for you. When you use renv, there is an immense amount of user responsibility.
So here's an example project. I'm going to call it jsonlite-example, and I initialized it in February of this year. At this point, renv is not activated on this project. I have an rproj file, and I have a single file called script.r, which has one line on it, library jsonlite. I can initialize renv with renv init, in which case I get the message that I'm recording versions of R and packages in my lock file, and in my project directory, I additionally get a .er profile, an renv lock, and an renv directory.
And then when I'm iterating on this project, I'll do some combination of installing new packages and snapshotting to record those to my lock file, and this is pretty smooth sailing.
And so what I have is I have this initial project, and my R versions and my package versions, they are all released at different points in time. They are all available for different lengths of time, and I'm taking a single slice of that point in time that represents latest available versions at that date. So I have my initial project.
Package installation and libraries
Now let's talk about package installation. A library is a directory containing installed packages. It's just a folder on your machine where those installed packages go. When I don't have renv activated on my project, I have two library paths available to me. One's a system path, one's a user path, and I want you to notice that they're both associated with R version of 4.4, which is why when R 4.5 is released, Hadley posts on social media, happy reinstalling all your R packages day to all those who celebrate, because you will physically get a new directory on your machine in which you need to install those packages.
And so when I don't have renv activated on my project, I have a shared package environment. I have one library that all of my projects point to. That one library contains a single version of JSON Lite 1.8.9, and all of my projects are forced to use that single version. And again, this is all under the umbrella of R 4.4.
When I do have renv activated on my project, I again have two library paths available. And again, these are associated with R version of 4.4. Additionally, one of these is project-specific for my JSON Lite example, and the other one points to my global cache.
So when I do have renv activated, I have an isolated package environment. I have a global cache that contains every version of JSON Lite I could possibly want to use across all of my projects, and now all of my projects can be associated with a different version of JSON Lite, and this is under the umbrella of R 4.4.
And when we think about any given project, they have project layers. We'll call this the project onion. In our innermost layer, we have our .rparge file and renv lock. Then we have the packages that are associated with our analysis, the R version, the system dependencies, and the operating system. So keep in mind where renv operates in those innermost layer touching the packages.
Now the renv lock is a JSON file that contains information from the installed packages description file. Let's walk through this a bit. It contains information about the versions of R and JSON Lite and renv. It contains information about the repositories, which we just heard a lot about. That's where you actually installed your package from. If you haven't done any configuration on your machine, your default repository probably points to CRAN at the URL cloudrproject.org. There's other repositories out there that you can install packages from. And we'll be talking more about the Posit Public Package Manager, or P3M.
We also should talk about this field, needs compilation. JSON Lite says needs compilation is yes, and renv says needs compilation is no. Needs compilation means that this requires compiling tools outside of R, which touches on your system dependencies and your operating system, which are layers that renv does not touch.
Now we have to talk about binaries. So binaries are compiled R packages. They are very specific to your operating system, your R version, your package version, and architecture. Typically, when we install packages, we either install them from source or binary. If we start from binary, it's a straight shot to install. If we start from source, then our machine has to compile it into a binary, and then we get to install it. So that needs compilation step happens in between going from source to binary, which means when we install from source, this is a harder and slower process, whereas if we install from binary, this is an easier and faster process. And typically, the reason why we'll do one over the other is just simply the availability of the binaries. They're not always available.
Projects over time: freeze, manage, or update
So now, let's talk about projects over time. I'm gonna pick up this project six months later, in August of this year, and I'm gonna actively work on it. And when I pick it up, the entire ecosystem has changed. There's a new version of R, a new version of renv, and a new version of Jsonlite. And so now, I have to make some project decisions, freeze, manage, or update.
And we're gonna think about this in this versions over time map, where on one axis, we have our R version increasing, and on the other axis, we have our package version increasing. And where we're starting with our initial project is right here in the top left-hand corner with R 4.4.2 and Jsonlite 1.8.9.
So let's talk about the freeze option. What does this mean in the context of renv? So I'm gonna pose the question that I am really excited about the new release of base R, and I want to advance to R 4.5.1. And so can I do this? Can I advance to R 4.5.1, but still use older versions of renv and Jsonlite when I pick up my active project?
And so if we're thinking about what's happening here, we have moved forward in time, R 4.51 is out, and I want to use it, but I wanna move backwards in time and use an older version of Jsonlite. Can I do this?
And let's be explicit about what we're doing here. If we're using a new R version, we will get a new cache. I had everything working just the way I wanted to under R 4.4, and now when I'm using R 4.5, my cache is empty, so I will be forced to install new packages.
And so we're going to open up this project under R 4.5.1, and we're going to push the only button we know how to push, which is renv restore. And unfortunately, I get my failure message, right? It says I am attempting to install Jsonlite from source, and it failed.
Instead of crying, let's push our second button. Evaluate our project state against our current, our configuration. So my first question is, do I have the configuration required for source compilation? I'm on Windows 11, I have RTools 4.5, and the answer is yes, I do. All the machinery is there. And I can programmatically check that with a function package build check builds tools, which yields the result, your system is ready to build packages. So that's kind of weird.
Well, if installing from source is a problem, can I just install from binary instead, right? I can just bypass this whole compilation issue. And that gets a little muddy. So let's think about when binaries are generally available. When I started this project in February of this year, Jsonlite 1.8.9 was the latest and greatest that was available on CRAN, and the binary for this package was available on CRAN at that time, and also on Posit Public Package Manager P3M. And the gray squares were future versions. Now, six months later, CRAN hosts binaries for latest available package versions. So Jsonlite 2.0 is gonna be on CRAN, not 1.8.9.
If I do wanna go back in time, P3M does actually host binaries for older releases. But where I'm at in the upper right-hand corner of this matrix, there is no binary available for this combination of things.
So can I install from binary instead? The answer is no. Which leads me to think, am I doing something weird?
So you get a little hint of this at the top of your console. RM tells you, hey, you're using R 4.51, but your log file was generated with R 4.4.2. And that picture we just showed about binary availability speaks to a general reflection of what's going on in the R ecosystem.
So in the R ecosystem, things work really well at any given point in time when we're looking at latest releases available at that point in time. The diagonal is a good place to be. The lower triangle is also an okay place to be. We can do things like stay on R 4.4 and use a newer version of a package. The upper triangle, no. We don't want to be here. This is undefined, uncharted territory.
And if you find yourself lost and wandering to this undefined territory, it's not your fault. It's nothing to be ashamed of. With RM, unfortunately, it is quite easy to accidentally do something that is quite unnatural to the R ecosystem. But if you do find yourself there, I want you to look at yourself in the mirror, say no, and slap yourself out of it. I'm not saying things won't ever work there. They will work sometimes. But eventually, you will hit unnecessary pain and suffering.
And if you find yourself lost and wandering to this undefined territory, it's not your fault. It's nothing to be ashamed of. With RM, unfortunately, it is quite easy to accidentally do something that is quite unnatural to the R ecosystem.
So am I doing something weird? Yes.
So the correct way to freeze a project is to return to its initial state. I want to revert back to that original R version. So let's go back in time. Now, if you're not aware, you can have multiple versions of R installed on your machine, and a wonderful tool for switching between those R versions is called rig, which is known as the R installation manager. So in your terminal, you can submit rig default 4.4.2. So now, I've reverted back to my original version of R, and at this point, RM restore becomes a trivial and easy process.
So I can successfully install package versions from my repository specified in the lock file. And remember, that was originally set to CRAN when I started this six months ago. Now, I've advanced in the future, and when I'm looking at my binary availability chart, if I'm really trying to go back in time to this version 1.8.9, that's no longer available on CRAN in binary. So if I want to take advantage of that binary, I can go to P3M to get that binary. And you can do that by supplying the repose argument in the RM restore function. And there's a third way you can do this, which is with the RM checkout function. So in RM checkout, you supply a date, and then RM will install date-based package versions such that they fetch the latest available versions up to that date.
Let's talk about the manage decision. So I'm calling it manage when I want to maintain my R version of 4.4.2, but I want to advance to JSON Lite 2.0 to use the latest and greatest. And remember, this is an okay place to be in the R ecosystem. This is pretty simple. In this case, I would do install of JSON Lite, and then snapshot to record that to my lock file.
And lastly, what about update? Update means I want to bring everything forward into the future, including my R and my package versions. And in our R ecosystem map, this is a great place to be. There's three steps to updating. First, you'll execute RM upgrade to upgrade RM to the latest available version. Then you'll execute RM install to install all latest package versions. And lastly, snapshot to record those versions to the lock file. And I want to be clear here that if you're actively working on a project, regularly updating is the happy path. It allows you to reckon with upgrading your package versions in a controlled manner.
Summary and recommendations
So to summarize, RM places an immense amount of responsibility on the user. And it is a package that you should truly read the documentation front to end. Because what we've talked about here in terms of what RM does and does not do for you is listed in the caveats. So please, read the documentation.
And another framework that's in the RM documentation is how you handle active projects. It gives you these kind of decision-making frameworks. And if you're using RM on an active project, if you're thinking about the freeze option, I just want to keep those package versions stable, that's really just short-term sustainable. It's gonna come back to bite you, and it's gonna be harder the more time that elapses. Managing by doing one-off upgrades of a package is also short-term sustainable. It's gonna really kind of lead to a fragile R ecosystem. And then updating is truly the only long-term sustainable way to go.
And if you think about the reproducibility spectrum, just remember where RM sits, right in the middle. There might be times and places where you need to go to the upper end of that reproducibility spectrum and use something like a container, like Docker. And I love this example from Andrew Heiss. He's got a wonderful repository that shows you how you can execute code from a published paper using all methods of the spectrum, using Docker, using renv, or just using packages installed locally system-wide.
So to go back to what David said, you should use renv, I agree, but you should use it with intention. There's a lot of places you can get into the weeds here. I didn't have time for everything. So if you want more tips and details, you can go to my GitHub repository, Shannon Pileggi, PracticalRenv. And I wanna give you a huge thank you to David for helping me reason through all of this, and Kevin for also listening to this, and everyone else who's supported me along the way. Thank you.
