Personal R Administration

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everyone. I hope everyone is having a great day three here at R Medicine. My name is Joy Payton. I use she, her pronouns, and I am a member of the organizing committee of the R Medicine Conference. And on behalf of everyone behind the scenes, we are so glad that you are here. And we hope that you're enjoying the diverse topics that our speakers are covering this week.

One of the topics that I love to talk about as a data science educator is shame. Because I know that shame can keep people from asking for help. And some of what I've dealt with in the area of shame is realizing that my personal R workflows were not the best. So I have not always used projects. And I have been known to store secrets insecurely once or twice. And I have also been known to have package dependencies that were brittle, which has led me to be a little afraid to update R.

So that is why one of the workshops I have most been looking forward to this year is this one. Offered by David Aja and Shannon Pileggi. Addressing personal R administration. So these two speakers are masterful presenters on any number of topics related to scientific computing. But I suspect that today they're going to treat all of us, not just to their technical genius, but also to their personal warmth and wisdom. So, Shannon, David, take it away.

Thank you, Joy. Yeah, and a pleasure to be here with you all today. I'm David. Shannon will be hanging out with me today. It's nice to meet all of you. And I just want to say, you know, all have sinned. I am here speaking because I have done a number of catastrophic and embarrassing things with my development environments. And I just I'm here to help you avoid repeating some of those mistakes.

Shannon, do you want to introduce yourself before we jump in? Sure. I'm Shannon Pileggi. I work at the Prostate Cancer Clinical Trials Consortium. David and I have co-taught various number of workshops under the umbrella of what they forgot to teach you about R. And I'm here to be his sidekick today and I'll help manage chat and questions and communication as we go along.

Course overview and project-oriented workflow

Cool. Thanks, Shannon. So let's let's jump in. So this is personal, our administration, and I think the goal here is going to be this title. Often you'll say, you know, I can get this to work on my machine. And I think what we're hoping to do is get you to a place where you can you can just get to it works. Right. That you're confident doing things sort of across different computers.

So just to provide some context, this course, I'm going to drop this link for the slides in the chat. There is a kind of there there are some portions that will be interactive so you can follow along at home. And then if you go to RStats.wtf, that has a book shaped website that that has some of the information that we'll be talking about today and the repository in which most of the other materials for WTF live is at github.com slash RStats.wtf.

Just to give you a little bit of context about me. When I started using R professionally, roughly 10 years ago, I had to use it on a bunch of different computational environments. So I had some laptops. I had some other laptops. I had some instances of Workbench. I had some instances of Shiny Server. And so I got into the habit of having to take the work I was doing and spread it across a bunch of different machines. A little bit later, I was a formally a data scientist at an advertising agency. And again, just kind of jumping all over different laptop environments, some on Windows, a couple on Linux. And now that I work at Posit, I was a solutions engineer for several years, doing a lot of demos, showing people how to do different things. I recently moved to software engineering, but a lot of that is still entails the same setting up demos, reproducing problems and doing that on a collection of really different environments.

And the reason I tell you all this is so that you understand that a lot of what I will be recommending is shaped by the experience of like how to get these things safely and reproducibly from one set of machines to another. So that's just some context about me.

And then our objectives for today, we're going to hope to try to answer some of these questions. How do I upgrade the version of R I'm using for this project? How do I track the package versions I'm using for this project? How do I move this project from one machine to another? How do I use credentials without exposing them? You'll notice that there's a refrain of for this project across all these things. And so a lot of what we're assuming is that you have decided to onboard a project based workflow.

Normally, we spend a lot more time talking about that. But today I'll give you sort of the briefest possible version of some of the things that it will be helpful to have be true when you decide to start working this way, because it's just going to make things a lot easier for you. You'll see this is actually a link to the full set of project oriented workflow slides. If we're doing this as a two day course, we usually spend much longer time on this section. But I'm just going to give you some highlights right now.

So the first thing we're going to talk about is embracing the blank slate. If you have RStudio open, you can copy and run this snippet right now. And the blank slate is going to mean that you set RStudio up so that every time you start working, you're working in a fresh session where you're not carrying around stuff that you computed two weeks ago. So this means disabling the option that will cause you to save your RStudio workspace. RStudio won't ask you to save it when you quit. It won't load it when you start. This last line is just like a sort of file hygiene thing. But embracing the blank slate, this is just going to make your life a lot easier as you prepare to start moving between different projects.

So the first thing we're going to talk about is embracing the blank slate. And the blank slate is going to mean that you set RStudio up so that every time you start working, you're working in a fresh session where you're not carrying around stuff that you computed two weeks ago.

In combination with that, one of the things you want to be prepared to do is to restart your R session often. If you have objects that you think are valuable, there are ways to save them. You can save them as RDS files. You can save them as a QS file. There's a bunch of formats you can use if you need to serialize something in particular about an R object. But most of the time, what you really want to make sure you've saved is the source for how you created that R object from your raw data, which you did not modify. And so being in the habit of restarting your R session is going to help you make sure that you're not carrying around stuff that you calculated weeks ago. Because that can often be the source of really confusing and difficult to debug problems.

And so these are the shortcuts for restarting an RStudio session or a Puzzletron session if you're in one. And if you have it open, you should run it right now.

And then the last thing you're going to do here is you're going to ‑‑ there are a couple of these. These two pieces of code here are often signifiers that something about your project workflow might not be quite complete. This is a link to kind of the tweet that people will cite often where Jenny Bryan , the original author of this course, has threatened to set your computer on fire if you do one of these two things. And the reason we don't want to do those is because if you're trying to work in a reproducible way where you're not carrying things with you between sessions, then if you're running this at the beginning of a script, it's because you're hoping that this resets you to a clean state, and it doesn't. Right? So there are things you can modify about a session, like your global session options or environment variables that you set as you're computing things, and this isn't going to reset those. And similarly, if you are changing your working directory because you're trying to access files that are in different places, that computation is much less likely to be successful if you're trying to take it with you across machines.

So if you have either of these things in your code, again, it's an opportunity to think about adopting that blank slate and thinking about changing that workflow. One of the things that we'll cite that's particularly helpful here is the here package, which gives you a way of referencing paths relative to a project's root directory, whatever that is. That might be a project where you have the RStudio project file. That might be something where there's a git directory. But using this to construct paths in your project means that you can get RStudio or R to reference files in the correct place without needing to change your working directory.

So that is sort of the whirlwind tour of the project-oriented workflow. There's one more piece of conceptual loading I want to do, but let me pause and see if there are any questions before we jump into this.

If you don't have those available, right, then you're going to, you can install the package successfully. And then when you try to load the package, you'll see an error message like this.

If you want to figure out which system dependencies you need to ask your administrator to install, if you go to package manager and you type in the name of the package, there is a system requirements section that spells out the commands you need to ask your administrator to run so that you can get those installed. One of the reasons these are called system dependencies is because they have to be installed system wide. They're not something that typically you can just install yourself. However, in the last couple of days, this is very new. And still kind of in preview. But this promises, this might solve a lot of problems for people who work in environments where getting an administrator in the loop is very slow. We're calling these many Linux binaries. There's a sort of parallel Python project that solves the problem. And that project, the things that that project produces are called wheels. The product manager and I are in a dispute. I think we should just call these reels. He is not with me, but I'm doing it anyway. But this is a link to the blog post where we talk about the strategy we're taking to make it so that it's much easier to install packages,

even on Linux, without administrative permission. And get all the system dependencies you need at the time you need them. And so, without getting into too much detail here, this is a container that has no system dependencies installed in it. But if I use this many Linux repository on package manager, then I can actually get those geospatial dependencies installed and running without any issues. Or not without any issues, but I can get them installed and running.

So, this is currently in preview. If you have this kind of problem where you're working in an environment where you often find yourself going back and forth with an administrator, trying to get things installed, check this out. Give us feedback. It's available on public package manager, so you can use it. But I'm very excited about this as a way of solving the system dependency problem.

Installing packages from R universe

Okay. So, in order to test or try out installing packages for ourselves, what we're gonna do is we're gonna install a package from the R universe. So, the R universe is another place you can get packages. Those packages are often things that don't necessarily make sense to put on CRAN. But it's just another way of thinking about distributing R packages. And so, what we're gonna do is we're going to try to update our R profile and install a package that's not on CRAN. So, in the project we have open, first try to install this get seller package and then add this to your project R profile so that you end up with the R open sci R universe and then restart R and try to install the package again. And we'll give people five minutes to do that.

I can make a suggestion, David. How about we extend this to ten minutes and let people take a stretch break and do the exercise since we're about halfway through our time. Thank you, Shannon. Excellent idea. We'll give it ten minutes. And on behalf of Big Water, also drink water. Have some water, y'all. We'll see you in ten minutes.

Walking through the exercise

Okay, so we're gonna walk through installing this package from the R universe. So the first thing I'll try to do is install this package and restart. I stretched, I drank some water. So I'm gonna attempt to install this package. And you see this failure, right? So git seller is not available for this version of R.

And so what we need to do is find a way to let R know about this additional repository we want to install the package from. And so in this case, I'm just going to grab this setting where I'm going to set my repositories to include the R opensci R universe repository in addition to public package manager mirror. I accidentally pasted this into the console. If I run this, this will also work, but then it won't persist if I restart my session, right? So if I run that and then I try to do it again, right, I'm still going to see the same message.

So if I want something like that to persist, then a good thing to do is to put it in a place where it will be available to each R session I start. And now if I try to install the package, then you'll see I get the package from the R universe. And you'll notice, this is a question here, that the package I downloaded is a binary package as well. So if you publish things through R universe, those do get compiled. So you do have an alternative way of distributing binary packages.

So if you're in an organizational context where you have something that's very complicated to build, and you want to try to set that up once and distribute it to people publicly, then R universe can be a good option for that.

So if you're in an organizational context where you have something that's very complicated to build, and you want to try to set that up once and distribute it to people publicly, then R universe can be a good option for that. In addition to sort of the standard distribution.

All right. So, yes, we installed Git seller from binary. And we know that in this case, because the message says that the downloaded package is binary. When you're streaming things from package manager, they're all going to say this, but sometimes, again, there are tips about whether you've got a binary in the header that comes back from the CRAN server. So if you can get binaries, you should, because it makes your life easier. Things go a little bit faster. It's a little bit less work.

But if you are trying to get the latest of something on GitHub or from some other source where binary hasn't been compiled for you, then you might need to install it from source instead.

Before you move on, David, do you want to make any recommendations? So like in that exercise, you said the options at the project level. Do you want to make any recommendations for what people can do at their user level just to streamline installation?

I would say it depends a little bit on, well, you know, some of what you do depends on how you decide to make your environments reproducible. So a thing I would probably do. Actually, you know, I can't explain why I haven't done it on this machine. So, you know, we're learning in real time. But a thing you could do, use our profile. Edit our profile, for God's sake. I'm going to edit it in the user scope, is I would set my default repository to Package Manager all the time. And that's because it's going to, for most of the time, it will have binaries for whatever distribution of R you're working on. If you're doing this on Linux then it's a bit more important that you find something that might match the Linux distribution you're using. Once you start isolating project environments then you might want the way in which you record that information starts to look a little different and we'll talk about that in the next step.

Reproducible environments

Okay, so reproducible environments, right, we're going to be thinking about recording both the package environments and we're also going to start capturing some other information about the version of the language we're using. We'll talk a little bit about what that means. There are really, the thing I want to help you think about here, there's this strategy map of reproducibility strategies and it describes on the x-axis we have who's responsible, on the y-axis we have how open the environment is. Most of the strategies that we're discussing for this class mostly focus on this top right area where you're in control of where you get your packages and the environment in which you're obtaining them is relatively permissive.

We try to discourage people from, say, moving into a system where they're in control, they might be able to, if your environment, for example, has restricted connectivity, you don't necessarily want to end up in a situation where you try to install something and it just fails with a networking error and there's nothing you can do, right, because that's going to put you in a kind of miserable situation. Down here on the validated side, there are, when a lot of administrative control is exercised over the environment, then you want to try to sort of have a collaboration with the people who are managing that environment to make sure they understand your needs and the shared baseline is kind of a good jumping off point for a lot of organizations where you can have sort of a base set of packages installed and then people who know what they're doing can push themselves up into the right. So like I said, we'll be mostly talking about the snapshot strategy but know that there are other ways of thinking about managing this problem and understanding what you're doing up here also helps you talk about what happens down here if that's what needs to happen.

So the two tools we're going to focus on for thinking about how we construct reproducible environments are going to be public package manager and then the renv package, both of which give you a little bit more control over what your package environment looks like and where you're getting things from than you might have in kind of the standard workflow.

Public package manager snapshots

So we'll start with public package manager. Something that you'll have noticed when we set the package manager address is that the url contains this latest and latest is just going to track the current state of cran. It is not like it's latest by about a day. So within a day typically public package manager will reflect the set of packages that are available on cran. And so if you just need something that behaves like a standard cran mirror then grabbing things from latest is ideal.

If instead you want something from a little bit further back in time then you can use what package manager calls a date-based snapshot and that gives you the ability. I'm just going to jump over to package manager and jump into the setup tab. So I'm working on macOS today and right now you see that this url points to the latest. So it's cran latest. If I want a repository that behaves the way cran behaved in July of 2022, I can use a url that looks like this where I have this date string in the in the url instead. And that means that if I supply this as the repository to my install packages command or to other things then when I type install packages I will get packages the way they looked on cran as of July 1st of 2022.

If you're trying to bring an old project that you haven't worked on back to life and you need to just try to get a set of packages from a date that is like too difficult to figure out what set of packages you need, using the date-based snapshot workflow can be pretty nice. It's also really helpful if you're trying to like reproduce specific package installation problems. So doing things via that date-based snapshotting workflow can be really helpful.

So what we're going to do is we're going to in the project we're going to check out what our current version of dplyr is. If you don't have dplyr installed you can do this with some other package like jsonlite is another one that will be relatively easy to install. So check out what the current version is and then put a date-based snapshot into that R profile. Restart R and then install a version of that package and see what version you get.

All right. Cool. So let's let's see what we got. I'm going to grab this. Open RStudio. And check out the package version I have which is 1.1.4.

Okay and then I'm going to set my package repository to December of 2022. The repos option here it's a named vector and it is probably helpful to call it the same thing which I'm not doing right now. So I'm not going to change this to rspm. I'm just going to call this p3m here.

So I'm going to restart. I'm going to install dplyr. Did I? Then I'm going to run the library dplyr which loads and that's filed.

I got the same version.

Now I'm going to try something. There we go. Yes. Hey that's new.

I'm going to try that one more time. First I'm going to confirm that my repos are set correctly. They are not because I'm not in a project. We're doing it live folks. Okay.

And if I check my repos again I'm still not in a project. Let's edit our profile. Oh you know what it is? I think the project file exists and I didn't set it in there. Hey so you know when you describe the short circuiting behavior and then fail to understand that it has implications for the demo you're doing? That's me.

I know. Okay so we'll restart again and now that my repository is correctly set to the previous states I should get an older version of dplyr which I might have to compile from source. So you can see many other people got the version of 1.0.10 and since that version is much older than the currently released version of R right because I'm running this in 4.5 I had to compile it in my environment.

But that results in me having a packaged version of dplyr that is now older than the one I had installed previously.

Any questions about what went wrong there? Does everyone understand what I missed? Okay so great. So in order to back ourselves out of this situation what we're going to do is we're going to since I had I put that configuration into my project R profile I'm going to take it out now.

And if I restart you'll see that oh it's options repos but it's been reset to the standard and now if I was running I was using 1.1.4 before so I can run that again. I was using 1.1.4 before so I can grab this and update this to 1.1.4 and install. Wow typing is amazing.

And so you can see I'm going to reinstall my packages and now I'll end up back at the version of dplyr I was before. Martine thank you. Yes typos on the slide. You can tell these slides were written by a human being.

So this process should feel a little janky to you. Installing a particular version and then you know not liking the results and going back to a different version by calling this. And so if that does seem like if you imagine that there's a better way to do that there is. It's renv and that's what we'll be talking about next right is trying to take some of the things we've got and managing our project environments. Shannon.

Yeah just before we move on I want to be the r mom here and just say like if you installed the older version of dplyr make sure you go back and install the newer version again otherwise you're going to try to execute some code and be like pull your hair out and be like why doesn't this work. So just make sure you reinstall that newer version of dplyr because that is on your system library now.

Understanding libraries and renv

Hey Shannon what's a system library? No we're gonna get into it. Okay so the way things work if you don't do anything right if you just open up R and start working is that you might have you know project one and project two and project three and a little bit of mermaid clipping of your numbers which I will anyway right but all those things depend on a shared project library and so if you update something in one of them in if you when you update something because you're working on a particular project you're actually updating this shared project library and so this is where we're going to talk about libpaths right so if I run this libpaths function anyway okay so if I run this libpaths function right which I ran before you can see I have these two paths right one of them is under slash users slash me all right this is my user library and then slash library slash frameworks are frameworks right this is the system library.

Right these all the base R packages right the base packages the recommended packages those are the only things that are installed in the system library for me. All the other packages I install get installed into my user library and so this means that any project I'm working on where I'm using R 4.5 if I don't do anything else I'm getting all those packages from here in this user path.

And the path looks slightly different depending on what operating system you're running on so when I wrote this example I wrote it on Windows and so under the C drive in the user path my name and then you know Windows paths but this is my user library on Windows and then similarly there's a system installation looks a little bit different on Linux but the same idea is going to show up in basically every operating system.

And what renv enables you to do is to take each of your projects and operate them with a library that is isolated from all the other libraries and so your project one has its own library project two has its own library so on from project three. Now that doesn't necessarily mean they are taking up three times the space on your computer because what you're going to do is you're going to maintain a global cache and as you need things that are in the cache you're linking them into each of these project libraries but from the perspective of your project what it will look like is that when you run libpaths and I'll show you what it looks like in the session right is that you're going to end up with the project having its own library in within the project directory and then is also going to make reference to a cache directory that has all the cache packages for the project.

What renv enables you to do is to take each of your projects and operate them with a library that is isolated from all the other libraries.

So if I install the renv package here and then I initialize it oh before I before I initialize it I'm going to call libpaths right so we see this is the those are the packages I'm working with and then when I initialize that's new you'll see this that my R session got restored restored restarted sorry a bunch of things got created and we'll talk about those in a second but if I run libpaths now right again we're looking at right this is the project directory right so me documents projects the project we created renv library and then right but this is in this directory there's an rm directory here and then you can see there's also a library caches rn thing where my the other packages are getting cached so I've gone from taking that specific that that user library to instead having a package a project specific library.

Um and so some of the advantages of doing this right is that it makes it much easier for you to do things like what we just did where you install a package without worrying about doing things to your other projects right. Um renv also has some machinery for making it easy to write down the set of packages you're using um and then uh share that with someone else and you can use that to they can use that to build an environment that looks exactly like yours um and you know the caching thing is nice because if you've already installed a package then you can just link it in from the library and that speeds up some of the workflow a little bit.

So what we're going to do is we're going to go through that same process I just went through you're going to create a new project um and you're going to call it wtfrnv and you're going to put it not in the current project um you're going to install the renv package and then you're going to call init to initialize the package you may see a message that's different from the one I see if you've never saw if you've never used renv before um and then we're going to you're going to call status and we're going to give people a couple minutes to do that.

If you're in a project right now like the explorer libraries one I would recommend exiting out of that project before you execute the create project.

Live demo: setting up renv in a new project

Okay so I'm going to so what Shannon mentioned um a thing you don't want to do is create a project within a project so there are two ways to avoid doing that one is to um so in the slide for example I say wherever you typically put projects so for me I have a place I could typically put projects and so I could say that I want to create a project called wtfrnv in my projects directory right that is not below the project I'm currently in.

So if I use this create project uh in a place that's not my current directory um and I've created this project before it seems so I'm going to overwrite it um and so that will launch RStudio in that session so that's one way to do that. The other the other option is to if you go here and close the project um this will just kick you out into an RStudio session that is in your typically in your home directory and then from there you could go through the new project flow and create a project in a new directory so you can create a new project and I will do this as a subdirectory of my goodness projects uh and I'll do it yeah so there's a subdirectory of projects and I'll call it wtfrnv2.

I can choose to create a git repository I can also have RStudio initialize renv for me when I create the project in this case I won't do that I'll just I'll switch into that projects install the renv package here. Oh ha ha guys I am I'm committing lots of own goals today uh so the version of the package I installed in this case was uh 0.16 which is quite a bit older than the latest renv uh and so uh let's let's go uh that it's our R profile uh and maybe come back to the future.

All right I'm gonna restart R we're gonna do it again. Okay uh so now that I have the latest version of uh renv from CRAN then I'm going to initialize repository um one of the things you'll notice right so we get some messages here about uh things that will be updated in the lock file um the other thing you see we're capturing is the version of R um and so now if I open uh my renv lock file we'll talk about the contents of this file in a second but did everyone get there.

Did did I drive anyone else into the ditch of installing an old version of renv sorry Martine.

Okay so we're going to talk a little bit about what it looks like to manage dependencies with with renv um if you already did the init um if you already did the init you may want to yeah you can just it's it's safe to do again. If you upgrade the version and do it again you may be prompted uh to uh chain like it will renv will ask you if you want to reinitialize the project and throw away the existing information and you can say yes so uh sometimes if you find yourself in a sticky place with uh renv reinitializing is a totally valid way to just kind of move forward if that's safe for you to do.

So what I'm going to do is I'm going to create a new R script and I'm just going to call it main.r and what I'm going to do and this is described in the activity here is I'm going to add uh I've added a file uh I'm going to uh invoke I'm going to write some code that looks like I'm assuming a new dependency uh for my project um and then I'm going to run status and snapshot uh so that I see what things renv is changing so I'll do this and then I'll give you an opportunity to do it so if I say library parallelly.

Which I'm did I spell that correctly that's double l double l. Okay so I'm going to erase this first I'm going to check on the status. No issues found the project is in a consistent state if I add a call here and then I add status the power project is in a consistent state. I'm going to follow the prompts to install the package.

Okay I don't think this is like a renv-backed project yet this is really confusing. I think the first one I created was not actually in this directory. Okay, so I say that I want library parallelly in this directory, and then I call renv status. Then I'm going to get some information from renv about the sort of discrepancy between what my code says, and what my package library.

Managing the renv lock file

And what my lock file. So there's there's kind of three places that the information is recorded right so I've installed the parallelly package. It's in the library. I'm using it in my code right so it appears on our code file. What, what I haven't done right now is recorded that I'm using the package in my renv lock file. So the lock file is where we store the information about which packages we're using. And right now if you look in the lock file. There's a lot of information here but the main thing we see is there's only actually one package listed here, and it's renv.

So what we need to do is we want to try to get our lock file into a place where the package is installed. It's recorded and it's being used. Right. And so those are the three states we're trying to harmonize. In this case, what I can do is I can call snapshot. You can see I'm capturing some information here about what the change this is going to entail. In this case, it's taking the parallelly package and recording the version of that package that I'm using the lock file. So I'm going to say do I want to proceed. Yes. There's a message that the lock file has been updated. And you can see now the parallelly package is also being recorded in the lock file as something I'm using.

So what we need to do is we want to try to get our lock file into a place where the package is installed. It's recorded and it's being used. Right. And so those are the three states we're trying to harmonize.

And if I do the same thing with another library, for example, jsonlite, you can see the ID knows that jsonlite is required because I've asked for it in my code. It's not installed. If I call status, I'm going to see the same thing. The package is used but it's not installed. So I'm going to install jsonlite. And in this case, I already had jsonlite installed. So you can see rather than fetching a fresh version from CRAN or Package Manager, I'm just linking it in from my package cache.

And now if I call status, you can see, again, we're in the same situation. jsonlite is installed and it's being used, but it's not recorded. And so to update my lock file to record that I'm using jsonlite, I'm going to call snapshot. And if I look at the lock file, jsonlite, parallelly, and the renv package. And then the last piece is if I decide to remove a dependency, now I have the situation where parallelly is installed and it's recorded as being used in my lock file, but I'm not actually using it in my code.

And so, again, calling snapshot, you can see the operation that's going to be taken. We're going to remove our recording that parallelly is a package we've installed. And if I go back to the lock file, you can see now it's just the jsonlite and renv packages.

So I hope everyone stepped through that same kind of process, right? Add a library to a file in your new project, make sure the package is installed, call status to make sure you understand how it's getting recorded, and then remove it, call status again, and snapshot as appropriate. I'll give people five minutes to do that.

Any questions about what's happening as we try to keep the state of the lock file in sync with the state of the library?

renv and package installation workflows

So, yeah, one nice thing about renv is that there are a lot of package installation workflows that it makes a little bit simpler. So if you are, you know, if you're using, and I'll show my instance of RStudio, if you see when I type install packages here, you can see that rather than, or I guess what you see is it says renv shimps. So what this means is renv has taken over the behavior of install.packages. And so if you type install.packages, you're actually calling renv install, which makes it possible to do things like install jsonlite github by just typing that into install packages.

That won't work if you're not using a session where renv is active. But it can be a nice convenience if you are trying to, for example, work with the development version of a package or if you want to install a specific version, or if you want to install a package from a specific commit hash right as your workflows become more complicated. It will handle all of these different sort of workflows for you.

And then, these are all also things that you could do with the dev tools package, but if you're already used to typing install packages, it's convenient to do that. The other thing about renv install is that it works even if you're not using an renv package, or sorry, if you're not in an active renv session. If you, for example, are running in a session where renv is not managing your session, you can still use it to install packages. And so again, if you're trying to grab something off github or solve some problem like that, it's just a useful piece of shorthand to have at your disposal.

Using a scratch directory

One question I sometimes get about how to work on things where you don't care about reproducibility. There are such things, you know, if you're trying to reproduce a problem for someone else, or you just feel like tweeting some code or whatever. One pattern I like for this is I have what I call a scratch directory. I actually have a bunch of them because my directory structure is a little bit insane. But if you have a scratch directory, I just have, you know, I have a scratch project that renv is active in, but I install, just YOLO install things into it all the time.

And that works for me as a way of keeping my projects that I care about separate from things where I care a lot less and I'm willing to install incompatible versions of things and generally make a mess. If you are so inclined, another thing it can be nice to do is like set a git ignore rule where you just globally ignore your scratch directory.

Collaborating with renv on shared projects

So, when collaborating with co workers in the same project, each on their own machine with own RStudio is renv then a suitable solution to avoid clashes with packages and things. Or should we each work and a copy of the project with the renv file. Yeah, so I would say the, it is a, I think of it as a, as the best, the best. It's a good solution for for doing that.

Hopefully that means that you're collaborating by sharing files with git. Like, are you, are you using git are you using like, is everybody on a, like a shared drive of linking to the similar thing like what's what's the situation there.

If we go back to the reproducibility strategy map that we were talking about a little bit ago. The one of the assumptions implicit here is that it's not just that everyone is working on their own machine, but also that they're working on their own file system, you definitely can all work on a shared drive. I have found that experience to, to be in the long run ultimately pretty painful, because you're all you're all just like mutating the same thing.

Unless you're working in an environment that's really specifically configured to support that workflow and most environments, honestly, are not and even when they say they are they're kind of lying. You're, it's just going to be someone's going to update something right like one of the things you've seen is that for example, the, the user and project, like the project, the user and project libraries kind of vary based on the person who's running them.

I just like I've seen enough weird conflicts there that that I think having an renv lock file that keeps track of the state of the project, it's better than nothing, I would, I would start moving in the direction of like getting people to work on their own copy because ultimately that's going to be an easier, more auditable way to figure out like who is making what change right so you don't end up in a situation where somebody just decides, they want to pull in the latest version of the thing in that project, without telling you and then you like you your expectations about what the project does change silently.

because ultimately that's going to be an easier, more auditable way to figure out like who is making what change right so you don't end up in a situation where somebody just decides, they want to pull in the latest version of the thing in that project, without telling you and then you like you your expectations about what the project does change silently.

So yeah, having renv on a shared direct like a shared drive is an improvement because then at least you know which packages you're using and you have some kind of explicit way of tracking that I think moving to a situation where each person is working on their own copy of the project and then you have a process for negotiating how you make changes to the project is like the best place to go.

Yeah, I think it's also important to talk about the onion in the context of renv. So, you know, we said that the slide said, renv helps you create an isolated project environment so like in the context of the onion What does that do and not do for you.

I think that for example, renv does not control that's what doesn't do for you, which is noted in the documentation on this is very clear like renv is like not a package. Right, like, I know we're kind of talking about it lightly in this context, but you should really read all of the vignettes on the package documentation to really understand what is happening.

Yeah. So, and what so one thing that renv does not do, where you could work I can imagine things that get we could get weird, is that it's possible that you and your collaborator do not agree on what version of R you're using. Right. If you're both working on shared file system. You could but you're working on you're working on a shared file system but different computers, you can have different versions of R and renv will like, it's not that that's a situation that it can't handle, but it's just one symptom of a thing that you're not actually controlling by working on the project in the same place right it's like you haven't quite negotiated all the things that you need to do to say that you're that you're, you're all collaborating at the same layer because you're not managing the version collectively.

And so you might find that someone like opens the project on the shared directory rewrites the lock file to use R 4.3, and then you're just having like a kind of, you're both going to be changing that as you open the project and work on it, but without consensus about what it is you're actually doing.

And there's like there are like aren't things you could you can go very deep on like trying not to solve the problem, but like to me quote solving the problem is just like start moving in the direction of an explicit contract about how you update the code that is your project and git is is one such contract.