
R-Ladies Gaborone & R-Ladies RTP (English) - Personal R Administration
R-Ladies Gaborone and R-Ladies RTP co-host E. David Aja as he demonstrates tips, tricks, tweaks, and some hacks for building data science dev environments that you won't be afraid to come back to in a year. Slides link https://rstats-wtf.github.io/wtf-personal-radmin-slides/#/title-slide What They Forgot to Teach You About R https://rstats.wtf/ Speaker E. David Aja : https://www.linkedin.com/in/edavidaja/ R-Ladies Gaborone: https://www.meetup.com/rladies-gaborone/ R-Ladies RTP: https://www.meetup.com/r-ladies-rtp/ Extras --------------------------------------------------------------------------------------------------------------------- Customising your .rprofile https://kanto.rbind.io/blog/customising-your-r-profile/ Locating R and R Adjacent Software and Configuration Files https://www.pipinghotdata.com/posts/2022-06-02-locating-r-and-r-adjacent-software-and-configuration-files/ CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-Universe - posit::conf https://www.youtube.com/watch?v=XDiyAvpo2uk You should be using renv | RStudio (2022) : https://www.youtube.com/watch?v=GwVx_pf2uz4 Featured music https://open.spotify.com/artist/0cmWgDlu9CwTgxPhf403hb
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
David, do you want to take it away?
Sure. Thank you, Shayla. So yeah. Hi, everyone. Like I said, my name is David Alja. I'm a solutions engineer at Posit. So I do work on helping people understand how to get their data science into data science environments to work at their jobs. And so today, we'll be sort of working through part of a course that we'll be teaching at Posit Conf. So it's the one that they forgot to teach you about our course. And our focus for today will be personal R administration.
The material, the slides I'm working from now, you can access from GitHub. This is the link to the course repository, which will have links to these slides as well as some things we won't be covering here today. So you can get those things here. And then you can also if you go to rstats.wtf, you'll also see a bunch of links to a lot of material we'll be talking about today. I'll also drop a link to the slides in the chat.
Background and motivations
And then just to tell you a little bit more about my background and some of the things that are going to influence the way I think about what it means to kind of build an effective development environment. I've used R in a bunch of different capacities. So when I first really started using R, I was an auditor. And I had like, I was doing some learning on my personal laptop. And then I had a bunch of different environments that I would use R in network. So I had a bunch of different servers. I was deploying Shiny apps to Shiny server. I had a bunch of different laptops.
And so it was just like a really complicated situation where I'm working on the same project across lots of different environments. A little bit later, I moved to an advertising agency. And I was working there, again, on a range of different environments, some Windows based, some Linux based, some local, some remote. And now that I'm a solutions engineer at Posit, again, I have a personal laptop. I have my work MacBook. I have a couple of Linux environments that are persistent. I work in a lot of environments that disappear rapidly.
Because I used to be an auditor, I am very concerned with being able to actually reproduce things. And there are lots of other contexts where that ends up being important. I'm trying to avoid needing an administrator to do something for me. Because a lot of the time, I just want to solve a problem I have without having to wait for someone to get back to me about the solution.
And then the last thing, kind of, again, just in terms of how we'll be thinking about what we want to do here is we really want to be able to tell people when we're communicating, hey, go run this code. And we're going to prefer that approach to sending someone a page full of screenshots. Because if you can run the code, there's a much more likely chance that you can reproduce the thing you're seeing.
Agenda overview
So the questions we're focusing on for today is going to be things like how do I change the version of R I'm using for this project? How do I track which versions I'm using for this project? How do I move this project? One of the things I want you to see, there's normally in the What They Forgot class, we do kind of an explicit couple of hours focusing on the project-based workflow. And we're going to take a lot of that stuff as given today. We'll be creating RStudio projects. I'm not going to try to convince you today that you should work in those projects.
We're just going to kind of start doing it. Because once you start working in that project-based workflow, you're going to find it a lot easier to take a lot of the lessons that we pick up today and think about what it means to take your project to work and then move it from place to place or deploy things or change the version of R or upgrade package versions. So we're taking projects as given and a lot of the context that we're setting here is going to be about what it looks like to work with a project from RStudio.
So this is our agenda for today. We're going to start with a warm-up exercise that's actually pulled from the project-oriented workflow section. And we're just going to get ourselves used to thinking about where the stuff that makes up our installation is on our machines. And then there are going to be a few different sections we make our way through.
We're going to talk about some ways you can change how R behaves as it starts. We'll talk about reasons why you might want to do that. We'll talk about installing R packages. There are some things that it's helpful to know when you're trying to install a package or trying to diagnose why you can't install a package. And keeping track of those and understanding a little bit more about what's going on there is going to be really helpful.
We'll talk about some ways to construct environments that you can reproduce in some other context, whether that's deploying it to an application host somewhere or just moving it to another computer. Those concepts are very similar. And so we'll think about some ways to take information we have about our projects and use it to make them reproducible. And then the last part, which we may not get to, will be about kind of completing the cycle of going from one version of R to another and thinking about what that looks like in a project-oriented workflow.
Let me pause there, see if there are any questions I should address before we jump in.
Creating SBOMs. Yeah, Greg, if you want to come off mute and ask your question, go ahead.
So I think this kind of coincides with your creating reproducible environments, and usually this is something that comes up whenever I talk to security. They want to be able to make sure that the modules that we install are tested.
Yeah, sorry. To clarify, an SBOM is a software bill of materials. And usually in the Python space it comes with not only the versions that you have, but also the versions that you have for your particular software or architecture build environment. So like, say if you're working on Mac, Linux, Windows, all these different building environments incur different dependencies.
So, we will not be covering like the SBOM in particular today. I think, right, like the software bill of materials has a specification, right, that captures a lot of the information you described. And one of the things you'll see are things I've seen as, for example, you know, I work at Posit. We make software that other customers install in their environments. They often want things like a software bill of materials from us. The way you get to generate one of those is by keeping track of the dependencies that you have, right?
So there are often tools that will go from your requirements text file to an SBOM or your Go dependencies to an SBOM. And so part of what we need to do if we're working in R, right, is we need to be able to, for example, hand someone an R envelope file or some other thing that describes the collection of dependencies we're using. And that will give us the opportunity to then generate something like that. So once you have some way of recording what packages you're using, it's much easier to hand that off to tools that need to convert it into a different format.
Warm-up: exploring your R package library
So, like I said, we're going to start with a little warm-up exercise where we're going to investigate our R package library. And so, you know, R packages, right, I think everybody is kind of implicitly familiar with them. You use lots of them as you're getting started. They're how you take bits of R code and send them to other people, right?
When you install R from, you know, if you go to cran.rproject.org, you install R, you get R that comes with a collection of packages. So, you know, you have your 14 base packages and those 15 recommended packages. It says here we ship with all binary distributions of R, right? If you compiled R from source yourself, you might not get those. And if that doesn't mean anything to you, hang on, we'll talk about it in a few minutes.
But so, for example, if you wanted to draw graphics and you only wanted to do them using base R, the lattice package actually does an excellent job of helping you visualize things. And so, it's one of the reasons that, you know, R has its reputation for being able to do effective graphics without necessarily needing a ton of things. And then, you know, subsequent packages have layered a lot on top of that.
But, you know, we have R packages and it's going to come with, you know, base R is going to come with a lot of these packages. And so, you know, when you type install packages, typically what's happening is those packages are getting copied from some repository like CRAN into a library. That's going to be the default library. You can identify where that is on your system by typing .library.
You may have access to multiple libraries that you are referencing packages from in your session. And the paths to those libraries are going to be accessed through this function .libpaths. And then, if you want to see all the packages you have on disk and where they're installed, then you would use the function installed packages. It is deeply unhelpful to my autocomplete life, but this function is called installed packages. But what are you going to do?
So, what we're going to do, and we're going to take, you know, seven or eight minutes to do this, is you're going to pull down this collection of R files. And you're going to try to answer some of the questions in it to explore the library on your system. So, there are some questions there. There are three files in the bundle. One of them is Comfy. One of them is called Spartan. And then one of them has the solutions to some of the questions that we're asking there.
So, I'm going to take this snippet. And I'm going to open up RStudio. And I'm just going to run this code. And it's going to, you're going to, by default, it's going to store the file on my desktop. I'm fine with that today. So, I'm going to say three and delete the zip file. And now you can see I get these three files on my system. I have, you know, your Comfy file, your Spartan file, right? And so, open up these files. Don't look at the solutions that I just put on screen. But, yeah, try to answer some of these questions. And we'll give that seven and a half minutes.
I actually think if you're not super oriented with it, the solution script is a great script to start with. Because it kind of just walks you through the process a lot more. So, if you're feeling like a little overwhelmed, go ahead and jump into the solution script.
Exercise debrief
So, what I'll do now is not restart the clock. I'll open up the files and run through the exercise myself. So, we'll open up the solution document. And do I have the FS package? I do.
So, you know, we'll see where the paths are in this case. So, I'm working on a Mac today. And so, I have a couple of libraries. The meanings of these we'll explore later. You can see there's one that's kind of under my user. There's one that's under slash library. That's what that looks like on Mac OS. That might look different on Windows or on Linux.
And so, if I look at what's the default library is, in this case, it's the library that's under slash library here, which is a baffling choice. But, again, we'll talk more about that later. If we try to run this first comparison here, right, where we're trying to see if the default library and the library paths are identical. In my case, they're not because I have multiple library paths. So, we can see that my default library is identical. In this case, the second thing.
I'm going to skip this part and move to the data frame of packages. So, let's see what packages I have installed and how many of them. So, I have 191 packages in my library right now. And we can tabulate. So, some of them, 14 of them are base packages. 15 of them are those recommended packages. So, that's doing what we expect. Then I have 102 packages installed into this, into my user library.
What version of R were my packages built on? They were all built on 4.4 because I installed them this morning. And let's see what proportion of them need compilation. It's about 50-50 here.
And again, some of the meanings of these things we'll explore a little bit more as we start stepping through the different aspects of understanding where packages come from and how they get on your system. But yeah, I see some of you have thousands of packages in your environment. And that doesn't get me heartburn at all.
R startup
Okay, any questions about that exercise before we jump in? We're going to pivot a little. And we're going to start talking about, yeah, so I'm on macOS today. I work on a bunch of different operating systems. So, I'm on macOS today. Usually I give this workshop working from my Windows machine. And I often also work on Linux environments. So, really all three are in play.
And I can bring up different environments if people have questions about the way things look on different operating systems. So, yeah, we're going to pivot a little to talking about our startup. But some of the things that we're seeing we did in that first exercise are going to kind of reoccur as we march through the next pieces of this workshop.
So, why are we talking about our startup? The reason we're talking about it is because we're often going to want to change something about the way our code behaves. But we don't necessarily want to change the code. And so, the behavior that I'm talking about there might mean, for example, I want this code to do something different depending on what computer it's running on. So, maybe when my code is running on my laptop, I want it to do one thing. But if it's running on a server, I want it to do something else.
And there are ways you can imagine saying, well, write an if statement. And if I'm on this environment, then do this. And if I'm on this different environment, then do that. But there are better ways to do that that make your code a little bit more robust. And so, we'll talk about understanding how you can influence what your code does as R starts up.
So, R startup is pretty complicated. There's a lot of stuff happening. This is a flowchart that I reference surprisingly frequently because it's helpful to understand where you have the opportunity to change the behavior of R as it starts up. But we're going to focus on just a couple of parts of what that setup looks like.
So, we'll be focusing on, in particular, R environment files, which are going to enable you to set environment variables. And then R profile scripts, which allow you to execute little or as much R code as you want, really, as your R session is starting up. And we'll talk about what you need to do to create and set each of those files.
So, one thing you're going to want to make sure of, just generally, and hopefully this is fixed by now, but in case it's not, you want to make sure that when you are creating one of these files, that it terminates in an empty line. You can configure RStudio to make sure that that happens. I'll open up my RStudio configuration and go to code. And where is that? Under saving. And I can just make sure that the source files end with a new line. And that will make sure that as I'm making these changes to these files, they behave the way we expect.
The other thing I'll do, since this isn't on, is I'll just turn this option on so we strip additional space at the end of lines. And I'll hit OK. So, again, that's in the settings menu. I'm using the shortcuts command comma or control comma if you're on Windows. And I'm going over to code and saving and just making sure that this file ends in a new line. And then hitting OK.
And so once you have that option set, then we can start thinking about modifying the R environment file or .renviron, which is fun to say five times fast. And so what we're going to do is we're going to use the R environment file to set environment variables. And the way environment variables work is there's a collection of essentially key value pairs. Right. So some value that has a name. And the point of that is that it will change the way processes behave on a computer when you start.
So things you would want to put in an R environment file, you might want to put R specific environment variables. So if you wanted to set, for example, how much history R keeps as you're executing code, you can set the R history as an environment variable. If you're trying to reach out to some other some other some service that provides you with data and you need an API key to connect to that service. So, for example, when I'm working with census data, I get an API key from the U.S. census. And then I use that to request data from them. And I put that as the API key in my code. What you can't execute in our environment file is our code itself. We'll talk about how you do that in a second. But you're really just limiting yourself to these key value pairs when you're when you're working with an R environment file.
One way that's helpful to kind of get a handle on editing these files is to use the use this function. Use the use this function and use it to edit. Call this function the edit R environment file. And there are two different scopes to think about.
There's the user R environment file, which is going to open up one below this. This tilde means this is your home directory on your operating system. This might mean something slightly different in on Linux or on Mac or Windows. But they're all going to kind of resolve to whatever counts as your home directory. And then underneath that directory, you have a dot R environment file.
And then the other thing you can create at a different scope is a project R environment file. So that's below a specific set of directories into your projects that you can create that R environment file.
So, again, these are some examples of things that you might want to put into, for example, a user R environment file. So, for example, these are real keys, but I burned all of them. So you can't do anything with them. Like if I wanted to put my GitHub personal access token in R environment file, which you don't need to do anymore. And I should take that as an example, which I will do before. But if you if you needed to set your GitHub personal access token as an environment variable, then you could do that. If you were using something like Posit Connect, right again, and you need to have access to an API key, you could do that. If you're on Windows and you need to make R tools available, we'll talk a little bit more about why you would want to do that. So these are the kinds of things you might put in a user R environment file.
And then for projects, right, the things you're wanting to modify are going to be things that are associated with the project. So that might be, you know, the path to a virtual environment. If you're working with the reticulate package or if you had database credentials that you wanted to make available to your code, then you would put those there.
Getting and setting environment variables
So the R environment file is how you set environment variables and then you need a way to get them. And so the sys.getenv function is how you get environment variables rather than and this is how you get them. The R environment file is how you set them. There is a sys.setenv function, which you can also use to set environment variables. But they don't, those environment variables are not going to, they only, they're only set for the session you're running in. And if we were to go back to the flowchart, setting them in the R environment file sets them earlier. And when you're working with environment variables, setting them as early as you can is often good. So for setting the files, we're going to prefer using the .renv and then for setting, for getting the values, we're going to use sys.getenv.
So what we're going to do there for questions in the chat. And there's one question from Sheila. How do you know if the R environment file you're looking at is for a user or project? How do you know if it's for a user or project? It's going to depend on where it is. So, I think the default is the user one. If you don't specify scope, you're going to get the user one. I'm pretty sure. Right. Yes.
And if you're searching for one, is it first looking in the user and then you get them. Yeah. Stephanie, you're destroying my exercise. Sorry. No, it's all good. So we're going to, we're going to go through the exercise to answer exactly that question.
Just to illustrate the, if I tell my terminal that I want to go to this dash tilde, and then I say what's working directory, am I in this computational context, right? On my Mac, slash users, slash udavid-algebra is my home directory. That home directory might look different if I was in on a different server, but all operating systems have this notion of home directory. So, no matter what context you're running in, the user R environment file is going to be something that lives here. So, if I were to list the files in this directory, there should be, there's not an R environment file here for reasons that will become apparent in a second. But you can see there are a bunch of other kinds of files that I have at this level because lots of programs know that they want to look in here for user specific information.
We have one more question in the chat. So I have the option to either tie environmental variables globally or scope them to a particular project and user. Is the user like the user of the machine? The user is the user of the machine. So, if I say, who am I on this computer? I'm udavid-algebra. What I'm going to do just to illustrate the difference is I'm going to pull up a temporary, I have some sort of ephemeral RStudio server environments. I'm going to open up one of those so you see what it looks like when I do that same collection of things in a different context. So, I'm going to open up this instance of Workbench.
And then I'm logging in, in this case, as this instance of Workbench. And then I'm logging in, in this case, as this dummy user called Publisher1. And so I'm going to open up an RStudio session. And yes, Jeremy, that is a problem that lots of people working on locked down environments have. So that's a good call out there.
So I'm in RStudio, right? If I open up, if I say that I want to go to my home directory, and then I print the working directory, in this case, it's slash home slash Publisher1. This is a Linux environment, right? So the home directory means something slightly different here. But you can see, again, I have a bunch of different files that correspond to things in my home directory here. And the user, I am in this case, right, is a different user. So the concept of, like, the home directory is associated with whoever's operating the kind of R session, right? So it's going to be your R session in that context. Does that help?
Exercise: editing the R environment file
So what we're going to do is we're going to edit our user R environment file. So we're going to use this helper function. If you call it without a scope argument, it's going to return it's going to it'll open up the user environment file. You're going to add WTF underscore user as the this is when I say key value pairs, right? This is the key. This is the value. There's an equal sign between them. And then you can add whatever value you want, right? But your name underscore user, and then restart the session, right? You have to restart the session every time you want to set new environment variables because they are scoped to the session. And then in the console, run sys.get to get that value. So we'll give people a couple minutes to do that.
walk through that exercise. I'm going to call the use this function and edits my user R environment. You'll see right I get the path that the R environment file is, is at in this case, it's below slash user slash u David Algebra, which is my home directory dot R environment. I'm going to say WTF user. There's a Spanish cartoonist who also has the name David Algebra, which is why I insist on putting the E in front. And so just I want to show you that if I call sys.get in WTF user right now, there's no value because I haven't restarted R yet. Right, so it's not going to pick up this value until I restart R.
And now if I run that again, I should get David Algebra user. All right, so you have to restart R when you set environment variables. Otherwise, and this is not that's not just R that's really any program. You're you have to restart it in order to pick up a change in environment.
Okay, now we're going to do the same thing. But we're going to do it with the project R environment. So one thing you'll see, and if you haven't done this yet, you may want to I created a little you can use use this to create a project. So if you if you're not already in a project, you may want to create one first, just so that you don't put this somewhere in your home directory, where it will be confusing when you're not in this class. So if you're not in a project already, you may want to create one. And don't do what I'm doing, which is creating it inside a project. So create a new project if you're not already in one and then run. We're going to try the same collection of steps, right. So we're going to edit the R environment variable, but we're going to do it in the project scope. We'll add WTF projects to the project name. And then we'll restart and get the value of WTF project. And then once you've done that, you can answer this question, which is, what's the value of WTF user after you set WTF project, we'll give people a couple minutes to work through that as well. We can add more time if we need to.
Okay, so I'll jump back to RStudio and close this. And so now I'll run, restart R. I'm going to run use this, edit R environment, right. I'm in my WTF R-Ladies Gaborone project. So I'm going to select the project scope, right. You can see, again, the active projects use this identifies as this project, modify R environment and also just, you know, make sure you restart R once you set the value. So in this case, WTF projects equals all these projects. Again, if I try to get this value, there's nothing there. I haven't restarted R yet. I'm going to restart R. I'm going to run that value again. Let's run that again. Now I have this value project. What happened? What did people get when they ran WTF user?
Nothing, right. So when you set a project level R environment file, your user level one does not get evaluated, right. So there's a name for that behavior, which is short-circuiting. But when you are, if you set this user, right, to a project level R environment, your user level one does not this user or this project level configuration, then you're not accessing information you have in your user level R environment file. And so if you saw earlier, I didn't actually have initially a user level R environment file in this project, because the way I do work, usually if I'm going to create them at all, I create them at the project level. So just something to be aware of, right. If you start trying to modify the behavior of R this way, you have to do it on a project by project basis.
Nothing, right. So when you set a project level R environment file, your user level one does not get evaluated, right. So there's a name for that behavior, which is short-circuiting.
Version control and secrets
Questions about that before we move on to another way of customizing the way R starts up? Well, I have a question, I guess about collaboration, right? Like is, if you put it in the actual project, and then that's on some sort of version control, is that generally an ignored file? Or because, yeah, I'm wondering.
Yeah, so again, I, the one of the reasons I don't usually give this workshop on my MacBook is because this is where all of my actual secrets are. So I have to do some things off screen to make sure I can show this to you safely. Get ignore. So I, yeah, I have, as part of my Git configuration for when I, so when I work with version control on this machine, there is a collection of files that I have Git configured to never recognize in any project. So if you work on Mac OS, right, you should ignore this file. Everyone should ignore this file. There's a bunch of R projects. So, you know, for example, if I'm working on a Mac OS, there's a bunch of R projects. So, you know, for example, the R project folder that gets created by RStudio, when you launch a project, the R history, right? Like I don't want to share everything I've ever typed into the console with my collaborators. I don't want to keep our data files on disk, right? I don't want my HTTP pass like OAuth tokens. And the R environment file is another one of those things that I ignore for all projects. So typically, if you have secrets that you're putting into that kind of context, you need some way of distributing those out of band. So that might be like a password manager that your whole team uses, or, you know, that's the only recommendation that I could make safely, a password manager that your whole team uses. There are some good free ones, but like some solution for communicating secrets out of band, having ways to set those on remote systems without passing them around in your code, this is how you do that. But yeah, so the R environment file, I ignore globally.
And the neat thing, this is kind of a neat capability of R, which is really like the only language I've used that has a built in way of doing this. Lots of other languages have a convention of using something called a .env file, and then having some package responsible for reading that .env file to get the same behavior. But this is kind of a neat thing that we get for free without needing to install stuff. So yes, don't commit your R environment files.
The R profile
Right, so we talked a little bit about, you know, those kind of secret key value pairs that you can use. You might also want to create some code that you run at the beginning of each session. And the one of the places you can do that is in this .R profile. It's R code that runs in the beginning of each session. There are a couple of ways, you know, a lot of the time if you're running a script at the beginning of an R session, one of the questions you have is like, am I doing this in an interactive context or not? So an interactive context, right, would be one where you're like the one we're in now, where I'm sitting here, I am typing and hitting enter and or control enter and lines are getting sent to the console. If you're knitting an R markdown document or running an R script from the command line or launching a shiny app, you typically like don't. Those are not interactive contexts. And most of the customization that you do in R profile, not all of it, but most of it is usually focused on stuff that's happening in the interactive context.
So things you might want to put in your R profile. You might want to in your session set the default place where you get CRAN packages. There are other ways to do this. If you're using the RStudio IDE, it has some ways of configuring how you set the default CRAN mirror that you're going to fetch packages from. But if you were working on a system where you didn't have access to that and you wanted to make sure you were getting packages from the right place anyway, setting a default CRAN mirror is a great way to do that. There's a link to the package prompt, which is another way of, you know, if you are the kind of person who works with R in the terminal a lot, then you can use this to customize the way R looks in the terminal. It does not work exactly the way you want it to in RStudio, but you can see here, right, the usage instructions for this package describe setting. It's a run in your R profile in interactive contexts. So something fun to check out if you do any sort of R work in the terminal.
But it's important to note, right, there are some things you don't want to put in your R profile. And in particular, if it matters to code you're sharing, right, then you don't really want it to be in your R profile. So let's see if we can figure out why we think we might not want to put these things in the R profile, right? So like, why might you not want to put strings as factors set to false in the R profile? This example is showing its age. I'm old. When I used R, when I started using R, this was more of a problem. It's less of a problem now, right? But if you're going to read something in to R as, and you want to set the kind of default, the type it comes in as, that's something you want to do explicitly in the code so that when you share your code with someone, they get the same results, right? You don't want to, for example, load something like the tidyverse in an R profile, because if you share code with a collaborator that doesn't have the tidyverse in it, they might, you know, run the code and get a different set of packages in their environment that execute the code, right? So that's something you would want to do explicitly in the script you're sharing, right? Don't alias functions in your R profile, right? Because that'll do, that'll cause some of the same problems, right? Someone might not end up with this F available in their environment. And then when you share your code with them, it won't work. And then, you know, again, if you were to set a theme in your R profile, then when someone else renders your plots, they're not going to get the same thing, right? So these are things that they matter to code that you're sharing, so you don't want to put them in the R profile, but you might want to, you'll want to set them explicitly in the script instead.
We don't have neighbors today, so I'll just, I'll pick on, will I pick on someone? No, that seems like a lot. But Shannon, I guess I'll pick on Shannon. Shannon, why might these be safe to put in your R profile?
So our use this and dev tools are things that you tend to use interactively, and they're not lending, the functions in these packages are not imperative to reproducible code and data and data reports and data artifacts. It's just things that you use on the fly. Right. So yeah, I like to think of these, thank you, Shannon. I like to think of these as like development dependencies, right? Use this as something I'm calling a lot if I'm doing interactive work. I'm just repeating what she said. That's not helpful, but they're development dependencies, right? So if I send a package to my collaborator, they don't need to have dev tools installed to work with the package necessarily. So those might be safe to put in your R profile. I still wouldn't, but this helps you kind of understand the distinction there.
Go ahead, Greg. You mentioned developer dependencies. Can I handpick which R profile I load on execution? Can you handpick which R profile is loaded when R starts? I mean, well, whenever you load a project, sorry, I'm trying to encapsulate stuff into projects as opposed to just code. Right. I'm going to say, stay tuned, Greg. We have, we're getting there. I mean, so there are definitely ways that you can, for example, customize the behavior, like the flowchart I showed at the beginning. There are a number of things before the files that we're talking about putting on disk, right? These are kind of the most common ways. If you have a need to put deeper, deeper into the system, you can totally change those things. Some of the tools we'll talk about in a little bit, take advantage of some of these facts to give you a startup experience that I think is going to reflect some of what you're looking for. But we'll get into some of those details a little bit.
Dot files and the R profile
So one way that you can figure out things that people put in their dot files is by searching, sorry, putting their dot R profile is by searching through them on GitHub. Files that start with this dot prefix are called dot files. It's a very creative name. And they are often configuration or other files for programs that people use on their computers.
Some people share their dot files publicly as a way of making them available for other people who want to use them or just because they, whoops, because they want to use them to set up new computers. My dot files are public for that reason. What is happening to my ability to copy and paste? There we go. We've done it. Right. So if you search for dot R profiles, right, Colin Gillespie, you can check out his dot R profile. He's got a lot of stuff happening in here. Right. So you can check some of these out on GitHub for some inspiration for things to do.
Right. I have a lot of people also put things under just the dot files, but I use a dot file manager system. So the translation to R is not immediately apparent there. Anyway.
Okay. So we're going to do the same thing we did last time. Right. Now we're going to edit our user dot R profile. Then we're going to edit our project dot R file dot, edit our user dot R profile. We're going to edit our project dot R profile. And we're going to see what happens after you restart each R session. So I'll give you five minutes to do that.
Live demo: editing R profiles
All right. So let's try out this activity and then we'll take a break. So what I'm going to do is I'm going to use this. I'm just paying attention to the button. What I'll do quickly is confirm I don't have anything. Okay. So I'm going to call use this, edit R profile. Again, you can see it takes the same scope argument as the edit R environment. So if I don't provide a scope, it's going to default to the one in my home directory. Right. So this is my user R profile. Right. And since this is dot R profile, we can just put R code in it. So I'll say hello from the user R profile. I save and restart this. Right. Now we can see, right, this is R code that gets executed on startup. Right. So I started the session. This was executed. I didn't do anything. Right. That's just what happens when I put R code in this user R profile.
Now I'm going to edit this. Now I'm going to edit the same file, but in the project scope, again, you can see, right, the project scope I have here, my home directory. Beneath my home directory, I have my projects folder. And then in this project, right, my WTF R-Ladies-Gabroni project, I'm modifying this R profile. If I say hello, right, and again, this is R code. So I use the print function, user R profile.
Save. Restart. Right. I get hello from user R profile. Right. Because the dot R profile is set in my projects directory, the one in my home directory is not evaluated. Right. So it's the same short circuiting behavior. But in this case, we're executing R code instead of setting values that we have to retrieve.
Questions about any of that before we break briefly? Like I said, this behavior is going to come back in a bit when we talk about reproducible environments with renv. But if there are no questions, should we give it another, Shannon, five minutes? Yeah. So give it five minutes. Take a break. Go get some coffee or whatever beverage is appropriate for what time you're in. And we'll see you in five minutes.
So we're gonna try this out in my environment. I'm gonna all use this. I'm gonna edit my project or profile. And that's this. I'm gonna drop this in here. In this case, I'll just add, I'll use package manager E3M, the brand latest. So now I have R up inside and package manager set as repositories.
I'm going to restart R. By the way, that interrupts that you see when I restart R is because I have a weird thing happening in my terminal. You shouldn't see that, but just to explain what's going on. If I kill this terminal, it will stop doing that.
So if I were to run options repos now, I would confirm that I have those two repositories set. And so now I can install packages, get seller. And you'll see it got fetched from the R universe. And we downloaded a binary package. So a neat thing to know about our universe is it's an R OpenSci project. For people who need to distribute more complicated packages, often they're associated with specific scientific domains and you want to distribute binary packages that are too difficult to get on CRAN for whatever reason. Then looking at the R universe is a great way to get some of those packages. And if you have packages you want to distribute and CRAN isn't the right place, setting up an R universe is pretty easy.
So that's the idea, right? We can modify this repository option, right? This is a thing we would do in our R profile and gives us the ability to install binary packages from somewhere else. When R says a version of the package might be available elsewhere, this is a kind of polite error message, right? It's not available in the repositories you've listed, but it might be available somewhere else. So there isn't necessarily a good place to look because there's a kind of infinite number of places the package could be. But the most likely places I would say you can use package manager has a search function. So if, for example, a package has been archived on CRAN, you might have an easier time finding it on package manager because it displays both current and archive packages, even if the process you have to go through is slightly different. So look at package manager. You can look on the CRAN website itself. Sometimes packages are only on GitHub if they're on GitHub but not distributed through an R universe. Those are going to be the most common places. Those are where I would look.
So, right, we got Git seller. We got a binary version of the package. And how do we know? We know because it told us that it downloaded a binary, right? Also, if you look at the extension here, you can see it's a .tgz, which on macOS is what the binary package format looks like when you download things on macOS. Again, it's slightly different on Windows and Linux.
So binaries, right, the easiest thing to get. But if you are installing, you know, packages from somewhere where compiled versions aren't available, if, for example, you're installing a package from GitHub, if you install a package from GitHub, you're just copying the source files down to get those latest versions. Then you have to compile them yourself. And so you may need to install those packages from source. And so hopefully this gives you some clues for how to do that.
Anything else on installing R packages before we talk about reproducible environments?
So if you can go back to your R session, you had a URL to point to the package manager. Is that correct? When I go there, it doesn't seem to work.
Yeah. Maybe I'm typing it wrong.
Let's see what happens. Right. So let's talk about what's happening. So package manager, right, there's this is the web interface, right? So if you saw, I went to p3m.dev, I automatically get redirected to this client's thing, right? So this is this is the interface that gets served if you visit package manager in a browser. If I go over to the setup page, and say I want the directions for Mac OS, for example, in the RStudio IDE, then this is the repository URL that I need to configure from my R session, right? Because the HTTP request that I send from R to get packages is not going to look the same as the one that my browser makes. So that's the difference there, right? And there will be, I think we'll talk a little bit about some different settings you want to apply depending on how you're trying to get that information. But that's the distinction you're seeing there, right? Is the browser view and the view from the R package request are not the same.
Reproducible environments
Any other questions about installing packages before we talk about reproducible environments? Okay. Reproducible environments is my favorite topic, which is why I have this job.
And when I was working on this presentation a couple years ago, I just like, the takeaway I want is you are going to need to reproduce your environment. The work, if you believe the work you do is valuable, then you should believe it's worth being able to reproduce. That is an argument. There's a sermon that comes with it that I will give later in the interest of time. I'm just going to say having a way to reproduce your environment is going to make your life a lot easier if you have to pick up a project that you were working on in the future, which is a thing that happens often. Or if you want just someone else to be able to work on your stuff, just having a way to reproduce your environment is going to make your life easier.
The work, if you believe the work you do is valuable, then you should believe it's worth being able to reproduce.
This is a diagram that someone on my team made a while ago to talk about different ways that you can think about reproducing environments. On the x-axis, there's who's responsible. And then on the y-axis, there's how permissive the environment is. We are going to focus on this snapshot use case where you are responsible for reproducing your environment, and we're assuming that the environment is relatively permissive. There are different things you will have to do if you work in a context where the environment is not as permissive. But understanding how to do things well up here makes all of the rest of these more easier. And if you're in the red zone, it kind of sucks to be there. So you want to try to stay in the happy path. This is your responsible. You don't necessarily have control over how permissive the environment is, but the more responsibility you take for reproducing the environment, the easier time you'll have if you have to operate under some of these more restrictive conditions.
So we're going to talk about two different tools that you can use to construct reproducible environments. One is a positive package manager. We'll focus on public package manager. And then the other is using the renv package. In a world where you have the ability to access public package manager, there are a lot of things that it makes easy that you can do. renv is something that you can use as long as you can install the package. So we'll talk about both of those as strategies for reproducing your environment.
So one of the ways that package manager makes it possible for you to reproduce your environment is you'll notice that in our previous example, when I configured the repository that I wanted to get packages from, there's this slash latest. Slash latest tracks plus or minus a day, usually, the current state of cramp. So if I hit install packages, if I set this as my repository URL, then I'll get the packages the way they look on CRAN right now, whenever right now is.
One thing you can do that might make it easier to reproduce projects, say you were working on something a long time ago, and you don't want to figure out what collection of dependencies you can use to bring that back to life manually, is you can use this date-based snapshot capability. So under the snapshots, do you want to freeze package versions or do you want to install packages from a particular date? So let's say I went back to a year ago today. Right now, if I look at this URL, again, rather than latest, what this says is slash CRAN slash June 7th, 2023. If I use this URL as my repository URL, then when I request packages from package manager, I will get back a package set reflecting the way CRAN looked a year ago. And so on back to, I think, like October 2017.
There used to be another way to do this. If you ever worked with Microsoft R, MRAN and the checkpoint package enabled a similar kind of workflow. Microsoft stopped supporting MRAN last year. We have made some changes on the package manager side so that if you were using the checkpoint package and you try to get a date-based snapshot out of package manager, it will also respect that. So if date-based snapshots are a way you like to work, then you can use this repository. It's like date-based repository URL to recover previous states of the CRAN repository. This also works for PyPI.
So date-based snapshots. I selected dates. You can see I get the dates in the URL. So what you're going to do is, yeah, Greg, go ahead. Sorry, just out of curiosity. So is the date stamp formatting of snapshots, is that only available in PPM or is that something that's also in renv?
There is an renv function that will make it a little easier for you to work with date-based snapshots. It's a relatively new function. It's called checkout. But what I'm illustrating here, like using this date-based repository URL or date-based snapshot as the repository URL will work kind of no matter what package installation client you're using. Because
the from the perspective of a package installation client, you're just supplying a CRAN repository that happens to behave the way this looks. So what we're going to do is we're going to take a couple minutes. You're going to set a date-based snapshot URL as your repository in your project. And then you're going to install a version of dplyr and post in the chat when you have installed this version of dplyr. What version did you install?
We'll give people a couple minutes to do that.
Okay. So some stuff is happening. One thing that Shannon has hopefully pointed out to me, I am very comfortable YOLO installing things because when I'm actually doing work, I always work in a isolated project environment. If that is not your lifestyle, make sure when this class is over, you reinstall whatever version of dplyr you were using before so that you don't break all of your projects.
So what we're going to do is I was running R 4.4. So I'm going to reset my version of RStudio so that it does that.
So I've set my repository here to point to an instance of a sorry, a date-based snapshot for package manager. If I reset and I look at my repo option, you'll see that it's pointing to that version. If I install packages and I request dplyr, then I get dplyr version 1.0.5. Now, some people who are using R 4.3 did not get that result, and I also got something else slightly unusual happened, which I will debug after this call.
But in general, what ought to happen, right, is that you should be able to point yourself back in time, fetch an older version of dplyr than the one that is the latest on CRAN. So if I reset this to latest, I reload, and I install packages. I ask for dplyr again. Now I should get 1.1.4, right? So that's the current version of dplyr on CRAN. And so that's what I'm expecting to get.
Okay. Questions about that workflow, what's happening on the package manager side?
If not, we'll switch to, can also use package version dplyr to check. That was a good call. I just read log messages all day, so why run functions? But yes, if I say package version, and I ask for dplyr, it will tell me that this is the version I have. Okay.
Managing dependencies with renv
So managing your dependencies by choosing a date for a repository is one way to do things. Right. In the context of a project-based workflow, right, one thing you might consider doing instead is working with a library that's isolated from your other projects, right? And that's what the renv project is going to help you do, right, is it's going to give us a way of constructing per project r package libraries, so that when we make changes in one project, we're not worried about the influence they'll have on your other projects.
So normally, right, and this is to the point of the warning that I will reiterate, right, normally when you have like a user library, or in the sort of standard setup, I have project one, project two, project three, and they all depend on the shared project environment that's generated by the shared project environment that's available at libpaths.
And so what renv is going to do is give me the give me a way to have individual libraries associated with each project, right. And the there are a lot of safety advantages to doing this, right. So I can experiment with new package, new packages, I can, I can install things experimentally, I'm not worried about breaking the other projects on my machine.
You can communicate what versions of everything you're using to other people on your team, or just yourself if you're working on the same projects a year from now. And then renv also has a caching mechanism that means that if you've already installed the package, it'll just you'll get that one linked into the library. So you're reusing things instead of downloading each of them fresh each time. So you're getting kind of the best of both worlds, you're in project isolation, but intelligent reuse of the things you have on disk.
You can communicate what versions of everything you're using to other people on your team, or just yourself if you're working on the same projects a year from now.
Demonstrating renv in RStudio
This is make this a little smaller, so that we can see what it looks like to work with renv. I'm going to maybe show this one and then sort of illustrate some of the things we talked about along the way. So if I look at, I come back to the RStudio IDE, and I look at my libpaths right now, right, you'll see that I have, you know, these two libraries, right. So this is the user library, right, we call it that because that library is under my home directory. And then this second one is what we call the system library, right. Now the system library, because it's at the slash library. This is typically the way I would install packages into the system library, is by running commands as an administrator. If you're in an environment that is managed less permissively, then oftentimes what someone will do is try to install packages into the system library for you, so that they're available, even if they're not available in your user library.
What I'm going to do to create an isolated project library instead, is first I'm going to install the renv package, and then I'm going to run this function renv init, right. So I'm going to initialize renv, and a couple of things are going to change about my projects. So the first thing you'll see is that if I close these, and I'll close all these, and I'll open up the files pane, I'll come back to WCFR, ladies. I open up this R profile, you'll see this has been written into my R profile by renv, right. So if this wasn't here, renv would have created a user, or sorry, a project R profile. And what that means, right, as we've learned at the beginning, right, is that now the project R profile controls the behavior of the startup behavior when I open R.
So you'll see if I clear my console and restart R one more time, the, this message is being printed, right, and renv has printed this message. So I get the renv version, I get the projects I'm working in, right, and those are, those messages are coming from this activate script. Now if I look at the libpaths again, right, we'll see that, I'll make this a little bit bigger, right, now I have a different set of libpaths. So the first one, you can see, is, refers to the directory my project is in, right, and then you can see below that I have renv library, right, but this is a project-specific library path. And the other thing you can see now is that the second library, right, is in a cache, right, it's referring to the caches directory, and so this is now those, if I come back to the diagram I have here, right, I now have a cache and I have a project-specific library. And so as I install packages, they'll go into the global cache and they'll become available in my project.
So if I run renv status, right, this will tell me whether there are things I need to do to change my project so that the project is in a consistent state. Consistent state in this case means that the state of the libraries referenced in my code are accurately reflected in this lock file, right, and so this lock file is going to be what keeps track of where I got my packages from, what version of R I'm using, and then what packages I'm actually using in my project.
So if I create a new R script and I add, for example, a library statement, I'll just call this main.r, you'll see JSONLite is required but not installed. One thing I want to show you, right, is that if I type install packages, you can see in the help text here that this is not coming from utils anymore. It's coming from renv shims. And so what happens is renv is taking over the functionality of install.packages so that when you install those packages, you're installing them into the way renv expects to manage them. So rather than clicking on this dialog, I'll use this here and I'll install packages. I'll say JSONLite and now what you see, get installed. And again, you can see that's into my project specific library.
Okay, so I'm installing JSONLite. You can see that because I've used renv on this computer before, it's pulling it in from the cache. So I didn't actually go to the internet to get, in this case, JSONLite. If I use another library statement, or before I do that, I'll call renv status. And what you see now is that I'm in what we were calling an inconsistent state. So I've installed the package. I'm using the package in my code, right? renv is going to look through all the R code files in your project and figure out what packages you use. So I've installed the package, I've used the package, but the package isn't recorded in my lock file.
And so what I'm going to do to correct that is I'm going to call renv snapshot. And what we're going to do is we're going to update. You can see this is going from nothing to the current version. It's going to be updated in the lock file. I'll say yes. And now my lock file has been updated with the JSONLite package.
Now, the observant among you will have noticed that this did not actually come from package manager. That's because I installed JSONLite before changing repositories. So one of the things renv is keeping track of is the fact that I got this package from a different repository than the one I got this package from.
And if I were to do that again, right, so if I were to add another library, say Parallelly, again, if I look at the status, you'll see the package is used but it's not installed. So I'm going to install it. Parallelly. And I'm getting it from package manager, right, so you can see it was downloaded. And so now I've installed the binary package from package manager, and then I've stuck it in the cache as well. And so now if I look at the status, right, and this is kind of the workflow that you exercise when you're using renv. You check the status a lot. You see if there are problems, right, in this case, again, I've installed the package. It's recorded, but it's not used. I'll snapshot.
It's hard. And you'll see, again, I'm going to grab the version of Parallelly and the repository I installed it from. And I'll add that to the lock file as well. Right. And it's the same thing if I remove the package. If I look at the status. Now, you can see it's installed and it's not recorded, right, but it's not being used. I snapshot. Then you'll see we go from having the JSON-like package to eliminating it from our lock file. And now the lock file just records the packages I'm actually using.
If you're working with packages you install from GitHub, right, if you're working with Git projects, yes, the lock file is. So, yes, if you're working with Git, you would want to commit the lock file, right, that's going to make sure that when you share the project with a collaborator, then they have the ability to restore the lock file. Then they have the ability to restore the packages.
Greg, do they have to solve the dependencies if they're not using ARM? No. So, you'll notice in the lock file there isn't any information about operating system. So, if I were to pick up this lock file and restore it on a different system, as long as I have access to a CRAN repository that, or CRAN like repository that has all the packages, I'll get the same installation back.
R-Env, I don't wanna say something incorrect. I think the solving languages, I think might not be correct for R-Env. That might be more like what Pack does. So no, it's not an SBOM. No, it's not. You can probably use it to generate one, but the R-Env lock file is not an SBOM. I believe like there's a particular specification for what an SBOM is, and you have to go generate one based on the information you have here.
Okay, yeah, Greg, other question? Yeah, I'm gonna get a little bit in the weeds. So those hashes, are those just hashes supplied by the repo of choice? Like one thing that comes up for builds, it's like, are these good hashes or no? I don't remember. All right, that's fine. So I, yeah, I would check the documentation. They capture some things about the install dependency, but I don't remember exactly what right now. So I would say, check the documentation. If it's not clear from the documentation, please raise an issue and we can get that question answered.
renv conveniences and the junk drawer approach
Okay. So a couple, I want to show a couple of other things you can do with R-Env installed, with R-Env just to, in terms of how you install packages, just some conveniences to know about that make your life a little bit easier. So we already went through the fact that R-Env install and or install packages with the shims enabled, right? Those are going to be ways for you to install those different packages. And then there are a bunch of shortcuts for different ways of getting things onto your system that you can do if R-Env is active.
So for example, if I said I wanted to install packages, JSON lights at 1.2, right? You can see like that version specifier is not a standard R syntax, but it's supported by R-Env. And the same thing is true if, for example, you wanted to install a version of the package from GitHub rather than installing it from CRAN. So if I say install JSON light here, then you can see now I've fetched the package from GitHub and I'm installing it. One other thing you'll see, right? This is taking a little bit longer, right? Because since I pulled the package source from GitHub, I had to compile it myself.
And then if I snapshots that version of the package, oh, well, JSON light isn't actually used in this code. So we'll re-add that to the... And now you'll see, right, that R-Env can track also that the package was installed from GitHub rather than being installed from a repository and you'll record some information about the versions of the packages and where you got them.
And then one last thing, and then I'll give us a five minute break before we finish out the kind of last section in the last 20 minutes here. So I have, the approach I take for sort of keeping my, my projects I care about separate from ones I don't is I have what I call the junk drawer. So I have a directory that's just called Scratch and in that project, anything goes. So I just install random GitHub projects, things that look interesting, like all of that goes into the Scratch drawer and I don't manage that at all. And then for everything else I use R-Env.
So I have seen different versions of this approach. One neat approach someone told me about last time is you can have, for example, a globally, you can globally did ignore the Scratch directory and then you can have a Scratch directory in as many folders as you want. So figure out what approach makes the most sense to you. But I find that like when I'm doing experimental stuff when I don't wanna worry about it polluting the rest of my environment, then I'll do things in like this Scratch folder and then I'll have most of my other stuff organized in a project way.
So let's take five and then I will do the last section on kind of installing and upgrading R.
All right. In the homestretch, so do I clear scratch my scratch folder sometimes? No. I mean, Shannon, yeah. The nice thing about it is that I don't care about anything that's in it, so.
But, you know, occasionally, like, I create reprexes and deploy things to, like, Cordopup or whatever. So I've never needed to delete it because it just, like, the way I have it constructed, it can't break anything. If you like condo-style file management and nothing in there sparks joy, then file means delete it.
Installing and upgrading R
Okay. So the last thing to talk about is installing and upgrading R. And I'll say there's, like, two conceptual things that I like to sort of talk about here. One is that these sections are kind of a cycle, right? So installing R, upgrading R, and then working your way through reproducible environments. If you adopt the project-based workflow, then these things are all, like, points on a circle instead of things that you do occasionally and randomly and dramatically. So I want to encourage you to think about them that way, right? That, like, an install of R and an upgrade of R should basically be the same operation. The other thing to talk about, right, is this thing I like to call the project onion.
And so the project onion is just thinking about the different layers of software that are responsible for how you manage your project. So we've spent a bit of time talking about these three things, right? So we've talked, and I'll, in a subsequent slide, we'll spell out explicitly what each of these are. But our goal is to come up with, like, a holistic framework for managing all the things we need to get to a place where we can kind of reproducibly manage our R projects. And that doesn't have to just be about our R projects. That can also be other kinds of projects that we work on. There's been a lot of, like, Python, there's been a lot of Python in the chat today, right? This concept, I think, is actually much more important to apply there. There are some things that it's kind of safe to ignore sometimes with R that are a little bit harder to manage in Python.
So we've got these five concepts, right? I'm not going to spend as much time talking about Scoop and Homebrew today, right? Those are things you would use to manage how you get different pieces of software onto your computer. But the thing we will spend, I'll spend a little bit of time showing you in the last few minutes here, is this tool called Rig, right? Rig is a R language manager. So it's responsible for, it's a way of making it easy for you to install different versions of R. And then once you have those different versions, you can use Rig and tools like renv to manage your projects.
Okay. The other thing that I'll say is because sometimes, like, you might want to experiment with these things without destroying your ability to, like, deliver on the projects you have in flight. So there are some tools I like to use to do, and again, this is, I mean, like, my job is to help other people install software. And so I need lots of practice doing this. But I find that these tools are useful if you just want to try something out and you don't want to, like, mess up your machine. So if you're using Windows, there's a tool called the Windows Sandbox, which essentially, it just lets you start a very lightweight Windows VM where you can install things. And then when you close the window, it disappears completely. Right? And so I use that to test out, you know, installing, like, different ways to install R without worrying about breaking my machine.
So if you're on Windows and you have the ability to turn on the Windows Sandbox, it's a fun way to experiment with things. I think it requires, like, Windows Professional, and you have to have access to control, like, some virtualization settings. So it might be trickier to do on your work laptop if you don't have that kind of control. But it's something to investigate. If you're on Mac OS, particularly, and at this point, let me say, if you're on Mac OS, I think both of these tools require you to be using an M series Mac. But something like Tart or this thing called UTM, both of these will allow you to boot up virtual ARM, virtual Mac stations. And then you can use those to, again, just do experiments with ways of installing R, different things. And then if you're on Linux, I don't know as much about Linux desktop, because I don't use that that frequently. But there are lots of easy ways to get yourself a Linux server that runs remotely that's not the one that's your kind of daily driver. So you can go spin up a VM, do terrible things to it, and then throw it away when you're done.
But having a framework for experimenting is just going to make it make life a little bit easier for you in terms of figuring out how to take this project-based workflow and spread it to all of the things you do on your computer. So the thing I said at the beginning, I really strongly prefer being able to tell people to run a command, instead of being able to, instead of like giving you a screenshot of like, this is how you download R. And, you know, so I think that's the approach we'll be taking to installing R. And then, like I said, I'm going to skip over the package manager piece of this. But know that there are the, I will say one thing to know, it's just that like, there are lots of types of software that are called package managers, unfortunately. And so if you're using Windows, check out the Scoop project, or check out Wingets. If you're on Mac OS, check out Homebrew. These are things that'll make it possible for you to install Rig, right? And that's your homework, because we won't really get to it today.
Demonstrating Rig
Your homework is to install Rig, which is the R installation manager. So if I, I'm doing this in a terminal, it's like, Rig is mostly a command line tool. There are some different ways of, you know, you could access it from the terminal inside RStudio. I will not do that today, just because I think it will make things slightly more confusing. But the things that Rig lets you do, right, is manage the different versions of R you have installed on your computer, get new versions of R, get older versions of R, set different versions as the default. So if you saw earlier in class, Shannon said that she was running R 4.3. And I wanted to test something that she said didn't work for her on R 4.3. And so I'll show you, I think that'll be the last thing we do today, is I'll show you what that looks like.
So if I run, for example, Rig list, Rig list is going to list the versions of R that I have installed on my machine, right? So I have 4.4. Because I'm running on an ARM Mac, I have this little ARM 64 suffix. This will look different depending on what kind of machine you're on. But Rig runs on Mac and Windows and Linux. So no matter where you're using it, you should be able to do that. If I wanted to install, it'll take slightly too long and jeopardize my bandwidth, so I won't install another version of R. But if I wanted to look to get an additional version of R, right, I could Rig install, you know, latest. I can install a specific patch version. I could install, you know, R 4.0 if I wanted to. And when you install an R version, Rig also has a couple of defaults. Excuse me. So, for example, it will install, Rig will install pack by default. You can instruct it not to do that if you don't want, if you don't want it to. It will also set, I believe, package manager as your default repository. Again, if you don't want it to do that, if you just want it to come set out of the box, then you don't have to do that. But the nice thing is that if you install something with Rig, you're getting the R installation practices that are just going to be kind of baked in. And that's why we want to use something like a, quote, an installation manager, is because it just sets a lot of things for you that put you on the path to success.
installation manager, is because it just sets a lot of things for you that put you on the path to success.
So, I Rig list. We can see my current version of, my current version of R is set to 4.4.0. And that's the version I'm running in RStudio. I'm going to quit RStudio here. I'm going to set my default to 4.3 ARM64. Now, if I Rig list again, right, you'll see the star is by 4.3. That's now my default version of R. And so, if I were to open up my instance of RStudio again, now you'll see my default version of R is R 4.3.2.
One of the, and to kind of bring us full circle, you can see that my R profile hasn't changed. So, it's still activating renv. But now you can see I'm getting some additional information. Several things have happened as I started R here. So, because R packages have to be installed for each R minor version and package version, when I switch to R 4.3 from R 4.4, there's some code in renv where renv will figure out how to go get a version of itself for the appropriate version of R that you're running. And then you see I get this informational message that says that I'm using R 4.3 now, but my lock file was generated with R 4.4, right? And the other thing I get is a message that says none of the packages that I requested in the lock file are currently installed. So, if I look at the status, something wacky is happening there. If I look at the status, then I get asked if I want to restore the project library, right? And so, you can see, now again, I'm going, I'm getting that version of JSON Lite I installed from GitHub. I'm getting the version of Parallelly that I installed from Package Manager, but I'm getting versions that are specific to this version of R instead of the version 4.4 that I started with. And then if I run snapshot, my package versions don't change, but my R version does change in my lock file.
And so, when you think about, you know, taking a project, upgrading to a new version of R, right, you have a way of controlling which version of R you start, you know, you work with for a particular project. You have that recorded in your lock file. And there are some tools in Rig, for example, there's a Rig RStudio command, which will just launch RStudio. But if you provide Rig RStudio with an R project file or an R with a lock file, it will open, you can use this, you can use Rig to open RStudio with specific versions of R. So, it gives you a lot more control over what version of R you're working with, what versions of packages you're working with, right, it makes your R version parts of your R project.
Hey, David. Yeah. I just wanted to ask about the snapshot. Is it the same as a Windows system restoration? It uses the date timestamp. Yeah, please. Yeah, so the, I have not used Windows system restoration in a long time. I think the, it is probably much more generic, right, so something like Windows system restoration is trying to back up all the files on your operating system. We, in this case, right, like, if I come back to the project's onion, right, we're talking here about, you know, stuff from your R projects to how you manage things, like software did you get installed, but around this is another layer of the operating system, and so the Windows, Windows backup, whatever Apple calls their thing, time machine, something like, all that is going to be stuff that's outside the context of managing the software you install on your computer.
Closing and Q&A
So, we are basically at time, I want to thank everyone for their time and attention. Happy to answer any final questions in the remaining couple of minutes before we hang it up.
There was a part where you were talking about where you could customize the profiles and Shannon said it wasn't covered in this presentation, like in the conference, would you cover it then? No, it's just a suggestion, yeah, and I'm happy to look up some really fun R profiles and share that with you. I would say, yeah, on customizing your R profile, there is some stuff I talk about not quite in this series about other ways of customizing your terminal, so you'll see, for example, that if I start typing things in my terminal, I get lots of help. These are all things that I do to make it easier to use the terminal when I'm trying to work on lots of different things. Again, my job just requires, not requires, but the way I find it effective to work might not be necessarily as useful to you because I don't actually do that much data science, I'm much more, yeah, but yeah, this is basically what we'll attempt to cover during the workshop in person.
The resource setting up Python virtual environments, there's a, so solutions.posit.co is a website that my team maintains where we talk about lots of problems that you might confront, particularly if you're using posit professional products, but some things are just like things that are good to know, and so this is the article which I'll drop in the chat. Like I said, the real meat of the thing is something I call the iron law of Python management, which is create a virtual environment for every single project. I cannot stress enough how serious I am about you really have to do it for every single Python project. There is no other safe way to use Python.
you really have to do it for every single Python project. There is no other safe way to use Python.
Yeah, well, I am david at posit.co if you have questions or want to follow up by email.
Okay, I think we're all good. Yeah, I think we're done. No one has any other questions. Sheila, do you have anything you want to add? No, I'm good. I just wanted to say thanks to everyone for joining. Yeah, have a great rest of your day or night. Yeah, it's nighttime right now in Africa.
Okay, and I just want to just encourage Shannon and David to get started on that book because I think we really need this. Yeah, there are some of the material here will make its way back to the book. The book in question is at rstats.wtf. Kind of continually a work in progress, but some of the things I mentioned about the workflow cycle, those are things that I'll be filling out through the conference.
All right then. Okay, thank you everyone for coming. The recording will be available soon on R-Ladies Global and you should be getting a notification beginning of next week. Yeah, so thank you again David and Shannon for your time. And Sheila, thank you so much for co-hosting with me. Thank you. Thank you. All right, bye everyone.
