October 2023 Webinar: Package Management in R Studio
Package Management in R Studio Thursday 26th October 2023 With Ryan Johnson, Data Science Advisor at Posit About the webinar: Managing packages in your R environment can be challenging. We’ll introduce R Packages, Libraries, and Repositories for this training and discuss strategies for managing your environment
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
There we go. Right, so we'll kick off. I just want to do a couple of announcements. So, the first thing to say is that, obviously, we just had the NHSR NHS PyCon conference, and it was a very resounding success. For those of you who were there, or for those of you who attended online, if you didn't attend in person or online, or even if you did, you should know that we are going to be publishing as many of the talks as we can, which is nearly all of them.
I think there were one or two that didn't want to be recorded for various reasons, which is fair enough, but we'll be publishing most of them as soon as we can, and they'll live on our YouTube. Our YouTube is already great, so go and have a look at it sometime. There's loads of useful stuff from all of the conferences that we've done on there already.
There's some workshops upcoming as well, so we have an intro to R and RStudio, which is very popular and well attended always. That's forthcoming, and we also have someone has offered to run a session all about GitHub Actions. So, one of the talks at the conference was about GitHub Actions, and I believe in that talk, they said, if you're interested in GitHub Actions, I'll be doing a workshop, and we'll be organising that workshop in November, so look out for that.
And the last thing to say about the conference is, I mean, I think it was absolutely fantastically resounding success, as I said at the time, and I'll say again to anybody who wants to listen, but we are doing a survey to find out what was good and bad about it. As I mentioned at the conference, we kind of re-ran last year's conference this year deliberately, because we've got a new team in looking after it, so we didn't want to move too much stuff around. But for next year, we've got a very kind of sincere wish to move the chairs, move the tables, move the venue, do it totally differently, one day, three day, eight day, you know, online, you name it, any combination of things.
So, if you've got views about how it could be done better and how it was done this time, then please share them in the survey, which will be going out very soon. Right, with all that said, thank you very much to Ryan. Ryan is a great friend to the NHSR community, as well as Posit in general, and Ryan has come to talk to us today about package management with RStudio.
Introduction to package management in R
All right, thanks, Chris. All right, so like Chris mentioned, today's session is going to be all about package management in R. And we've been doing quite a few presentations and workshops and webinars with your group, so hopefully for a lot of folks on the line, you recognize my bald head, but if this is your first time meeting me, and it's the first time meeting you, my name is Ryan Johnson. I'm a data science advisor here at Posit. Part of my role is just to make sure that everyone on your team is familiar with our open source tools, also our professional tools, and also just passing on some R and Python best practices.
And so that's kind of what we're going to be talking about mostly today. As we go through today's session, I highly encourage you to use the Zoom chat for any questions, and I'll be sure to keep that window open on my screen so I can see those questions coming in. But I'm also okay if you have any questions as we go through any pressing questions. Feel free to unmute yourself, scream into the microphone, ask any questions live. I'm totally okay with that as well. I like to keep these very fun, casual, and a safe learning environment.
And then I will be sharing the slides with you all directly afterwards, so no need to kind of have your notepad and pencil ready. You can just sit back, relax, and take it all in. I also will say that there will be a few moments during today's session that if you want to follow along and actually write some R code, it'll be very minimal R code. But if you do have RStudio open, either your local desktop application or maybe using Posit Cloud or Posit Workbench, wherever you access RStudio, just feel free to have that open, and then I'll kind of direct you to that RStudio session a little bit as we go through today's session.
So what are we going to talk about today? Well, this is going to be all about package management in R, so obviously we're going to be talking about packages. But we're also going to be focusing on libraries and repositories. These are going to be the big three topics for today's session. We're going to talk about what happens when things go wrong in your package management strategy, and also some strategies and tools to either prevent things from going wrong or to kind of fix them if they happen to.
Here's some other resources that I collected when creating this presentation. So, again, I'll forward these slides over, some great presentations, some documentation to read through. And then one more thing to keep in mind before we dive into today's topic, if you were hoping I was going to stand up here and say, this is how you manage your packages in R, that would be great if I could do that. But unfortunately, there is no single solution for managing packages in R.
And why is that? That's because every team is a little bit different. So here I have listed just a few variables that can differ from one team to the next, such as are you running R and RStudio locally on your desktop, or are you running on a server? Are you in an air-gapped or offline environment? Do you work in what's known as a validated environment, very popular with our pharmaceutical customers? How secure is your IT or security team? Who manages the environment? Is it the users or the admin? And can a data scientist update a package to the latest and greatest, or are they kind of locked down to a specific version?
So because all of these things differ from one team to the next, again, we're not going to have a single solution for you all. But my goal is to provide you with a lot of topics and things to consider so that you can develop your own internal package management strategy specific for your group.
Packages, libraries, and repositories
So to kick things off, I'm going to show on the screen here an error message that no matter who you are, if you've written any bit of R code, I guarantee you at some point you've hit this error message. So what are we doing here? So we're trying to load the ggplot2 package, which is popular for data visualization, using the library function in R. And we get this scary red text saying there is no package called ggplot2.
Now, if you've been using R for a couple of years now, you may know exactly what's going on here. But put yourself in the shoes of someone who has never used R or is just starting to use R. An error message like this generates a lot of questions. Like, where can I find ggplot2? Do I already have ggplot2 installed? If not, how do I install it? What version am I installing? And then where do I put it after I install it? That's a lot of questions and can certainly be overwhelming, again, for those that are new to R.
So to tackle this question and some other related questions, we're going to focus in on these three topics. Obviously, we have packages, libraries, and repositories.
Now, I personally, when I was first learning about package management in R, and I was trying to think of some ways to kind of help explain these three concepts, I came up with this analogy, which I've used potentially in some other presentations for your group, but I always think it's good to go back to. So let's say you are in the market for a brand new car. So this is you, and you want a new car. So what do you do? Well, typically, you're going to find yourself at a car dealership. So you shop around the car dealership, and eventually you find the car of your dreams, your brand new car. So you purchase that car. You sit inside of it. You turn it on. You drive it home, and then you park it in your garage at home.
Let's say a year goes by, and for some reason, you need another new car. So what do you do? Same thing. You go back to that car dealership. You buy another new car. You jump inside of it. You turn it on. You drive it home, and you park it in your garage. So now you have two cars in your garage. And then you can repeat this process as many times as you want for every new car that you purchase.
Now, again, a pretty silly analogy, but it does, I think, play nicely with the R ecosystem when it comes to package management. So focusing just on the cars. Cars are what you interact with. You turn them on, and you drive them home. These are your packages, because when you're writing code, you interact with packages. Now, for your cars, you park them in a garage. For your packages, you park them inside of what's known as a library. So your garage, in this analogy, is your library. And then where do you go shopping for new cars? Well, that's your car dealership. But where do you go shopping for new packages? That's your repository.
Now, for your cars, you park them in a garage. For your packages, you park them inside of what's known as a library. So your garage, in this analogy, is your library. And then where do you go shopping for new cars? Well, that's your car dealership. But where do you go shopping for new packages? That's your repository.
Now, again, let's just throw some formal definitions at you, just so you have them. In focusing on the R package, a package is a standardized collection of material that extends R. So you're going to find code, data, documentation all inside these packages. But the take-home message, you interact with packages. When you're writing R code, you are typically using functions within packages.
Next, we have your R library. This is a place, and it's going to be a directory, like a physical directory on your computer or server, where R knows to find packages it can use. In other words, this is where you store packages. And then you have your R repository. And this is the primary vehicle for organizing and distributing R packages. In other words, this is where you install packages from.
So we're going to talk about each one of those three, and we're going to start with the package. So what is the package? I think pretty much probably everyone on the call here has leveraged a package at some point in their R journey here. But I think this is a really important statement here. It's kind of like outside of today's topic. But in R, the fundamental unit of shareable code is the package.
If you have a piece of R code that you use a lot, and you copy and paste it multiple times, maybe you want to share it with your future self in another project, or you want to share it with a colleague, that's where you should take that code, put it inside of a function, place it inside of a package, and then you can easily share that package. So inside of most packages, you're going to find these four things. Obviously, you'll find your code. Your package may have some data that's needed for that package, or maybe just some data to play around with. All great packages have good documentation. And maybe you'll find some unit tests in there as well, just to make sure the package is behaving correctly.
And all of these get wrapped up into a nice, pretty package, which I purposely put in this hexagon shape, because as many of you know, authors of packages love to create what's known as hex logos for their packages. And here at Posit, we are no exception to this rule. We've created tons of open source packages, and here I'm just showing a handful of them, some of our more popular ones. So, for example, we have Shiny for creating interactive web apps using R. There's also a Python equivalent as well. R Markdown for static reports, GT for tables, a few other packages. And then you can see down here, ggplot2, which we've already introduced to your team for data visualization. And we're really just going to focus in on ggplot2 today for this workshop.
So when it comes to packages, there's really two big questions that we need to answer. How do you install it, and how do you then load it? So looking at the first question, how do you install a package? And going back to the car analogy, this is on par of trying to purchase the car. So how do you purchase the package, install the package? Fortunately, within R, there's a very intuitive function called install.packages. And then in open quotes and parentheses, we just give it the name of the package. So that will install the package so it becomes yours.
But once you've purchased said package, you need to be able to turn it on, just like you would turn on a car so you can drive it home. So to load a package into your environment so you can use all the great functions inside of it, we use the library function. And same syntax, open parentheses. The quotes here are actually optional, but sometimes it's useful just to know that you can use quotes. So for both of these functions, you can always have the package name in quotes, and you just give it the package name.
So knowing what you know now about packages, let's go back to that error message we saw on the first slide where we tried to load the ggplot2 package, but we got an error message saying there's no package called ggplot2. So what's going on here? Well, it turns out we tried to load a package without first installing it. This is basically trying to turn on a car without actually owning the car.
Live demo: installing and loading ggplot2
So we're going to jump over to our first live demo. And again, if you have RStudio open on your computer or if you're logged into Posit Workbench, anywhere, feel free to follow along and you can run some of these commands with me. But we're going to go ahead and walk through the process of installing and loading your very first R package. And again, we're going to focus in here on ggplot2.
Now I'm going to use Posit Workbench for my environment. So this is an RStudio session hosted in Posit Workbench. And what you're seeing over here in this top left quadrant, this is an R Markdown document. I have some code here in these code chunks interspersed with some text right here. So let me just go ahead and set the stage here. And we're going to come down to this first code chunk where we're going to try to load the ggplot2 package. Now before I run this code chunk, I want you to think of this environment like a blank slate, like you just installed R for the first time. You just installed RStudio. You go to load ggplot2. So if I run this code, we get that same error message we just saw. There is no package called ggplot2. But now we know what to do. We need to install it. We need to purchase said package.
So we'll come down here to the second. Actually, let me go ahead and I'm just going to clear all my output here. All right, so we're going to go ahead and install ggplot2. So I'm going to go ahead and hit play. And again, feel free to run this command in your own environment. When you run this, you may see some output that's different from mine. And that's totally okay. As long as you're not seeing an error message, then you can be pretty confident that this package was installed correctly. But there's actually some other ways you can check to make sure it was installed correctly. And one of the easiest ways is just to see what version you installed. So there's another function you all can take note of called package version. You just feed it the name of the package. We can hit play. And you can see I have installed 3.4.4. That's the version of ggplot2 I just installed.
So now that I've purchased the package, it is mine, I can now turn it on so I can start using some of the functions inside of it. And so that's going to be the role of the library function. I'll hit play. You might see some output. You may not see some output. Again, as long as you're not seeing an error message, then you can be pretty confident that the ggplot2 package is turned on and ready to rock.
All right. So coming back here to this code chunk, now that I have ggplot2 turned on and ready to rock, I can start using some of the functions inside of it. So here we're going to use the ggplot function to create a plot. We're going to use the mtcars dataset, and we're going to plot on the x-axis, mpg, on the y-axis, wt, and we're going to create a scatter plot or a point plot. So I'll go ahead and hit play. And then here's our beautiful ggplot2, again, using the ggplot2 package that we just installed.
Repositories
But we still have some lingering questions here. Where did I install ggplot2 from? So what was my repository or car dealership? And then once we installed it, we had to put it somewhere on our computer. Where did we put it? So what's our library? Let's go ahead and talk about repositories first.
So within R, there are plenty of repositories you can choose from, and here I'm just listing a few of them. Probably the most popular R repository currently out there is something known as CRAN. CRAN stands for the Comprehensive R Archive Network. And how it works is pretty cool. So it's actually not like a single server. It's actually a series of servers scattered throughout the world. And every single one of those servers is completely identical, so they all store the exact same information. And because of that, they're often referred to as mirrors. So you may hear someone say a CRAN mirror. As of March of this year, so a couple months ago now, there was over 19,000 R packages on CRAN. So, again, CRAN, by far, probably the largest repository of R packages.
But there's other repositories you can choose from. So there may be some folks on the line here that have leveraged packages within Bioconductor before. So if you're doing any type of bioinformatics, Bioconductor could be a great repository of packages. And as of, again, March of 2023, there's a little over 2,000 packages on Bioconductor. But anywhere you can go shopping for a package can serve as a source of package, a repository. So that can be RForge, it can be GitHub or GitLab, anywhere you can install a package from.
So that begs the question, well, what repository am I using? So in your RStudio environment, R environment, when you install ggplot2, where did you go shopping? You can check that by running this function. So the options function, open and close parentheses, and in quotes, repos, which is just shorthand for repositories. And that will print to the console whatever repository you use to install that package from. So here in this screenshot, you can see I'm actually pointing to an instance of Posit's own CRAN mirror. Yours may be this, it may be something different. That's totally fine, but this is how you can check.
It's also worth noting that you can see what repository you're using and also change your repository using the RStudio IDE. If you go to tools, global options, and packages. And I'll go ahead and show you this here in a second. So let's go back to RStudio and just ask what question, what repositories you're using for your environment. So again, feel free to run some of these commands in your own environment. But all I'm going to do right here is I'm going to run options repos. And that's just going to print to the console right underneath this code chunk, my repository.
It's worth noting that for the most part repositories are just going to be URLs, websites. So here you can see this website right here. Let me quickly talk about this, but we're going to focus in on it a little bit later. So in the first part of this URL, you see colorado.rstudio.com. This is our demo environment here at Posit. We call it Colorado. I have no idea why. We just do. And then we can follow the path here. We see RSPM. RS is RStudio Package Manager or Posit Package Manager. We see all Linux jammies. So we're pulling in Linux binaries. We'll talk about that a little bit later. And then I'm pulling in the latest and greatest packages.
Libraries
Okay, so we talked about packages. We talked about repositories. Now we're going to focus in on libraries. And of the three, libraries tends to be the most complex. So we're going to spend a lot of time talking about libraries. But before we really dig into it, let's just ask the question, where did we put ggplot2 once we installed it? What is our library?
So there's another function within R that's kind of built into R called .libpats. And that will print to your console what libraries are currently active in your environment. So go ahead and run this in your own environment. And it's worth noting that what you're seeing on the screen, this is my library, at least in the screenshot. Yours 100% is going to be different from mine. Because again, this is going to be a directory, like a folder on your computer, which is unique to your environment. So yours is going to be different.
Let's go ahead and just check it. What libraries are you using? I'm going to come back here, and we're just going to run .libpats. No arguments, just the function. We'll hit play. And you can see here, I actually have two libraries. So if you are using something known as renv, you may see multiple libraries. But it's going to use this top one by default. And so when I go to install a package, this is the path on my system where it's going to be placed. And I can follow this path if I want to. So using my file directory here in the bottom right-hand corner, you can see I'm currently within this package management R, which is right here. So we can just follow the path. So I can go to renv, library 4.2, that guy. And here's all the packages that I currently installed on my system. And you can see they're actually just directories within this folder.
All right, some notes about R libraries. Some of these points we've already touched upon. The first one, R packages, again, are installed into libraries. These are directories on your system, and they contain a subdirectory for each package installed there, as you just saw. The second point here is something I have not mentioned yet, but I think is really important. Libraries, they're going to be associated with a specific version of R. So take a look at the path you see here on the bottom of your screen, and you'll notice at the very end you see 4.1. This library is associated with R version 4.1. If I was to install R version 4.2, so another major version of R, that's going to create a new library for those packages.
And then this last point here, library environments are different if you're using RStudio on your local machine, so on your laptop, for example, versus a Posit Workbench or a server-based environment. So let's actually talk about that last point here. And first, let's go through a scenario where you're running R in RStudio on your local machine, let's say your laptop, probably many folks here on the call right now. So in this environment, it's pretty simple because you are the owner of this laptop, so there's one user, and there's going to be one R library. And this is known as your own personal system library.
Now, when it comes to a server environment, it gets a little bit hairier because now you have a single environment, the server, but you have multiple people logging into the server at once. And so every single user that does log in, they will have their own user library. So every single one of these users will have a library that's called a user library. But there's one additional library which can be really powerful and helpful for larger data science teams, and that's known as a shared system library.
So if we take a look at this Packages tab in this environment, I'm running here within Posit Workbench, you can see all these packages listed here, and I actually have two libraries. This first one is my user library. So, again, these are packages that I've logged into Posit Workbench and I've installed. So these are my packages, my user packages in my user library. But when I logged in for the very first time without doing anything else, there were actually some packages already installed for me, and that's known as this shared system library. So this is something our system administrators set up. They installed packages onto the server and made them available to everyone. So, again, this is good if you have larger data science teams, and that way you don't have to install the same package for every single user. You can just share a package across users.
Project libraries and renv
All right, but there's one more library I want to talk about that wasn't confusing enough. And to introduce this library, let's go through a scenario. Let's say you have – we'll start right here at Project 1. You come in Monday morning. Your boss sends you an email with a data set, and they want you to analyze it. And so you decide you're going to create some plots, so you're going to use ggplot2, and we'll just say version 1. You create the document or whatever. They're really beautiful visualizations, and then you send it off for review somewhere.
Some time goes by, and you get a second project thrown at you, and you decide to do the same thing, create some visualizations. But there's now a new version of ggplot2. We'll just call it version 2. So you install the new version. Things are going great. But let's say while you're working on Project 2, Project 1 actually kind of comes back to you. So you get an email saying, oh, we need you to go back to Project 1 and redo some of the plots. And now you're in a bit of a pickle because Project 1, as you remember, it used version 1 of ggplot2.
So what do you do? Well, you could try to kind of revert ggplot2 back to version 1, which is probably easier said than done. Or you can try to bring ggplot2 up to speed with the latest and greatest version, which might work or might break some things. So it's not great for reproducibility. So what would be much nicer is if we could just isolate these two projects so that Project 1 always used version 1 and Project 2 always used version 2. And you can do that with the help of something known as RStudio Projects in combination with a tool known as renv.
So renv is another open source R package that we've developed here at Posit. And it gives you access to a third library or fourth library at this point called a project library. So a little bit about RStudio Projects and renv. What are they? So when you use an RStudio Project, that gives this project, whatever you're working on, its own working directory. And inside of that working directory will be your code, data, results, tables, figures, everything associated with that project, including the packages needed for this project and only this project. And that's through the power of renv.
So it makes every single project its own isolated environment, which makes it projects much easier to share, makes your projects much more reproducible. And it also gives you access to something known as an renv lock file, which we'll briefly talk about here in a second. But if you've never used projects before, let me just quickly introduce you to how to create them. There's actually a few different ways you can do it in the RStudio IDE. I'm just going to show you one method. So if you have an RStudio session open, you can go to File, New Project. And that'll bring open this menu here where you can give your project a name. And this is going to create a new directory on your computer. So give that directory a name, where you want to place that directory on your system. And then importantly, you want to check this box to use renv with this project, because then that's going to give you access to that project library. You hit Create Project. It's going to open you up in a new RStudio session. You should see the name of the project here in the top right corner. And then if you click on the Packages tab, you should now have access to a project library. So again, this library is specific to this project and only this project.
And then within the file directory, you'll see a few additional files. So you'll always see this rproj file within the home directory of your project. And then we have this renv directory and we have this renv lock file. So I believe we talked about lock files with your team before. But just quickly introduce you to lock files. This is the first time you're seeing them. This is an example lock file. And they're very ugly to look at because they're written in JSON format, but they're actually pretty easy to read. You can start right up here at the very top and see what R version was used for this project, so 4.1. What active repositories were used for the project. So where did this user go and get their packages? You can see it's CRAN. And then everything else is just information about the packages in this project. So here we have two listed, Markdown and MIME. You can see their associated versions. So that's something that renv is really good at, is keeping track of what versions of packages you use, where you obtain them.
Okay, so to summarize what we talked about with libraries, because, again, libraries is a bit confusing. Hopefully this flow diagram helps out a little bit. But if you just ask the question, if you install a package, where does it go? So if you start up here at the very top, we install ggplot2. The first question you should ask is, are you using an RStudio project? If yes, are you using renv? And if yes, then the answer is that package is going to go into your project library. But if you're not using renv, or if you're not within an RStudio project, the next question becomes, where are you running R? Are you running it on your local desktop or laptop? In that case, it'll go into your personal system library. Or are you running on a server, like Posit Workbench, in which case it'll go into your personal user library. So those are kind of like the three main libraries to keep track of as you're working through R.
Package dependency conflicts
Okay, so now that we've covered packages, libraries, and repositories, let's talk about the not-so-fun stuff. And that is package dependency conflicts, or basically what happens when things go wrong. So let me run through a scenario with you. I'm going to pick a random date. This is completely arbitrary. We'll say January 1, 2020. You open up R and RStudio, and you install a package. We'll focus on the tbl package, which some of you may be familiar with. So we go and install .packages, the tbl package. It is very common in R, and also in Python, for packages to depend on other packages. So by installing the tbl package, it actually, by default, will install three dependencies, the rlang, cli, and cran packages.
Now, again, on this date, we installed tbl and all those dependencies, so they're all being installed at the exact same time. And there's actually a cool thing that CRAN does. So again, that big repository of R packages. CRAN always tests to ensure that the latest packages always work together. So if you install these packages on the exact same date, all the packages and the versions and dependencies, they should all work nicely together. You shouldn't get any conflicts.
But let's fast forward a month. I'm just taking some arbitrary time. And you go back into that same R environment where you install tbl. And you now install another package, and we'll focus on the packagedown package. So you install packagedown. It turns out packagedown actually has the same three dependencies, rlang, cli, and cran. But because this is being installed a month later, it turns out there's actually some new versions of these various dependencies, which is why they're kind of shaded in this darker gray color. So by installing packagedown on February 1st, it'll automatically update the rlang, cli, and cran packages. But what it will not do is update the tbl package itself. And now you're in a bit of a broken state because you have an old version of tbl that's trying to use new versions of its dependencies, and you may run into some issues where functions aren't behaving as expected.
So how do we get around this? Well, I mentioned before that CRAN tests to ensure that the latest packages always work together. So because of that, it's very common for data science teams to do something what's known as freezing a repository to a set date. So let me explain how this works. So again, we have two dates right here, and we have January 1st, 2020. And let's say that you as the user or your administrator decides to freeze your repository to this date, February 1st. You install tbl, things go great, and then you fast forward to February 1st. But even though it's February 1st, your repository is still frozen to January 1st. So you go to install packagedown in February, but it still installs it as if it's January 1st. So no matter when you installed any subsequent packages, they're all being installed as if it's the same time. So all these packages, again, should work together because that's the added service that CRAN provides.
Package management strategies
Okay, now that we've talked about a bunch of different things, you might have some questions kind of circulating in your head as you start to devise your own package management strategy. You know, things like who is going to be responsible for reproducing the environment? Is that going to be up to the data scientists, the users, or is it going to be up to my system administrators? Also, how open is the environment? Can a user install a new package, or are they going to be locked down to a specific version? Should we be using these shared system libraries or just specific user libraries? And really, you have to weigh the pros and cons specific to your team. It's nice to talk to other teams as well and to kind of see what they're doing, but ultimately it's just to find a solution that kind of meets your team's needs.
Now having said that, we have worked with a bunch of teams here at Posit where we've seen package management strategies succeed and we've seen some strategies not so succeed. And so what you're seeing on this plot right here, we're showing you a few different strategies that we're going to touch upon. On the y-axis, we're looking at package access. So that's asking, can a user install whatever package they want? So is it open, or is it fairly locked down? So they're kind of constrained to specific packages or specific versions. And then on the x-axis, we're looking at who is responsible for reproducing the environment. Is that going to be up to your system administrators, or is it going to be up to the data scientists?
We find teams have the most success somewhere along this diagonal. And we're actually going to talk about these three strategies. We're going to start with snapshot, then we'll talk about shared baseline, and then finally a validated strategy. And then we'll just briefly touch upon some of these other strategies where we see things go a little bit haywire. So let's go ahead and start with snapshot in the top right corner here.
So in a snapshot strategy, users are able to freely access and install whatever package they want, but you must be using RStudio projects and that renv package we talked about before. You have to be using that renv lock file. So in this scenario, users are going to have the full responsibility to record the dependencies needed for a project. So in a typical workflow, you would create a project, you then write some code, install some packages, write more code, install other packages, and you use that renv package to snapshot your environment, to record what packages you're using. That's going to update that lock file, which you can then use to restore your environment if needed.
Now the pros for a snapshot is that it definitely gives users access to pretty much any package and any version they want to use. And the second point right here, I put it as a pro, but depending on your team, you may consider it a con. It's really going to require little to no IT or sysadmin involvement. It's really going to be on the shoulders of the users. And that kind of leads me to my major con here, is that if you're brand new to R and now you're being tasked with keeping track of every package you're using and the versions, that can be a slight barrier for those that are brand new to R. Now it's worth noting that if you are running RStudio locally on your personal laptop, we'd certainly recommend that you use this strategy for all of your projects.
The second strategy we're going to talk about is something known as a shared baseline. And it's just like a side note, this is actually what we use internally here at Posit. In this strategy, all or most of CRAN, so all 19, 20,000 packages of CRAN are made available to every single user via that shared system library. But users are still able to freely access and install any package they want, any version they want, and they can install it into their own project and or user library.
So the reason why we really like this strategy, it's good for both those that are new to R and more advanced users. For those that are new, all these packages are already there available to them. But for the advanced R users, they can still reach out and grab a development version of a package or the latest and greatest. It can also reduce duplicate package installs across users. So I mentioned before that when you use that shared baseline strategy, that prevents every single