{slushy}: A Bridge to the Future - posit::conf(2023)

Presented by Becca Krouse Scaling the use of R can present complications for environment management, especially in regulated industries with a focus on traceability. One solution is controlled (aka "frozen") environments, which are carefully curated and tested by tech teams. However, the speed of R development means the environments quickly become outdated and users are unable to benefit from the latest advances. Enter {slushy}: a team-friendly tool powered by {renv} and Posit Package Manager. Users can quickly mimic a controlled environment, with the easy ability to time travel between snapshot dates. Attendees will learn how {slushy} bolstered our R adoption efforts, and how this strategy enables tech teams and users to work in parallel towards a common future. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Managing packages. Session Code: TALK-1078

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody. My name is Becca Kraus. I'm happy to be here. I work at GSK in our Statistics and Data Science Innovation Hub, where my team works on tools and solutions to support our users across our biostatistics department. And today I'll be talking about one of these tools that we created called the Slushy R package, which helps our teams build a bridge to the future. So we're no longer at RStudio Conf anymore. We're at posit Conf. So now more than ever, we have a wide variety of folks in the rooms with all sorts of different preferences when it comes to software and different experiences. But if there's something we all know and we all can agree on here, it's that penguins are really cute. And they're not just adorable little creatures, but they're really, really sophisticated, because they live in these very, very cold, windy conditions, right?

And so their sophistication comes from the fact that they work together to stay warm. And so they do this technique called Penguin Huddle, where they group in really, really close together, and it helps them generate heat and conserve energy. And they don't just stand still, they kind of move around in this coordinated fashion so that everyone has the opportunity to stay nice and cozy in there. So much like these penguins depend on having a nice, cozy, warm huddle that they can rely on, those of us using open source software depend on having safe places to do our work, right? So in regulated industries, this particularly rings true. So I come from pharma, that's one of several regulated industries, and we put a lot of care and attention into making sure we have stability and consistency when it comes to the software that we're using.

Frozen environments

So like I said, a lot of care and attention is put into making sure this happens. A common solution for ensuring this is by creating a lockdown centralized environment where people can come and do their work. These types of environments are sometimes nicknamed frozen environments because they are, they do not change, they cannot change. A user cannot modify them, and they are snapshot to a single point in time. The contents of these environments in terms of the R packages that get installed are carefully vetted and assessed and tested to make sure that they are as accurate, reliable, stable as possible.

If any of you attended the previous session, there was a pharma track in this room just a little bit ago. My colleague Ben talked about how we're growing the use of R at GSK. We have a lot of new teams, teams that are new to R, and these environments have this added benefit of being incredibly user friendly. They can start learning R without needing to, you know, learn about installing packages and managing versions and all sorts of things. It's just available for them. They can log in and hit the ground running.

Now, with the increased use of R, we start to see a wider variety of use cases. Different teams have different timelines. They have different needs. On top of that, we have people that are really excited to use new features. And the thing about the frozen environment is it doesn't change, right? The rest of the open source world moves along very, very quickly. And so, these environments get out of date quickly. So, how do we support, you know, it starts to look a little bit like, you know, when we're trying to support teams and figure out things for them, it starts to feel a little bit like my toddler and this toy here jamming it in. So, how do we support our teams that where the provided solution that's one size fits all is not quite the right fit for them? You know, what do we tell them to do?

The open R world and renv

Well, we have another solution that is accessible to all of us, which is the open R world, right? Where you install R and you can use whatever packages you like. So, if teams opt for this solution, they can explore wherever they want. They can make changes and access new things and they make the rules. But within reason, right? Because these teams are still working on a collaborative project where there's more than one person and they're coming up against deadlines a lot of the times. So, they generally need to try to stay in the same area. So, luckily for us, we have an incredible tool at our fingertips developed by Posit, which is the renv package.

So, renv is a great way for people to begin to huddle up and get in the same area in terms of what types of software they're using. So, that they can begin to have something that's more reproducible. Our teams, they need a little bit more guidance when it comes to, you know, where are they forming their huddle? What are they forming their huddle around? How closely together do they need to be? What if they need to go off and get food and come back later, like these penguins do? So, this is why we created Slushy.

Introducing Slushy

Slushy is an R package that is designed to support our teams that want to use an open R environment and need to figure out how to huddle up effectively and efficiently. Our Slushy, we originally created it internally and we decided to open source the code to share it with the community. So, I'll share the GitHub link at the end. But Slushy makes use of a couple of really awesome existing tools. I mentioned renv already. And it also makes use of another tool developed by Posit, which is Posit Package Manager. On top of these two things, it gets us, or these two things alone, get our teams a long way.

But they still need some additional supports. And so, in addition to these two things, we have some, we're adding some additional functionality to help guide our users through their journey, so it's nice and smooth. We're also putting in some guardrails for them, so they can stay on track better and make sure everything's nice and easy for them as they're adapting to this ever-changing open source landscape, right?

So, we had a few goals when it came to building Slushy. And one of those was to help people find a home where they could begin to feel warm and cozy and have the confidence to hit their ground running with building their, doing their coding just like they do in a frozen environment.

How Slushy works

So, first and foremost, the renv package is absolutely fundamental in helping our users begin to form that huddle, begin to form that home base. If you're not super familiar with renv, what it does is, well, it does a lot of awesome things, but one of its key features is helping everyone on the same team use the same packages. And not just the same packages, but the same versions of those packages, so that code that I write will work for you and vice versa.

Now that we have a way to begin to huddle up, we need to help our teams figure out what to huddle around. They're looking to us to give them guidance about, you know, what packages do I use? So, if we think back to our frozen environments, those centralized environments and all of that vetting that happens to decide what packages go in there, that vetting results in a subset of packages that are deemed to be accurate and reliable. If that decision has been made on an organizational level or department level, we can leverage that and get those packages into those, in to be used by that team, and that is their agreed upon set of packages. They can also do that legwork themselves. In either case, a vetting step will lead to this subset of packages that the team aligns on, and it will help to kick off that huddle.

Also, through that vetting process, we can be sure that each one of these packages is going to work well individually. But with all of the moving parts in open source, we want to also be sure that all these packages will be compatible with one another. In our case, most of our packages are coming from CRAN. So, when a package gets released to CRAN, all of its checks and tests get run automatically, and because everything on CRAN is so interconnected, there's also, as a byproduct, any other package that's connected into that package some way is checked for consistency and compatibility. So, for, you know, everything that's up to date on CRAN as of today is compatible together.

Posit Package Manager is a way for us to access different points in time through what's called snapshots. These CRAN snapshots are dated, as you can see from the calendar, and we can hook into any one of these snapshots and get access to the CRAN ecosystem and have confidence that everything is going to work well together. Something else I want to point out here is, with a lot of our newer R users, this is a big boost in their learning journey because thinking about all the individual packages and their different versions is a little bit overwhelming, right? And so, when we shift to the snapshot mindset, it really simplifies things for them, and all they have to think about is a date, and communicating together is a lot easier around that.

it really simplifies things for them, and all they have to think about is a date, and communicating together is a lot easier around that.

So, with these pieces, we have a way to huddle up. We know what we're huddling around, and we can be sure that, or our teams can be sure that everything they're using is going to work well together and individually. And we can get that. So, through Slushy, we have functionality to just get that huddle into place for the team through the Slushy function, which calls our under the hood.

Now, we're still, our teams are still working in the open R environment as part of this paradigm, right? So, we're not blocking access to things. They still have access to really anything they want out there. And it can be easy and tempting for people to try new things and veer off course a little bit. These teams using Slushy really want to keep everyone nice and tight, and they really want to keep everyone nice and tight together, and not veer off course, right? We've agreed upon a set of packages. We want to stick to those. So, it was important for us to, as part of Slushy, to get some functionality in place to make sure that we are constantly monitoring what the teams are using. And so, renv does a great job at this as well, and we're just kind of augmenting it to make sure that people are sticking to the packages they intended to use, as well as the snapshot that they intended to use. So, it really makes sure that no one is lost along the way.

Dynamic huddles: adding packages and time travel

But sometimes, teams need to move around a little bit. You know, after all, huddles are dynamic. They are responsive to changes in the environment. And projects can be long, and people can change their minds about things. So, one example of somewhere people might want to change things a little bit is through the agreed-upon set of packages. So, let's say a team a couple months down the line finds a really great package. They've vetted it. They feel really good about using it. Through Slushy, we've given them the ability to make sure that that package can get added in to their environment through this function here, Slushy add. And by adding it in officially, it makes sure that that package is okay to use. It's no longer going to get flagged by that previous monitoring step, and they're able to expand things a little bit.

Another place people might want to move is through time. They commonly do. They've started at a snapshot, and, you know, a month later, two months later, things are moving ahead. They want access to new features, understandably. So, we want to give them the ability, or we've given them the ability, to shift things up ahead. In some cases, they might need to also go backwards if something gets deprecated, or maybe it's a little too risky to make that leap, and they want to go back. Either way, we've made it just as easy to go forward as it is back through the Slushy update function. People can optionally pass a date. So, you know, getting from snapshot to snapshot very specifically, you would pass a date to this function. Otherwise, it's just going to bring everything up to date. This function also makes sure that everything in that library is going to the new snapshot, and no dependencies get left behind or anything like that, or left ahead if they're going backwards.

So, making the leap from snapshot to snapshot can be a little bit intimidating. Sometimes it's risky. People might be on really tight timelines. So, we want to give them a peek ahead here. So, what we did with Slushy is add in this preview function that they can call before they actually update all their packages or revert all their packages. So, this feature allows them to get a peek ahead at what is going to change should they make that leap, and they can make plans accordingly. They can begin to do their research and anticipate how their code might be impacted by this change, and this really helps us support them through, you know, as they move around and make that process smoother.

Simplifying the experience for users

So, it's a lot of different pieces, and we think back to our huddles. Penguins don't read a manual on how to huddle. They just know how to do it from the time they're little babies, and we want the same experience for our users. We want them to have success and feel like it's easy to get there, and feel like they can just focus on their jobs, get their programming done, enjoy the process of learning R.

So, one way to simplify things for them is, of course, by creating an R package, and we try to be really intentional about, you know, what is the minimum number of steps that our team needs to make and try to really simplify the functionality to reduce the burden on them so they don't have to think too much about all this stuff.

Another way we try to simplify things is through our training and recommendations about how to use Slushy. So, for instance, instead of allowing or recommending everyone on the team can adjust the environment, we assign that to one or two individuals who are responsible for initializing and maintaining the environment, and everyone else just follows suit, and they don't have to think too much. They just have their library in place, and they can start using it. This decreases the amount of, you know, the learning curve, of course, and the burden on the teams, but it also is going to decrease the risk that that environment might change in some way by somebody who's just trying something out.

Another way we've tried to simplify is by creating a config file that kind of wraps up all of our settings, and Slushy will read that automatically. So, what this allows, by putting a bunch of settings in a config file, an organization or a team can make a standard config file that they can apply to all of their projects. So, this, you know, taking all of these settings out of Slushy, the teams don't need to be concerned about it. We've done that for them, and through this config file, people can, or we can customize things like the package manager URL, whether or not they should be using a cache, whether there's a cache available, that really makes it faster for the teams to get all of their packages in place, since everyone is installing the same things one after another. They can do all sorts of renv customizations in terms of those settings, special code that should be run before and after Slushy is initialized at the start of each session, and perhaps most importantly is the list of approved packages that should be used from project to project, or the list of agreed upon packages.

So, again, teams don't need to ask that question of what packages should I use? We can just kind of get that information to them, or they can make that source of truth themselves and have that standardized.

Supporting teams looking to the future

So, all of this, Slushy package has really been great to give a predefined kind of off-the-shelf huddle for our teams to take and use and make their own, which, again, is really excellent, especially for our newer teams. And another place Slushy has been a big support is for our teams who are looking way out into the future for their deadlines. So, again, back to the frozen environments, they take a lot of effort and time and resource to create because of all the care that's put into them. So, the cadence at which they're released is spread out a little bit.

So, a team that's looking way out into the future has the opportunity to use a future frozen environment when it's available instead of kind of settling for something older right now. And this will allow them to have some access to the newer features. So, what they can do in the meantime is set up Slushy, work with the team using or creating the frozen environment to understand, you know, what packages are expected to be in there, get those into place for themselves, and update along the way. And they can fix on the snapshot, and it's kind of smooth sailing to the end.

This is not just great for our study teams who are empowered to do this on their own. It's great for our tech teams who can focus on the task at hand, and both teams can move along in parallel towards a common future. So, with that, Slushy has allowed us to build spaces where our users can thrive, just like these penguins and their huddle. If you want to learn more about Slushy, I'm happy to talk at the break where you can visit our GitHub site. So, thank you.

Featured software#