Resources

CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-Universe - posit::conf

Presented by Mo Athanasia Mowinckel Say goodbye to installation headaches and hello to a universe of possibilities with R-Universe! Take your R package development to new frontiers by organizing and sharing packages beyond the bounds of CRAN. R-Universe's reliable package-building process strengthens installation and usage instructions, resulting in fewer support requests and an easy installation experience for users. With webpages and an API for exploring packages, R-Universe creates a streamlined and tidy ecosystem for R-package constellations. Also, you can build a custom toolchain for your users, relieving your workload and empowering users to help themselves. Join me to learn how to explore the vastness of R-Universe and expand your package development possibilities! Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Managing packages. Session Code: TALK-1080

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, cool. So I want to talk to you all about how you can take your R packages to new frontiers with R OpenSci's R-Universe.

So a little bit about me. So two years ago, I had a lightning talk at RStudio Global about my first package, which was a huge milestone for me. So I come from psychology. We do some coding to do our stats and our stuff, but there's no formal training. So getting an R package out there, that was a huge deal for me.

And my package is about plotting brains in ggplot. This is so much fun to work on, and it uses simple features for those who know about that. So it's basically a map, but instead of the globe, it's your brain. And it makes me very happy. So there's two parts of this package. It's ggplot2 package with simple features, and there's a second part with plotly and like a three-dimensional swively brain. Very, very fun. IT loves it when we want to put that online.

The challenge of getting on CRAN

So these packages for this package at that point lived only on GitHub, because I did not have the headspace or understanding of how to get it on CRAN, and we want our packages on CRAN. Like I wanted to be an R developer, so to be that, I needed to have a package on CRAN. Like I couldn't just have it on GitHub. And why do we want our packages on CRAN? Now, that's because that's where, like, everyone else is getting their packages from. That's where users expect the packages from. They don't have, like, a mental model necessarily about where the packages are coming from. They're coming from CRAN. You do install packages. They're coming from CRAN.

So I want them there, but, like, getting there... That is work. That is a lot of work. So I had this package. It had this stuff, and it, like, was working on my code, getting it ready for CRAN, and I hit this roadblock. My package is too big. There's a 5 megabyte limit. There's a lot of data behind these plots. And I was struggling. So the first thing I did, I saw I split it up in two, and I no longer have one package to get on CRAN. I have two packages to get on CRAN. Right. Okay. But I did the work, and I got them both on there, and I'm super proud.

Now, a magic thing happened when this package went from living on GitHub to living on CRAN. And that is all the user installation problems just poofed away. Like, before, I would have, like, issues constantly with people, like, it's not installing, it's not installing. And I'm, like, I don't know why. Because I'm not a developer. I don't necessarily understand what's behind this. But CRAN just fixed all of that.

But the thing is... You know how you have different maps? Okay. So the globe, like, the map itself is the same. But you can kind of split the world up in different ways, depending on what you want to look at. By continents, by country, political borders, like, geographical landmarks, whatever. We do the same with the brain. Don't think, like, neuroscientists, we're not, like, the brain is one thing, no, it's so many features. And everyone works on, like, their own versions of all of these features. There's so many of them.

I have, like... I think we have about 30 requests right now for new what we call atlases. And we have 20 already. And all of this all of these are over 5 megabytes. They will never go on CRAN. I cannot get them there unless I completely figure out how to get away from the size of them. And all of these have issues with installation. And it's frustrating for me and it's frustrating for my users, because I have no idea how to help them. I don't have an overview of what is making this installation not working.

Understanding binaries

So what the hell does CRAN do that makes this work? I just dived into it. I need to figure out why CRAN installations work. And I got to learn about binaries. Now binaries are a thing I know nothing about. But I'm still gonna try to explain it to you.

So imagine you're, like, these two R users and you want to connect to this you want to get to this star, which is the package you want to go to. So when you're what we're calling compiling from source, you know, once in a while you write install packages, blah, blah. And it goes, oh, there is no binary available. Do you want to compile from source? And you go, yeah, sure, I need this package compiled from source. And it does, like, all of this stuff. And you don't know what it is. And hopefully that's gonna work. And you're gonna be happy and you get your package. So you're, like, following this map. You're here. You're gonna go there. You're following all the steps. You're getting there. Great.

And then another user comes along. Their OS is something else. So they're starting their journey from another place. So they need to do something else. There's another road map for them to get there. And what if you're in uncharted territory? Like, you're on an OS that we don't we don't know how to get you to this place. Compile from source failed. That is what my users are struggling with.

So why? So what are binaries? So binaries circumvent this entire thing. I love sci-fi. So for me, they're like a wormhole. So it's a prebuilt thing. Like, you know where you are. You know where you're going. And you just, like, there's a rift in space time and you just get there. That's what the binary does. Like, it just gets you there. Awesome.

So it's a prebuilt thing. Like, you know where you are. You know where you're going. And you just, like, there's a rift in space time and you just get there. That's what the binary does.

So when you have your package on GitHub, you have no binaries. It's all compiling from source, which makes all of the madness. It's awesome for me. Easy. I just get my code up there. Just get it to pass, like, the most important checks of our command so that it installs. But it can be hard for the user because the binaries are not there. CRAN is awesome because it's easy for the users. Binaries are there. There's vetting. There's checking. Like Joe was just talking about. Like, there's this whole infrastructure around it that makes it work for the users. But it is so much work for me. And I'm not a full time developer. I'm a scientist. So I need something, like, in the middle.

Discovering R-Universe

And like two months after I had that lightning talk, ROpenSci comes with their R-Universe. The R-Universe is a piece of software magic that you cannot believe. So Jeroen Ooms, who is the one who is behind this at ROpenSci, he's a magician, this stuff is so good. So it makes, like, its own little CRAN-like server. But without the vetting and stuff. But it works just like CRAN. There's no curation. You design your own. You put the packages you want in there. You get your own universe. And it's kind of organized through, like, user or organizations through, like, GitHub, GitLab, Bitbucket, stuff like that. It's quite easy to connect to.

So let's have a look. So this is my GTSEG R-Universe with some of the packages that are in this universe. And there's a couple of things we want to look at. Number one, this is the main package. You see the little green ribbon? That's because it's on CRAN. So you get this nice little thing. If your package is on CRAN, our universe will detect it and it gives you this lovely ribbon and you can go, yay.

But the really exciting part is this part over here. Which is telling me that for R version 4.3 and R version 4.2, I have binaries for both Windows and Mac. And this is the most magic thing I know. Because the binaries are there, you don't have to think about them anymore. I don't need to understand what they are other than the wormhole. And it is just working. And once this came into place, I don't have installation issues anymore. Like, it's just working. It is magic.

And you know when you have your package on GitHub, you have to install either the dev tools or the remote package. And then you have to use the special install function to get the blah, blah. Notice this piece of code. This is all you need to install from this universe. You just have to plug in that I want you to look in this universe for packages that I'm looking for. And in this case, I'm asking to install the main ggsec package. It will look for it first in the ggsec universe. It will find it. And then for all the dependencies, it's gonna jump to the next because I don't have those in my universe. There's way too many. It's just gonna jump to the next, which is a CRAN mirror. Super simple. I just have that in my readme now. Don't have to think about it.

Why use R-Universe?

So why would you want to have your package on the R universe? You might, like me, have all of these packages that are huge. And even if they weren't huge, I'm not completely convinced that the CRAN team would be super happy if I start flooding it with, like, brain atlas packages. It's extremely niche. And there's a lot of them. And they keep, like... It's just, like, exploding.

You might have learner tutorials and you're giving it to your students and you need them to run stuff and they're not installing. Again, binaries, they're gonna install easily. You might have dev versions that you also want to distribute with binaries. Because again, it's gonna install easily. Or you might want to collect packages for your company or your lab. So you have, like, a common ground of, like, where your packages live. And it doesn't have to be packages you have made. You can put other packages into the R universe from other people if you want to lock them down or, like, I had a...

So one of these packages that I'm gonna talk to you about soon depended on another neuroscience R package that had a bug at some point. And I could go back to my R universe and say, I want you to depend on the previous version of this package. Because that makes my universe work again until they fix the bug. So I could fix it without, like, actually fixing it.

Organizing packages and the API

Now a thing I noticed when I was setting up this universe was that in the beginning, it was in a kind of disorganized state. Because these packages, this whole ecosystem comes from a large European research project I was on called LifeBrain. And we had several packages come out of LifeBrain that were not brain related. They were stats related or something else. And my packages were, like... It was making all these other packages less visible, and it was kind of cluttering it up. And it also... The neuroscience people weren't necessarily interested in these other packages. Some of them who are more, like, social science related. So I realized, like, I should probably switch things up a bit.

So because GitHub is quite easy when you're an academic, because you get free stuff, I just made a new organization specifically for these packages. And I only did that because of our universe. So that I could get, like, one unified universe where all of the packages for this suite live. And because they're now all collected in the same spot, it means that it's also easier for me to start getting contributions, because I just include them into this organization. It's not tied to my, like, Horizon Europe project. It's tied to just this contribution from the community.

And because, like, all the installation issues poofed by this thing, I could use that headspace to start making a package that would enable people to make their own atlases, so that I didn't have to make it for everyone, because I just don't have time for that. So I made this other package that contains tooling to create your own GDSEG atlas. It's a monster of a thing, because it calls a lot of, like, other neuroscience stuff, and it's, like, Jesus. But it's there, and it works for me, and it has worked for a couple of other people. And I'm excited to see that it is being used.

So the R universe has a lot of extra stuff going on as well. So in this, like, extra package that I have, it's first iteration also contained a dataset, a table in this case, about where all of the atlases that were compatible with this suite lived. With the idea that they would probably not be in the same spot, because R universe didn't exist when I started on this. I had started on this before that. And we would need to, like, manually update this every time, like, a new atlas came in. And that is, like, a bit of work. I'm, like, I prefer things to be a bit more automatic than that. A good friend of mine once said I became a developer because I'm lazy. I think that's kind of true for me, too. So anything that can just happen automatically, I'm very, very fond of.

And through the R universe, I can do that. So now instead of a table, I just have a bit of code and a function that just looks that calls the ROpenSci API and asks, hey, can you list all the packages in this universe? And it does. And I just make it into a table. That's it. That is it. I'm so happy. You have no idea. It's such a powerful piece. And, like, I'm not going to cover everything that it does. It does so many things. It's the API. It will render your documentation. It will give you, like, pages to browse through. And I know there's a lot of things going on in the pipeline.

So while we do love CRAN for its consistency, for the fact that it's something we can connect to, it is safe, it is there, it does a lot of magic for us, we can all expand our horizons by a lot. By using the R universe. Because it is such a powerful piece of tool. And I thoroughly, thoroughly believe if you have a package on GitHub, you should have a package in ROpenSci's R universe. Thanks.

And I thoroughly, thoroughly believe if you have a package on GitHub, you should have a package in ROpenSci's R universe.

Q&A

Are there any advantages installing the package from R universe versus CRAN if it's available on both?

So in my case, like, advantages and disadvantages, it kind of depends on the person that has set up the R universe. So in my case, if you're getting it... So ggsec, which is both on CRAN and on RU's universe, if you're getting it from the R universe, you're getting the development version. If you're getting it from CRAN, you're getting the release version. And I know several people who do it that way. It's not necessarily the way it's set up, but it might be. So the version on CRAN might be more stable to use, in a sense. So I think that's more of a preference. And probably, like, ask the developer which version is the preferred one.

And then just real quick, have you received any pushback from your users who were like, no, I only want it from CRAN? Or are they just happy from wherever it comes from?

Oh, no. I'm happy that it exists. No, like, so our neuroscience is like the hugest community in the world. So if something installs, they're just going to be happy. So if it's on R universe, like CRAN universe, GitHub, doesn't matter. If it installs and they can use it, they're happy. And that's an amazing place to be.