CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-Universe

Transcript#

This transcript was generated automatically and may contain errors.

Okay, cool. So I want to talk to you all about how you can take your R packages to new frontiers with R OpenSci's R-Universe.

So a little bit about me. So two years ago, I had a lightning talk at RStudio Global about my first package, which was a huge milestone for me. So I come from psychology. We do some coding to do our stats and our stuff, but there's no formal training. So getting an R package out there, that was a huge deal for me.

And my package is about plotting brains in ggplot. This is so much fun to work on, and it uses simple features for those who know about that. So it's basically a map, but instead of the globe, it's your brain. And it makes me very happy. So there's two parts of this package. It's ggplot2 package with simple features, and there's a second part with plotly and like a three-dimensional swively brain. Very, very fun. IT loves it when we want to put that online.

The challenge of getting on CRAN

So these packages for this package at that point lived only on GitHub, because I did not have the headspace or understanding of how to get it on CRAN, and we want our packages on CRAN. Like I wanted to be an R developer, so to be that, I needed to have a package on CRAN. Like I couldn't just have it on GitHub. And why do we want our packages on CRAN? Now, that's because that's where, like, everyone else is getting their packages from. That's where users expect the packages from. They don't have, like, a mental model necessarily about where the packages are coming from. They're coming from CRAN. You do install packages. They're coming from CRAN.

So I want them there, but, like, getting there... That is work. That is a lot of work. So I had this package. It had this stuff, and it, like, was working on my code, getting it ready for CRAN, and I hit this roadblock. My package is too big. There's a 5 megabyte limit. There's a lot of data behind these plots. And I was struggling. So the first thing I did, I saw I split it up in two, and I no longer have one package to get on CRAN. I have two packages to get on CRAN. Right. Okay. But I did the work, and I got them both on there, and I'm super proud.

Now, a magic thing happened when this package went from living on GitHub to living on CRAN. And that is all the user installation problems just poofed away. Like, before, I would have, like, issues constantly with people, like, it's not installing, it's not installing. And I'm, like, I don't know why. Because I'm not a developer. I don't necessarily understand what's behind this. But CRAN just fixed all of that.

But the thing is... You know how you have different maps? Okay. So the globe, like, the map itself is the same. But you can kind of split the world up in different ways, depending on what you want to look at. By continents, by country, political borders, like, geographical landmarks, whatever. We do the same with the brain. Don't think, like, neuroscientists, we're not, like, the brain is one thing, no, it's so many features. And everyone works on, like, their own versions of all of these features. There's so many of them.

I have, like... I think we have about 30 requests right now for new what we call atlases. And we have 20 already. And all of this all of these are over 5 megabytes. They will never go on CRAN. I cannot get them there unless I completely figure out how to get away from the size of them. And all of these have issues with installation. And it's frustrating for me and it's frustrating for my users, because I have no idea how to help them. I don't have an overview of what is making this installation not working.

Understanding binaries

So what the hell does CRAN do that makes this work? I just dived into it. I need to figure out why CRAN installations work. And I got to learn about binaries. Now binaries are a thing I know nothing about. But I'm still gonna try to explain it to you.

So imagine you're, like, these two R users and you want to connect to this you want to get to this star, which is the package you want to go to. So when you're what we're calling compiling from source, you know, once in a while you write install packages, blah, blah. And it goes, oh, there is no binary available. Do you want to compile from source? And you go, yeah, sure, I need this package compiled from source. And it does, like, all of this stuff. And you don't know what it is. And hopefully that's gonna work. And you're gonna be happy and you get your package. So you're, like, following this map. You're here. You're gonna go there. You're following all the steps. You're getting there. Great.

And then another user comes along. Their OS is something else. So they're starting their journey from another place. So they need to do something else. There's another road map for them to get there. And what if you're in uncharted territory? Like, you're on an OS that we don't we don't know how to get you to this place. Compile from source failed. That is what my users are struggling with.

So why? So what are binaries? So binaries circumvent this entire thing. I love sci-fi. So for me, they're like a wormhole. So it's a prebuilt thing. Like, you know where you are. You know where you're going. And you just, like, there's a rift in space time and you just get there. That's what the binary does. Like, it just gets you there. Awesome.

So it's a prebuilt thing. Like, you know where you are. You know where you're going. And you just, like, there's a rift in space time and you just get there. That's what the binary does.

So when you have your package on GitHub, you have no binaries. It's all compiling from source, which makes all of the madness. It's awesome for me. Easy. I just get my code up there. Just get it to pass, like, the most important checks of our command so that it installs. But it can be hard for the user because the binaries are not there. CRAN is awesome because it's easy for the users. Binaries are there. There's vetting. There's checking. Like Joe was just talking about. Like, there's this whole infrastructure around it that makes it work for the users. But it is so much work for me. And I'm not a full time developer. I'm a scientist. So I need something, like, in the middle.

And I thoroughly, thoroughly believe if you have a package on GitHub, you should have a package in ROpenSci's R universe.

Q&A

Are there any advantages installing the package from R universe versus CRAN if it's available on both?

So in my case, like, advantages and disadvantages, it kind of depends on the person that has set up the R universe. So in my case, if you're getting it... So ggsec, which is both on CRAN and on RU's universe, if you're getting it from the R universe, you're getting the development version. If you're getting it from CRAN, you're getting the release version. And I know several people who do it that way. It's not necessarily the way it's set up, but it might be. So the version on CRAN might be more stable to use, in a sense. So I think that's more of a preference. And probably, like, ask the developer which version is the preferred one.

And then just real quick, have you received any pushback from your users who were like, no, I only want it from CRAN? Or are they just happy from wherever it comes from?

Oh, no. I'm happy that it exists. No, like, so our neuroscience is like the hugest community in the world. So if something installs, they're just going to be happy. So if it's on R universe, like CRAN universe, GitHub, doesn't matter. If it installs and they can use it, they're happy. And that's an amazing place to be.

CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-Universe - posit::conf

Transcript#

The challenge of getting on CRAN

Understanding binaries

Discovering R-Universe

Why use R-Universe?

Organizing packages and the API

Q&A