
R-multiverse: a new way to publish R packages (Will Landau, Eli Lilly) | posit::conf(2025)
R-multiverse: a new way to publish R packages Speaker(s): Will Landau Abstract: R-multiverse is a new dual repository for R packages, based on infrastructure from R-universe and GitHub. We would like to invite the developer community to contribute packages. With R-multiverse, users have a central place for installing packages. Automated quarterly production snapshots enforce quality. Package maintainers retain most of the freedom and flexibility of self-publishing. Maintainers directly control package releases through GitHub or GitLab. R-multiverse originated from the R Consortium Repositories Working Group. It has transparent governance, and it operates in a collaborative and open way. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
We are all in great company here. I count myself as a package developer as well. And we write an R package, we want, chances are, we want to share it with as many people as will find it useful. We want it to see the light of day.
And that brings us to the dreaded ordeal of publishing. And that's something that we have in common with authors of books, of journal articles, and of other kinds of media that are published in general. We have the standard practices generally to go through a central publisher, like CRC Press for books or Journal of the American Statistical Association for journal articles. And for us, we have repositories like CRAN, Bioconductor, and ROpenSci.
This is generally the best way to help our work see the light of day, to make sure people use it, to convince people to trust what we put out there. But it comes with a challenge.
The challenge of central publishing
Inherent to the model of central publishing, whether it's a book or article or package, is editorial review. And for sure, these review experiences can be extremely rewarding and edifying. But they also come with a lot of frustration and a long wait time to wait for feedback. And then there's ingesting that feedback, coming up with revisions, and you might even get rejected. Even if you don't get rejected, you might have to do something to your package that you find difficult or maybe not even agree with.
And so, a lot of folks turn to self-publishing, which is possible, again, for books and journal articles and packages. We all have ways of doing this. And the reasons we might want to do this is that, as package developers, we control publishing. No one has to tell us what to do. The freedom is really nice, but it comes at a cost. Again, this other extreme of the trade-off, because we have a hard time convincing users to trust our work.
If on a self-published repository, as a user, I might express more skepticism of something that's self-published rather than something that's formally vetted in a central repository. And also, your work is harder to find if you self-publish. And speaking of obscurity, let's talk a little bit about siloing. If I'm a user and I install a package from GitHub or even from our universe, not only do I need to know the name of each package I want to install, I need to know every package's individual repository. And for a user with hundreds of downloaded installed packages on my system, I don't want to have to do that for every single one.
Introducing R-multiverse
Which brings me to R Multiverse, which is a new dual repository built on top of the R Universe project. And what it aims to do is provide a centralized place for self-published releases, and on top of that, automated checks and quality control for production scenarios. What we're trying to do is meet this middle ground to try to combine the best of both worlds of self-publishing and centralized publishing with review.
What we're trying to do is meet this middle ground to try to combine the best of both worlds of self-publishing and centralized publishing with review.
Now, let's get into how this works. With R Multiverse, it's very new. It's a dual repository, so it's not just one repository, it's two. And our first repository is this community repository. It's just plain and simple, an R Universe with all the latest releases of registered packages. And on top of that, we have a production repository with quarterly snapshots of the healthy releases from the community repository.
So my goal for the rest of this talk is to help you understand each of these repositories, what they mean for users, and what they mean for developers who want to participate and contribute packages.
The community repository
So a bit about this community repository. So it's, like I said, plain and simple, an R Universe, just like any other R Universe. It's got a nice dashboard with package documentation, metrics, and test results, and it's transparently available. We have a nice custom URL for it, both for the dashboard and for downloading packages. And the latest releases come directly from the individual source, GitHub and GitLab repositories, just like any other universe does on the R Universe platform.
And the implications of this, because this is a single universe that is offered to the entire community to participate in. Usually, individual universes are maintained by individual users, individual labs, or individual developers, I should say, who want to publish their own work. This is something that we all share, and it's also focused on releases. What that means is it's a middle ground between development on the one hand and production on the other.
There's this term QA, or quality assurance, in industry and software development. And to my knowledge, we've never had this for R before, but that's exactly what this community repository aims to do. It's this middle ground. It's releases that the developers endorse for production, but are not necessarily ready for production because we don't enforce tests or guarantees of compatibility at this level precisely. That's for production later on.
So, that's what the community repository is. You can imagine that you would want to install, maybe if I'm a user, I would install a small number of packages from the community universe where I want just the latest and greatest from a small number of packages that I'm familiar with, that I trust, and developers can just very quickly and easily get releases there very fast.
Registering a package
Now, how do you, as a package developer, get your package into the community universe and registered with R multiverse in general? So, we keep our submission requirements very minimal at this level. So, we want to make this experience as close to self-publishing as we can. And so, what really this boils down to is the artifact that you register has to be an R package with at least one release on GitHub or GitLab. It has to be devoid of malicious activity, has to be, you know, in good faith and not have malware, et cetera, et cetera. And it's got to have a free and open-source license and the most obvious standard place.
So, we, as the administrators of R multiverse, have permission to distribute the code. And so, all we're asking is that it's an R package and it's legal and safe. We do have policies, and there's maybe a bit more to than that, but we generally take after the GitHub terms of service and acceptable use policies. So, very straightforward and, you know, for well-meaning authors to abide by this.
And that is all we ask for registered packages in community. So, the registration procedure, again, is simple and it's open, it's open-source. We have this contributions repository with a collection of registered listings of packages. And these are just text files where it's one text file per package and the text file usually has the URL of the source code repository of the package. And all you'll do is you'll go into the web interface, you will submit a pull request to add a new text file where the title of the file is your package and the contents is just one line with the source code URL.
And then what happens in the pull request is there is a bot that runs about every hour and it scans the pull requests and it just checks, is this a package? Does it have a license? Are there any conflicts with an existing package with the same name on CRAN, et cetera, et cetera? And if those very, very light checks pass and there's a high degree of trust, let's say the maintainer is a member of a trusted GitHub organization, then the bot can automatically merge and register the package. In a lot of cases, you know, if there's some finding with a check, it doesn't necessarily mean a rejection, it just means a manual review is required.
And again, according to these very light submission requirements. Downstream of that, another automated process compiles all these listings and automatically generates an R universe package manifest with, again, the package URLs and it sets the branch field equal to star release. And what that means is instead of pulling the latest commit, we're pulling the latest release off of GitHub and GitLab.
The implication of that is that new releases, if you want to update a registered package that already exists in our multiverse, you do not have to go through human. You can just straightforwardly create a new release on GitHub. This is what a lot of us as package developers do anyway when we create a release of a package and we want to submit it to other standard repositories. It's great practice to create a release on GitHub and GitLab anyway. And this is all that it takes to update a new version in the community repository. And then your package will be straightforwardly available to install.
No matter what package is in community, again, for users, it's this one unified URL that users can install from with install.packages. It's very simple and easy. The dependencies might come from other repos like CRAN, and so this getOptionsRepos is useful to make sure the dependencies are also installed if needed.
The production repository
So that's the community repository. The production repository has in mind that we use R for really tightly regulated, tightly controlled situations and things that are not just exploratory analyses. We run R for high stakes analyses that really need to be correct and based on software that we trust. And that's what production is aimed at. And our approach is a bit like Debian.
Jeroen Ooms came up with the spark for this idea, borrowing from Debian's philosophy of gradually building snapshots at intervals. And we're optimistic that this way of thinking will turn out to not only provide the same quality for users, but also provide the same convenience for developers.
So how this works is we take quarterly snapshots of the community universe, and we only include healthy releases with healthy dependencies from that community universe. And we gradually build each snapshot from the ground up, gradually to give developers enough time to make the adjustments they need leading up to a snapshot, and also to avoid last minute volatility and panic.
What we require from production, it's a bit more. All the requirements of community apply, plus we enforce warnings and errors in R command check for Mac, Linux, and Windows. For a particular version of R, we require that the version number of the current release is higher than the others, so this is a benefit for users. And we also require that the dependencies of this package in our multiverse also pass the same checks.
Our universe actually has this really nice feature that as soon as the checks of a package are finished, all the downstream packages of degree one that depend on it in the same universe are also rechecked. And what our multiverse does is it waits for those checks to complete, and effectively what that means is we have the same reverse dependency check guarantees that we're used to and that we rely on, but coming at it from a different operating model.
Now, our operating model is kind of looks like this in terms of where do we want to go with a snapshot and what does it look like? We want a collection of releases from our multiverse set in stone, built on a particular version of base R, and a snapshot of packages from CRAN also set in stone.
That's where we want to arrive at, and what we end up starting with is not a pyramid of stone, more like a pyramid of water, where any package could change at any time. That's where we start, and we gradually freeze this ecosystem from the ground up, and we start by fixing a version of base R and a snapshot of CRAN packages from Posit Public Package Manager. And we say, those aren't gonna change for the entirety of the snapshot, and we give an entire month for developers to adjust their packages and checks.
And after that, we start to freeze in the next month the packages that already work in multiverse. We freeze the healthy packages, and we give time for bugs to be fixed. So, and this is a gradual, nice, soft landing and soft freeze that allows the changes that enforce quality to happen, but we provide stability along the way.
And what we end up with is maybe not all the packages from community, but certainly a lot of them. And it's this nice model that guarantees this compatibility of the packages and their dependencies in this ecosystem.
And for user installing from production, you would supply a dated snapshot URL, and the Posit Public Package Manager snapshot from two months prior.
Summary and call to action
And so to recap, so multiverse tries to centralize these releases and enforce quality for users, but also puts power and control in the hands of the maintainers to make it as much like self-publishing as we can.
So multiverse tries to centralize these releases and enforce quality for users, but also puts power and control in the hands of the maintainers to make it as much like self-publishing as we can.
And why I'm talking with all of you is this is the debut of multiverse. We are ready to scale, and we are inviting package contributions from all of you. And we also wanna scale up the moderator team to make sure that this reviewing experience is fast and simple. So if you'd like to participate as a moderator as well, please come talk to me.
I maintain multiverse with three other folks listed here, and we couldn't have done this without the R Consortium or R OpenSci, or R Universe. Thanks very much, and I will be happy to take questions.
Q&A
So we check against the R Consortium advisory database which exists to report security findings. We check new submissions against that database and we enforce that again in production. I didn't list that check in the slides because it's not new or in the sense it's not unique to production, but we do enforce that.
Okay, so injecting maybe a package with a similar name, but enough like an existing package, but that might do something nefarious. The advisory databases as well, we rely on that stuff. We rely on the community report to report those findings.
We also have, for most package submissions, we have a manual review process. And in terms of automatically accepting packages, we take a people first approach to safety rather than a package first. So there's a short list of trusted organizations like R OpenSci, R Consortium, and we're open to growing that list. But also, if a developer isn't part of one of those trusted organizations, then we always do a manual review of new packages. And we've thought about automated checks on, let's say, the fuzzy matching of the name and stuff to account for that kind of spoofing, but haven't implemented that yet.
