Resources

Heather Turner - Contributing to the R Project

Posit provides an amazing set of products to support data science, and we will learn about many great packages and approaches from both Posit and the wider community at posit::conf(2024). But underlying it all are a number of open source tools, notably R and Python. How can we contribute to sustaining these open source projects, so that we can continue to use and build on them? In this talk I will address this question in the context of the R project. I will give an overview of the ways we can contribute as individuals or companies/organizations, both financially and in kind. Together we can build a more sustainable future for R! Talk by Heather Turner Slides: https://hturner.github.io/positconf2024/ R Contributor Site: https://contributor.r-project.org/

Oct 31, 2024
16 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Yeah, as users and developers of open source software, we get to stand on the shoulders of giants. That means we can build on the work of others to build something really epic, maybe like this human pyramid on water skis. But there's a danger that the picture can look a little bit different, maybe like this famous XKCD cartoon, where we have this towering jumble of blocks that is balanced precariously on one small piece that represents a project some random person in Nebraska has been thanklessly maintaining since 2003. I'm sure some of you, many of you have seen this before. And large open source projects like R or Python tend to fall somewhere between these two pictures. And I'm going to be focusing in this talk on R.

So if we think about the R Project, we have a small team of core developers that maintain the code base that is distributed as R. So this is about 20 people, and only one of them is working full time on R development, most squeezing this into or alongside their regular jobs as academics or data scientists. Beyond that, we have a slightly larger group of external contributors that also contribute to the code base, but on a more ad hoc basis. And beyond that, we have the massive community of R users and developers that are using and building on base R, but not necessarily contributing back to the core project. And of course, for R to be sustainable, we do need people to contribute back in some way to the core project.

And of course, for R to be sustainable, we do need people to contribute back in some way to the core project.

Why contribute to R?

And I'm going to be talking about different ways you can do that. So why should you care? Well, hopefully you agree with me that R is special, right? It's great. It has the state of the art statistical methods. You can use it for your full data pipeline. It's a user friendly language, and it has a great community. And many of you use it. Maybe you don't necessarily realize that you're using base R, but if you're using Tidyverse or Shiny, of course, that's all relying on the core code base. And you've invested time and money, and you don't want that time and money that you invested in skills and the development of code to go to waste.

And finally, R needs your help, like any open source project. Just as an indicator, it has over 600 open bugs on its bug tracker. I also sort of slightly alluded to this demographic cliff that we're facing. So R is a 30 year old project. Many of the core developers are near or even beyond the end of their careers. You know, they've retired now, officially. And it's also acknowledged that there's a lack of diversity. So it'd be great to work on this at the same time as we're thinking about bringing new people in.

And what have I got to do with it? Well, I'm a similar vintage to Colin. I started using R in 2001, in my PhD. And I've worked in academia, in pharma, as a freelancer. And in all these roles, R has been fundamental to my work. A few years ago, I went back to academia to take up a research software engineering fellowship, where I'm focusing on this issue of sustainability and diversity and inclusion in the R projects. And in relation to the R projects, I made my first contribution to base R in 2004, my first real job after the PhD. And as part of that job, I was developing a CRAN package. So that was my first CRAN package in 2005. And over time, I got more involved in the R community. And 2015, I was elected as an R Foundation member. And as one of my tasks on the R Foundation, I co-founded FORWARDS, which is the task force for underrepresented groups in the R community. And then in 2020, I started the R Contribution Working Group, with the aim of fostering a larger and more diverse community of contributors to R.

Ways to contribute to the code base

So there are many ways that you can contribute. And it will take some combination of skill, time and money. And I'm going to be using these icons to give an indicator of the relative skill, time and money that it will take to get started in the different ways that I mentioned. So I'm not going to go over in detail, but you'll see the symbols on the slides. There's also lots of links that you can follow to get more information about all the things I talk about. And there'll be a QR code at the end, so you can grab a copy of my slides and get all those links.

Okay, so the first way is the sort of the obvious way, contributing to the code base itself. But the code base doesn't just include R code. All right, it includes text, the messages, warnings and errors that you see when you're using R, includes documentation, the R help files, but also the R manuals. So the help files are in RD format, the manuals are written in a thing called tech info. And then there's code, but it's not just R code. There's a huge amount of C code in base R, and there's also other languages like Fortran and Bash. So many of you will have skills that you can potentially contribute here.

So in terms of text, the main way that you can contribute here is in translating the text from English to other languages. And for this, we have a weblaid interface that provides a browser interface where you can step through the messages that need translation in a particular language. So in the example here, we're looking at messages in the R files, the stats package, and we're looking at translating those to Spanish. So you'll see the message that's the original English and underneath there'll be a blank box if there's no translation, or there might be an existing translation that needs a view. And so as a contributor, you can add the translation or update the translation, save and continue. And those contributions that you make will eventually be pulled together and added to.

If you want to contribute to documentation, fixing a typo or making a small improvement to the documentation, you can use the GitHub mirror. So R itself is maintained in subversion, but there's a mirror of the source code on GitHub. So you can go over there. It's the RSVN repo on GitHub, and you can find the source file. So this is the source file for the help on multi-non. And click on the pencil icons edit file, which you can do all through the browser, and it will open a fork of the product for you. You can make your change, then you can make a pull request back to the repo. And from that, you can create a patch file that you can submit to R-Core for consideration.

You can contribute to the code in a very similar way, but with code, it's a little bit more complicated. You might want to edit more than one file. And if you're working on an internal function or compiled code, then you'll want to rebuild R in order to see the change. And so for this, we can instead use the R dev container. So this is a virtual environment that you can launch through GitHub Codespaces. It has a VS Code IDE. So you can get a copy in here of the R sources, open the files that you want to edit. So here I've got the source file for the ask yes, no function. So it's an R script. And I'm just going to make a simple change here. So one of the options is one of the default choices that this function gives is yes. So the options are yes, no, or cancel. And I'm going to change yes to oh yeah. So there we are. I've done it. You can see the old line that has yes. You can see the new line that has oh yeah. And this is using VS Code's version control features. So we can see the change. At the moment, I've just changed the source file. But staying in the dev container, I can rebuild R. And then I can run my function with the change. And I can see if it's worked as expected in an example. So you might not, I don't know, hopefully some of you can read that. So we can see that when I call the ask yes, no function, I now get oh yeah as one of the options that I should give. So this allows us to test changes and develop bug fixes, for example, in the code or the documentation that we would like to propose to R.

If you head to the contributor.r-project.org website, you'll find information on a range of support for people that want to contribute to R. So we have a Slack group. We have the R development guide, which is an ebook targeted at people new to contribution covering a lot of the information that you'll need to know. We also have slides and video tutorials. And we also have sort of live help. So we have monthly office hours in two different time zones. It can drop in and ask for help on getting started or if you've started to work on a bug fix or something and you want to discuss it. And we also started to organize R Dev Days as such nights to conferences. And we'll be having a small group alongside the Tidyverse Dev Day on Thursday. And we have a few spaces left if anybody would like to join us then.

Contributing experience and expertise

So beyond contributing to the code base, there are other ways that you can contribute your experience and expertise. One way is testing R before release. So you may know that R has a major release once a year around April time. A couple of weeks before that, there'll be a pre-release version that is available on the R project website. And you can help by testing your own programs and workflows, your special ways of installing or setting up R, things that interact with external libraries or interactive R packages. In other words, things that are pretty tricky for the small core team to test themselves. And if you find any issues and report those back, you know, maybe they can get fixed before the official release.

But it's not too late to report issues, of course, after the release. And it's really helpful for anybody to report any issue you find on the bug tracker to help maintain R to a high standard. So both bug reports and wishlist items, sort of feature requests, can be posted on R's Bugzilla. And it's really helpful if you're reporting a bug to include a minimal reproducible example, which many of you will be familiar with. But in R, it's particularly helpful to restrict yourself to the base R packages and data. So it's clear that it really is an issue with base R, and it can be easily reproduced without any add-on packages.

If you've got a little bit more experience and time, then you can also help by reviewing bug reports. For example, you can help by trying to reproduce the bug in different versions of R or in different operating systems, different graphical devices. You might be able to simplify the reprex as another step towards analyzing the root cause, or you might be the one that helps find the root cause. You know, the few lines of R code or C where the bug is actually happening. And then you can discuss with others whether this is something that actually needs fixing and how to fix it. So it helps to have people contribute to that discussion rather than it just being one person having to think about this.

You can also contribute your expertise via a working group. So I've also already briefly mentioned the R Contribution Working Group and FWDs. The R Consortium also runs a number of working groups, and some of these relate to core development or core infrastructure. So I mentioned a few of them here. There's one on thinking about how to implement multilingual R documentation. So in other words, having R files, R help files in different languages. There's one that's been working for a number of years on the R7 package, which is a potential successor to S3 and S4 classes and methods in R. And there's the R Repositories Working Group, which thinks about how to support CRAN and the development of new package repositories for the community.

Contributing financially

And finally, I'd like to mention contributing financially. So far, the contributions have mainly been contributing your time and skill, but of course, money is also useful. So how can you do that? If you're an organization, the most obvious way is through the R Consortium, which was basically set up to help businesses support the R ecosystem. And it has a number of membership levels for different sized businesses. And this funding goes towards key infrastructure projects and community conferences and meetups, which have been a real boost to the R community. So this is a really valuable contribution.

There's also the R Foundation, which I've also briefly mentioned. This is the nonprofit organization that was set up a long time ago to support R. This works on a bit of a smaller scale, but it also has institutional memberships, starting from just €250 a year. But of course, if you want to contribute more, that's also welcome. And that would also be recognized. So the highest level is to be a patron with unspecified monetary amounts, and they're very happy to say that the deposit is a patron of the R Foundation. And this money goes towards funding things that the R projects and the R Foundation are involved in. So some support for R Core, CRAN, the R Journal, the UseR! Conference, and things like the R Dev Days. And both the R Foundation and our consortium work together as well. So for example, they jointly fund one of the R Core members at the moment.

But you can also contribute as an individual. So the easiest way to do this is through the R Foundation, which has an individual membership level, or you can make a one-off donation of the amount of your choice. We also have an open collective. This was mainly set up to receive money from Google Season of Docs. But you can also contribute to that. And that funding goes towards small projects, such as maintenance of the R Development Guide, or the UseR! Info Board that tracks all the information about past UseR! conferences to help future organizers.

So in summary, then, there are quite a number of ways actually that have a low barrier to entry. Message translation, documentation fixes, testing R, reporting bugs, making an individual donation. If you have a little bit more time and expertise, then you can get into working on code fixes, reviewing bugs, or contributing to a working group. And probably the highest barrier is making an institution-level donation. But this makes a lot of sense if your institution is relying heavily on R and wants to make sure it's sustainable for the long term. So here's the QR code I mentioned, which we're taking to these slides, and all the links. There's another page after this that is just all the links on one page, so you don't necessarily have to go all through the slides again unless you find that helpful. And so thank you very much.

Q&A

We do have time for a couple questions.

We have, who comes up with the R version names and how can we submit some suggestions?

Okay, so yeah, so you might have seen these quirky nicknames for the R versions, and it's Peter Dalgard, one of the R core members that come up with these names, and they're based on the Peanuts cartoons. I don't know if he takes suggestions, but yeah, maybe you would have to email Peter. He's at the University of Copenhagen, I think, so you could probably find him online, Peter.

And then one more, are there any particular types of contribution the R core team actively looking for right now? And so some of those things I mentioned are things that the R core have particularly asked for, like testing R before release, helping to review bugs. These are things that they've written about on the R blog, so that's another way you can sort of follow what's happening in the project. And they do engage with these activities like the R Dev Day, so we try and find things that have R core support, so you've got a good chance of your contributions being accepted.

And they do engage with these activities like the R Dev Day, so we try and find things that have R core support, so you've got a good chance of your contributions being accepted.

Awesome, thank you so much.