
rOpenSci Champions: Building Communities of Open-Source Leaders (Noam Ross, rOpenSci)
rOpenSci Champions: Building Communities of Open-Source Leaders Speaker(s): Noam Ross Abstract: rOpenSci's Champions program is designed to build a diverse, inclusive, and sustainable community of scientific R developers. We identify emerging leaders in open science from underrepresented communities globally. We link them together with each other and with mentors who are experienced software engineers and teachers. In a months long program of skill and project building, they create, contribute to and review R packages. Then they bring those approaches back to their local scientific communities. In this talk, I will discuss the methods and accomplishments of the Champions program, and discuss how we can use this approach to build sustainable communities to maintain open-source software. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Good afternoon, everyone. Thank you for sticking around for the very last talk of the day here.
Thank you again for coming. My name is Noam Ross. I'm Executive Director of rOpenSci, and I'm going to be speaking about our rOpenSci Champions program and how we build communities of open-source leaders. I want to say from the start, I have the privilege of being the director of an organization, which also means that mostly I get to talk a lot about other people's work. This work in particular has been led by our spectacular community manager, Shani Sabine. And so I'll be talking about some of the work that I've done, but the work that she's very much facilitated through our organization.
About rOpenSci
First, I want to give you a little background about rOpenSci, what our goals and values are. rOpenSci was founded in 2011. It's a nonprofit organization. Our goal is to foster a culture that values open, collaborative, and reproducible research using shared data and reusable methods. And we do that by building both the technical and social infrastructure in the R language that's needed to enable that open science in a welcoming and diverse community. And I feel like, you know, in these times, it's really important to emphasize that last part. The diversity, equity, inclusion, and accessibility of our community is both a core value of the open science we do, that that science is accessible to everybody, and it's a key instrument of our goals, and that we need it in order to accomplish the science that we want to do.
We have a number of different program areas. The first one being software peer review, which we've been doing for 12 years. It's a program where people can contribute code to be reviewed by peers. We run a number of different sort of technological platform programs. The marquee one these days is rUniverse, which is our R package hosting and publishing platform. We have a number of projects in multilingualism and translation, providing resources in multiple languages and tools to enable that type of translation, both guides and packages and automated translation facilitation tools. We run community events, hackathons, community calls, co-working times to facilitate the community to do our work, and then finally we do training and mentorship, and that's really what I'm going to focus on today.
Challenges in open source scientific software
So we do all of these things to support open source science. Open source science is a certain area of software development with its unique challenges, and people who support open source scientific projects have a number of challenges that they report when they are managing those over time. Those challenges are really in the social realm. They have to deal with burnout, dealing with funding issues, dealing with leadership and community development issues, stability, how they manage their communities, the internal communication, the external communication. People report that these are the challenges that cause their software packages and their projects to stutter and fail over time, and we want to think about how we can facilitate overcoming those challenges for scientific developers.
Some of the things that, you know, when interviewed, scientific developers have said about what it takes to sustain their work. They say, I just don't have enough time to do all the things that I want to do because they require a level of expertise that is difficult to teach, and I haven't been able to easily recruit people to continue the work. Or it takes longer than a year to develop people to move into leadership. The hardest part is finding the right person and having the time to invest in mentorship. So succession and mentorship, recruiting co-maintainers are a key challenge to sustaining scientific software.
Scientific software also has a problem of scale. Most tools developed in the scientific sphere are part of what we call the long tail of scientific software. There are sort of mega packages, be they commercial or scientific, the Tidyverse and SciPy, but most tools are small and niche and essential to specific communities but don't necessarily have the scale of other open source software packages. They're developed to meet a knowledge need, not necessarily a market, so they don't go into a place that has scale. The teams that develop them are often not full-time developers. They are small and they are splitting their time between development and research. They may have user communities that scale with the success of the project, so the user community grows, but the funding resources and the size of the team does not necessarily grow with the scale of the user community, and if that team is small, it doesn't necessarily have the critical mass for the learning cultures that, you know, we all learned about this morning from Cat.
So that size is a real challenge, and one thing that we deal with a great deal is that these small projects have a high risk of maintainer loss and abandonment because when you're dealing with a project that has one, two, or three people and they reach a stage of burnout or a new job or a graduation or other challenges, you have this possibility that you're going to lose a maintainer and the remaining maintainers are zero or aren't capable of moving it forward, so a lot of the work we do in our OpenSci and our broader package developer community is trying to recruit those maintainers in, trying to bring someone into a project so that early enough or after the fact so it can continue when it needs to.
Communities of practice as the solution
So we have many of these challenges. These challenges are heavily driven by the scale of these software projects. They're small but important. So what do we do when we have a challenge of lots of small teams that have a problem of scale by the limits of their size? Well, there's a book I really liked as a kid about Swimmy, right? Swimmy is the eye of this fish, this fish right here, and Swimmy's insight, which is the insight that everyone spoke about in this room and is here as they know, is that you can do things together better that you can't alone, right? A small fish may not be able to do something that a big fish can, but a lot of small fish can, right?
You make an open source project together, you join a club, you join the community garden, you form a union together, you're able to accomplish things together. But when the context, when our unit of our small fish is a small software development team, what is the big fish that we form together? And that big fish is a community of practice. Community of practices are groups of people who share a passion for something that they know how to do and who interact regularly in order to learn how to do it better, how to support each other. Everyone in this room, in this conference, form a community of practice of R. R Bensai is working to build communities of practice specific to the open source scientific challenge. So our question is how do we foster that community of practice that has both the social and technical skills needed to sustain that scientific open software?
Community of practices are groups of people who share a passion for something that they know how to do and who interact regularly in order to learn how to do it better, how to support each other.
The rOpenSci Champions program
All of our programs cover this, but the one that really addresses it most directly is our community champions program where we build leaders for open source science, specifically building those combinations of skills. So the R Bensai champions program is a 12-month part-time cohort-based fellowship where participants come together to learn those skills on contributing to and forming communities of practice, both together and with one-on-one mentorship from other R Bensai members. So I'll talk a bit about the pieces that we have for this program and how and why we implement them.
So the first thing that is the principle of our design of the champions program is to recruit for and design for maximum accessibility for historically underrepresented groups. So the capacity to do this work is everywhere, but the circumstances where people are able to participate is not and is the limiting factor. And both because expanding that limiting factor is what's going to enable us to have more open source scientific maintainers and because expanding that limiting factor is the moral imperative in open science, we really prioritize making sure this program is accessible as broadly as possible. So part of that is a very active recruitment program where we partner with affinity groups, with local R user groups, do a great deal of outreach with ambassadors and academics in different parts of the world to make sure that people can know about this program, and then we hold information sessions during the recruitment phase so people who may not be used to such programs or used to applying to such programs can learn about what it means and what they need to do to apply. We're very proud of how broad that is able to reach. In our last phase, we had applicants from the first phase, we had applicants from over 55 countries.
Then we really prioritize in the design of the project accessibility, support, and safety. So accessibility is very broad. One of the accessibility topics that we think about a lot is language. So while in our first two cohorts were English, but we think very carefully about the design of making sure it's broadly accessible in English, in our cohort this year, we're running our first all Spanish training program. That's going to repeat and we're thinking about how do we expand that to focus on more languages and regions. This is a supported program. It took a lot of effort to go to funders and say what we wanted to do is give people stipends to participate because we have heard that that is a really important step in them being able to spend time from their job at this. So we're able to provide stipends for this for both participants and in cases where mentors where they need it as well. And we think about accessibility from a disability's perspective and safety and making sure that people understand that they have a safe space, they have a code of conduct, they have everything they need to be able to participate fully. And then all of those things are something that we subject to regular inquiry and feedback and multiple times during the course of the yearlong program we go back to our champions and we say what's working and what's not, what do you have a challenge to participation that we can work on as well.
Curriculum and collaborative skills
Then the core is that the design of the program is to teach the skills of collaborative software development first and that development and maintenance work in the context of an open source community. So everything we teach is about the collaborative interactive component of technology development. All of our champions have experience with R, experience with data science, lots of career experience in their different fields of expertise. So what we really focus on is this participation. So on one hand we cover our package development. That's something that some learners know and others don't. But it's sort of the core that we build around. But then the lessons that we work on are on building software together which are essentially team collaboration techniques. We talk about code style best practices. How do you develop so something is readable and maintainable by others and can be picked up in that way? How do you do peer review of software of others? There's code review which is part of the team building practice but rOpenSci also does what we call peer review which is the practice of reviewing and giving feedback to other people's code, other people's projects. How do you give effective feedback? How do you do reviewing and enter a code base that you're not familiar with so that you can provide useful information? We have sessions on contributing to other open source projects. So in our last cohort we had Heather Turner who runs RDevDays for contributing to RCore itself, how people can participate in that. And then we go beyond sort of these particular technical topics because we're thinking about developing people as leaders within their open source communities. So we talk about organizing events. If you're going to have a hackathon or bring together maintainers, hold a webinar about the thing that you're building, how do you organize an effective event? How do you do public speaking at those events? How do you do communication with your team projects? You know, an open source maintainer in a small project has a lot of hats to wear and we know that it goes beyond just the code.
How does this work in practice? Well, so the collaborative teamwork, we think of collaborative first Git. Some people learn Git, say first Git's a version control system, here's a bunch of Git commands and you get like the first bit, but Git is first a collaboration tool and teaching it as a collaboration tool is the way we design our lessons. This is our current version of that lesson, actually written by one of our first cohorts champions, Parker Ellis, and then we do peer-to-peer code review because some people join in order to develop code, some people join in order to review and practice contribution, and so we will match them up and so people are able to do that code review for each other. Those collaborative skills are what we're building.
Project-based learning and real-world impact
The next thing we do is this is project-based and the reason it's project-based is partially to build skills and build confidence, but it's as much for us because what we want to do is empower people as peers within our community. I'm not looking in this program to find the person who is going to do the work for my package. I am looking to develop a peer who has a project that I am interested in contributing to so that we have a mutual and equal exchange. So the projects that people create are not toy projects. These are things that have lasting and important impact in their open source communities. So we have a lot of participation of people from ministries and national statistical agencies, from research institutions. So here's an example. We have a package that is a processing package for the Argentine census data, which did not exist in a standardized form year to year prior to development, which has rapidly become an important tool in Argentine population and demographic science. Similarly, tools for Brazilian transport systems. You know, we have a global project. We have an ornithology team in Taiwan that makes a great deal of use of the Taiwan-British bird survey package. So these are important projects. We're recruiting important projects as well as champions so that what they build is something that is of lasting value and attracts its own maintainers.
Evaluation and multiplying impact
One thing we do a lot in the program is evaluate and iterate on this design to improve it. So we have a great partnership with the Center for Scientific Collaboration and Community Engagement, and that may be the first time that I got its acronym right on the first time. So every year we go back and we interview our champions and our mentors. We conduct a survey and evaluation to understand what worked and what didn't. That also enables us to collect a lot of mission that helps us communicate about the program and the program's value. And what we're really trying to understand when we do these evaluations is not just did people like this? Did people's skills improve? That's nice. We want to know are communities of practice forming as a result of this work? And are those communities impactful?
So we ask survey questions and we ask free form questions to hear from people what they are getting out of the program and what is resulting. So we have things like this. The feedback and interest are received from other participants and mentors gave me the confidence and motivation to continue improving it in this package. If I were wanting to transform national data into something useful for the wider public. So we're getting that community activity happening and we are getting the impact as a result of the activity. And then we interview the mentors. Because as I say, our goal is to create peers, not students or apprentices. And so what do the mentors say? Being part of a cohort makes people feel they belong to a larger community. I learned from my mentee, too. That's us accomplishing our goals.
Being part of a cohort makes people feel they belong to a larger community. I learned from my mentee, too.
Now, the challenge of the program is we only have so many people, so many cohorts. We run two ten-person cohorts a year. There's a limit to that impact. And so we want to think about how it multiplies. And so part of the design of the program is in identifying and cultivating leadership in Unchampions. We select people who have leadership potential in their local communities, right? And we encourage activities to facilitate that. We ask everyone who participates to do something in the realm of giving a talk, running a workshop in their city, in their university, in their agency, thinking about how to bring these practices to a larger group beyond the ten people. Because we just don't want our OpenSci participants to expand our community of practice. We want to build these community of practices and networks wider, which means we have a reach into all of those places.
And this has been spectacular. So one of our champions founded the R User Group for Rosario in Argentina. There is now a continuous user group in that city as a result of this. We have people doing international work for meetups through the international affinity groups, giving talks within their agencies about how they can use these packages, how they develop, how people can contribute to them. All of those let us have the impact in creating these communities of practice well beyond the ones that we run ourselves.
Cultivating the soil for future leaders
So this morning in the keynote, Dr. Hicks gave a part of her talk where she talked about the metaphor of the plants and the soil, where the developers are seeds, and the job of a manager is to cultivate the soil in which the seeds grow. And I think in our context, we have a much bigger field to tend to the soil of, because what we are trying to do is not just build skills within our team, but make sure that we are building the community from which our future developers, maintainers, and leaders are coming. The community of practice is the thing we have to make sure is rich and fertile for all of those accomplishes to grow out of.
So that's it. I very much want to thank Shani for being the leader of this program. Within rOpenSci, Mel Simon and Stephanie Lesert are the other big participants in this program. But of course, well beyond the core rOpenSci, I want to thank so much all of the champions and mentors we've had in this program over the few years. We have several of them in the room today on all sides. They're what makes the success. And that's it. Thank you all very much. I'm happy to take any questions.
Q&A
Thank you so much. We have a few questions here. So what is the typical experience of experience backgrounds of folks in the rOpenSci community? Are there ways for high schoolers or college students learning data science to get involved? I mean, there's certainly a place that the typical experience of champions tends to be early and mid-career. We have a lot of people who are like well into research or other data science career that they're in. We have had high schoolers participate in rOpenSci. I think we've had a couple of examples. Our software peer review, our hackathon projects, they're all open. They're not age limited. I admit we haven't focused on youth development in it, but we have something called the contributing guide at rOpenSci. So it's contributing.ropensci.org. And it's actually like a book about all of different ways to enter into the community of practice. Do you want to sign up to review? Do you want to join in one of our webathons or hackathons? Do you have one of these other projects we can contribute to? Do you want to contribute to the platform? All of those things are widely open, and we do actually get a very wide range of people who participate.
Great. Is rOpenSci affiliated with PyOpenSci? And do you think we're going towards a polyglot OpenSci in the future? We are very good friends. We have a joint Slack channel where we talk about collective challenges, and PyOpenSci does a lot of, they have a parallel, they have a similar software review process, and they do a lot of work in the Python packaging ecosystem to harmonize and document and help people understand the complexity of the Python packaging world. They started about three years ago. And so I think we collaborate a lot. We're both able to do what we do because we have a lot of focused expertise within the organization about the particulars of the language and the platforms and the communities that have sprung up around. And so we've talked about, should there be one OpenSci? And we're not sure there should be, but there's a very tight loop between us. And so people who are interested in one can easily move between another, and much of the design of, say, the peer review system or the philosophy of the community is really well aligned between us.
Perfect. One last question here. How do we apply for the program? We will have, and we have an announcement twice or once a year. They're one year and two cohort years, depending. And we put out an announcement that in addition to our website, as I said, we do really heavy recruitment. So we'll go out through a great many of, you will see it in Latin R and Africa R and the Japan R groups and the global R ladies groups. And if you're a participant in any of those spaces, it will show up. We're able to do it pretty loudly. Of course, you can also follow This Is My on our website, any of our social media, it will show up. We did our last announcement in, it was the late spring that our cohort is now doing in its first month of training for. And so I would expect the next one is probably going to be a similar time next year. I think we're going to run two cohorts next year. Sometime in mid to early spring, we'll probably have the announcement for recruiting them. Great. Thank you so much.
