Building sustainable open-source ecosystems: Lessons from the #rstats community and an NSF grant
The blessing and the curse of open-source software is that it lacks the infrastructure of a corporation. It can often be difficult to ensure that projects have stability and longevity. In this talk, I will discuss ongoing work on an NSF "Pathways in Open-Source Ecosystems" grant focused on the {data.table} package. Like many R packages, {data.table} has incredible functionality and thousands of users - but no cohesive community or governance structure to support it long-term. We are working to build this ecosystem. I will provide my advice and insight for key aspects of a sustainable open-source project: Engaging casual users, supporting developers, generating content, emphasizing education, and creating a home base for the community. Talk by Kelly Bodwin Slides: https://github.com/kbodwin/positconf_2024 GitHub Repo: https://github.com/kbodwin/positconf_2024
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello everybody, I know it's like afternoon lull, end of a session, so thank you all for being here for topics that I think are really important. So extra credit to people in this room, a lot of cool stuff happening at this conference, but I think the question of, you know, how do we keep this awesome thing that we have rolling is a huge question.
So my talk today is about, you know, sustainable open-source ecosystems and specifically thinking kind of from the perspective that is personal for me about a grant that I'm on right now as well as the famous rstats hashtag on the website formerly known as Twitter and elsewhere. And so I'm a professor of statistics at Cal Poly, but perhaps that is not maybe the credential that matters today. The credential that matters in this talk is that I am an R groupie, and that all I do is follow R around to all the conferences and go to the talks and like act very excited.
What's been shaking up the R world
So basically over the past four years, a few things have happened that have made me really think about, like, what is the future and sustainability of R? One of them is obviously a global pandemic, which I've chosen to represent by RStudio Conf, Posit Conf going online, rather than one of the depressing pictures we keep seeing in the news, but obviously that sort of changed a lot of people's relationship, I suppose, with their work and with R.
And then the next thing that happened was Posit rebranding to Posit, and I think we all as R folks had a lot of questions about, you know, what is kind of R's role in this world where Python is also very important and other languages too, of course, and our sort of fearless leaders, RStudio, don't have R in the name. What does that mean?
And then stuff happened on Twitter, namely it becoming kind of inhospitable to big parts of our community, and so a lot of the community left because they didn't want to be in a place where some of the valued members of our community didn't feel safe. So this was a big blow, right? I hate to say how much of my career depended on Twitter, but it was really where we were connecting and sharing things about R and having those conversations, and it's not really that anymore. There's not just one place anymore.
But then a good thing happened last year for me and for everybody and for R, which is that the NSF released this new program, POSE, which is Pathways to Open Source Ecosystems, and they're basically supporting kind of not the code part of open source, but the building of the infrastructure that is necessary to keep important open source projects alive. And so my friend Toby Hawking, my now friend, I only knew him from Twitter, is at Northern Arizona University, or he actually just moved from there, but he got this grant to reinvigorate the data.table package.
The people here that use data.table, if you're aware of data.table, it's one of these packages that has a cult following, a large cult following, it's not niche, but the people that use it really use it, really love it, and it was pretty much built and maintained largely by one person, and he burned out a little bit, and so it kind of got a little stagnant. So Toby brought me in on this grant because I am very much a tidyverse person, but I do love data.table. I think it's kind of a both, not, or sort of situation.
So I've been thinking about how to make data.table specifically more sustainable on this grant. So as these things are happening, as I've been thinking about it, I kind of am thinking about three questions here. Versus R itself, like what keeps R, you know, relevant, popular, why is it that R is still growing even though Python is growing exponentially? You know, those are not mutually exclusive.
And then packages, like data.table, specifically what keeps these packages sustainable and current? You know, we just heard about the challenges involved, right, in keeping packages fresh and available. And the community, right, what keeps us engaged, or where are we engaged, and how are we connected to each other?
And so I think, you know, the answer to these three questions is, like, the people in this room. Specifically, the people in this room, in that room, on the internet, you know, it's us. And as I've been thinking about this talk and, like, who's my target audience, I'm thinking, you know, it's not the big-time developers and the full-time positive employees, sorry. And it's not really beginners, people that are new to R. If you're here learning things for the first time, you know, your only job is to learn and grow. This is for the people in between, which is most of us here. Like those of us that are really benefiting from these tools, that are using them heavily, but maybe aren't quite, you know, the Hadley Wickhams of the world.
And so what I realized, putting this talk together, is that I'm speaking to myself. I'm thinking about, you know, these are the things, like, the kind of post-pandemic I now have, like, elevated maybe from beginner to not beginner around that time. And these are the things I need to be doing better. Like I have been behaving like a beginner and just accepting the gifts of the R community and not giving back. So think of all these things that I am telling you to do here as things that I'm also telling myself to do and kind of pledging that I want to be better at going forward.
Supporting software creation and maintenance
So let's think first about creation and maintenance of software. That dovetails nicely off that last talk. You know, what do we need to do to support this, even if we're not the ones building maybe the big packages? The first thing you can give to developers, the big developers, is your time. Yes, contributing code, as you heard from Heather, is very nice, but, like, there are other things you can do, right?
Filing issues with feature requests, bug reports, all that good stuff, primarily on GitHub is where you would do this. Translation. There's a big translation project in this data table grant. So if you speak a language that is not well represented in R, would love if you reach out. We have funding for translation projects. Fixing typos, like little things that you find. My students sometimes find things and they get really excited when they can actually contribute, even though they're brand new, they found a typo.
And then I'm going to kind of beat this over the head today. Blogging and posting is so valuable. Like if you use a package and it solves a problem for you, mention it. And then the developer of that package will be so happy, I promise you. That is one of the best things you can do for the developers of this software.
Blogging and posting is so valuable. Like if you use a package and it solves a problem for you, mention it. And then the developer of that package will be so happy, I promise you. That is one of the best things you can do for the developers of this software.
And I thought that this was maybe, like, annoying, right? To be, like, filing issues or bothering developers of packages. But the more I talk to these developers, the more I understand they want this. They want the feature requests. Even if they have to say no to it, even if they can't do it, they want those suggestions. They want the bug reports so they know what's going on. These are very valuable.
Giving credit and money
The next thing you can give your developers is credit. And I've been sort of bringing this to all the academic conferences, especially, where they're writing papers. If you have a project, whether it is a formal paper or a blog post or, you know, just an analysis project, cite your packages. All of them. I mean it. All of them. Like, if you use some package to make your font prettier on ggplot, that was work that someone did to make that software so that you can do that thing, right? And you should be giving them credit, at least in some way, at least acknowledging that you used it.
Maybe not a formal citation like in a paper, but a list of packages that you used or mentioning it in your blog. But if you are writing formal papers of any kind, there's no excuse, because here are three lines of code that will just get you copy-paste bib text citations for every single package that that project depends on. So no excuse. There it is. Gift to you. And to me.
And then, of course, Heather also mentioned money. We unfortunately live in a society and it happens to be a capitalist society where people need to buy food. So these developers are doing things out of the goodness of their heart. But you can help out. And I think open source is cool because you get to use the product before you decide if you want to buy the product, you know? No one's asking you to kind of give up front. But if you see one of these like GitHub donation, people have them on their personal GitHub sites. If you see a buy me a coffee button below a particularly nice app or analysis, if you have used and appreciated that thing, hit that button, even a little bit. Both the gesture means a lot and, of course, the dollars mean things to people.
And if you do have access to kind of company funds of any kind, please fund workshops and professional development. It's really hard to adopt these new tools, right? So the more that private companies can fund them, the more they will have people in the workspace that can do these things. Or at least, you know, support your employees to do things like this. If you're giving your employees at least a little bit of creative time, a lot of the greatest like open source work of our time has come from people at private for profit companies who are allowed to use a bit of their time creating these open source tools.
And then I'm glad Heather covered this in much more detail, but there are these foundations out there that are kind of clearing houses where you can donate larger sums of money. And then these organizations, these nonprofit organizations, help distribute it to projects that are going to help grow our ecosystem.
Making it easy to support your own projects
And then just a word because those of us kind of in this middle zone, we do sometimes develop. I have kind of like one and a half packages that I need to work on. Make it easy to support you. Don't think that this financial support or emotional moral support is only deserved by the people making the massive packages. Like if you've made something, make it easy to help you. So add a citation file if you want the citation to be in any way non-standard.
The kind of default from that code I showed you is just like the info from the description file of the package. But you can set it up so that instead it points to like if you have a paper, maybe in a journal that's related, or if you just want that citation to be different, add that in. If you're a big project, maybe join one of these foundations. We're working on this for data.table. If you're a small project, I think our universe and our OpenSAC are a good place to go just to get your package kind of part of a larger ecosystem.
Not so much for funding, but for, you know, the other types of support and also letting people know about your package. And you know, if you if you were kind of in that position where you're like, I really want to do this thing, but I'm not motivated or I don't have time or I just can't make it happen. I do think these little like donation buttons, you know, put them out there. There's no harm in putting them out there. And if someone appreciates what you did or wants you to do more, they might click it.
Governance and the data.table story
And another really important thing I want to mention if you're on any kind of development project is this idea of a governance document, contributor guide, and code of conduct. The contributor guide is going to make it easier for people to help be part of that development. Most of us don't live in a silo and we could use that help. Code of conduct is a good way to keep your community, you know, feel like a safe place for the whole community. And this governance document has been very powerful.
So this data.table project has a new governance document. It started as a GitHub issue discussion that looks like this. proposal, structure, discussion between the contributors. And like none of this is official, right? There's no boss. There's Toby who has the grant, but he's not the boss. There's a lot of contributors to data.table, including the creator, Matt Dowell. And this was just a community discussion. So you don't have to be like a big fancy organization to have a governance document that sort of says whose responsibility is what in the maintenance of this project.
And then they did this poll request, you know, where the actual thing, the other one, you know, drafts of it were added. This was all happening on GitHub, right? But it was a real conversation. And I just think that's really, really cool and powerful. And then it was approved. You can see which date it was approved. It was in December of 2023 here. And this is the activity on the data.table repository. So you can see that kind of drought between, you know, I guess it's more like 2021 and present. And then there was the first major release of data.table in four years, right after this governance document was approved. Because everyone understood whose responsibility is it to review the code? Whose responsibility is it to, you know, do the release? All that good stuff really drove the project forward.
Supporting education and sharing knowledge
Next question in the room. How do we help people learn? This has a fairly easy answer. Please support education. I know sometimes it can feel like a big university asking for help is like this. But we really don't have, I'm a professor at a public university, we don't have that many resources for things like adopting and learning new tools. So the more that you can put out there, the better it is. Like my students are finding the stuff you're putting out in the world. I promise they're using it if you put it out there.
And I also think people are like, well, I'm not an expert in that thing. But a resource made by someone actively learning the thing about how they are learning it, like a blog post, is a lot more valuable than like the expert perspective, trying to boil it down to first principles. So your stuff is helping our students.
And then just, you know, the free access to tools, that is really important, too. I'm really grateful to Posit that makes a lot of their tools. This is a secret they probably don't want me to tell, but they often will make their professional tools available for educators if you're using it in the classroom only. And that's really appreciated.
And a lot of people are kind of hesitant to do this blogging thing just because, you know, oh, well, this is me. It was me to me again. I have so many half-finished blog posts, and I'm like, it's not perfect, I can't post it. I want us to get away from that. Like, I really think we should get back to this world that we kind of had in Twitter, where you are just spamming your every idea and every new thing you learn. That was so fun, right? And it didn't have to be perfect, and it didn't have to be polished. I want more mini blogs. I want more mini blogs. I want to hear, like, tonight, what you thought was cool today, what you learned at this conference, because I use those things.
A push towards short, imperfect blog posts, nothing against the beautiful, you know, Next Great American Novel style ones, but let's get more out there.
So, a push towards short, imperfect blog posts, nothing against the beautiful, you know, Next Great American Novel style ones, but let's get more out there. And similarly with packages, I need to be better about this. Like, lots of little releases as you add stuff. Don't feel like you have to clear the board on every, like, bug and request to make it worthwhile to, like, update your package.
And this is what I mean by, like, the little stuff. So, this is a good old days on the website formerly known as Twitter, and I just took a little screen grab that should play. There we go, scrolling through my bookmarks. All these are just little things, some of them silly, right? But I've used all of these or most of them in the classroom. This was so cool to log on, and it's starting to be that way. In the last few months, I have felt like there is more out there in the social media sphere than I have time to consume, and that is a really fun problem to have.
A lot of this, what I'm saying now, has been said better in the keynote from 2019 by Dave Robinson, so I really, really recommend this video. That is a link if you went to the slides link. I'll show it again at the end. Basically, anything out in the universe is more valuable than anything not out in the universe. No matter how quality that thing is, if it lives on your computer, it's only helping you. So, I just want to encourage people to throw things out there, and I'm trying to, like, live this truth.
We have a new blog. It's not as new anymore. It has a lot of posts for data.table. It's called The Raft. We have, like, data.table dedicated posts and updates on the grant. And then I just threw this thing out into the sphere recently. It's kind of inspired at the UseR! conference. I'm calling it particles because it's, like, little articles. And my rule about this is if I sit down to write the blog post, I post it, like, I cannot step away and come back to it. So, it is a stream of consciousness. I don't edit it. I don't be like, oh, I could learn another thing and add to it. I just chuck it out there. There's only three right now, but I have ideas, and after this conference, I'll have more.
Finding and investing in community
Last thing I want to ask you with my remaining minute, what communities are you in? How are you investing in them? We're a little more fragmented. Twitter still exists. Just be aware that there's a lot of people not on there. You're missing a big chunk of the community. I'm seeing Mastodon, specifically the Fostodon server grow. It's really great. Seeing a lot of cool stuff on LinkedIn, which I never thought I'd see. I was anti-LinkedIn as an academic, but it's actually been really nice. And then those conversations do happen on GitHub. There's a lot of dedicated discords and slacks for more, like, back and forth active dialogue. So, find the things that work for you.
I'm not going to tell you which one is the best. People keep asking me to do that, and I don't have that answer, but I'm on all of them, except Twitter. So, jump in. And then, not just on the internet, right? This world is important, too. I think Zoom also counts, but I mean the real-time interaction with humans, not the posting text online version. Find your local R user group, R ladies group. Go to these conferences or participate online. I think this is also an important thing that we sort of forgot during the pandemic and should remember exists and is fun, and preaching to the choir in this room, but also, hello, people on the internet. These things are important, too.
So, I'm going to make you do this with me, even if it's awkward. I'm doing it. So, I want you to raise your right hand. Put your left hand on, like, your laptop or your favorite Hex sticker, whatever is meaningful to you that you have on your body right now. I'm going to say the thing, and then I'll point, and you're going to say the thing. You ready? We're doing it. We're going to make the other room think we're weird. All right. Number one. I will cite all my packages, thank my developers, and contribute what I can.
Number two. I will share my knowledge and encourage new users.
Number three. I will seek out my communities and find ways to get involved that work for me.
Now, this last one, loudest. I will remember to have fun.
Thank you. Yes, I think we live in a wonderful time for R. It is still growing. It is not dead. Twitter didn't kill it. Elon Musk cannot kill R. Let's just do this thing, all right? Let's keep contributing. If you're people like me that are just now leveling up out of beginner, make it happen. Make your silly blog. Make a particle or whatever the thing is. Do that with me, and I will do my best to do it for you, and we'll see what we can build.