
Tracy Teal - It's not just code: managing an open source project | PyData Seattle 2023
www.pydata.org An open source project starts with ideas and code, but it continues with people. We know that most open source projects rely on just one or two people for most of the work. And code is just a small part of these roles, which also include project management, conflict resolution, decision making in uncertain situations, building an inclusive community, and lots and lots of communication. Whether you’re just starting a project, interested in getting involved in open source, or already have a community of thousands, there are some tips, tricks and templates that you can use to make maintaining an open source project more manageable. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
And welcome to It's not just code managing an open source project. I'd like to introduce the open source program director from Posit, Tracy Teal.
Thank you, I'm super excited to be here today speaking on this topic. You may wonder why this topic is in this session. And that's because I had a surprise opportunity to give this talk today because someone canceled. But this is one of my favorite topics, and I love the panel discussion yesterday. So I'm excited to get to spend a little bit more time on this topic.
And so I'm preaching to the choir in this conference to say that open source software is the roads and bridges of technology. So when I say that, you know, what do we mean? You open up your phone, your social media, your news, your medical records, your bank. They are all using free and public code. And this is a theme that we've been hearing throughout this meeting. Especially in the panel yesterday, and this is something that Nadia Ekbal highlighted in a report that she did.
When she was a VC, she was looking for the biggest sort of unfunded opportunities in the tech space and what she encountered was open source. And so she worked on this report to generate ideas about why this was happening and what the structure of open source looked like.
But what we see in open source is this sort of inverse pyramid structure, right, where we have a lot of people who are using it, users, who use an open source solution as is without adapting or modifying it. This is the biggest group of folks out there using open source software. There's integrators, people who integrate one or more open source components to create an application. So they may be doing a little bit modification, a little more what's going on under the hood in some of these projects. Then you have contributors, people who provide changes and improvements to the open source software ecosystem, and then finally you have maintainers, and these are the people who develop and control the main source code and manage the project.
So we see that this is not necessarily like a very stable structure. The way that it's designed and I think that a lot of us in this room have experienced this, this feeling of sort of being somewhere in this pyramid where there is a weight on top of you in terms of the responsibilities that you have to your team and your communities. And so a lot of times too, these are volunteers, right? So these are not necessarily even positions that are paid, the people that are supporting these structures.
So we see that this is not necessarily like a very stable structure. The way that it's designed and I think that a lot of us in this room have experienced this, this feeling of sort of being somewhere in this pyramid where there is a weight on top of you in terms of the responsibilities that you have to your team and your communities.
Tracy's journey into open source management
So, why is this a topic that is so interesting, exciting and important to me? Well, I started in open source software development in bioinformatics. And I wrote some software that mainly I used, but, you know, my team used it too. And I started to write projects that got more use. And I started to encounter like some of the edge cases of some of these things. But I also realized that especially in bioinformatics, one thing that was really limiting the advancement of being able to use genomic data was people with the ability to use data science to answer the questions that were important to them.
So I was a part of founding an organization called The Carpentries, which is a non-profit organization teaching researchers how to do data science and software development. And for me that really marked this transition actually in this pyramid, where I thought about, oh, I'm developing open source software, I'm a part of this pyramid, but when I transitioned to running an organization I really realized what that meant around maintaining free resources and everything that went into it and that I had zero training in any of it. Alas, my software development skills were not particularly useful in conflict resolution, in gaining money, running effective meetings. So these are all things I say I tried to get my MBA via Google. Every time I was like, I don't know how to do that, but I bet somebody knows how to do that, so I would Google it.
So that was like kind of this beginning of this journey for me. And then I was on the board of NumFocus, super excited to be back here at this event. And I saw that this wasn't just like a me problem. I wasn't the only one struggling with this. All of the projects kind of under that NumFocus umbrella were really struggling with a lot of those same challenges.
So now I am at Posit, I am the open source program director, and so I have the opportunity to work with open source developers every day and talk about how kind of Posit fits into this ecosystem of open source. And I am the chair of the board for PyOpenSci.
So as I said, this journey where I didn't know what I was doing is not my journey alone. There are certainly people who do know what they are doing as they get into this. But in general, open source developers have been trying to build roads and then we are suddenly asking them to maintain and manage the road crews for a city. So where do they get those skills? Is this even something that people are interested in? How do we start to think about this?
So why do we want to talk about it at a meeting like this? And that's because shared challenges are also shared opportunities. Which opportunities? So we are all here together, we have a lot of shared challenges, but this also means we don't need to be alone. These are shared opportunities for us to navigate these waters together and talk about what some of the solutions are. So one thing in particular, we can provide support, training, and resources for open source maintainers together. We don't each need to be doing this on our own.
Five areas to consider when maintaining open source projects
So what does this look like? So I'm going to talk about five different areas that we can think about in maintaining open source projects. And none of this here is a should. These are all questions. These are things that you can consider. Every project is going to be different. If I'm going to say should for me, it's not should for you. So in general, what I'm posing are more questions than answers in this.
But one of the challenges when you're working in an open source project is that you get going on the project and that's sort of where all your energy is. And so you don't have that space to step back and think about why am I doing this? What's important? So just a sense in this room, how many people have contributed to an open source project? And how many people feel like they're in some kind of management or maintenance group? And how many of you have had training in business or conflict resolution?
Yeah, okay, great, perfect. If you all have, well then you were going to tell me more, please. I'm always learning. So these are things that I only realized far into a project that I needed to consider. And then, again, especially working across all these different open source projects, realized that those were things that we wanted to be thinking about from the start.
So the very first thing, this maybe seems a little bit strange, but it's identifying your project goals. Because a lot of times when you start a project you are trying to solve a problem that you have. And maybe you write it just for yourself or for your team. And then some people pick it up and they start using it. And you're like, wow, that's cool, that's amazing. And sort of before you know it other people have sort of like decided what this project is about without you. And that's not bad, but you actually do get to decide what the goals of this project are. Just because you start something doesn't mean that you need to maintain it forever. Doesn't mean that you need to implement every feature. So it's worth taking the time to consider your goals.
So the first thing you have to do is take a breath, because, again, this project may have gotten away from you a little bit. So who is this for? Who are you creating this project for? What's the kind of person that you're thinking about this tool is for? How reliable is this going to be? Is this something that people are going to be using in production? Big environments? Or is this something that someone's using in a lab and you can kind of print out a bunch of error messages or share it with some caveats? What's the importance of reliability for this software?
What kinds of input will be important? Not like data input, but what kind of feedback? What do you want to hear about? What do you need to learn as people use this? And then finally, and this is actually really hard, but what's your vision of what will be possible if this software exists? The world and the future will be like X if this software exists. And that seems like, oh, maybe that's a grand statement, but you're doing it for those reasons. What is your vision for this software?
So these are really hard questions. Everything I'm posing here, none of it is a jest. It is all something that you have to spend time with and it takes time. But it's something that is worth taking a day, or go take a walk in a park and take your notebook. It's hard to answer these questions for yourself or with your team when you're in front of your computer.
Managing expectations
All right, the next one is managing expectations. You actually get to set your own boundaries. The challenge in this is you have to create those boundaries, those expectations, and you have to communicate them. So because it's about managing the expectations of the people who are using and contributing to this project.
So again, we have five things to consider here, there are certainly more. Stage of the project, like where is it in its development? How do you want people to treat each other in the space of your project? And how do you want them to treat you? How do you want to handle PR submissions and reviews? Templates, this is something that I'm excited about, ways that exist already to help you manage your expectations. And then licensing, something that came up in the panel yesterday.
So stage of the project, I really love this concept in this diagram. This is from the lifecycle package that Posit is one of the developers in here for. And so there are other things that you can consider, but this is actually really important, this idea. Is this an experimental package? Are you going to be contributing to it a lot? Are things going to be changing really quickly? Is it stable? Is it something that people can use and they expect there not to be many changes? So you can rely on it to behave in a certain way. And then, importantly, superseded and deprecated. Again, if you create something that doesn't need to last forever, one example is I wrote some software in graduate school that worked with 454 genomic data. That package does not need to exist anymore because people don't even generate 454 data anymore from genomic sequencers. That's a pretty easy one. But there's other reasons why a package might not really need to be actively maintained anymore.
How people treat each other. I think this is something that's been really positive in the Python community over the last several years, is the expectations of how people behave in code repositories and at conference. And a lot of this has been around really, again, setting those expectations, not just saying, hey, we should be nice, but let's write it down. What do we mean when we say that? What are the consequences for that? How is it that you share that something isn't going well and being explicit about that being important?
This is a place where templates are great. GitHub actually has code of conduct templates. So when you start a GitHub repository you can select this and add a code of conduct to your repository. I will say just putting a code of conduct in your repository is not enough. You have to think about how you might handle something that comes up. But this is a really important way of writing down what you mean when you say how you want people to treat each other. And this is important for you. How do you want people to treat you?
PR submission and review, contributing guidelines. You see contributing guidelines a lot in GitHub repositories. The timing of this is so awesome because I'm so excited about this. PR submission and review. Within Posit, we have an open source reading group and we're reading a series of blog posts about good pull requests, doing good reviews of pull requests. And from this, Davis Bond put together this tidyverse tidy team code review principles. And so you can see there's a lot of categories here about what does a good PR look like? What does a good PR review look like?
So this is setting expectations for the team. We think this is already what we do, but we're trying it out. But it's also another great way to externalize how is it, what's a good PR to us? What does that look like? What are your expectations of our review? And one thing that I really love that Davis and Liam L worked on together is these patterns of collaboration. So close-knit collaboration, understudy, external contributors. So the key thing here, again, if you're thinking about managing expectations, the first thing to do is which mode of collaboration are we in? Not every kind of pull request or every kind of thing that you're working on is going to necessarily have the same relationship. But if you can agree, this is where we are, then you have a set of ideas about what the timing is going to be, when will you get back to me on that PR? Will you just edit and merge? Things like that.
And again, it's like oh, well, yeah, we kind of all knew this already, but writing it down really helps us stay true to what we think we should be doing and helps other people understand our process and maybe something they can use if they're building a new project.
Okay, templates. This is my favorite. Whenever you're trying to do something new, it's like somebody's done it before and there might be a template. So always Google whatever you're trying to do and template. So there's a lot of great stuff in this space. There's READMEs, this README project around better documentation. We already talked about the code of conduct, READMEs straight from GitHub now.
Other kinds of, they're not exactly templates, but are like, what are good package development structures? So like the tidyverse has use this package that is like, you use this to do package development. So it helps you have good practices in terms of what your package looks like. And then also your packages look like other people's, so people sort of know what to expect. PyOpenSci has been working on these Python open source package development guide. So if you're excited about Python packaging, and like who isn't? I'm serious. This is amazing. This is Leah Wasser is doing this work, and Juanita is maybe here, she's going to be doing that project for just a little while. They have put so much work into like really having community conversations to say, you know, what works? What doesn't work? What's practical? Really incredible work that is really important in these spaces about again, what does a Python package look like? When you're starting, what does it look like? What can you expect from the packages when you take a look at them?
Governance
And licensing. I actually don't have too much to say about licensing. I thought it was an interesting conversation yesterday. Yeah, one thing that was interesting was that is it really open source? That conversation that was a part of that panel. And open source is defined by a set of licenses. So if it has an open source license, it is open source. And when we talk about governance, we can talk about maybe some distinctions there. But yeah, you have to make your choices on your open source licenses.
Who is involved in governance conversations in an open source project right now? Okay. I was like, no one wants it. Okay. So this is hard for sure. This is where having your project goals is important. When I said like, what kinds of input do you need? Who is this for? What do you want to do? Right? Like how much time do you have for this project? So these are some, the BDFL model is a pretty popular one. Open governance, I think this is where we kind of got into this conversation of is it really open source? Like open source is a license, but there's kind of this open governance idea, there's a community or a committee or something. Like Python is not great, but before it was BDFL.
Who's going to make your technical decisions versus your organizational decisions? And who's going to do these things? You might decide on an amazing structure and then be like, cool, and now who's going to do it? And everybody looks at their feet. Then it's not really going to be a structure that's going to work for your team, or you need to think about how are you going to recruit someone who's excited about those things?
Financial support
So yeah, that's definitely a whole topic. I'd love to have a conversation about governance in particular. So the last one then I think is financial support. So this was again something that came up yesterday. And I think this is, from some of my experience, having transitioned from a developer to leading an organization, was the recognition that this is really, I mean, in our case it was a non-profit, but a non-profit or a business. NumFocus takes some of that away by having this project structure where they'll do your back office for you. But you still, we come back to these, how do you make decisions? How do you bring in money? How do you make decisions about how to spend money? And this is again just a really challenging thing if you've been a developer, you might not have experience in thinking about how you generate revenue, how you spend money.
So that might be a little controversial, maybe you don't think that that's the way it should be, maybe it's not the way it is for you, maybe it's not what you've seen, but I think it's something to consider and appreciate. So then, if we do need to think about money, how much money do we need to meet the project goals? Again, what are you trying to do? How many people do you need? What roles do you need? How much does it cost to pay those people? Or how much does it cost for your infrastructure?
Align how you get your money with your project goals. This is some of the best advice that I heard. If your project goals are something and you create a way to generate money that is not aligned with the goals, you're actually running two businesses. You're running the business that generates the money and then you're running the business that does the other thing. And unless you can find a way to wed them together, the money one often wins in terms of the time, because you have commitments that you've made that you need to honor.
Align how you get your money with your project goals. If your project goals are something and you create a way to generate money that is not aligned with the goals, you're actually running two businesses. The money one often wins in terms of the time, because you have commitments that you've made that you need to honor.
And so when we got a grant from my grant manager at the Moore Foundation, Chris Menzel gave us that advice, and I think that that has been some of the best that I've heard. That's very easy to not recognize at the beginning.
Can you receive money? Getting sponsors is great, that helps with some of this. But again, this is actually non-trivial sometimes. How do actually people pay you? And then do people know what they're paying for and how you'll use it? I hear this a lot in open source, I say it a lot myself, we're doing great work, why don't they just give us money? That's not really how things work, unfortunately. But actually not unfortunately. People want to understand what they're paying for and how you'll use it. And so all of these other questions about what's your governance structure? How do you make decisions? All of these things is really important actually for getting money.
Because you can say we need money for one and a half developers and someone to write documentation. If we have this money, this person will do this, this person will do this. You'll see at the end of the year that these features are implemented, we have a maintenance structure, and we have a set of docs that you and your team needs it. Again, there has to be this kind of public good, is really important and true. But people need to understand how you're using money and have confidence that you actually know how you're going to be spending money.
Okay, so Nadia Ekbal, as a part of this study, interroads and bridges, also had this really nice, this lemonade stand handy guide to financial support for open source. So it's a GitHub repository, if you Google it. So she talks about a few different models for open source structure. And I do sort of, I'm not saying something's missing, but that the piece of, okay, this is how you bring the money in, but then how you spend it and how you talk about how you spend it is a complementary piece.
Posit's model and community of practice
So Posit, so one of the reasons I am super excited to be at Posit and why I came to Posit in the first place is because I think it's one of the a great model for supporting open source software. So we have this concept of a virtuous cycle where open source tools generate demand, they're great tools for data scientists, they're free, but then you need to transition often to like working in a corporate setting where you have privacy, or you need to work with teams, and so then you need enterprise tools to enable you to use those open source tools. And so those are the products that we build and sell, and then we use that to invest back in our pro products and our open source products.
And obviously this is hard, this is a big business, like there's 300 people, JJ Leary's gonna talk next about Quarto and built this business, like this is non-trivial. But I think it hits a lot of these marks of like aligning your mission with your ways of generating revenue, thinking about how you bring income, thinking about governance structure. There's 43 people working full time on open source at Posit, even more working on other pieces of open source within the company. But then even just within that, there are hundreds of open source products, and we actually don't necessarily govern them all the same way because they have different goals.
So it's kind of a really interesting ecosystem around a lot of these different pieces of we consider maintenance. And I think one of the things that's particularly great about it too is that because we have so many open source developers, this idea of community of practice. So you in this room are not alone. There is a community that is working together in open source. And Posit like, we get to experience it like every day. Like that I get to see my colleagues at this conference is amazing. Be on Slack with them, it makes such a difference to have other people working on open source at the same time.
But there's a lot of other organizations out there that help create these structures. The Carpentries, these PyData meetings, PyOpenSci, PyLadies, RLadies, NumFocus. And it's important to not go it alone, because when you're feeling like maybe this is, like I'm not good at this, it's actually usually not true. It's just that it's hard. And so finding other people that you can share with, learn from, is really important in this space.
So we in this room are building roads and bridges, maybe transitioning to running crews, but we don't have to do it by ourselves. We can talk to each other, we can develop resources, we can share them with each other. And so this was a little earlier stage of this talk that I would have planned on, but I'm giving a workshop on managing an open source project at Posit.com 2023 in Chicago in September. And I'll have all the materials online. But the ideas, like some of the things I talked about, templates, decision making, opportunities to practice with like conflict, or how do you handle code of conduct, like these are skills that we can learn just like anything else. And so developing resources to learn those things, as well as learning how to code.
You can hear JJ's talk next on Quarto, it's the best, it's like my favorite thing. I actually gave a talk on it being one of my favorites. Yeah, at RStudioConf, I said these are a few of my favorite things about Quarto presentations. And in that one I sang poorly and I will not do that here. And you can check out PyOpenSci, which is a great place around community practice and Python packaging, great community, developing good resources. So I will stop here. I think there's a few minutes for questions. I'm happy to talk about whatever you want to talk about, open the door on conversations that we can continue later as well. So thank you so much for this opportunity.
