
Running a polyglot data science community (Melissa Van Bussel, StatCan) | posit::conf(2025)
Running a polyglot data science community Speaker(s): Melissa Van Bussel Abstract: Running a successful data science community is CHALLENGING! And it's even more challenging when you have a very large group of people, especially if they use different programming languages or are at different steps in their data journeys. It can be done, though, and in my talk I'll share the strategies that we've been using at Statistics Canada to keep the 1000+ members of our R and Python User Group interested and involved. I'll also talk about why you don't have to limit the target audience of your data science community to just one group -- creating polyglot communities that embrace multiple programming languages is a great way to ensure the long-term success and relevance of these groups. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
I am so excited to be here. This is my fourth year at the POSIT Conference, and I've been looking forward to this event for months.
For me, one of the most magical places on earth is POSIT Conf. Just like at Disney World, I get to meet my favorite celebrities, take home awesome souvenirs, and take photos in front of iconic landmarks.
And even though we can't be in this magical place every day, what we can do is create a community back at home that feels just as magical.
My name is Melissa Van Bussel. I'm a senior analyst at Statistics Canada, and I help run a large polyglot data science community at my workplace called the R and Python User Group.
Our group is pretty magical, and today I want to share with you some of the things that make it so magical, and I also want to share how you can create your own magical polyglot community.
About the R and Python User Group
The R and Python User Group at Statistics Canada has over 1,000 members, all of whom have different levels of experience with programming, and who all use different programming languages.
In our group, we really hold the philosophy that the community aspect is far more important than the programming language that someone uses. In the same way that Disney World isn't about Mickey Mouse, your data science community doesn't have to be about one specific language.
In our group, we really hold the philosophy that the community aspect is far more important than the programming language that someone uses.
And in fact, part of what makes going to Disney World so magical is the fact that you can see all of your favorite characters all in one place. That's why the name of our group has both R and Python in it, so that our members don't feel like they have to choose between one or the other.
Don't be too misled by this name, though, because even though it's only got two languages in the title, we do tend to be the go-to for all things technical at our workplace. It's just that the acronym RPUG is a lot nicer to say than whatever this one would be.
By creating a group that embraces multiple programming languages, you're ensuring that your group is going to stay relevant, regardless of which language or tool ends up becoming the most popular over time. And if you want to create your own Polyglot community, it turns out that you don't need a magic wand to do it.
I'm going to share a few of the strategies that have been working well for us at Statistics Canada, and hopefully that'll help spark some ideas for your own group as well.
Building a polyglot identity
If you ask somebody to introduce themselves, especially at this conference, you'll notice that people's introductions usually include the programming language that they use. Someone might describe themselves as an R person or as a Python person in the same sentence where they tell you what their name is and where they work.
For a lot of us, the programming language that we use can feel very core to our identity.
And so if you want to create a successful Polyglot group, then my recommendation to you is to create branding for your group that people can identify with and can associate with your group. But more specifically, you want to make sure that that branding isn't overly specific to one language.
For the R and Python user group, our branding is centered around our group's acronym, which is RPUG. And in 2021, one of our members, Thomas Wood, created a logo for the group, which is a pirate pug.
This logo was created entirely using code, which means it's completely reproducible and also pretty easy to modify, which means that it wasn't long before other members of the group started creating spin-offs of the original logo.
As of today, we've got over 30 different versions of this logo, the stickers of which you can find all over the place at the StatCan office in Ottawa.
One of our other members, Alex McSween, who's actually here at Conf, created a 2025 RPUG calendar where each month of the year features a different pug based on that month's theme. Over the last year, we've also hosted a couple of merch orders that have been centered around these logos.
A brand isn't just a brand, and a logo isn't just a logo. It's something that helps your members feel more connected to something bigger than just the language that they use, but more importantly, it also helps them feel connected to your group.
Creating something for everyone
Once your group has its own identity, the next step is to make sure that there's something for everyone. Because a polyglot group is going to have a wide variety of people, sort of by definition, you wanna create a multi-park experience where every person in the group feels like there's something that's for them.
At Disney, we've also created a multi-park experience at Disney World. Epcot is the theme park that's the most focused on learning and exploration. And in RPUG, we've got our own little version of Epcot that provides our members with a number of ways that they can learn new skills related to open source.
For example, we have a monthly meeting where we share updates about what's new in the world of open source and at Statistics Canada, followed by a presentation or a workshop from a guest speaker.
We try to make sure that these presentations aren't super specific to one language by choosing topics that are broadly applicable to anybody who works with data.
Over the last year, some of the topics that we've had have included a showcase of internally developed R and Python packages, a roundup of success stories when working with the parquet file format, and a presentation from the Statistics Canada library showing our members how they can access library resources to learn about open source topics.
We've also got an internal website that has guides and tutorials for working with R and Python in a way that's specific to our workplace's technical infrastructure. And we've got a webpage that outlines project templates and packages that have been developed by the members of our community.
Hollywood Studios is the theme park that's all about going behind the scenes and seeing the hidden work that goes into making the published product before it ever ends up on the big screen. For us, that means giving people a safe space to try things out, to make mistakes, to learn, and to ask questions.
We host low-stakes code challenges where the prompt can either be solved in just a couple of lines of code or in a lot more, depending on how creative the person wants to be with their solution. And participants are encouraged to have fun and be silly with it so that it doesn't feel like work.
You can use any language that you want for these challenges and this tends to be a space where people will often use a language that they're less comfortable in.
We've also got a number of different ways that our members can ask for help with programming or other technical questions. We've got a very large chat on Microsoft Teams where people ask questions casually throughout the day, but we also have nine volunteers that offer drop-in office hours on a regular basis.
Because the group is polyglot, each office hour clearly outlines which topics the current volunteer can answer questions about. And it also points users to the full list of office hours if they need help figuring out who the right person is that they should reach out to for a specific topic.
Fun events and Magic Kingdom moments
If you've ever been to Disney World, then your favorite theme park is probably Magic Kingdom, which is a place all about having fun just for fun's sake. Over the last year, RPUG has had quite a few events that have had absolutely nothing to do with programming and have really just been an opportunity for like-minded people to hang out together and do something fun outside of work hours.
We've had a couple of trivia nights and recently we had an event called Beads and Brews where we got together for dinner after work and then people could make friendship bracelets or key chains or whatever else they wanted to make.
This year, we're also launching an RPUG yearbook just for fun where we can put all of the pictures from this year's events. And we'll also have a section for RPUG awards so that we can recognize the contributions of our members.
Now a lot of the stuff that I've just talked about might seem kind of silly, especially in the context of a group that in theory is supposed to be focused on data science, but having these fun activities is actually really important for keeping members engaged with your group.
And if your group is polyglot, then I'd argue that this kind of thing is even more important because it's these types of events and activities that are going to give your members a chance to connect and figure out what stuff they do have in common. It's a lot harder for somebody to make a new friend during a programming workshop than it is for them to make a new friend while they're doing something fun and casual.
And the more that the members of your group are friends with each other, the more they're going to want to participate in your group's activities as a result.
Letting members be part of the story
Another way that you can increase participation in your group is by taking a page out of Disney World's book. When you go to Disney World, you don't just feel like you're watching a bunch of actors. You feel like you're a character that actually belongs in the story. And you want to create this same kind of feeling within your group.
One of the easiest ways that you can do this is by letting every member of the group be as involved as they want to be with the organization and the planning of the group. And if your group is polyglot, again, this is going to be even more important because nobody is an expert at everything. So you want to make sure you've got a wide range of voices so that each member of the group feels represented and welcome, regardless of which languages or tools they use.
In RPUG, we have the philosophy that anybody who wants to contribute can. There's a few different ways that we do this, and one of them is the RPUG Weekly Newsletter. And anybody at Statistics Canada can publish something in this newsletter. You actually don't even have to be a part of the group to submit an article to it.
What we get as a result is a weekly, community-curated collection of upcoming events, programming tips, and so on. We also have an internal RPUG website, which anyone can contribute to, by submitting a merge request on GitLab. And I took these numbers directly from the repository, and we can see that in the last four months alone, we've had 33 contributors across almost 2,000 commits.
This group is grassroots, it's really by users and for users, which means that the people who are organizing the group are also the same people who actually use the tools on a regular basis, rather than people from senior management, for example.
We try really hard to keep things casual. We try to avoid getting too bogged down with bureaucracy. But with that being said, of course, there is still a lot of work to be done.
To be honest, it would be basically impossible to have a group of 1,000 people and not have some type of structure. It definitely takes a village to have a group this large run smoothly.
The committee is open to anyone, and it does tend to be the more advanced programmers who join this subset. Personally, I really love being part of this committee. We meet every two weeks, and in addition to planning some of the fun stuff that I've talked about earlier today, this is also the group that discusses the more technical topics. In particular, we talk a lot about how open-source tools are being used and supported in our workplace.
We've found that this model where anybody can join and everyone can be as involved as they want to be has worked pretty well for us.
Why you should try the polyglot approach
If you're in this room right now, it's probably because you're somebody who cares deeply about building and sustaining data communities. And if that's you, then there's probably a good chance that one of the following scenarios applies to you.
Maybe you're already part of an amazing group that's thriving. Maybe you're just starting out, and you're wondering how to get your new group off the ground. Or maybe you're in the third camp, and you're part of a group that might be struggling a bit to stay afloat.
Maybe you're in a smaller city or a smaller workplace, and so the pool of potential people that could possibly participate in your group is just a little bit smaller.
If the second or third one applies to you, then I would really encourage you to give the polyglot approach a chance. By doing so, you're multiplying your potential members, you're multiplying your potential impact, and you might also learn a new programming language along the way.
At the end of the day, it shouldn't be about R versus Python. It should be about creating a space where everybody feels welcome and is excited to learn from each other.
At the end of the day, it shouldn't be about R versus Python. It should be about creating a space where everybody feels welcome and is excited to learn from each other.
So regardless of which of these scenarios applies to you, I hope that my talk gave you some ideas about how you can create a community that feels like a Disney world of data science.
If you're interested in creating your own polyglot community and you wanna connect to chat about it, I'm happy to take questions in just a second, or you can find my information up there and we can chat later. And if you're somebody who likes to collect hex stickers at conf, and you want your very own RPUG sticker, come say hi afterwards and I can give you one. Thanks.
Q&A
Thank you so much. We have lots of questions here. Several of them are interested in how you got started with the RPUG group and the trajectory of asking people to join and how you gathered the first group of people in order to start the group.
Yeah, great question. I'm probably not the best person to ask about that because the group started in 2017 when I, like I didn't work at Statistics Canada back then. So I wasn't there for the early days of the group. I joined StatCan in 2021. So by that point, there were already a few hundred people in the group, so yeah.
Okay, but if you were to generalize that and give some advice about someone starting a group from scratch, would you be able to offer any things you've learned so far about polyglot groups to help someone get one off the ground?
Yeah, good question. Yeah, so I think the biggest tip that I would have is kind of like the section of my talk where I was talking about doing things that are fun. I think the mistake that a lot of groups make is they only focus on the technical side of things. So it kind of feels like this is just a thing that somebody is doing because their career is related to data.
But if you want people to feel truly connected, then having the fun stuff as well, I think is really important. And also just recognizing that not every person is gonna want to participate in everything. So there's gonna be some people who they're only gonna want to do the stuff that is productive in the traditional sense, like workshops or whatever else. And then there's gonna be other people who only want to do the fun stuff, so.
Really good advice, thank you. Very important question, which is your favorite RPUG variant sticker?
Ah, okay, that's a great question, and I have to be careful about my answer because Alex has made a lot of them and she's sitting in the third row here. So my favorite one is probably like the reindeer, like Rudolph, it's called Rudolph the Red-Nosed Pug. And I think that might have been, was that the one that won, there was like a competition this year where people voted for their favorite ones and I think that one, was that the winner?
Do you have trouble sharing demos between teammates for those teams with sensitive data that can't be shared with other teams?
Yeah, good question. So almost all of our meetings are internal people only. So even if we can't share like the specific data that somebody's working on, they can probably share the code that's on GitLab. We can't put any data on there anyways, so everything that's on GitLab is already abstracted away from the code. So I wouldn't say we have too much trouble with that.
Is the group all in Ottawa or are some remote? And if the latter, how do you keep the remote people engaged particularly with fun events?
Yeah, that's a great question. No, not everybody is in Ottawa. We have like offices in a bunch of major cities across Canada. So we have some events that are in-person only like the trivia nights or the beads and brews, but then we have our monthly meetings as hybrid, but we also have some events that are virtual only. So we have like our pug virtual coffee hour that's entirely online.
But I would say like that's probably an area where we need to kind of like work on things a bit more. I think it is like the Ottawa people who are most excited about the group. But the other thing that we do as well is like there's been a few times where we've sent like stickers or other merch to offices outside of Ottawa, so.
