What R We Counting? (Ben Arancibia, GSK) | posit::conf(2025)

Transcript#

This transcript was generated automatically and may contain errors.

All right. Well, thank you. You know, as mentioned, I'm here to talk about what are we counting. I had to do it, where I posit, had to do a little pun.

So, at some point in your open source journey, someone is going to come to you with a question. Might be a vice president. It might be a CEO. It might be some other stakeholder. They're going to come and ask you a question. And that question is really, really, really simple. They're going to say, hey, how is open source adoption going? It's a really simple question.

But it is incredibly difficult to answer. And when I see questions like this, one thing pops in my head. And that is Tom Hanks in the movie Apollo 13 with his famous line, Houston, we have a problem.

Now, you might be thinking, Ben, why are you here? Why are you here asking, telling us about this question? And also you might be asking, Ben, why are you wearing Beetlejuice looking pants? The second one, I just like them. The first question, it's because I lead open source adoption at GSK R&D. This question happened to me. A VP came and asked me and said, hey, how is open source adoption going? And similar to the hypothetical I just gave you, Houston, we have a problem flashed in front of my eyes as I tried to fumble around with holistic measures and things like that about, hey, how is open source adoption going?

victims of our own success because our VP went from Kevin Bacon to Ed Harris. And he or she did not really care about what it is that we put together and seeing the current state. They cared about, all right, what can they actually do more of and what it can do.

Something I didn't tell you in the beginning, I'll tell you now. We pulled all the R metric and language statistics information, but we also pulled a ton of other data. So things like commits, pull requests, repo life, repo pull request reviewers, irregular commits, unmerged pull requests and package names. The reason why we did that is because we wanted to experiment and do some other dashboards as well as answer some other questions that might come up in a more preemptive manner. So there are two other projects that we did out of this. They're a little bit more on the innovation experimental phase, so less robust than the current open source one. But I'll tell them to you now.

One project that I'm very proud about that I think is very cool is GitHub health. If you have ever tried to get people to use GitHub in an organization, it is incredibly difficult. It's a steep learning curve and it's tough. The key thing about support is how do you go from a reactive support to a proactive support. So the question that we have tried to solve is can you proactively find studies that are struggling with GitHub and support them? And we can. There's a lot of different metrics and things that you can pull, like, for example, merge conflicts, the number of merge conflicts that haven't been resolved, the number of commits, number of branches, things like that, that indicate that a study is struggling. And you can actually proactively reach them and say, hey, do you need some help? They're very appreciative of that. So that's something that I think is very cool, something that I'm very proud of.

The other thing is the tool catalog. One other question a VP is going to ask you is going to be a question around finance. Return on investment. It is a dreaded question. But it's a question that you need to be prepared to answer. So we spend a lot of time building tools. Do our users actually use them? That is a question that is going to come up. And so what you're going to have to answer is what tools are actually being used, and then what's our return on investment for either internal tools or open source tools? Through the GitHub API and some regex, you can pull package names out of scripts pretty easily, and you can get an idea of what is actually being used. So for example, one of the things that we use a lot in GSK is the admiral package. You can argue we probably need to invest more time in the admiral package because we use it everywhere. But this type of tool or this type of dashboarded data allows us to have those really important finance questions because it's going to come for you at some point during that open source conversation.

So one of the things that I think is really cool about this work through our GitHub API and whatnot is we were able to move our stakeholder, our VP, from, hey, how is open source adoption going, to the bottom. I know exactly how adoption is going. And that feels really good. Probably not as good as mission control when Apollo 13 landed back on earth, but equally as good. And one thing I want to highlight here is mission control, there's a lot of people here. There's a team. And just like that, at GSK, we have a team that worked on this. So I just want to thank them. Becca, who's in the audience. Hi, Becca. Alana, Aladri, Hamza, and Hashir. Again, it's really crucial that we have a team to be able to do this. And so just really want to highlight that. And that's it. I'm happy to take any questions that you might have about open source adoption or if you want to connect on LinkedIn, go for it. I feel like you have to put them on there. But, yeah, I'm an open book. So thank you.

Q&A

Okay. Thank you, Ben. Great presentation. We have a few questions that have come in from the audience. So the first one is from Scott. And Scott asks, how did you come up with the two specific goals, all open source central tools and 50% code written in open source?

So the reason how we came up with those commitments was there's a big transition in pharma to adopt open source. We just prefer it in terms of workforce that's coming out of schools, especially like stats programs, things like that, as well as we think it provides us more innovation. And honestly, some of those proprietary tools are really expensive. So that's why we decided to make those financial as well as other, I guess, call it holistic decisions for those commitments.

I think choosing targets is hard, but why 50%?

It sounded good at the time. We had 50% as sort of like the base, and then we have a stretch of 70%. We were just like, that sounds good. We'll put our finger in the wind and see how it goes over two years. So the beauty about goals is you can always change them later on if it's not going well or if it's going really well.

Okay. The next one is from anonymous. What are the main reasons practitioners resisted adopting open source?

What is the main reason? Well, it's hard. If you've been coding in a language for 20 plus years and you're told, hey, you can't use this tool anymore, you have to think about data in a different way, you have to stop using macros and start thinking about packages. It's hard, and I don't think we should overlook how hard it is for people. I think we have to have a lot of empathy. Even if you've only been doing your job for five years, it's hard to change, especially with really tight deadlines. In the clinical trial projects, we have really, really insane deadlines. So it's just hard. I think we have to have a lot of empathy.

It's hard, and I don't think we should overlook how hard it is for people. I think we have to have a lot of empathy. Even if you've only been doing your job for five years, it's hard to change, especially with really tight deadlines.

Okay. This is an interesting question from anonymous again.

Okay. Thanks, anonymous.

Did closed source vendors try to push back when they saw how things were going?

Of course. Depends on who in the closed source vendors saw it, whether it's sales, technical, whatever. But, yeah, of course. I mean, it's a business, you know? Like, that's just how it is. But I think you just have to have backing from your leadership, and you have to have the desire to want to have those tough conversations. I would say if you're not willing to have that tough conversation, maybe you are a little bit too early in your open source adoption journey. But you're going to have to have it at some point. But I feel like that could be a totally different talk.

Okay. Let's do one more question. This is from Raphael. Was there already a culture of versioning code with GitHub before our adoption? If not, are you measuring all the old slash new SAS and new R code that exists elsewhere?

I mean, you could argue maybe there's not a culture now. It's tough. Let me be very frank. GitHub is really, really difficult. Because you are taking people from a paradigm where they save their code, and they don't want to share it unless it's 100% totally done. With GitHub, obviously, we want to do commits frequently, almost like an auto save function. So it's very, very difficult to get people to make that transition. In terms of before GitHub, we did everything on file shares. We saved it that way. So think of, like, you know, one drive essentially for studies. I would say we our version control is very good in the sense of, like, people are now pushing. It's not perfect. We still have a lot of bumps along the way. But it's just kind of how it is, I think, when you need to teach someone GitHub in a big organization. So.

Okay. Well, thank you, Ben. Really appreciate it. One more round of applause.