Resources

Uniting the pharma industry with data science | Ross Farrugia | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We'd love to see you! We were recently joined by Ross Farrugia, Data & Insights Engineering Product Family Lead at Roche/Genentech, to chat about Pharmaverse, open source in pharma, career progression in the industry, and the impacts of AI to the pharma industry. In this Hangout, Ross tells us all about the Pharmaverse, an open-source initiative he co-founded, which aims to unite companies in the pharmaceutical industry. Instead of independently repeating similar work, companies co-develop and share the effort for building, maintaining, and testing code, especially for clinical trials and submissions to regulatory bodies. This collaborative approach, primarily built around R but open to other languages like Python, helps accelerate the delivery of drugs to patients and improves the review and approval process by health authorities. The Pharmaverse fosters a strong community, allowing for shared intellectual property (IP) and increasing trust and sustainability of packages, as demonstrated by the Admiral package with 10 co-development companies. The community, which includes nearly 2,000 people and 400 code contributors, democratically decides on package inclusion, connecting individuals across the industry to supercharge development efforts. Resources mentioned in the video and zoom chat: Pharmaverse Website → https://pharmaverse.org/ Pharmaverse Slack Community → Access via https://pharmaverse.org/FAQ Posit Podcast: The Test Set → https://posit.co/thetestset/ Monthly Workflow Demos (Orbital package) → https://posit.co/workflow-demo/seamless-r-python-model-deployment-with-snowflake-and-orbital/ Book Recommendation: The Situational Leader → https://store.situational.com/collections/books/products/situational-leader-book?utm_source=google&utm_medium=ppc&utm_campaign=CLS+Store&utm_term=the+situational+leader+book&gad_source=1&gad_campaignid=21067538390&gbraid=0AAAAACkXJNBKGqigIGr_i6NfE2yjK6IZr&gclid=CjwKCAjw1ozEBhAdEiwAn9qbzXXYLJFHz9wJSI6iVuEPZ-zO2NcrqjOTlPSOyTfRqWOvgT_zA99pDRoCsNEQAvD_BwE#gad_source_1 Libby's Podcast Recommendation: Change Technically by Cat Hicks & Ashley Juavinett → https://www.changetechnically.fyi/ Phuse (A Global Healthcare Data Science Community) → https://phuse.global/ Ross's Blog on Contributing to Open Source → https://pharmaverse.github.io/blog/posts/2024-03-11_tips_for__first_.../tips_for__first__time__contributors.html If you didn’t join live, one great discussion you missed from the zoom chat was about top tips for contributing to open source (though we talked about it quite a bit live!). Ross and community members emphasized starting small, learning basic Git, and contributing not just code but also by commenting on issues, raising bugs, and providing feedback. It was highlighted that helping write documentation, such as vignettes, can be a valuable contribution for newcomers, as fresh eyes can identify gaps that developers might miss. Yes, your fresh, newbie eyes are so valuable! ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us! Timestamps 00:00 Introduction 03:50 "How is Pharmaverse different from a CRAN task view?" 06:20 "How is Pharmaverse curation handled?" 08:29 "How to convince competing companies to collaborate on open source?" 13:07 "What does career progression look like in the pharma industry?" 16:21 "Is it possible to break into pharma without prior experience?" 18:42 "Are there roles in pharma for open source package development?" 21:50 "Thoughts on the merging of R and Python communities in pharma?" 27:30 "Top tips for contributing to open source?" 31:40 "Is the pharma industry slow to adopt new software?" 35:25 "Focus on randomized control trials versus observational data?" 40:40 "Is clinical trials experience a requirement for entering pharma?" 41:27 "What career advice do you have for us?" 44:13 "Book and podcast recommendations" 46:26 "What data science professional bodies would you recommend joining?" 48:51 "How will AI influence the pharma industry and Pharmaverse?" 52:56 Wrap-up

Jul 25, 2025
53 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I am so excited to welcome our featured leader for today, Ross Farrugia, Data and Insights Engineering Product Family Lead at Roche slash Genentech. And I will let him explain what that slash Genentech means. Sure. Hi, Libby. Thanks for having me along. So yeah, the slash Genentech is our kind of U.S. company, but we merged with Roche many years back now. So most of the folks in U.S. will have known Genentech, but you might not have heard of Roche so much because it's a Swiss company. In the rest of the world, and especially in Europe, Roche is a lot bigger name.

So my role involves supporting kind of overseeing a lot of data science tooling in the clinical reporting space. So all of the tooling that we use for analysis about clinical trials. So I sponsor packages in the open source space, such as Admiral, or maybe people have heard of Teal for helping to create Shiny visualizations. And in my spare time, apart from chasing around my young children, I like to play football or soccer as the U.S. folks would call it.

Introducing Pharmaverse

Yeah, absolutely. So if anybody hasn't heard of Pharmaverse, this is essentially an open source effort for the pharmaceutical industry to bring companies together who, in the past, we were all doing very similar things in this post-competitive space. So once the drugs have been identified, essentially you have to run the clinical trials, and that unites the clinical trials, build the submission packages that are going to go to the health authorities. We're lucky that we have very harmonized industry level standards from an organization called CDISC. But yeah, we were all still working in silos, doing very similar activity, but each company repeating the same work.

So Pharmaverse was an effort built initially around R, but we're also open to Python and other open source languages. And we've grown to have eight council companies involved. We have 400 co-contributors. And in our community, we have almost 2,000 people and many more using our packages. So essentially, bringing companies together to, instead of repeating the same work independently, we now co-develop and share the build effort, share the maintenance effort, share the testing effort, to build code, which is ultimately going to help to bring all drugs to patients a lot quicker in the future. And it's not just us that can use the solutions. Also, the health authorities are open to use them. So it even helps to improve and accelerate the review and approval process.

Pharmaverse vs. a CRAN task view

Yeah. I mean, hi, Ross. One question I had was like, people might be familiar with a CRAN task view, where basically you're saying, here's a set of packages that are associated with a public or a specific data area. So how is Pharmaverse kind of different from a task view?

Yeah, that's a great question. What we realized, though, is in the early stage of our adoption across our industry, we needed a little bit more. We needed to build a community around this. And actually, what we wanted, instead of having just packages where individuals from solo companies were kind of putting something out there, as well as I do in our industry, takes a little bit more for you to adopt and trust. So say Pfizer put out a package, and my company, Roche, we would look at it and think, hey, do we really want to trust that? Do we really want to embed that in our workflows that ultimately are going to go towards submissions? But what about if we can come in and work with Pfizer and we can share the effort, we can become co-licensed holders and ultimately share that IP, it's a lot more likely then that you're going to take ownership and you're actually going to believe and trust that.

And then it gets to the point where you have things like Admiral now, where we've got 10 co-development companies behind the Admiral package. And then companies that aren't even co-developers, they're more likely to trust it because they're saying, hey, Roche and I are going to pull out now. Or even if they do, there's nine other companies there, that are maintaining this. So with something like open source, where I think the fear is always that it's not going to be sustainable. And people are going to put something great out and it'll just be one person behind it. And then six months later, they'll find something new to attract their time. And then it disappears. With a community like Pharmaverse, for a start, there is such a package, but we already like it. We can find members of the community that can help to maintain it.

But generally, we prefer that from the start, these things are built with numerous companies, numerous individuals, and then you tend to get better solutions because they all bring their individual experience and you build something that's not just fit for one, but potentially fit for all.

Do you have some curators or some kind of people providing a little bit of overview? Yeah, we used to have working groups dedicated to different areas like data engineering and tables and graphs, visualizations, and they would be the ones that would ultimately kind of decide what goes in the Pharmaverse. And we started off trying to be quite opinionated, right? We didn't want lots of repeat packages doing the same thing, things like that. Actually, over time, what we decided was let's just democratize this across the whole community. We've got like almost 2,000 people in this Slack. Let's let them all have a say. So now if anybody approaches us and they say, hey, I wanted to add my package to Pharmaverse, I'd like to come in and work with you, we open it up to the whole community to say, hey, do you like this? People comment, people say, oh, it would be really nice to fill this gap or to enhance it this way. And then quite often, the maintainer of the package says, okay, come and be a contributor, come work with me.

And yeah, it's kind of nice because a lot of the time what we're doing now is just connecting people across the industry that they were going to build that solution anyway. They didn't need us, but we can help them to supercharge it by bringing more people into that work. So it's become like it's lightly curated because as long as the community are happy, we're happy. We do have some inclusion criteria, of course, as well that we try to monitor. But yeah, generally, it's becoming to the point where it doesn't need us so much now. Like in the early days, we were the ones kind of banging the drum and make sure now it's become like a really natural kind of ecosystem and a natural community, which is what we always hoped it would be.

Getting competing companies to collaborate

So I was just curious about, you know, I love this idea of sort of taking competing companies but coming together to solve a common goal. And I'm just curious how the heck you did that. How do you convince people from competing companies to be like, yeah, we'll work on this together?

Absolutely. So when we first spoke to our legal group, actually, and leadership's another thing, company leadership. But we went to legal first because the first thing any company feels is, OK, what's your actual intellectual property? What's the stuff that, you know, is actually kind of making you money around and you don't want to ever open that up to others. And we had a really great IP expert lawyer and he's very progressive thinking and very kind of an open source enthusiast. And the way that he talked us through is that, OK, what value is there to open source and are you by open sourcing, by bringing this together with other companies, are you ultimately going to get more in return than what you invest? And we said, look, even if only if we do this and only one company across the whole of our industry steps up and gives what we're given, then we cut our development and maintenance budget by 50 percent. So we just need one.

development and maintenance budget by 50 percent. So we just need one.

So, OK, go find that one. Go find someone who can meet you halfway and let's give this a try. Let's make a pitch to our leadership to show that this has potential. We were very lucky in that a colleague from GSK actually was on LinkedIn and was messaging similar and made a connection with somebody from my team. And it kind of snowballed from there. So we thought, OK, let's prove it. And the first piece that we really looked to do this in was a piece of work where we hadn't actually put pen to paper. We hadn't developed any code at that time. So it was like a no risk strategy, right? Either it doesn't work and we're no worse off than when we started or through working together as one team, not like, OK, we'll do half the work over here. You do half the work over there and then we'll throw it over the fence. No, let's work openly together on public GitHub repo where we're all in it from the start, open from the start.

And we're just amazed. Firstly, that one company really stepped up and met us halfway. But then what we did is we reached out to leaders from across the industry and said, hey, we have this idea. Would you be willing to bring people forward? And you don't have to develop right now, but just test out what we're building. Tell us, is this suitable for your needs? And we asked a number of companies. We were amazed that over 10 companies stepped up and ended up with like 40 people testing. We had thousands of comments, loads of really useful things that helped us. And this was just one package and really helped us to go into that design.

And from there, we started to reach out and understand, well, there's already a number of other companies doing similar stuff in other areas of the work that we do that we're also interested. So once you get one win, you show your leadership, they buy in, they invest in you more, and they start to make more and more connections. And you find the like-minded people. They're going to be whatever industry. They're going to be people that believe in the greater good and that a rising tide lifts all boats. Sometimes it's just finding them and having that time.

Career progression in pharma

Ross, what does career progression look like in your industry? Where can data scientists go for either leadership or IC roles?

Yeah, that's a great question. And we get this a lot in our industry because it's quite a technical career, you know, data scientists, but there's obviously a lot of kind of people leadership naturally needed. There's a lot of kind of project management tasks as well. So you can go numerous routes. We all tend to start with the technical route. We want to learn the basics. We want to really understand clinical trials and how to do analysis. And then as we progress, you tend to go one of two ways. Either you're progressing kind of leadership where you start to kind of oversee teams of people, project manage deliveries, and then you may be going to people line management, very different, and or maybe go on the more technical route where you get more and more advanced technical skills and you take on larger projects. So instead of writing code per se, it's going to be used by a team of five people, you end up writing code that's going to be used in a product for 1000 plus people.

In my personal journey, I went the more the leadership route. So I became a line manager in our statistical programming group. I was there for five years, I really, really enjoyed it. I loved working with people, helping them to develop and helping them to make the best of themselves. But actually, it got to a point where we had a reorganization. And I got an opportunity to move sideways into more of a product leadership role and into a very technical group. And I've really enjoyed that as well, because I've stretched myself so much, I moved away a little bit from the technical side, getting back in now and seeing how in just those five years, all the tech has advanced so much. And now we've got AI coming and all these great advancements that actually enabled me to get back to some of my earlier passion and get back involved there, whilst also still overseeing teams still leading product deliveries, but not having the people management side of things.

Breaking into pharma from outside the industry

Is it possible to break into the pharma world if you are a data scientist slash statistician with no experience there? So on the people side, like one of my proudest things of Pharmaverse is that Pharmaverse is there for the patients. Ultimately, we're doing this to help to speed up all drug submissions, not just the ones from Roche/Genentech. We're trying to help all companies to do submissions quicker and ultimately get treatments to patients all over the world a lot faster. Actually, on the people side though, I've been really like proud to see where folks that aren't in the pharma industry have heard about this, and especially kind of younger generations, like, you know, they really want to see open source thrive and pharma doesn't have the greatest reputation for many reasons.

So it's really nice to see a younger generation being attracted to our industry, and then through the open source collaborations, actually finding opportunities to jump in and learn the skills that will then be able to go on to their resumes and help them to find employment. So it's very nice when you see somebody says, hey, I've been learning R, I've done it in academia, but I have no idea of the standards that we use, the kind of data standards that we use. I have no idea what you mean by clinical reporting, building analysis data sets called Adam, and, you know, very specific industry terms, but because they're passionate and they know enough about R, they can add some value, they get started with the good first issues, the, you know, the real beginner kind of things, and naturally the teams that they join are then willing to give back to them and to help say, hey, go read, there's open training where you can learn those skills here, and people actually helping them to upskill in those new areas.

Tips for contributing to open source

Yeah, great question. So Libby, I do actually have a blog on this topic and in the Pharmaverse blog site, so I'll share that with you later, and it'd be great if we could share that link out with everybody after. Essentially, I see some folks commenting, definitely start small, find something which you're passionate in, or you have some background experience, don't feel like you have to be an expert, because you learn as you go, and be patient with yourself. I think knowing some basic Git is very essential, so if you're coming at this totally new, you don't know any background, learning Git, because most of the, especially Pharmaverse, so most of the open source work I'm involved in will be on some kind of Git open organization, so understanding Git is the way.

You don't have to give code, as you said, so one really great way to start is just by commenting on issues, by trying out packages, by raising issues, raising bugs, giving feedback, that's a really good way to make connections as well, because often it's the connections you make with the teams that then lead to them wanting to invest in you and help you to develop. Also, when you do come to take your first issue, don't, I made this mistake, I went and took an issue that ended up kind of affecting like five different R scripts that needed a bunch of testing, and it was totally the wrong thing, because I almost got scared off from the start. Luckily, I had really patient people around me, they helped me to understand, I made a bunch of mistakes, and they were okay, like this is part of the learning curve, so they coached me, and they helped me to understand where I'd gone wrong, but if you can start with like a really small issue, so something like adding a few unit test cases, or something like that, where you can help them to beef up their code coverage, or start really small, or the documentation kind of issues, and then build up, build confidence, get to know the team.

One of the best things about kind of open source is that you actually get to meet people from all different backgrounds, all different companies, and you do become like, you feel like an actual team, whereas when you work internally with people, you know, hey, they're my colleagues, I'm meant to work with them, you're all putting your own time, you're all doing this through passion, and it makes such like a nice camaraderie, you're not there because you're getting all paid to do that, and paid to be together, you'll get there because you have a shared interest, so enjoy the fun of open source as well, and embrace those kind of relationships that you build. Yeah, the biggest thing is just don't be scared, just put yourself out there, and people will be patient, they'll understand that you're on a learning curve, be humble, be open, be transparent about where your gaps are, people will help you to get there, they certainly did with me.

R, Python, and language diversity in pharma

How do you feel about the merging between the R and Python communities when it comes to pharma? I usually see whenever I talk to my pharma folks or anybody in the FDA or anybody over there, they're using either SAS or R. I've kind of seen this trend recently, especially with the Apache Arrow ecosystem with analytics. You're really starting to see the language wars not really exist. It's like open source or proprietary is kind of what I'm starting to see. Do you feel like that's kind of something you see on your side of the fence as well in R-Land?

Yeah, great question. One thing I'd say is I like to say I live in R-Land. I like to live in open source land. You can absolutely open source that code as well and build into collaborations around SAS. So yeah, I'm not here to bash SAS in any way. What I see a lot of movement in our industry is we used to call ourselves like SAS programmers or statistical programmers, and we used to be like very narrowly focused. The beauty of this is just purely making a step outside, think outside the box, and actually open ourselves up to new languages because we know there's going to be new languages advancing all the time. So really when we're calling ourselves more like data scientists now, it's just about kind of opening ourselves to becoming multilingual.

So a lot of companies in our industry aren't actually abandoning SAS. They're just entertaining R and saying, okay, what are the areas like Shiny where it adds real value? And then also, we're looking at things like the Pharmaverse packages where there's a lot of reuse, but it just saves us doing something ourselves and building our suite of SAS macros that we would have had to build and maintain ourselves in the past. So a lot of people are using SAS and R, but you mentioned Python as well. Python, of course, it adds value in numerous places like machine learning and other, and sometimes in building certain systems and tools, Python is way beyond R. Maybe not always. Some of the statistical analysis pieces are naturally seeming more advanced in R or more easy in R. Some of the visualization pieces in R we're really liking, but actually there are definitely scope and people in the team that I work with are definitely building tooling in Python.

One thing that I would say though is, people say, hey, yeah, and Pharmaverse is open to Python as well. We've got a couple of Python solutions, but some people say, hey, why are you going so hard into R and you're not going to say, let's all learn R and Python at the same time? For that, you have to kind of understand the culture of the group of people that we're working with are predominantly being SAS programmers for many, many years. Imagine if like the way we solved this change was, hey, you're going to become data scientists, learn R and Python in the next six months, and then we'll tell you which one we're going to use and which task we're going to use. It would have just scared people, right? Because learning one new language, especially if you're kind of older, like I am, I've been around for a long time, and then I used that for many years, it was daunting for me to think, hey, I need to learn that language. They said, hey, you've got to learn two languages. That's like a completely different selling proposition and it would have affected our change management.

Yeah, definitely. So I think learn from the people around you, and learn from the people that you respect, and learn pieces of their behavior, and the way that they do their work, that you want to aspire to replicate. I used to see great presenters, and I would just watch them enough hours, and pick little things of the way that they did presentations. I also worked under great leaders where, you know, people would be overseeing kind of hundreds and hundreds of people, but they would still make you feel like they always had time for you. They would always be super approachable, no matter how, what level of seniority they grew to. I thought, wow, I really want to be that approachable, no matter what level I get to in this business.

Also, I saw people go the other way, and I saw things where their behaviors, I thought, wow, I absolutely don't want to emulate that. There's a lot of egos in leadership, and there's a lot of people actually think, wow, I wouldn't want to work with that person. So I always, you know, try to then stay humble, and try to stay approachable, and try to actually know that you don't need to be the cleverest person in the room all the time. Actually appreciate people around you, understand where they're better at things than you are, and learn from them. And especially when you become, like, older, and you become more, you kind of go up some kind of career ladder, it's sometimes, it's very easy for you to rest on your laurels, and think, hey, I've learned all I need to now, I can survive. Actually, go speak to the people that are coming in, the junior new starters, and they're coming in with things that they studied, which weren't even, you know, existing when back when you studied. So they're coming in with all these new great areas that you can really learn from, but be open to that, and don't feel like, you know, you ever need to feel embarrassed about asking kind of questions.

People will respect you more if you show that humility, and that vulnerability. That's how you build human connections, and I think that that's the biggest thing that I'd say, especially coming from, like, a technical background. I think, don't forget that the human connections are the most important thing about everything we do.

I think, don't forget that the human connections are the most important thing about everything we do.

The book I recommend is around, like, situational leadership, so going around situational leadership 2.0 is really nice. So when I first became a people manager, I was like, what kind of leader am I going to be? What kind of manager am I going to be? And then, actually, somebody who is, like, a great kind of mentor of mine, he was kind of coaching me, and he said, you need to be the leader that each individual needs you to be for them. So it's not like one size fits all. You can't just say, hey, this is my management style, and that's going to work on anyone. So I did a lot of reading about situational leadership, and that, for me, really helped. You're going to see that, you know, you can have two people, and they can be the similar level of experience, but you need to kind of support them and coach them in totally different ways. And I think that's something that really kind of stuck with me.

AI in pharma

Yeah, great question. So I can say like, it's already started to change. I think the majority of people working in data science in pharma now are using AI regularly. I use it daily. So firstly, it can be a great coding assistant. So whether you use something like a copilot and it's there in your computing environment and day-to-day you're interacting with it, or it's just something like Gemini, where it's kind of something you're chatting to and you're having that kind of discussion and that context. I personally prefer the discussion mode because I like to see, usually it's my prompt, the bad thing. I haven't given it enough information. I've assumed too much knowledge, but quite often with some back and forth and I'm going to see through my knowledge and my experience, I know the bits where the hallucinations and from the right discussion, I can get out some great code that I can reuse or something that can save me a lot of time.

So, but generally in our industry, it's kind of a couple of things. So one is around kind of knowledge management and trying to kind of help that there's so many processes, so many things that you have to learn in a highly regulated industry and the human brain can only absorb so much. So having AI kind of go over the massive amount of standard operating procedures and documentation that we throw at our people, it really helps for kind of chatbots and quick Q&A. Then also there's the coding assistance side. It's definitely helping people to learn new languages and to become more confident in coding.

But the most interesting thing that I'm looking forward to is as we're starting to kind of use these multi-agent approaches and kind of workflow automation combined with AI, because we're finding that now that we've got like this foundation of these amazing tools like the Pharmaverse, a lot of our work you can kind of harmonize and standardize to the point where you can automate, but then there's always like the 20% which is like study specific, protocol specific, and you need that bit of context or understanding or you need to make a choice. So if the AI can make predictions there, well ultimately you automate the bit you can, but then the human is in the loop to ultimately make the final decision, that's the bit that's kind of taking us real forward.

Because yeah, automation kind of works so far, but then it always hits a bottleneck where you say we can't automate that because it's always going to need some human input. But when you combine that automation and AI, the AI can make a prediction if you give it enough context. But as long as the human's there to kind of ultimately decide is that the right prediction or to correct the AI where needed, I think that's really exciting. So I see that it's basically speeding us up. It's an accelerant. There's a lot of people out there fear it's going to replace us and yeah, natural people, yeah I understand people's fear, but for me I see we could do so much more. And in a field like data science where our demand is only growing, there are so many other areas outside of clinical trial reporting that my company are crying out for data science skills, but we just don't have the resource because we're so focused on delivering our clinical trial submissions. So anything we can do to speed up and to make it easier to do our day-to-day work, it's opening us up to actually solving more fun problems, solving new areas and bringing new areas of value to the business. So I find it a really exciting time.