Resources

Open Source in Clinical Reporting | A Conversation with Ben Arancibia at GSK

Posit's Director of Life Sciences, Phil Bowsher, sat down with GSK's Director of Data Science, Ben Arancibia, to discuss various topics within the open-source clinical reporting space. More about Posit's work in the pharma space: https://posit.co/use-cases/pharma/ Watch GSK's latest web event with Posit: https://www.youtube.com/watch?v=xDrt6txplek

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Ben, welcome to Posit Conference, thanks so much for coming, it's great to see you as always. Great to see you. You came in for the summit on Sunday, you and some colleagues, awesome, well I'm so glad we got a chance to connect and I really want to highlight the awesome work that you're doing, your team's working on, and just let the community know about the things you're doing.

But I think like, before we jump into that, I'd love to take a few steps back into like how it all started, like how did you get into pharma, what did you do to Bridget and of all things like Shiny too? Yeah, yeah, for sure. So how I got into pharma, it's a weird story. So I'd been a data scientist for a long time, I'd worked in consulting and different things like that, and had lived that consulting lifestyle, you know, fly out Monday morning to some place and then fly home kind of Thursday night.

So I think you and I have similar backgrounds with IBM and everything. And one day GSK, they reached out and they said, look, we're trying to build a data science capability, we'll teach you pharma, you teach us data science, and you'll be part of what's called the ESPRIT program, which is like this executive leadership training. And you do some rotations, and we'll find you a job eventually in the organization. So that was kind of my entrance into pharma.

Like what really appealed to me, why I wanted to work in pharma was, I had a lot of great tools and toolkits with data science, open source, cloud platforms, things like that. But I hadn't found kind of like that passion area, that passion industry. And a lot of my family, so like my mom, her brothers and sisters, her parents, they've all had cancer. At some point, I'm going to have cancer probably. So being able to work in an industry to actually show impact, I don't need to know the science, but I can, you know, work on the pipeline or help people move assets through our pipeline to, you know, actually have impact on those patients, really hit close to home. And that's kind of really what appealed to me and why I wanted to join the pharma org.

So being able to work in an industry to actually show impact, I don't need to know the science, but I can, you know, work on the pipeline or help people move assets through our pipeline to, you know, actually have impact on those patients, really hit close to home.

Bringing agile practices to GSK

It's such an awesome story of bringing data science into pharma. And the work that you were doing before was GIS. GIS, cybersecurity, yeah, just trying to solve problems. And it's amazing to see the influence that you're having in the community and at GSK on work like agile and better practices around software development. Did you bring that with you into that role from?

Yeah, so when they initially hired me, the idea was for me to really focus on helping them think through like how to build our SCE, our scientific compute environment, so cloud architecture and things like that. And kind of the way it played out, it was, there's actually a big niche for like, how do people actually think about user design? How do you think about actually coming up with values within development teams to interact with users? How do you think about, okay, we built something, but just because you build something doesn't mean it's going to be adopted. So how do you actually think about putting in those support structures, but also at the same time using those support structures as feedback to improve our data products? So that was kind of the niche I found.

So eventually just kind of building out tools, you know, obviously we still, or I spent a lot of time working, coding up things, working with different people, but at the same time, really leaning on support structures in order to understand sort of what do users need? Because at the end of the day, with this big transition that we're making, we have to make our user feel as least amount of friction as possible in order to be able to kind of adopt our different open source tools.

And you work with such a rockstar team. I mean, Christina Fillmore and Andy Nichols and Ellis and Becca. What is that like to have such an amazing crew around you? I'm lucky. I mean, they make the job easy because I think what is really crucial is it's not only the individuals to be able to go out and talk to people, but it's like who can actually take that user feedback and then turn it into a reality. And I think being able to pair teams together, not only like a very solid engineering team, but also a very solid kind of user feedback support, you know, someone to, you know, consulting or voice of the customer, if you will. Being able to merge those two together is what allows us to really make those very solid data products with such a small team. Because at the end of the day, we are not big, but we're small and agile.

The Accelerate R program

Can you talk a little bit about what was the spark for the Accelerate R team and how that came about? Yeah, sure. So Accelerate R is one of my, I guess, things I'm most proud about. I've talked about it here at Posit. I've talked about it in some of the other conferences. But I think the thing that we saw and realized is we were doing tons of workshops, tons of classes on R. And then all of a sudden, no one was using it. And we were trying to figure out for a long time, why is no one using R?

What we realized is a user or an individual within our Biostats organization, they would take a training, they would do a workshop, but they actually wouldn't be able to get the ability to use our open source tools until 12, 18 months later on in their study. And I can't remember why I had breakfast. No one's going to remember what do I do with Tidyverse or anything 18 months down the road.

And so the spark was really thinking about not training in the sense of how do we train people but thinking about training and then learning. And I think that's a really important concept because I think training is really easy. You find some documents, you create the documents, you create whatever it is, and people can go and do it. But how does someone learn? How does someone learn while trying to deliver against deadlines? And that is a really important question to think about because we're not university students. We're not college students. We don't have the ability to spend a semester to learn something and then go apply it. We have to learn on the fly.

And starting to think about our user in that different way about how does someone actually learn within an organization is really crucial. And so that's why we started Accelerate R. We go and sit with a clinical study team. It's a very intense eight to ten weeks where we're literally training them and then we're using those eight to ten weeks to learn about what's wrong in our workflow to then build a tool during those eight to ten weeks. So it's not only us supporting people, teaching, having them learn, and then upskilling those capabilities as we make our transition, but it's also giving us the feedback that we need to figure out what are the actual tools instead of us making guesses. And I think that balance is really what's leading to a lot of success for us.

And so the spark was really thinking about not training in the sense of how do we train people but thinking about training and then learning. And I think that's a really important concept because I think training is really easy. You find some documents, you create the documents, you create whatever it is, and people can go and do it. But how does someone learn?

And I think it's such an impactful way to get feedback from your users because it's created and the output of that has brought forth some really important packages like TFORMAT and Slushy where you identified challenges and issues that your user base had and instead of trying to find a workaround or something from the community, you said, hey, let's build this and solve it for them. Exactly. And I think the thing that is great is because of sort of our time box engagements, we have to get something out. So like Slushy, I think Becca was like, all right, let's do this. And she built it maybe in like a month because since we have that time box thing, we don't have long periods of time where we can go away and we can think about it and we have to deliver it and we have to see the impact. And being able to see the impact quickly and then continuously get that feedback again is another input for how we make really strong data products.

And I think it's something that speaks so much to me because groups will reach out to me to do a workshop and I always tell them, look, I'm happy to do a workshop for you. And I hope that it sparks further learnings, but really what's needed in-house is a competency center or some type of group that manages the change that's happening in the drug development space. Absolutely. And your team just tackled that. And I joke all the time that last year your talk was about the downside of workshops, which is so funny because I do so many workshops, right? But I do always preach the side of it needs to be part of a center that your group manages that you think about how do we take people from the commercial software into the open source side.

Exactly. And that's the other thing that we love about it is we try to eat our own dog food. So as soon as we kind of finish an iteration, we try to get some of those individuals to start contributing to the open source. So like, for example, the team that we did slushy with was an oncology team, and then they started to directly contribute to Admiral Onco based off sort of some of those lessons and some of those learnings that we had as an Accelerate R iteration. So like being able to not only train, not only upskill, but also figure out a way for individuals or inspire individuals to start contributing to the open source. It just makes everything a lot more solid, if you will, based off that.

Open source contributions and the BEAST package

And talking about contributing to open source, you contribute to open source, right? I've seen you have a package with Christina for Bayesian analysis. What's that like? Complicated. So, yeah, we're, we really, so it's our first stats package, which I think is pretty special for us within GSK. It's focused on Bayesian dynamic borrowing for robust mixtures priors. Good luck saying that ten times fast. But it's really focused on some of our cutting edge stats methodology, which we partner with some of the other individuals on our team and basically try to take out of their brain and translate it into something that, you know, the community can use. And that, you know, it's always super exciting to be able to be part of that. Plus, it has a great name. Yeah. Beast. Yeah. We're thinking maybe a gritty logo or something as our hex icon.

OpenStatsWare and upskilling statisticians

So I feel like one of the things that you often hear in the open source community is that good software development and engineering is a critical part of statistical computing. Sure. There's a group, OpenStatsWare, that you're part of, that really is about this. What motivated you to get involved? Yeah. I mean, I think what really motivated me to get involved was seeing that there's a gap. So you have a lot of our PhD level statisticians that come into the organization and their intelligence is just something I can't comprehend. But watching them fumble around with GitHub, as everyone does when you first start it, was kind of eye opening. And a lot of also statisticians, they live in the base R world and they don't have a ton of exposure to tidyverse or packages and things like that, and how it can improve their workflows.

So being involved with that group and some other groups internally at GSK to upskill our statisticians to think about, how do I actually do version control? How do I use it to my benefit? How do I use it in order to be able to track things and share it so I'm not just passing code around in the organization? That has been really crucial for us being able to upskill and it's a partnership. They teach us stats or more sets, but we teach them some of the other tools that make their lives easier. And at the end of the day, that's the goal. We all want to try to do things well and being able to have kind of more tools in our tool belt in order to accomplish that is how we do it.

Tables, TFORMAT, and the TFORMAT Builder

So we've talked about going from commercial software into open source and learning new tools like Git. It seems like a critical component of this for most of the organizations that I work with are tables. Yeah. GSK has thought a lot about tables. We have. Yeah. So obviously we have our table solution called TFORMAT, which Christina wrote, but also supported by Becca Krause and Ellis Hughes. And then Becca is the current owner and maintainer of that. And we like to think about how to leverage TFORMAT obviously in our workflows, but also how do we leverage services around TFORMAT to help others?

So obviously if you're a hardcore R user, you know how to use R. Great. Go to the TFORMAT package and things like that. But what if you're not an R user, say you're a medical writer or say you're a scientist or something like that? How do I interact with TFORMAT? And Becca just gave a talk about it, but we have an open source tool called TFORMAT Builder, which is essentially a Shiny app GUI that allows you to kind of go through and step by step be able to initialize and create a TFORMAT table. So it helps guide you along the way. And I think one of the great things about TFORMAT, TFORMAT Builder is it gives an entry point for someone that's R curious or open source curious, but doesn't know where to start and it kind of guides them. So I think that is being able to have that entry point that's not so overwhelming for someone that wants to dabble is really crucial for how you kind of build your community.

Gen AI, R adoption, and ARDs

I feel like you've done so many awesome things in the group that's managing these new users or existing users that are hitting roadblocks. Are there new things on the horizon that you have thought about incorporating? You hear a lot of times the buzz of Gen AI. Has that come up for you? We've been thinking about it, sure. But I think for us, where we want to focus potentially is how we apply it. And I don't know if we're the right people, honestly. It's so new. I think there's a lot of smart people out there that's going to figure it out. But I think at the end of the day, what we like to focus on is how do we have impact now?

So I think, as I mentioned, one of the things is around R adoption and sort of where is the industry going? So one of the things that we were talking about in the summit on Sunday is how do you know how much R adoption is going on? How many studies are actually using R? So within GSK, we've piloted some ideas using version control, which we have a new policy that everyone needs to put all their code in GitHub within Biostats. So it allows us to be able to answer the question of at a study level, how much R is being used? And being able to use sort of that kind of information allows us to be better with how we target things and how we support things. And to me, that's kind of like a new frontier, because then we can really start to kind of pick up acceleration.

I think Gen AI is definitely interesting. We have the problem of reproducibility. How do you answer the question of if I've created some code with a Gen AI product, it's a black box. And some of our consumers, our regulators, might not like that. So I think there's still a lot to think about that we'll figure out over a few years. But it's coming down. We know it's coming.

I heard you mention that critical word of reproducibility, and it's such an important part of what GSK does, is the metadata-driven analysis and touching back to packages like MetaCore and tools like that. Yeah. It's another set of tools that we thought were really important, our open-source part of the Pharmiverse, things like that. But yeah, we have a ton of things around metadata. And then there's a new paradigm coming out around ARDs, Analysis Results Datasets. And we're all going to it. I think everyone is waiting for the CDES standard to come out. That way we can then all apply our own standard on the set of standards.

But I think what's exciting about that is we're able to have an end-to-end pipeline using R. And I don't think that was true two years ago. So now we can say, from metadata to table, we can do it. Metadata to figure, but ggplot at the end, we can do it. And I think that's what's really exciting, especially for us this year, is being able to say, as soon as your metadata hits within your clinical trial database, you can start your pipeline. And that's really exciting.

And I feel like GSK and TFORMAT was very early to that ARD, ARS format from CDES. And you're seeing tools now like GT Summary that's incorporating that as well. So it's going to be interesting to see where that takes off.

Python in the drug development space

I feel like there's another component to the drug development space that's creeping in that's quite popular is Python. Has your team thought about how to approach that within the Accelerate R team? We've thought about it. But the current way of thinking, essentially, is we think Python is great, but for those kind of underlying data engineering tasks. At the end of the day, R is a statistical compute software. And we need to do stats. So right now, I think there's going to be, as we start moving into different environments, whether it be cloud environments and data being stored in cloud in Parquet format or XYZ, I think we will start touching some of those other languages. But I think they'll be how we serve it up to our statistical programmers or statisticians who fundamentally want to use R because of the stats power behind it. So we will. I just don't know how much. Not everyone's going to be touching it, but there is a future.

Looking ahead and cross-industry learning

So you've been here for the summit, the workshops, the conference. You're going to go back, hopefully relax. What are you going to do? I'm sure answer a bunch of emails. I'm really excited about a lot of the things around some of the R pharma topics. To me, there was a topic around how do you go SAS programmer to R that obviously speaks very close to my heart. But there was a lot of things also thinking about how can we use some of the products you all have put out, like use this for package development. How can we take that and maybe put a wrapper on top to make GitHub easier for a statistician who might not have that background in computer science or how version control works, but they need to use it.

And so to me, what's exciting is being able to take what's out there, put our spin on it and then put it out for others to use. Maybe when we put something out, either another large pharma can take it and use it or a small pharma. And I think that's where I'm kind of curious about where we go is what happens to those medium and small pharmas in this big transition and how do we help them along in the process.

It's a pretty amazing thing and you see this quite often in the ecosystem where tools like Admiral were built off of dplyr and GT Summary is built off of GT and you have TFORMAT that extends GT. And it's, I think, a really cool way to take that foundational package and start to build the things that we need for the organization that we have. Yeah, for sure. And I think the other thing that's great about it is since we're all leveraging these open source tools and putting them all out, it allows us to kind of standardize and it allows us to really focus on the things that matter. Like at the end of the day for pharma, what we want to focus on is the science. We don't really want to focus on like, we don't have the same competitive advantage, we don't get a competitive advantage between like who has a better tabling package or anything. It's something that we like to do because we like to think about how users interact with data. But at the end of the day, we like to focus on science and if this allows people to focus more on the science, then it's a big win.

Like at the end of the day for pharma, what we want to focus on is the science. We don't really want to focus on like, we don't have the same competitive advantage, we don't get a competitive advantage between like who has a better tabling package or anything. But at the end of the day, we like to focus on science and if this allows people to focus more on the science, then it's a big win.

You know, it's a topic that's come up so much with my interactions with the pharma is how do we standardize things? And it seems like one of the lowest hanging fruit in that space is around TLGs and standardizing on the standard reports that they make. There's a new initiative, which was originally called Falcon, has now moved into Cardinal, I believe. Have you seen that or part of it? A little bit. I've seen sort of where, I've seen where it's going. I think if it works for your organization, use it. I think the beauty of some of our open source communities like ASA, OpenStats, or Pharmiverse is there's a real strong push for everyone to contribute and then you figure out what works for your organization. And I think that's kind of the beauty of sort of these package ecosystems. Your ability to take something that is fully kind of modular and figure out how to plug it into your workflow is what's great about it.

And so to me, I think if Falcon or some of these very, not rigid, but some of these ideas on how it is that we standardize, if that works for you and your company, great. If it doesn't work for you and you want to use something else that's available out there that someone is maintaining, great. And it really depends on what it is that works best for you. I think it's a great story because pharmas are all so different in the way that they process things. And it's basically saying, here's an ecosystem of packages that you can pick off of that are reflective of the processes that you have. Exactly.

You know, one thing I've always thought with GSK is that, you know, Andy Nichols being critical on the R Validation Hub and the R Validation Hub white paper being such a critical piece of pharma on the open source side, you know, I'm sure that must have helped get things underway for GSK. For sure. I think being able to clearly define what do we mean by validation is the most crucial thing. I mentioned it in my talk a little bit earlier, but one of the things that I remember early back before everything was validated is like, what does code execution mean? Does it mean a .exe file? Like is it literally an execution file or does it just mean you can run the code interactively? And I think being able to figure out and define these are the things we care about, reproducibility, the ability to trace, or traceability, the ability to say, yes, we trust this. We trust the outputs and we trust the outputs that are going to come from these tools that we're using is really crucial. And being able to take that framework and then apply it internally. So we apply it in a certain way at GSK. Another company can take those ideas and apply it for their QA department, but being able to say, yes, these are the things that we care about, I think is a big step forward for us to be really feel comfortable for how it is that we deliver and how do we trust what we create.

Well, I have absolutely enjoyed chatting with you. You've got a lot of the conference left. Any talks you're looking forward to attending? The thing I love coming here is being able to see not only what are other pharmas doing, but what is everyone else doing? And I think one of the things that I learned coming into GSK is we can learn a ton from financial services. We can learn a ton from United States Geological Survey. Being able to take sort of like lessons learned for people's individual organizations and then think about what it is that they did and kind of put our own twist is always a great way for us to keep innovating and pushing things forward. If we just kind of stay within our internal pharma land, we're all going to kind of say a lot of similar things. But being able to see that kind of that cross-industry problem solving is really crucial for how it is that we continue to innovate.

I think it's fantastic. You get to mix with different people, diverse groups, and different industries here. And hopefully you take a lot back to your team. Hopefully. Yeah. Well, thank you so much. I've had a great time and enjoyed the rest of the conference. And hopefully we'll connect. Any other conferences coming up this year you're going to go to? My hope is to get into the rPharma conference. But rPharma, unfortunately, I won't be at Fuse EU, but I'll be at potentially Fuse. And then, you know, wherever things pop up. And Orlando. Oh, of course. Yeah. Orlando in March after a Northeast winter, I'm ready to go there. Sign me up. Exactly. Exactly. All right. Thank you, Ben.