
The Need for Speed - AccelerateR-ing R Adoption in GSK - posit::conf(2023)
Presented by Ben Arancibia How does a risk-averse Pharma Biostatistics organization with 900+ people switch from using proprietary software to using R and other open-source tools for delivering clinical trial submissions? First slowly, then all at once. GSK started the transition of using R for its clinical trial data analysis in 2020 and now uses R for our regulatory-reviewed outputs. The AccelerateR Team, an agile pod of R experts and data scientists, rotates through GSK Biostatistics study teams sitting side by side to answer questions and mentor during this transition. We will share our experience from AccelerateR and how other organizations can use our learnings to scale R from pilots to full enterprise adoption and contribute to open source industry R packages. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Pharma. Session Code: TALK-1068
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi everyone, my name is Ben-Aaron Sebia, as you can see. I'm here to talk about accelerating R adoption in GSK, but really what I want to talk to you about is learning.
So learning, it's really, really, really hard. Everyone knows that. I think we within GSK, we're supposed to spend 20% of our time learning and developing. I don't know if we're able to do that. It's difficult. It's really hard. And it's really hard to do learning and apply it to your job when you're working on books, if you're working through articles, things like that.
So we're posed with this question. How do organizations enable people to learn by doing, especially with these deadlines? As you can probably see, I have TopGun here, and I promise I'm going to land this plane, but TopGun is going to be throughout the presentation, and I'm going to relate it between the two.
So you might be thinking, why is this bozo here able to talk about learning? So within GSK, I'm a director of data science. I lead our R enablement, which means I work a lot on our training, work a lot on enabling people, building tools, things like that. The most important thing is I'm really passionate about teaching and learning. Right now, I'm reading a book about drawing, like learning how to draw, which is a very humbling experience if you've ever tried to learn to draw as an adult. Very humbling, but it's totally worthwhile.
GSK Biostats and the move to R
All right, a little context. What does a biostats organization do? At the kind of core fundamentals, what we do is two things. One, we write statistical tests to determine if a drug or an asset is safe, and then the second thing, is it effective? That's it. That is what we do. We move things through the pipeline, and we write the tests to determine that.
So huge change. In 2020, GSK Biostats, we decided to make R a primary language for statistical reporting. That is a huge deal for us. If you attend any sort of pharma conversation, presentation, this is a big deal for us as an industry as we move away from some other tools.
So we made this decision. We moved away from some proprietary tools. So what did we do? We built a center of excellence. Everyone builds a center of excellence. That's kind of the thing to do. You got to do it. So one of the things within that center of excellence was build a training program. So that's what we did. We built one. And after a lot of failures, we finally figured out one that worked. And today, we call it Accelerate R. Very clever name. You had to put R in it, right?
Why classroom training failed
So when we started training people, when we started moving into R, what did we do? We started doing in-person training classes and virtual training classes. Why? It's what we did last time. It worked really well. It was perfect. And as in Top Gun, they also had training in the classroom.
So we said, all right, let's just do what we did last time. Let's do in-person training, virtual classes. We'll create some documentation. We'll do some for our study teams. We'll figure out what it is that they want, what it is that they need. And guess what? They'll be able to do it, no problem.
It did not work. We trained about 80% of GSK Biostats. Within GSK Biostats, we have about 900 people. 80% of those people went through that training. We just were not getting R adoption. They weren't actually able to apply it to their clinical studies. So it wasn't working.
For Tom Cruise, it did not work either for the in-person training in the classroom. So what did we do? We looked in the distance, like handsome Tom Cruise or Maverick. And we reflected, what was wrong? Why was adoption so hard? Everyone went through all these trainings. Everyone was doing a lot of book clubs. Everyone was attending virtual meetups, things like that. Why were people not able to apply it to their clinical studies?
This was our big insight. Within clinical trials, you have huge lead times of projects. You will be on projects for years because of the nature of drug development. So one of the things that we saw frequently is someone would take an R training class. They would then wait 12 to 18 months before they applied it on a study. And guess what? They totally forgot everything. I can't remember what I had for breakfast yesterday. How are you going to remember what you learned in a training 12 months ago?
How are you going to remember what you learned in a training 12 months ago?
So they forgot everything, and they were doing all the training on the fly. It just wasn't sustainable. People were not willing to do it.
The other thing with our R training, they were learning a huge amount of new things. So when we say R training, we also mean GitHub, packages, a new way to think about data. Because how you interact with data within R is very different than how you might interact with it in other proprietary languages, different things like that. It's incredibly overwhelming. So it's not just, oh, I need to learn R. Let me call a library tidyverse. I actually need to figure out, how do I install R? What is the equivalent of a macro library to a package? Things like that. It's too much.
The Accelerate R approach
We had this insight, so we proposed a change to our leadership team. No more classes, period. No more training classes. We're going to move all of our training into on-demand training and documentation. Why should I write a tidyverse training when someone else has done it a lot better than I ever could? I'm not going to do it. I'm just going to link out to it and use that.
So what do we decide that we want to do? All the amount of time we would spend building out training courses, we're actually going to make a small, agile team called Accelerate R, and that team is going to sit side by side with a clinical study team and solve their problems with them and train with them at the same time. So instead of spending time actually building documentation, building virtual trainings, you're just going to actually spend the time with your study team. That's what we did. We're going to focus on support and mentoring.
So this is how we do it. It's really overwhelming, I fully admit it, but basically I'm going to take you through the process step by step on how actually Accelerate R works and works to train people and sit with people.
First off, once we've identified a clinical study that wants to work with the Accelerate R team, this small team of R experts, we say, great. Can you actually access R within our organization? You do not, you cannot imagine the amount of times people will be like, I'm ready to use R, and you're like, great. And then their first question is, how do I access it? What system? First thing, you got to make sure that they actually have access to the system. It's a silly thing, I realize, but it's a huge hurdle that a lot of people have in terms of actually beginning to use a new piece of technology within a large organization.
After they get access, we go through an onboarding. So what the Accelerate R team will do is we do a week-long training up front about an hour a day, and we say, okay, what is it that you need to learn? What is your R maturity level? Are there specific problems that you're working on? Then we craft a training for them, specific packages. So maybe it's a team that's like, we know a ton about R already. I know how to use tidyverse. I know how to use ggplot. I'm struggling with metadata. We'll focus purely a week on metadata. It's probably not the most exciting week, but we'll do it for them.
But that's what we do. We do, first, an intro to these different packages based off the maturity level, and we say, great. We did this week-long training with you. Let's deliver. What we do then is we sit side-by-side with them for four two-week iterations. We work on the backlog with them. We build out their outputs with them. We do code reviews. We answer questions. We have drop-in sessions. We are there to support them and help them with the creation of their outputs.
So what we've done is take training, and we've actually just decided, all right, instead of doing sort of a virtual training teaching you how to use something, we're actually going to train you by working on an output that is going to go to a regulatory agency. That is how we are doing our training now, and that is sort of our core tenet. No more training classes. No more virtual trainings. Teach as you need in order to create those outputs. Learn by doing.
We're actually going to train you by working on an output that is going to go to a regulatory agency. That is how we are doing our training now, and that is sort of our core tenet.
After that, we have a closeout. So the other big problem within Pharma is we have a 30-year history of using other tools where we built out these huge code libraries that we can use. Within R, we don't have that. What we do is, sitting side-by-side, we learn what are their problems, how they go about solving it. We then take that code, and we create a code library. That is how we are organically creating code libraries. So we have that kind of closeout and being able to learn, okay, what is it that they actually need to do, and then how is it that we can kind of learn by doing and take it and spread it across the organization.
Challenges and what went well
We encountered some challenges again. So we were Maverick, and we looked in the distance, and we thought to ourselves, there are some issues. Let's think about this, and let's fix it. Timing with Accelerate R is crucial. One of the big issues that we had is, from an Accelerate R team, is we were interacting with study teams way too early. The way resourcing works within pharma is no one works on one study for the entire length of that project.
So when you actually jump in and help on a submission, it's really, really time-dependent, and it's really important that you nail it. Because if you start to work with a submission team too far in advance, you might not have everyone there working on it, and as a result, your attendance and your ability to do training and interact with people, it's not going to be exactly what you would want. So you really need to think about that timing component.
The other big thing, the learning curve. I don't know who here has tried to teach anyone how to use Git or GitHub before. It is a journey. So this is what we experience a lot, this huge, steep learning curve. People are like, ah, this is really hard. Then they get more and more and more confidence, and then as soon as they think that they've got it, something comes up, some internal process. For example, one of the issues I dealt with recently was with grobs within ggplot. I have no idea what I'm doing. I've used ggplot now for seven, eight years. I have no idea what's going on there. So I encounter this all the time, but this is a really important thing that you have to realize is going to happen when you're working with individuals and training them and teaching them.
Some things did go well, all right? One, we were able to expose lots of different people to lots of different tools. As I mentioned in the front, when you do R training, it's not just here, learn R, it's here, learn R, here, learn GitHub, here, learn X, Y, Z, other tool. Being able to expose people to all these different tools gives them a lot of confidence, and it helps us as an organization be able to say, great, we actually are progressing. We are moving.
The second thing is we provided a route for the business to provide feedback. As I mentioned, we have this center of excellence, and there's a lot of tools that that center of excellence wants to push out, but also get feedback. This Accelerate R teams, because they sit side by side, they see what are the problems that study teams are having. It provides a route to provide that center of excellence a feedback on tools and also talk about what are the needs of the business. One of my colleagues, Becca, she's going to talk about Slushy, which was a direct output from that interaction with business teams.
The third thing is the farm reverse. It's an open source collaboration. Everyone's working on different packages from an industry point of view. It gives us an ability to say, yes, we can help contribute. We're training people on R. Then also, it allows us to point towards people who are super excited about the ecosystem as well as the programming language in the community. It gives them a place to actually go and contribute and make a contribution to the community.
The other thing is organizational resilience. If anyone has ever been part of change management or change within a large organization, again, very, very hard. There's a lot of times where you're going to hit peaks and troughs, and you're going to actually wonder, did I make the right decision? Being able to have people that are champions that are saying, yes, I learned a lot. The support and mentoring has helped me change. It helps the organizational resilience get over that big cultural change that we as farm art is encountering now, but other organizations encounter as well when they make these big changes.
So thinking about how do organizations enable people to learn? It's simple. Support them. Mentor them. Teach them on the job and actually focus on enabling them to deliver and supporting them to deliver using these new tools and technologies. Make that change for them, and don't force them to try to learn how to do it on their own through different trainings and things like that. Really just sit side by side and enable them through that support and mentoring. So thanks, everyone. That's my talk. Hopefully Tom Cruise was there along the way with us to be able to talk about training.
Q&A
Thank you so much, Ben. Appreciate it. Just as a reminder to those in this room, we have a way for you to be able to submit questions. It's under POSIT, P-O-S-I-T, forward slash Slido, S-L-I-D-O dash B, which is this room. So I got a couple of questions here, but you're more than welcome to submit some.
So one of the questions here, Ben, is why bundle Git with R learning instead of one at a time? Yeah, so the reason why we bundled it is because if you have ever tried to install an R package that is not on CRAN, you have to learn how to use dev tools and install through GitHub. So we decided, you know what, let's just teach version control at the same time. It might have been a bad mistake, but it is a mistake that we fully embrace and we continue with. We also do a lot of version control within the organization for our study code. So just keep it all the same as part of the training.
When you are training a submission team, how do you find balance of teaching the core concepts without getting trapped into doing the analysis for them? It's a good question. So when we interact with the team, we kind of have an up front conversation to say, here's what we're going to do. Here's what we're not going to do. We sit by side, we help them solve their problems, but we are not part of the core submission team. So we just kind of scope that out and have very frank conversations. I live by this value called radical transparency. So with these subtle study teams, I am radically transparent in that I am not going to do their work, but I'm going to help them do their work.
Also, what did primary in 2020 mean? For example, was SAS, JMP, Solar, et cetera, used in the science doc? Did this impact code use to generate CSRs? Good question. So primary, what that means is we, what's the best way to put this? So GSK, we have a philosophy of a multilingual world, which is right tool for the right person. As an example, if someone's been using SAS for 35 years, they might not want to use R. And that's fine. So what that means in terms of a primary language is if you want to choose to use R as a language to do a submission, use it. I don't care. But it's basically trying to make programming languages call it at the same level in terms of acceptance within an organization.
The next question says, what is the most common hurdle you see as you train the new teams? What is the most common hurdle? There are two things that I think are the most common hurdle. One, how to think about data is dramatically different within R compared to SAS. And I think that's really difficult for people to wrap their minds around. And then the other thing is there is a, people think of SAS macros and packages as one-to-one equivalents. They are not one-to-one equivalents. So breaking people out of that kind of thought in terms of how do I actually find a function to use is really important. The number of times I've had to tell people don't write this custom function, it exists already in the Tidyverse or exists in this package, is too many to count at this point.
How many people are on the Accelerate R team, and what is the typical ratio of Accelerate R staff, members of research team being supported? So good question. So in terms of full time, we have three people that work on the team, but we have a secondment model. So oftentimes what will happen is after we interact with a clinical study team, one or two individuals from that clinical study team will devote time to Accelerate R as we go to the next study team. So it's a good way to not have to ask for headcount, but also at the same time provide people the ability to train or work with other teams that they might not have the opportunity in the past. Well, thank you again, Ben. Appreciate it.
