Andy Nicholls & Michael Rimler | Using R to Drive Agility in Clinical Reporting

Transcript#

This transcript was generated automatically and may contain errors.

For those of you who don't work in the pharmaceutical industry, what this, and indeed for those of you do, this diagram in front of you is essentially an org chart and it's also a timeline. So this is how the biostatistics department at GSK is structured. I say it's like a timeline because our structure essentially reflects the drug development life cycle. So this can take up to 15 years, which is what this arrow at the bottom is presenting, and that takes us through the research in vivo in vitro analysis all the way through to the clinical development.

And by and large, if you're not from industry or you are and that's familiar to you, you can slice this up into two distinct ways of working really. The first is the sort of non-clinical area, and that's actually where my group, Cisco Data Sciences, sits, but it's also where our research statistics and our manufacturing statistics groups sit as well. And that's a large group of statisticians, programmers, data scientists, around about 100 people. But overwhelmingly, the vast majority of people in that group sit within the clinical space. And the clinical space has very, very different controls and ways of working to the non-clinical space.

And it's the clinical space that we're going to be talking about today. So a team of 500 clinical data scientists, that we're going to call them. And I put data scientists in quotes on this slide because, really, if you speak to anyone in that department, very few people would actually call themselves a data scientist. They are made up of people whose job title is either statistician or programmer, generally.

And one of the reasons why I think that I would also probably not call them data scientists in a sense is that, for me, a data scientist is someone who is multi-skilled in a number of different ways. So they're not simply a statistician. They can code a bit as well. They can use different languages. They use the right tool for the job. And although there's a huge amount of talent in that 500 statisticians or programmers, at the start of our journey, they weren't really data scientists in the sense that I'm describing.

Skills and language proficiency in 2018

Just to give you an idea of what kinds of skills we do have, away from just pure statistics and pure programming skill, I put up some proficiency, some rough estimates of proficiency within the department at the start of this journey in 2018. And as you can see, the overwhelming majority can program in SAS. I've said 99%. I put 100%. But I'm allowing myself a little bit of room for one or two people who can't. But essentially, everybody can program in SAS.

There is some proficiency in R in the clinical space. But largely, R has been used historically for the generation of graphics. And SAS still dominates in that area. It's normally bespoke graphics that are particularly difficult to generate in SAS that R has been used for, along with a few very proactive R users pushing R for graphics, and for clinical trial simulations where R arguably lends itself better towards simulations. Python, again, I've tried to cover myself with 1%. Essentially, nobody's using Python in this clinical space. But just in case, I put 1% there. And Julia, essentially, I've put up there to make it sound like I know what I'm talking about. Essentially, no one is using that either.

Why R hasn't been used in clinical reporting

So why are we not using R? Well, a lot of it comes down to the past 20 or 30 years of drug development. And skills have been built up over that time in SAS. And alongside those skills, the tools we use, what are called SAS macros, which if you're not familiar with SAS would be akin to an R function. But you can think of all the functions and packages you might build in R. All those kind of tools that we build to make our job easier have all been built in SAS. And in a clinical space, it's very, very heavily regulated. And so we have systems that control, to a very fine nature, our workflows that we use. And these systems, as well, are entirely built for SAS.

Add all that to the fact that it's very conservative. The further along you get along the pipeline, the more conservative it gets within our industry. And you can start to see why R hasn't been used. But the real one that I hear a lot is this statement here that regulators only accept analyses performed in SAS. This is essentially a myth that has been around for a long time. It gets repeated in various different forms.

But the real one that I hear a lot is this statement here that regulators only accept analyses performed in SAS. This is essentially a myth that has been around for a long time.

So you have, for example, when you're filing for submission with the FDA, the Food and Drug Administration of America, you have to provide data sets in the SAS transport format. And that's an open format. This statement here is from the FDA. It's clearly an open standard. I put a line of code, a bit of pseudocode on the right-hand side to highlight just how easy that is to generate in R. In fact, there's less code in R than it would take in SAS, ironically. But these kinds of things, these kinds of statements get put out there. And I think because it has SAS in it, there's always this assumption that SAS has to be used.

On the regulatory note, I could talk for an entire hour. I won't be, but I could talk for an entire hour on this topic that Phil mentioned in the introduction. If you are interested in the regulatory barriers or perceived regulatory barriers that we face in the pharmaceutical space, I would recommend you go to www.pharmaR.org. This is the R validation hub that Phil mentioned. And you can see here, in particular, our white paper on a risk-based approach to package validation.

We're not the first ones to go through this, but we are carving our own path, and we do have different experiences. And the more that we share, the more that we can pull the rest of the industry along with us, along with the folks like Genentech and Roche and the likes of that.

And then, of course, we always are thinking about, well, we're doing this on the QC set of things. What does it mean to get R onto production? Some of that falls within Andy's group. Some of that falls within the tech group, and some of that falls within the work that we're doing, and there's always the question out there about submission with R-generated analyses and data.

So that R for QC project, I talked about a little bit already, and Andy's also mentioned some of this also. But why did we decide to use R in the R for QC project? We have this long-term objective within clinical programming to become programmatically multilingual. It's not driven by SAS license costs. Andy already mentioned the idea of recruitment. I also think that flips to retention for the folks that have been within our group for a long time. This offers new and exciting things that can do if you have interest in doing that. But operationally, it also gives us flexibility, the ability to choose the right tool for the job at hand, and also an expansion of the things that we might deliver to our internal customers.

Why did we look at QC? It's a lower risk of entry point. That's what we thought from the beginning, that QC programming tends to have lower regulatory scrutiny. It's not typically submitted along with a submission package. So therefore, you can have QC code developed during the reporting workflow and in R as opposed to SAS, and it's not going to be challenged as much. And we're still producing everything using SAS at the moment. So we're taking things that are produced in a language and in a workflow that has been commonly accepted, and we're simply using our validation processes using R. And as mentioned, it's going to facilitate the on-the-job organic upskilling because we know that if I need to validate a summary of demographics, I can go and I can use R within our RStudio environment and do that work there as opposed to launching up SAS.