Resources

Open Source Solutions to Next-Generation Submissions, After 30 Years of Industry Experience

Presented by Mike K Smith The pharmaceutical industry is undergoing rapid change, driven by a desire from both industry and regulatory agencies to move to more interactive visualizations and web applications to review data and make decisions. These changes would have been unthinkable 30 years ago when I started working at Pfizer. In this talk, I'll consider the drivers for these changes, how open-source tools can help achieve this, and why collaboration across the industry is vital to achieving this goal. I'll contrast this with my experience of 30 years working in the pharma industry - when the R language had only just been released, when the internet was new, and when submissions to agencies were printed out, loaded onto trucks, and shipped to their doors. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Pharma. Session Code: TALK-1067

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So, thank you for coming to this presentation. My name is Mike K. Smith, I lead up the R Center of Excellence in the SWAT team. And on the 3rd of October this year, 2023, I'll be celebrating 30 years employed at Pfizer.

And in the talk today, what I want to try and talk to you about is the fact that right now in industry, things are changing. Things are changing fast because the industry is looking at R and other open source tools more and more for clinical trial reporting and for regulatory submissions. So I want to look at that in the context of what was it like 30 years ago, how far have we come, and also a little bit of crystal ball gazing about what the future might be like.

Back to 1993

So let's jump in the DeLorean, set the destination for 1993, October, and go back and see what things are like. I will be your Doc Emmett Brown guide to this, but when we get back there we might meet this young lad, right? So please, if you find him, be gentle. This is his first week in industry. This is his first job. He's moved 500 miles from home to come and work for Pfizer down in East Kent. He's also slightly bewildered because he's started into this industry where people know how to do clever things with SAS, and he's never used SAS before, so he's going to have to learn.

Now, think back 30 years. So if you are less than, oh, I would say 40, the thing on the far left is not just the save icon, but it's an actual data storage device that holds a whopping 1.4 megabytes of information, megabytes. But obviously, over time, things have changed. Technology has moved on. We've gone through CD-ROMs, DVDs, and right now you can essentially download almost an arbitrary amount of information from the internet and from the cloud. So 30 years ago, what's a cloud? Well, cloud computing isn't just something that happens in the west of Scotland where it rains a lot, right?

But 30 years ago, the industry really was predominantly SAS-based, right? SAS was the lingua franca for all the work that was done for clinical trial reporting, and I would say 90%, 95% of the work was done in SAS. Back in 1993, a full-stack developer used one tool, this one, but what they did was data manipulation, analysis, programming of tables, listings, and figures, right? So that was your full stack. You did end-to-end work, but all of it in SAS.

The other thing that happened in 93, or very shortly before 93, was that R became a thing. Now, back in 93, R was more of an academic statistical software for statisticians to fit models and do interesting things like that. It certainly wasn't the programming language that we know about today.

Regulatory submissions then and now

Let me talk also about regulatory submissions. So at the end of the clinical trial development process, we have to gather together all of the information that we have on our therapeutic or our treatment, bundle it all together, and back in 93, that involved printing it out, binding it, putting those binders together onto a pallet, shrink-wrapping it, and literally trucking it to the regulatory agency. This is how we did it. So that's an example regulatory submission, and you can see what kind of amount of volume of information is in there.

Fast forward a little bit, 30 years on, we're still producing largely the same static information, but it's now in electronic format. And there's an electronic submission to the regulatory agencies. But still, we've come a long way, but it's maybe to someone 30 years ago, what they would see in this electronic submission would be very familiar.

So that says to me that there's a well-founded perception that pharma is very slow to change. And I would also argue that there's, across all of the pharma companies, there's a certain amount of hubris that says, well, if this isn't developed by me within my company, I'm not sure that the quality is really up to Pfizer standards. And that, I think, prevents us from engaging with the open source community and bringing in outside work to help us with our jobs.

Change is happening now

But let me tell you right now, change is happening. Pay attention to the tense there. It is happening. It's not coming. It's here now.

Last week, we heard from Novo Nordisk that they had completed and filed an NDA, a new drug application to the FDA, predominantly using R. So using R and SASP, but in the whole tables, listings, and figures and data manipulation, used R. This is the single biggest question I get from anyone that says, yes, using R for clinical trial reporting is a nice idea. But has anyone actually done it in practice? And the answer now is, yes, yes, they have. If you haven't seen that video, I encourage you to go and have a look. It's really a great eye-opener because Novo are talking in very kind of candid ways about what they've done and how they've done it.

The other thing is that the change is accelerating. The pace of change is accelerating. And if you're sitting at the sidelines looking in and going, yep, this looks interesting, but I'll give it a little while and see how other people get on with it, you might find yourself far further back in the change curve.

And if you're sitting at the sidelines looking in and going, yep, this looks interesting, but I'll give it a little while and see how other people get on with it, you might find yourself far further back in the change curve.

The Pharmaverse and industry collaboration

So what does this change look like? Well, there's a collection of R packages called the Pharmaverse that helps the industry with all aspects of preparing for this regulatory submission. So data manipulation, tables, listings, and figures, and presentation of those tables, listings, and figures, as well as things like infrastructure bits or environment management, also validating R packages and testing R packages. So there's a huge amount of work done in here. The other great thing is this is work done across a huge number of different companies. And as you can see here, a significant number of people.

So this is all great information that you can tap into and exploit and see how much of it is useful to you for your process today. But it's not just R packages that people are developing. People are developing training modules. So this is a great example where Genentech have made their training course on open source development and using GitHub to develop new things and how to work in that framework, and then made it available to any of us in this room to go and evaluate this course and see how it works. And that's fantastic, because it brings all of our skill levels up. So when we ask, how do we do this, we can all now work off that same level playing field of knowledge.

A lot of these initiatives are powered by consortium. So there are three listed here, FUSE, R Consortium, and Open Source Informer. And what these consortia do is that they give a platform for multiple different companies to come together and discuss freely. But not just companies. It's also the regulatory agencies. In this R Submissions working group from R Consortium, the FDA are active members and engaged, and they are defining how they want things to be, because they want to shape what that next generation reporting looks like. And I think that's really powerful. Rather than it being powered by pharma companies who want to do something fancy, the regulators themselves are sitting at the table and going, well, we have a stake in this. We want to tell you, we want to shape what you're going to do with this.

But these three organizations are driving that change in a way that, you know, I've never seen anything like this in the last three years, last 30 years. I would say that in the last five years, the pace has picked up enormously. Even in the last two years, it's accelerated again. So these consortia really are powering that change.

You might say to yourself, why would companies share this knowledge? Isn't that our intellectual property? And 30 years ago, yes, if I wanted to talk to someone in Roche about how Pfizer do things, our internal legal people would get rather twitchy. But if we do it through consortia now in a working group, that gives us this level playing field that says anyone can come and talk at this within this working group. And it frees us to be open and candid about how we're doing things and try to find the best way. Because it's not just Pfizer want to find the best way. We want to find the best way and hear what Roche are doing and Genentech are doing and all these other companies, and then help that inform how we move forward.

So the IP now is people are saying, well, I think the IP is in the treatments we develop and then the data that we collect. Not in how I code that visualization. And I think that's eminently sensible.

Looking ahead

So let's have a wee bit of a guess about what the future might bring. Please, believe me, I am not telling you that in 30 years we'll have hoverboards because by now we should have had them, right? So it's always dangerous. Neil Sparrow, the physicist, says prediction is difficult, particularly where the future is concerned. So bear that in mind.

But what I want to talk about now is some of the things I see and the kind of directions that I think are going to be important. So we all will, if you don't know about Quarto, you will at the end of this conference. But the principle behind Quarto is reproducible reporting. And I think that's going to be really important to be able to demonstrate that this table listing figure that I'm producing can be done again and again and again. And if I feed that same data through, I'll predictably get the same answer.

In the middle, I'm talking about WebR, but really what I'm meaning here is the regulators want to have a little bit more dynamic review of what we're presenting, as opposed to those static images and listings and tables. WebR is a good example because you can bake that interactivity into a single HTML document, and that's easy to send to an agency, where sending a Shiny app depends on a server, which depends on infrastructure, which depends on package versions and a whole bundle of other things that get a bit squarely and difficult.

Lastly, on the right-hand side, the theme I'm talking about here is interactive graphics. So the fact that it's eCharts, you know, pick your own. It could be observable, it could be Rd4 or anything else. But the point is, we are all looking for that little bit more interactivity in our visualizations so that we can drill down into something and go, what subjects are involved in this data point? Or who is this person here, and can I see what other endpoints pertain to that individual?

The other thing is that data is all around us now. 30 years ago, if you wanted to know how a patient felt about a treatment and their daily snapshot of their endpoints, you gave them a paper diary, which they filled in every day, they brought back and we digitized it. Or you gave them a meeting with their clinician at the end of treatment, and they would ask questions. But today, with wearables, apps, all of these things, we can get a much richer view, a daily view, an hour-by-hour view of the patient's experience. So for example, you could have a wearable that measures tremor and shake in a Parkinson's patient, continuously. Which I think it makes the analysis a lot more complicated, but it gives a much richer and more nuanced view of exactly what's happening for that patient.

Another thing is that patients are having a much greater say in what the industry does with their data. So they may give consent for their data to be used in its primary use of clinical trial reporting and for the submission to the regulator agencies, they may opt out of secondary use of that data for things like building ML models or AI models, and we have to respect that. Patient advocacy groups are now much more data literate than they were 30 years ago. They're used to seeing data, they have people in those advocacy groups who can analyze that data, interpret it, understand it, and then make a case for what they would like to happen. And lastly, patients have the right for their data to be forgotten. So a patient could come back and say, I don't want my data in your database anymore, and we have to respect that, honor it, and expunge it.

Building trust

So all of this change means that we need to build trust. We need to build trust between the developers of these packages, who are across all of these different organizations. We need to build trust that if I pick up your package and use it for my purposes, I trust that you're doing a good job, that you've thought of all the edge cases, or you've tested it adequately. There's trust between the industry and the regulatory agencies, and that trust is built largely through those consortia working groups, because we're talking about it monthly when we get together. And obviously, it depends on trust between the patients and the pharma industry to make sure that we are treating their data with respect.

Let me give you a little cautionary tale about predicting future technology. The books that you see here are what's called an encyclopedia. You used to find them in buildings that were called libraries, and it's a printed volume, and they had hundreds of articles about all kinds of topics, and if you wanted to know about that specific topic, you'd go to the library, you'd extract the volume, you'd look it up, and you'd find out your information. Those things went out of date really quickly, but guaranteed you'd find a set in a library.

And then Microsoft said, well, with the advent of the CD-ROM and the DVD, we can now go from hundreds of articles about information to thousands, tens of thousands of articles, and we can have embedded media clips so that people can see and get more information about this. And, you know, largely because information goes out of date very quickly, obviously, they could sell annual updates to this product. That's a great idea, right? That's revolutionary, until something like this happens.

And now you've got an open, community-curated encyclopedia with millions of articles, tens of millions of articles, edited by all of us. If we have a deep knowledge of a certain sitcom, we can go in and we can contribute to an article on it. And that's amazing, right? It's revolutionary. And the other thing is it's available in any browser, on any device, on your phone, on your tablet, on your computer, everywhere you go. If you're at a trivia night and you have a debate with your colleague about who was it in that film that did that thing, you can now look it up.

Thirty years ago, this kind of community-led thing would have been astounding, right? Still is quite astounding. But 30 years ago, you would have been laughed out of town. So believe me that change is here. It's coming. It's accelerating. And like the Wikipedia, don't let yourself be the Microsoft and Carta team. Sitting there trying to sell a product that is just about to be trumped by an open source thing that will blow you out of the water and change your world, okay?

And like the Wikipedia, don't let yourself be the Microsoft and Carta team. Sitting there trying to sell a product that is just about to be trumped by an open source thing that will blow you out of the water and change your world, okay?

So with that, I must say I'm out of time. But like Natalia, here's my QR code for all kinds of links to the things I've talked about and how to connect to me. So thanks very much.

Q&A

So we wait for some questions here. What is the thing that excites you the most right now that you're looking forward to happening as you're seeing all this sea of changes?

I'm really excited about moving away from static outputs. So the idea that you can have nice, clear, informative visuals, which are also rich. So you can delve into them. You can see different layers of information. I think that is going to be just amazing. But like I say, it's dangerous to predict where we're going to go because anything can happen.

And when you say you, you mean like you as in the analyst or you talking about like your end users be able to do that? I think from the end user perspective, we can offer them a much richer experience and a much more insights into the different layers of data that we're collecting.

Well, again, thank you so much, Mike. Appreciate it.