Resources

Open Source in Drug Development | Thomas Nietmann and Posit

Thomas Neitmann, Associate Director at Denali Therapeutics, sat down with Posit to talk about open source in clinical trials, his work at Roche, the creation of admiral, career beginnings, and his future predictions for data science in the pharma space. Posit's work in Pharma: https://posit.co/solutions/pharma/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello Thomas. I feel great to meet you. How's it going? Very well. It's such a great honor to get to finally meet you. I feel like so much of our engagements have been virtually through R and Pharma and I feel like I know you but I haven't actually got to meet you so this is nice. It's good to finally sit down for sure.

You have such a fascinating background of coming into the pharmaceutical drug development space and working with legacy software and moving into open source and creating Admiral one of the most important packages of the entire late-stage clinical trials process and doing such an important part of working with data which is critical. It'd be great to learn more about what was that journey like for you to get there.

Yeah in many ways it started by accident. I never thought I would for starters work in pharma and definitely not become a programmer and so I ventured into sports science because I was in love with sports all my life and that was really what I was passionate about and then got a job in my I think fourth semester of my undergraduate working at a research lab in the University Hospital in Berlin where I'm from and there we had a PhD student who had written R code. He was quite proficient at that and I collected some data with a certain measurement device which would produce an XML output which as you might imagine it's not easily analyzable so he wrote a script to basically turn it into a nice tidy data frame and I was supposed to run that so he gave it to me and said install R in RStudio and then click the source button sort of and yeah way over my head. Didn't work for starters so I had to kind of dig down a bit and make this work and at some point when I managed to run the script CD output I thought like okay this is pretty powerful so this became a huge passion and yeah later on when I decided to join the pharmaceutical industry basically I looked for an intersection between working on clinical trials and programming and then I found the role of statistical programmer.

From sports science to open source packages

It's such an amazing journey and it speaks to so many of the people in the pharmaceutical industry as they're transitioning to open source and I think what's also really exciting about what you did is you came to R for a use case at work and fast forward only a few years you create a very popular package for creating graphics. Obviously with open source what you really want to do is create something which others can also use because when you face a problem most likely someone else faces the same issue or even let's say similar so it makes sense to say I created this it's probably useful stuff for someone else so let's make it open source and that's what I did with this package called ggcharts which kind of tried to make it easier to create certain plots which tended to be code heavy with ggplot2 and make it basically a function call to create for example like a sort of bar chart which I thought was should be something very simple but turned out to be quite an exercise actually to do.

So you take and create a package and what's to me such an awesome thing is that you were taking dplyr workflows and piping of data into ggplot2 and you said yourself gosh I can simplify even this which was a very much a simplification of various workflows. Tell us about the motivation to simplify the interface into programming and helping people do such core things as graphics. Yeah really what you when you develop software what you want to do is create as easy as possible of an interface while providing as much power in a way as possible so you can create very powerful tools which are hard to use and no one will ever use them so instead what you should opt for is simplicity and that people can adopt it and that they are empowered then to readily adopt it. So that was kind of what I was aiming for make something that looked kind of hard with the existing tools and try to streamline it and make it very easy so if you want to create a certain type of graph you just have a single function call with a couple of meaningfully named arguments and then off you go.

Really what you when you develop software what you want to do is create as easy as possible of an interface while providing as much power in a way as possible so you can create very powerful tools which are hard to use and no one will ever use them so instead what you should opt for is simplicity and that people can adopt it and that they are empowered then to readily adopt it.

Building Admiral for ADaM datasets

And then on to CRAN was that was that a big day for you? For sure yeah I mean yeah that was a big day and I'm pretty sure I did a post and celebrated it so yeah. Oh absolutely and so take that into the work that you have done on Admiral is there lessons learned from that package and the simplicity around the interface and creating graphics that you create around this interfacing for ADaM datasets? I mean yeah I think certainly the aspect of trying to keep it as simple as possible for the user yeah it's a big one which we had with kind of developing Admiral for ADaM datasets yeah because the existing solution we had which was not an R which was in proprietary language in many ways could be hard to use for people so as long as totally we were you know in the realm of standard datasets and it just worked it's good but then if something fails or you need to customize it's a black box and people struggle with it yeah so we wanted to have something that is extremely simple to use for people just have to read the documentation once and then I'll show okay if I need to do this I use this function if I need to do this I use the other function yeah so simplicity was really really key.

One area that's so interesting is this idea of Admiral being extended by other groups and people and pharmaceutical companies into different areas of pharma and life sciences right vaccines oncology what makes it about this use case that extends it well into these other areas of ADaM data creation why not just use Admiral or why not just use dplyr? Well I think certainly you could use dplyr and that's certainly what we do it's the basis of what we use we kind of say the tidy versus great let's you know take that as our basis and build on top of it but it's obviously streamlines a lot as you said because there's common algorithms you do in these kind of derivations without specific to this domain so you can take all these dplyr functions put them together and do that thing but it's much easier if we give you a single function to do that and the idea of Admiral is we built like the core layer all the stuff that anyone would need for kind of general purpose let's call it analysis data sets in pharma for clinical trials but then there's the all these intricacies of whether you're working on oncology or on vaccine development or any other therapeutic area and I don't have the expertise for that but if there's a certain company which is heavily investing in kind of research and development on that kind of front they are the experts so it makes sense that they say you know what you've built is good we'll take that as sort of the next basis of the pyramid so to speak and build again on top of it and that worked out well so far with these three extensions and I really hope that this kind of ecosystem flourishes and gets larger and larger.

Do you think that there will be more unification or standardization of other areas of the late-stage clinical work? I would certainly hope so because at the end of the day we all do the same work so there is the submission package if you want to file for your new drug to the FDA for example and for all of us whether you're working at a Roche or GSK or Pfizer at the end of the day it's the same so why make it kind of different use different tools and spend the effort of you know within each company to build such a tool that's kind of what we did the last 20 years and now there's this nice shift to say we'll adopt open source and we'll also as I said embrace the mindset and then you know we can just bundle the resources and say we all face the common problem let's get together and solve it together in a way that we all can actually use it and that way everyone benefits.

Teal, Ocean, and the open source platform at Roche

Not only do you have this fantastic team but you also have a great focus on building out the infrastructure and tooling and Kieran was talking about in the webinar that Shiny was the spark that helped focus on the creation of Teal and you have other frameworks like Ocean. Tell me more about that and that view of creating tooling internally to help with the creation of content and process. Right so you mentioned Teal and that's kind of an interesting case because you know when you say we'll stop doing things the way we did for 20 years we kind of do it different now everyone says like why should we it works right but when you show people something like Shiny which is something that we didn't have previously as like a solution and people somehow get excited they see like this adds value and so in many ways that was like the initial foot in the door and then once that got adopted readily you know we went on to say okay what are other things where we could leverage are an open source and then you know Admiral Oak these other things got developed but all these tools are quote-unquote useless unless you have sort of a platform that all packages together and you know Pharma being a fairly regulated environment you need to make sure that you know the analysis you do are done in kind of a what we call validated environment basically you need to make sure that the software use is reasonably reliable for what you do and you know to be fair that makes sense because imagine you do an error on your analysis to get a drug approved which then turns out to not be beneficial but potentially even harmful so you got to make sure that everything kind of works and you need that kind of base layer the computing environment in which you can do that and so that's the work that has been done with Ocean led by Ian Healey within Roche which is a huge undertaking to basically say build a novel environment to really embrace all this open source and it's not only about R there's Python in there that can be Julia in there it's kind of really based on all these great open source things like for example Docker containerization so really taking all this cutting-edge development and making sure that we can use that within a kind of our drug development space.

In other interviews and in the community you're a big advocate of using the RStudio IDE and tell people about the RStudio products as part of those environments at Roche. Sure so if you spin up basically an R container in this Ocean environment you have basically an RStudio Server Pro instance ready for you and I can still remember going back to that initial R script I got I downloaded R opened the kind of R GUI that comes with it yeah that wasn't too exciting and then getting to something like RStudio which really you know so much streamlines your work and you get all these nice tools and your development just becomes more efficient you can actually focus on kind of the hard things to design and stuff and have all these little bits and pieces that help you along the way.

Looking back at your career just a couple years ago you listed Shiny and projects that you had worked in Shiny what is it like to publish Shiny applications internally at Roche to Posit Connect and are there other types of content there like APIs and things that that you use in the clinical process? Yeah so when we develop these Teal applications internally we obviously want to make sure that our stakeholders who want to use them have readily access to that and kind of you know data scientists are not kind of DevOps people or IT people they don't know how to pull up a server I certainly don't and I'm sure that many don't even want to bother about that they want to build the app and then they want to click a button and say okay now it's available to these group of people and it's kind of great to have like a product as kind of Posit Connect which yeah streamlines this for you and then it works and you can concentrate on kind of your expertise which is to make sure that the actual application that is built is fit for purpose for your clinician for your safety scientist whoever wants to use it in the end.

Preparing for the first R-based FDA submission

One of the key things right now that the industry is focused on and I feel like other pharmaceutical companies are looking at Roche is the buzz around next year and in the upcoming years of doing submissions fully in R and there's been some pilot work done around that but I believe Roche has been the key organization that says we're going after that what's that been like internally as you prepare for that?

Obviously a huge challenge because we made the decision to say every new study starting this year 2023 should go on our new platform with Ocean and use all these next-gen tools like Admiral Oak, Teal and etc to deliver their studies and you know we cannot influence whether or not a study reads out positively in the end but if it does that basically means it's time for that submission which uses R and open source as its backbone and will get submitted to the health authority and actually there's one phase 3 trial we have in oncology which is already in our platform since last year they were kind of early adopters got ready to be able to do this and their plan to read out their study by the end of this year so even potentially December November of this year 2023 so there could be a kind of first clinical trial submission using all these tools in R to the FDA and which is huge and obviously that puts a lot of pressure not only on the people working on that particular study because everyone is looking at them but also people like me building tools because they feel the pressure of if our tools don't are not up for the job then they will fail in a way and everyone within the organization is I would say laser focused on making sure that we achieve this goal and it's really interesting to see I mean there's so many great developers within Roche and all converging on this kind of one goal and trying to make it work.

Everyone within the organization is I would say laser focused on making sure that we achieve this goal and it's really interesting to see I mean there's so many great developers within Roche and all converging on this kind of one goal and trying to make it work.

It's very interesting because if you look at new drug applications on the FDA side open source and R has been used for quite some time I think I've even found submissions as far back as 2007 where R has been used and people and even organizations are starting to use Python in some of these submissions and so there is this interest of interoperability and moving to a world where open source languages can work together and you see an area like PK PD there's a lot of tooling used there for modeling and simulations and nonlinear mixed effects models and things like that how much of the future do you see Teal and these frameworks that you have evolving to include other language as part of these submissions? It's an interesting question and I think it's hard to say really but the way we for example build this Ocean platform is to say let's make it language agnostic in a way let's not repeat the quote-unquote error we had 20 years ago to lock in into a single tool stack but build something that is extensible such that if in the future for example Julia is very interesting for PK you mentioned if people want to use that they should be able to we should empower them in a way to use whatever tool they feel is best for the job and I think that's that's kind of a critical shift in mindset to say we have a tool and we need to make it work for use case X but instead say we have use case X which of all these tools out there is the best for the job and then enabling people to say this is the best one let's take it and let's bring it into production so to speak.

Validation, reproducibility, and the future of R in pharma

And there seems to be such an obsession in our industry and rightfully so around validation or should I say qualification and reproducibility right do you see this being built out further into different infrastructures tooling technology what do you think the next two to three years of that space will be? So I hope that within three to five years what you mentioned again we have a convergence on that and I think that's already starting to happen because there is this R validation hub which is again lots of pharma companies coming together and trying to tackle that problem together and there are thoughts around having a CRAN like repository for quote-unquote validated packages for clinical trial data so people are very much aware that this is a problem we really only get to tackle efficiently if we again work together because if we have again a validation approach in Roche at GSK at J&J it's probably not the best approach so I really hope that again just as we had on the code development side we can come together and make something work out there but it certainly seems to be going into that direction which is good to see.

Talk about going in new directions do you think that will ever be a Python package in the pharma verse or some other? That's an interesting question I wouldn't say no because you never know what the future brings I think now that we have shown that one particular open source language is viable why shouldn't another not be viable as well that being said I think there's a reason why R is particularly interesting for the pharmaceutical space because if you think about it the kinds of analyses that we do in terms of a primary end point of a study that's really the kind of classical biostatistics often also frequentist kind of way of analyzing data and really R is built just for that it's a language for statistical computing and while Python may offer that to some extent as well it's really another beast it's a general purpose language that being said it certainly has attracted a lot of data scientists particularly kind of ML and AI space and that is also coming to pharma as well but it's not the kind of bread-and-butter work of showing with a clinical trial that this particular drug actually works it's more like on the periphery I would say so there are use cases for sure would someone say port Admiral to Python maybe but I'm not sure it's worth the effort.

Thomas I can't tell you how much I've enjoyed this I've the fact that we're actually able to hang out and chat is much better than engaging in emails or virtually through R and Pharma it was a pleasure well thank you so much thanks Thomas.