Ellis Hughes | R Package Validation Framework

Transcript#

This transcript was generated automatically and may contain errors.

Today, I'll be talking about the R Package Validation Framework, which is a project I've been working on for the past little over a year or so now. And I'm really excited to share with you all what we've been working on and the updates that have come out of this.

So my name is Ellis Hughes. I'm a statistical programmer. I have a background working in statistical genetics. Currently, I work at Fred Hutch under Sharp working on HIV vaccine research. I'm pretty heavily involved in the R community. I'm one of the Seattle Use R Organizers. I also am one of the organizers for the Cascadia RConf. And I also run a screencast called TidyX, where we go through and explain how R code works, usually from Tidy Tuesday submissions. So you can find me on Twitter at Ellis underscore Hughes.

What is validation?

So a common question that any of us, if we've ever tried to introduce new software into the pharma world is, is it validated? That's a question that we're always going to be getting. Is it validated? But what is validation? It can mean different things to different people. Yes, there's an official definition, but it means different things to different people.

So as I understand it, the definition of validation is establishing documentary evidence that our software performs some sort of process, procedure, activity, and compliance with our specifications with a high degree of assurance. So in layman's terms, we're creating documents to prove that our software does what we say it does.

But why do we even care about validation? Why is that question even asked? Well, a lot of people refer to the fact that we'd have to validate our software for FDA submission. But there's a lot of unspoken benefits around validation that don't always get talked about, such as improved quality and safety of our code, because we've gone through and checked it. We're confident in the quality of the code, as well as the fact that it'll return proper results. It results in faster processing. Because we already have this code now set up, we can use it across multiple projects and trust that it'll work. And then it also promotes trust. Because we've performed this validation, we've gone through and vetted it the best to our abilities. So we are confident that when we give it an input, it'll give us a consistent output, as well as throw an error when things go out of bounds.

So validation practice can be a really high bar. And there's a lot of documents that go into creating validation. Because first, you have to fill out a form for specifications, planned uses, environments you're going to be using it in. Then write out some of your code based on those specifications. Record the function authorship in some external file, potentially an Excel file. Then get another form to document your test cases, test environment, how you plan on doing your testing. Then maybe the last form to show how your testing plan is comprehensive, that you're proving that all your specifications are being met through your testing plan. Have a third party come through and manually evaluate your tests and get screenshots for results. And then you have to review your documentation, combine it into a final validation packet for release. But information is shared across all of these documents. So any time one of them updates, you need to go back and update it. And if you manually evaluated all your tests and had to make an update, you're going to have to go through and rerun everything. So it's incredibly inefficient. And at any time, it can feel like game over because you're just redoing your work over and over again.

R and validation as best friends

But today, I want to tell you that validation and R can be best friends. They can work together to provide a validation framework such that we can achieve all the goals of validation without as much additional stress and work that can be involved with that. Using a combination of R Markdown, TestStat, and Roxygen2 , we can essentially make validation push button for validation and generate a document that looks like this, where we're able to capture a place for signatures for everyone that's been involved with the project. We're able to have information about the environment in which we performed our validation and record who wrote what pieces of the project, the specs, the functions, the test cases, the code, as well as share what the coverage is between our specs and our test cases. And finally, record all of our content here, including our test results, all at the click of a button without having to redo as much work.

But today, I want to tell you that validation and R can be best friends. They can work together to provide a validation framework such that we can achieve all the goals of validation without as much additional stress and work that can be involved with that.

And this is the R package validation framework. So there's five key elements to the framework. There's recording your specs, writing your code, recording your test cases, writing your test code, and finally generating the documentation. The advantage of using this framework is, well, it's integrated into the R package development process. It takes you from the beginning of the ideation of I want to create a package all the way through the process to creating a package that now you've validated. It's native to R programmers. This framework lives within the R package itself. So you don't have to leave your R package environment in order to perform your validation or have all the documents in there. It allows for iterative development. Because we don't have docs living across multiple locations, it allows us to update pieces without having to make sure that we've updated the five different other documents with that piece of information. It's reusable and extensible because it's reliant on code. We're able to just rerun things and make sure it'll just run and work. And it's extensible. If you want to move pieces out of your package into a more utility focused package, you can totally do that and just copy out your test cases and specs and whatnot into that new package. And you don't have to be putting in a whole new round of effort into doing that.

And with their powers combined, we're able to generate a reproducible validation report at the click of a button or at the build of a package because it lives in the vignettes folder.

What's next

So that's that's the framework. Where are we at now? So currently I'm working on a white paper with a team from Fuse to describe the framework process in more detail, as well as generalizing and recording the optimal processes that we suggest you follow when you're using this framework. And it's very much under construction. Hopefully it'll be coming out later this year. Really excited for this. We're also working on a ValTools R package. So this is going to be based on the white paper. It's very much in progress, but it'll provide tooling that's not currently available with use this or dev tools to help folks perform their validations using this framework. And hopefully that'll be coming out later this year as well.

So many thanks to all the folks that have been involved with this framework, Fred Hutch, so Marie Venditulli, Anthony Williams, Jimmy, Barthi, Raphael, Alicia, Shannon, Paul and Kate, as well as the folks that are on the Fuse R package validation framework working group, because you really helped me figure out how to expand this framework and really make it more general so it works for more folks. So hopefully now you understand that validation and R can truly live together forever. Thank you very much for having me talk today. RStudio as well as Procogia and putting this together. You can find more code in this presentation at github.com slash the bioengineers slash validation underscore RStudio underscore 2021. Thank you so much.

Ellis Hughes | R Package Validation Framework | Posit

Transcript#

What is validation?

R and validation as best friends

Specifications

Code development and documentation

Test cases

Test coding

Validation documentation

What's next

Featured software#

rstudio