Marie Vendettuoli | Lessons learned developing a library of validated packages | RStudio

Full title: Towards an integrated {verse}: lessons learned developing a library of validated packages Developing R packages as a unified {verse} – a set of packages that work well together but with each focusing on individual tasks – is an efficient strategy to structure support for complex workflows. The ongoing challenge becomes managing the growth of related packages in a holistic manner. This is especially problematic in industries with a heavy emphasis on stability, for example if packages need to be validated prior to use in production. In this talk, I will discuss a paradigm for developing and maintaining validated R packages, emphasizing the following areas: 1. Strategies for organizing packages to prevent excessive re-work 2. Facilitating responsive, iterative development and 3. Empathy for developer and user experiences About Marie: Marie Vendettuoli is a Senior Statistical Programmer at Statistical Center for HIV/AIDS Research and Prevention (SCHARP - https://www.fredhutch.org/en/research/divisions/vaccine-infectious-disease-division/research/biostatistics-bioinformatics-and-epidemiology/statistical-center-for-hiv-aids-research-and-prevention.html) @ FredHutch. She holds a PhD from Iowa State University in Human Computer Interaction and started developing R packages for use within regulatory frameworks while working as a Data Scientist at USDA Center for Veterinary Biologics (https://www.aphis.usda.gov/aphis/ourfocus/animalhealth/veterinary-biologics/sa_about_vb/ct_vb_about). Before discovering R, Marie worked in a CBER (https://www.fda.gov/about-fda/fda-organization/center-biologics-evaluation-and-research-cber)-regulated laboratory. Her main interest is developing analytical infrastructure to facilitate scientific analysis for fellow data scientists working in a regulatory environment

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, I'm Maria Vendettuoli. I'm a Senior Statistical Programmer at Fred Hutch in the Statistical Center for HIV AIDS Research and Prevention. At RStudioCon 2020, my colleague Ellis Hughes shared technical details demonstrating how our organization is tackling the challenge of software validation for R packages. It's now a year later. We have three packages released and are tackling the extended challenge of ensuring that our entire verse is fit for purpose as an integrated unit. Today, I'll be sharing my thoughts regarding multi-package development in the pharma environment.

Between these two implementations, both developers and validation report stakeholders are able to verify at a glance that complete coverage exists.

Managing technical debt across packages

The last couple recommendations I have address the topic of technical debt. You may be wondering, why are we splitting assays into separate packages? A couple reasons. First, FDA guidance asks us to revalidate the entire system when one element changes. Now, change is expected, input data structures get updated, analysis needs drift. However, an update to one assay type should not introduce the appearance of change affecting other assays. Secondly, from the user perspective, we need to put some conceptual structures around processes. Sure, every data set will be converted to BDS, but the definition of that basic data structure will vary from assay to assay, and we need to capture that distinction in a manner that is easy to digest.

What we also want to avoid is excessive re-implementation. As we have expanded to multiple assay packages, we have identified utility functions that can be shared and moved into their own foundation package. We can enforce package code dependencies through routine use of the description file, but the power of modular validation means that we can move the associated specifications, test cases, and test code files as well. Validation of the lighter assay package with the new utility package is simply compiling the R markdown document without any need to rewrite content. Likewise, when we need to add additional specifications, test cases, or test code to an existing package, an updated validation report will add the new child documents to existing sources. This has the added benefit of being able to use a simple diff to compare validation reports across package versions.

This has the added benefit of being able to use a simple diff to compare validation reports across package versions.

So, I wanted to take a moment to thank those who made this work possible, especially the preceding efforts by Ellis Hughes and the combined contribution of the SHARP pData standardization team. Future work includes formalizing our validation practice through a FUSE-ORG collaboration with deliverables expected later in 2021.

Marie Vendettuoli | Lessons learned developing a library of validated packages | RStudio

Transcript#

Setting the team up for success

Addressing development from an empathy perspective

Package validation and systematic mapping

Managing technical debt across packages

Featured software#

rstudio