Claire Bai - Translating clinical guidance to actionable insights with R
COTA’s team of oncologists and data scientists curate real-world data used by life science companies and healthcare partners to inform drug development and patient care. Over time, we have received many of the same questions from our data users, which indicated a dire need for translating our internal clinical guidance and data model knowledge into a tool for successfully navigating our data. We developed rwnavigator, an R package that helps users easily prepare COTA data for analysis with time-to-event packages. As first-time package developers, we ran into many challenges as we created, tested, and deployed rwnavigator. We hope to share with the greater R community our motivations for developing this package and best practices we learned along the way. Talk by Claire Bai Slides: https://github.com/rstudio/rstudio-conf/blob/master/2024/clairebai/rwnavigator_FINAL.pptx
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Good afternoon, everyone. My name is Claire, and I'm so excited to be here at Posit.com this year. Slides disappeared in the back. On behalf of Coda, to tell you about our solution to navigating complex real-world data using R. At Coda, we use technology analytics and oncology expertise to create clarity from real-world cancer data. However, like any other data user, we've had a lot of challenges refining data for a given study analysis. To solve this, our team of data scientists and statisticians have developed an R package. It's called rwnavigator, or real-world navigator.
Talking about rwnavigator, our package, which its sole purpose is to help users easily prepare real-world oncology data for use with gold standard survival analysis packages.
Okay, to summarize the essence of rwnavigator, this package is a fast-forward button for creating sleek, clinically-informed, one-row-per-patient data tables to facilitate cancer outcomes research. Being able to speed up research studies that could result in important discoveries about cancer treatment and care sounds pretty nice, right? Well, it wasn't exactly an easy path to success, and even though the genesis of rwnavigator itself happened only about a year and a half ago, the journey actually goes back years in COTA's history.
Okay, to summarize the essence of rwnavigator, this package is a fast-forward button for creating sleek, clinically-informed, one-row-per-patient data tables to facilitate cancer outcomes research.
The origin of rwnavigator
For as long as we've collaborated on research studies, we've been asked by our provider and life sciences partners about how we get from the complex cancer data that they see to the clear study results we present. Some common questions include, why are there so many patients whose characteristics remain unknown? How do I pick a sensor date across all of these tables? What variables should I consider as an event for a given time-to-event endpoint? The good thing is, these are all questions that have already been fielded by our own research team over time. Using the knowledge of our medical experts and oncologists, we've created a unique clinically-informed perspective on the optimal use of our data.
To make this perspective scalable across all our internal projects and also give external users access to the same knowledge, we took a step back to look at the bigger picture. We quickly discovered common threads connecting many of our data preparation processes for time-to-event analysis. We're already leveraging variables across our data tables that were specific to our data model to answer key research questions. So why not build functionality that would not only help users replicate our research from beginning to end, but also simplify and standardize our own code? Thus, the idea for rwnavigator was born.
Package design and goals
So the overarching goal of our package was to have well-tested, clearly-defined functions that could identify, clean, and organize patient data across multiple tables in a one-row-per-patient fashion. Most data users we work with, along with our own team of analysts, including myself, are proficient users of R. So we wanted to create something that would not reinvent the wheel for survival analysis, since there are existing packages that do that really well. Rather, we wanted to add something that could be easily implemented into our workflow in order to elegantly produce these one-row-per-patient tables. This would accelerate the manipulation of our data into something compatible with standard survival analysis.
There were other important considerations on top of these prerequisites. One was to include user autonomy in selecting data from specific tables, rather than all tables in the environment that included a given variable. For example, in calculating real-world overall survival for a set of patients, if there was no date of death recorded for a given patient, they may have last been recorded to be alive anywhere from being contacted by their provider to going in for a lab assessment. However, a researcher may not consider lab result times a potential event indicator, and thus they could eliminate that particular data table from consideration for identifying the last time a patient was recorded as alive.
Once we were able to identify and calculate the event of interest, again here it's death, we had to consider an index date to calculate the amount of follow-up time to that event. So in oncology research, a good anchor point is the date of disease diagnosis or the initiation of a treatment regimen. However, cancer patients may often receive more than one treatment cycle in their entire journey, and not all patients will receive the same number of treatment cycles for a given disease, resulting in multiple index states. I'm really sorry, what happened to one-row-per-patient?