
Eric Nantz, Alex Lauer, Rich Iannone - A Pivotal Year of Milestones in R Consortium Working Groups
A Pivotal Year of Milestones: R Submissions Working Group and R Tables for Regulatory Submissions updates - Eric Nantz, Alexandra Lauer, and Rich Iannone Resources mentioned in the presentation: - R Submissions Working Group: https://rconsortium.github.io/submissions-wg/ - R Tables eBook: https://rconsortium.github.io/rtrs-wg Abstract: Within the life sciences industry, Shiny has enabled tremendous innovations to produce web interfaces as frontends to sophisticated analyses, dynamic visualizations, and automation of clinical reporting across drug development. While industry sponsors have widely adopted Shiny as part of their analytics and reporting toolset, a relatively unexplored frontier has been the inclusion of a Shiny application inside a clinical submission package to regulatory agencies such as the FDA. The R Consortium R Submissions Working Group has continued the positive momentum of previous submission pilots to achieve substantial progress in this domain. In this talk, we will share the development journey of the working group’s Pilot 2 successful submission of a Shiny application to the FDA, along with the progress on the use of novel technologies such as Linux containers and web assembly to bundle a Shiny application into a self-contained package, facilitating a smoother process of both transferring and executing the application. The R Consortium’s R Tables for Regulatory Submissions (RTRS) Working Group has released the first edition of (Tables in Clinical Trials with R)[ https//rconsortium.github.io/rtrs-wg/] as a free and openly accessible ebook. The book contributes to the development of a theory of displaying tabular information by identifying a small number of table archetypes that may be used to generate the most common tables employed in clinical submissions. Chapters in the book demonstrate how these tables may be rendered in different R Packages including flextable, gt, rtables (with and without tern), tables, tfrmt and tidytlg. All tables are generated from CDISC-compliant data. Comparing the code showcases the robustness of R for aggregating and displaying tabular information and illuminates the flexibility and design tradeoff of the various R packages. The talk will discuss the motivation for the book, present the idea of table archetypes, show some representative tables, and make the case for R as a superb language for analyzing clinical trial data. The RTRS working group expects Tables in Clinical Trials with R to become a primary resource of clinical programming teams. Speaker Bios: Eric Nantz is a director within the statistical innovation center at Eli Lilly and Company, creating analytical pipelines and capabilities of advanced statistical methodologies for clinical design used in multiple phases of development. Outside of his day job, Eric is passionate about connecting with and showcasing the brilliant R community in multiple ways. You may recognize his voice from the R-Podcast that he launched in 2012. Eric is also the creator of the Shiny Developer Series where he interviews authors of Shiny-related packages and practitioners developing applications, as well as sharing his own R and Shiny adventures via livestreams on his Twitch channel. In addition, Eric is a curator for the RWeekly project and co-host of the RWeekly Highlights podcast which accompanies every issue. Alexandra Lauer is a Senior Principal Statistical Analyst at Merck KGaA, Darmstadt, Germany, with a background in mathematics. She specializes in bridging psychometrics and biostatistics, with a primary focus on Health-Related Quality of Life evaluations. Alex is an R enthusiast, co-leading the Merck-internal R User Group. Rich Iannone: My background is in programming, data analysis, and data visualization. Much of my current software engineering work on R packages is intended to make working with data easier. I truly believe that with the right approach, tools like these can be both powerful and easy to use. Presented at the 2023 R/Pharma Conference (October 26, 2023)
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
All right, yeah, let's formally get this going here. So yes, it is my pleasure to be joined by Alex and Rich to talk about this really pivotal year of milestones that we've had across two working groups within the R Consortium. I'll be speaking for a few minutes about the R Submissions Working Group that you've heard mentioned in previous talks this conference, and then I'll turn it over to Alex and Rich shortly thereafter about the R Tables of Regular Submissions Working Group progress.
Yep, so I believe most of you are familiar with the R Consortium by now, but it never hurts to set the stage here. It's very important to note that this is a non-profit organization, but has core mission is to provide tremendous support for the R language and through the R Foundation and community initiatives all in one. And the biggest values you see or the biggest impact you see are the grants that are given to specific working groups or specific projects to improve the infrastructure of R and the overall ecosystem. And they have a tremendous blog as well where you can keep up to date with the latest happenings. They often put spotlights on really influential community members and again, just tremendous value that we've seen in its history already.
R Submissions Working Group overview
And so, like I said, I'm going to be speaking about the R Submissions Working Group, which again, I want to emphasize the cross-industry and cross-collaboration with regulators as really being the life of this working group. We have been able to have successful pilots, which you're going to hear about shortly, where we won the overarching goal is to evaluate a clinical submission package that's produced entirely in R for analysis programs, analysis results, data sets, and yes, even pushing the envelope with interactive applications. But through these pilots, we really want to see where are the gaps, where are the solutions needed so that we can inform the rest of our life science partners, life science community here on the best practices and the issues to watch out for, but also getting almost, I'll get to this later, the collaborative nature with our regulators to bring that feedback into these pilots as well. But again, like the R Consortium itself, everything is in the open.
So the foundation is actually what you've actually heard about in previous conferences, such as R-Pharma. And in fact, in November 2021, now a couple of years ago, this set the stage for us. We had a successful R-based test package that was submitted to the FDA, which met the requirements that we see in typical clinical submissions, such as following the ECTD specification and then supplying a very detailed analysis data review guide or ADRG, if you're familiar with that acronym. Now back then, we were, at least at the time, thought we were required to convert all of our R scripts to text files via package light. But I'm happy to say that's not a thing of the past. But now with this pilot, we also made sure we used R to create all the typical tables, figures, and listings that you would see in a mock submission.
Pilot 2: Shiny application submission
But why stop there, right? We're all big fans of Shiny. We've seen Shiny transform our industry. So we thought, why not take those results from Pilot One and surface them into a Shiny application? You heard about this earlier in the conference, but to recap the structure of this, it took all those analyses from Pilot Two and then bundled it into an application as an R package. And its core is Shiny modules with dependencies that were managed by RM so that we could create that R package reproducibility on the reviewer systems just as easily as we had on our respective development systems from us on the working group. And then this ADRG took most of the material from Pilot One, but then yours truly put in pretty precise instructions to execute this application, bootstrap it from the ECTD transfer, bootstrapping RM.
And I'm happy to say that as of last month, we had a successful completion of this pilot with our FDA reviewers. Their feedback was tremendous throughout this whole process.
So with Shiny, we get tremendous power at our hands for letting the user customize, say, the data sets that they're looking at, the types of results they're looking at. And in fact, in an earlier version of the Pilot Two app, we included more dynamic data filtering. In this case, we were powering the filter modules with the Teal framework and specific modules within that so that the reviewer could easily see what happens when, say, they have a different level of age in the population or a different lab measure value. The initial version applied these filters to the source data sets for all results, including the results of statistical models that were specified in the ADRG. But it would, just like anything in Shiny, update all results, including p-values dynamically.
We have found out, and this is very important feedback, that that exercise for those results or those analyses that were pre-specified, if there are changes, could lead to potential confounding and confusion or misinterpretation if they led to subgroups that were not pre-specified for that particular analysis, say, a regression model, linear mixed model, whatever have you. But we found a great middle ground is that we have a visualization in there, a Kaplan-Meier plot, which only allows, we only allow filtering in that module because we're not doing inference on that. It's really a time to survival of an event that's specified in that analysis plan, but that still takes advantage of some of the great features that Shiny can bring us. So I think there's more to this story coming in the future, but I think us as a community, leveraging Shiny and interactive applications, we need to make sure we use it responsibly.
I think us as a community, leveraging Shiny and interactive applications, we need to make sure we use it responsibly.
Pilot 4: containers and WebAssembly
We have a new frontier. You've heard about this a little bit. Pilot 4 is looking at a way to share this same Shiny application, but in a more efficient manner, using two novel technologies. One based in container technology. You may have heard about container orchestration engines like Docker. That gets a lot of the mind share. But in essence, a container is a way to encapsulate both the application dependencies, but also the compute environment. And it's managed by a container runtime. So it's not a full-blown virtual machine. It's more of a container in an image. We're actually using the Podman container image orchestration for this pilot, because there are some licensing concerns with Docker at the moment. So Podman is open source, and that's where our regulators, colleagues at the FDA, are now going to be trying using Podman to take our application and see if they can run it in this more efficient manner.
WebAssembly. This is the new frontier, the really new frontier. Extremely high moving, or fast moving, I should say. But we have a lot of excitement, us as a community, but also equal excitement on the regulatory side as well, where imagine an application is just run in their web browser. Eliminates so much of that overhead that we had to have in Pilot 2 for bootstrapping that application. We are not there yet, but we've had significant progress thanks to our partnership with Absalon, who has made tremendous contributions in this space, as well as Posit's George Stagg, who's been a heavy consultant on this. And we hope to have results for this to be shared next year, and maybe paving the way for new innovations in the submission space.
R Tables for Regulatory Submissions Working Group
Thanks, Eric. That's really exciting stuff. Love following all the different things you're doing in the Pilot group. I can't believe we're up to four now. Things are moving so fast. Anyways, we're going to talk about our group, the RTables for Regulatory Submissions Working Group. A bit of a mouthful. What I usually do is remember RTRS. It's easily searchable on GitHub.
And who are we? We're basically a bunch of people from different pharmaceutical companies, and also table package developers. We're all mean together, and we're trying to solve this problem, like how do we efficiently create tables for regulatory submissions? There's a lot of R packages that work with tables. So what we've done is we looked at all of them, and we tried to see which ones are most effective at doing this task, and create documentation. So our goal is basically to demonstrate and clarify the various aspects of table creation using R, with a select number of packages. And we believe that this enhances the ability for table creation with R, if we have all this documentation at the ready.
So we actually created a book. That's one of our initiatives. It's a tables ebook, and it's published online, and it's available now. It's called Tables in Clinical Trials with R. And what it does is it contrasts table creation with seven different table R packages. Again, there's more than that, but we sort of like whittled down the list to that short list of seven. And we have a whole bunch of example tables with five different table types in there, with all the different packages. So we can really see, you can sort of compare across the different packages, and see which one suits you. You can just, you know, take examples and run with them. And with that, we have a detailed discussion that goes with it. For things like topics like formatting, pagination. And we wrote this book using Bookdown. So of course, it's available online. And the great thing about Bookdown is it includes runnable code. It runs with the building of the book. So you can just copy and paste that code and adapt it to your own solutions.
Walkthrough of the tables book
Yeah, absolutely. So yeah, this is the book. I hope you can all see it. I have the pleasure to walk you through what we've done. So basically, we're starting with a little bit of introduction, like why we've put together that exercise. On the left-hand side, you can see all of the seven tables or table packages that have contributed to this exercise. So we have gt, we have R tables, the combination of turn and R tables. There's flex table, T format, tables, tidy TLG. These are the seven ones. And we start actually with a little bit of methodology and background. Like here, you can see in the titles and footnotes section, how to add titles and footnotes to an existing table. And you can take that code and compare, for example, gt to tables.
Then like, what do we see here? It's, of course, a table. So we love tables here. And it's a great overview of what output formats are actually available to you with each of the package options. You can see here, they pretty much can all produce HTML outputs, which is amazing, right? HTML is a great format, and I would love to see that used more prominently. If you want to go for PDF, then everything but tidy TLG will work for you. Likewise, if you want to render to RTF, everything but tables will work. But if you would like to go for something more exotic and maybe render to PowerPoint, then flex table and R tables are your only options of choice.
But let's look into clinical tables now, because that's why we're here. Let's start with demographics and have a look at R tables first. This is the R tables code. You can see we've written that in Bookdown. So you can click on the right-hand side here, highlight the code, copy and paste the code to your own console, and rerun everything. All of the code examples use the same underlying data sets, and the data that we're using, for example, here it's the xADSL data set. They follow Adam standards, so it's super applicable to our use cases. But let's look into R tables here. The majority of the code is actually composed of two helper functions, one for the numeric summary and one for the factorial summary. And then what Gabe, the package author of R tables, does is that he creates a layout. That basic table function here creates that layout with the title, subtitles, and footer applied. Columns are split by values in the arm column. And yeah, then we analyze variables age, sex, and country, depending on whether they're numeric or factorial. This only creates the layout. The resulting table is then created by applying that layout to the xADSL data set.
And this is what we see. So this is the standard demographic table that you would expect, right? It has the three columns for the three values in the arm column, the drug X, placebo, and combination therapy. And we see that we have a numeric summary for age and factorial summaries for sex and country. You can now compare that code to gt and see, okay, that code is significantly longer. Why is that? Not only because I've written it, and I kind of like indentation a lot, and it's, yeah, it is a lot, but also because gt requires you to do all of the cell value derivation up front, which means that you have to feed a data frame or a tuple to gt that almost has a style of the table you want to get out. And then you apply your usual table header functions and you format the values. And in the end, the resulting table looks pretty similar to the one we saw in our tables.
Same underlying data is used, so you can directly take that and compare that to the T format. And it's super easy to compare these. We can look into disposition quickly. Let's look into flex table, for example. This is the code how you can, that will produce a disposition table for you. All of these code pieces here use the same underlying data again. So you can directly copy them to your console and apply, and run and see what works best for you. And this is how the disposition table looks like. So comparing that again to, let's say the tables. So yeah, to the tables package here, you will see that it's a little bit longer because tables is one of two package engines that do this cell value derivation for you under the hood. All of the other packages don't do that for you. So yeah, I hope I was able to convince you to at least have a look at the tables book. And yeah, please share your feedback.
What's next for RTRS
That was great. And the code you wrote for gt was super excellent. And as I mentioned before, we have a lot of people in our group that are actually the table package authors. So they were able to like personally write code. So what's next for this R group, the RTRS one. We're going to focus on other topics like table value formatting, pretty big topic in and of itself. Listings, it's still a thing. Basically just lots of big tables. So that has its own sort of difficulties and solutions. Orchestration of table creation. Kind of important. We're making lots of tables. Any efficiencies we gain from orchestration, that should be talked about and documented somewhere. And the final topic we do want to explore is document integration and medical writing.
Getting involved
So yeah, we have great opportunities to get involved in our submissions working group, but also for the Shiny enthusiasts out there, definitely get in touch with us for the, yeah, for getting, going to the working group site. And in fact, on GitHub, we have a link to our submissions pilot four, which is currently ongoing, but we also are going to be talking about next steps with the submissions working group in general very soon. So definitely get in touch with us on GitHub to get more involved. Yeah. And for RTRS, same sentiments. Join. We meet every six weeks. It's quite a good group. I've been in it since the beginning. And if you have anything to do with tables at all in your work and you're part of Pharma or just a table enthusiast, reach out. We're totally available and we'd love to collaborate with you.
Okay. Actually, I do want to serve as a question from Harvey real quick on the submissions working group side. He says it is very exciting. I agree about WebAssembly. You have everything running the client. How would you address the concern of security? So certainly keep in mind what Max Kuhn said in his talk yesterday. Whenever you have a Shiny app in WebAssembly, you don't want to pass in sensitive credentials if you can avoid it. We're definitely positioning this as a more self-contained application. The datasets are already loaded as part of the quote unquote repository that the application is being bundled with, but we have had conversation with George Stagg at Posit who has said that actually the browser running the application is a more tightly controlled sandbox from a security standpoint than even a container such as run with Docker, which is really almost too good to be true, but we're going to keep pursuing that. That will be a hot topic with our colleagues at FDA as we sort this out further.
The browser running the application is a more tightly controlled sandbox from a security standpoint than even a container such as run with Docker, which is really almost too good to be true, but we're going to keep pursuing that.
We do have one more question for you, Alex and Rich. What table packages is used to make tables rendered in that book down? Oh, seven of them. Yeah. That's what I thought. Yeah, yeah. They all rendered nicely in Bookdown. It's brilliant. Yeah. That is the beauty of them. Like that whole book shows you... Let me count. Math is hard. We show five different archetypes of tables and all of them feature seven ways to produce them. So all of the seven packages are covered. Check that out. It's super cool. You can lay the code side by side and yeah, choose whatever works best for you.
All right. Well, I think that will probably wrap us up here. So thank you again, Alex and Richard for joining me for this presentation. I'm glad we could collaborate on this. It was fun to have our brainstorming sessions for these slides. I've learned a lot more about your working group, even of that effort as well. And so we're looking forward to collaborations in the future. Thanks for having us, Eric.

