Resources

Sean Lopp | Posit Investments in Pharma | Posit

From rstudio::global(2021) Pharma X-Sessions, sponsored by ProCogia: R/Pharma is an organization of R enthusiasts who work in the pharma and biotech industries. This presentation summarizes the group and presents some goals for 2021. More about Sean Lopp: Sean has a degree in mathematics and statistics and worked as an analyst at the National Renewable Energy Lab before making the switch to customer success at RStudio. In his spare time he skis and mountain bikes and is a proud Colorado native. Learn more about the rstudio::global(2021) X-Sessions: https://blog.rstudio.com/2021/01/11/x-sessions-at-rstudio-global/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Awesome. Thank you, Sam, and welcome, everyone. We're really excited to have this kind of pre-RStudio global session. So as another person from RStudio, just really glad to have you all here, and thanks for taking part of your week to spend time with us. So I'm going to jump right in. We only got a couple of minutes here. The kind of first thing I want to do is talk a little bit about, conceptually, how we think about data science here at RStudio, how that applies to pharma, and then we'll look at some of the exciting updates that we're making to support that conceptual framework.

So data science, we like to talk about it as kind of a journey. You start out with this hypothesis.

Yeah, no worries. So we think about data science as a journey. It's a very iterative process, as I'm sure you're all aware. It starts with a hypothesis. There's development and data involved in that development. There's collaboration with others, with other teams, other stakeholders, and then at the end, it's some type of deliverable, whether that's automation or a handoff to the FDA.

And so in pharma, oftentimes, what this looks like is something like early-stage research, where you're developing that hypothesis that might evolve into a full-blown experimental design and trial, and then into something like a report or a submission. It might look more like manufacturing or marketing. Those are also key parts of the pharma journey, determining some type of objective, and then iterating to decide if that was the right objective, maybe coming up with models and doing production scoring. And so there's a lot of different ways this kind of journey can be realized in the pharmaceutical space.

RStudio's tools for the data science journey

What I want to show you today is how we're building tools to help and where those tools fit into the different parts of this journey. So on the top, you have some major updates we've released in the last year to RStudio's professional products. So in the development side, we've done a lot of work to ensure that RStudio scales as your data and the richness of that data grows by allowing you to tap into things like the cloud. In the middle part there, where you're thinking about collaboration and iteration, we've done a lot of work in a tool called RStudio Connect to help you dynamically share content. And then when it comes to hardening and automation, we've done a lot of work in a product called Package Manager to ensure that whether you're doing analysis in R or in Python, whether it's using CRAN or Bioconductor, that you have the correct packages and dependencies that you need to be really certain that your work is reproducible and reliable.

So those are the things in blue. Those are kind of all available today as part of RStudio Team, which is that professional platform for doing data science that you'll hear a little bit about this week. But on the bottom, maybe more exciting are some of the developments specifically in the open source community that we've partnered with pharmaceuticals, the companies to kind of bring to really support the workflows that you're doing day to day.

So the big one that we're going to look at today is this new kind of reimagined R Markdown. It's actually helpful throughout the journey of creating hypotheses, collaborating and sharing. So we're going to spend some time looking at that. There's new packages for the middle layer of work where building models, specifically tidy models, is going to be a major part of the RStudio conference. There's some work on Shiny in R Markdown that's being done. One example of that work is the thematic package. And then on the far end here, something that's really specific to the pharmaceutical space is the work that we're doing around validated submissions.

And so one key element of that work is a tool called GT, which is an R package for creating tables, and then some work around validation and how different platforms and tools that are supportive of submission can be validated.

Validation documents

So let's drill into a few of those specifically, and then actually kind of look at what's new. So kind of going actually backwards, we'll start with those validation documents I mentioned. These are documents that outline for packages that are affiliated with RStudio. So things like Shiny in R Markdown, tidyverse, tidy models, as well as RStudio team, those professional products. These documents outline how to think about their software development lifecycle for compliance. So if your QA team has ever asked you, how can we be sure that R is producing valid and correct results?

That's a big question. R is a huge ecosystem. Some of the later presentations today are going to talk through how to think about answering that question. But we're excited to announce some documents that hopefully will just answer that question for you for these RStudio affiliated packages. So be sure to check those documents out. We won't read through them today. That'd take a lot more than the couple of minutes we have left.

Bioconductor support in Package Manager

The second thing that I wanted to kind of mention, we talked about in the submission space, it's really critical that you have a package management solution. And so RStudio offers RStudio Package Manager as one option there. And we're excited to announce Bioconductor as a new addition to that offering.

And so to just kind of quickly show you what that looks like. If you're using packages from Bioconductor, you might be familiar with this notion that Bioconductor does its releases about twice a year. And so Package Manager makes those same releases available to you. There's instructions for inside of Package Manager for how to set this up. But one of the things we're really excited about is that previously there was kind of no way to keep CRAN and Bioconductor in sync. And so if you're using RStudio Package Manager, you can actually go ahead and keep those two things in sync so that as your Bioconductor packages install packages from CRAN, you'll be guaranteed to get the right results.

And I'm showing you today kind of an internal version of this, but you can actually go to packagemanager.rstudio.com, which is a free version of the Package Manager service. And that Bioconductor offering is available there as well. So if you want to source Bioconductor packages as well as a kind of frozen version of CRAN that will match the Bioconductor release, that's available to you today.

R Markdown updates and the GT package

So I know we're going kind of in rapid fire succession, but the last thing I wanted to talk a little bit about was some of those changes in the open source ecosystem around R Markdown and some of the packages that work with R Markdown to create these beautiful documents that you might be using internally or for things like submissions. And so I'm going to talk about two things, an update to R Markdown and this package called GT. And so we'll kind of combine both of those into a single demo.

But before we hop into the ID, just to give you an understanding of GT, it's a package short for grammar of tables. So it's very similar to ggplot2, which is the grammar of plots. And so what GT is all about is allowing you to write really expressive and flexible code for creating tables and then take that code and use it to render those tables wherever you need them. So the same code can be used to create tables that render to RTF for things like Word documents or to render to HTML.

And so what GT is all about is allowing you to write really expressive and flexible code for creating tables and then take that code and use it to render those tables wherever you need them.

And so that's kind of the concept. And I'll show you what that package looks like and how you use it inside of R Markdown. So let me switch over to the RStudio ID. You can this is all open source, so you can use the latest version of RStudio on your desktop or on a server with these packages. And basically, the workflow that you'll see is similar to what we're looking at here. So I have some data. This is just kind of raw data inside of a data frame. And to turn that data into a table with GT, it's very similar to how you might create a plot, ggplot2. So you start with this GT function and then you add pieces to get to the final end result. So in this case, we're adding some summary rows, we're adding a grand summary of summaries, and then we're telling GT how to format the different text cells.

And so I'll actually go ahead and run this code for you, and we can take a look at the output. So this is going to create for us a GT table. And by default, that GT table is going to be in HTML. So it's right here inside of our markdown document. And you can see kind of what I had described. We start out with this data that's in a table, and then we're adding summary rows, we're adding a grand summary, which is this top row here, and then we're doing some formatting. You can see things like footnotes are fully supported, different sections of tables are supported, you can have custom formatting, column headers, so a lot of flexibility in creating the table just the way you want.

And that's already pretty exciting, but it gets even better because with a single line of code, you can take the same table that we've created and save it to a different output format. So if you need this table to exist inside of a Word document, you can create that version of the table that will then render inside of Word or inside of whatever other format formatting tool you want to use. So that's the really kind of exciting update to GT.

But I mentioned there was something else. So kind of the last thing I want to show you is this update to our markdown. And the way it works is inside of the latest version of the IDE, there's this new option to essentially render the document as you go. And so by clicking this kind of magic button here, what I get is an interactive editor for the R Markdown document. So our code is still here, you can see our table is still here, but I actually have kind of the rendering of the R Markdown in real time.

And so why would you do this? Because it turns out it makes it really easy to create rich and powerful documents. And so as an example, I'll go ahead and insert another table and this table will take a look at some of the features that are available here. So as I type Markdown, that Markdown is going to be rendered in real time. I can even put in things like emojis. So there's full support for interactive editing. And you can even do things like render latex in line. You can see the preview shows up and then the output shows up. So as you're writing, it becomes a really powerful way to create and see exactly what you're doing.

But under the hood, it's all still R Markdown. So I can always go back and forth if I wanted to to the original document that's just a plain text file. So we're excited about how all these things together can hopefully help you in your day-to-day work achieve that data science journey.

And so I'll end things there. This is just a little bit of a preview, a taste, a sampling, if you will, of what's to come both today and at our studio conference. So I'll hand things back over to you, Sam. Thanks for having me.