How to deploy a Shiny application using clinical trial data to Posit Connect
Episode 2: Publishing a Shiny application in R to Posit Connect - Using Clinical Trial Data Led by: Ryan Johnson, Data Science Advisor Follow-up links: * Posit Team: https://posit.co/products/enterprise/team/ * Talk to us directly: https://posit.co/schedule-a-call/?booking_calendar__c=RST_YT_Demo * Follow-along blog post: https://posit.co/blog/publishing-a-shiny-app-in-r-with-clinical-trial-data-to-posit-connect/ * Source code for example: https://github.com/ryjohnson09/adam_analysis * Posit Team demo resources: pos.it/demo-resources Timestamps: 1:35 - High-level overview of Posit Team 3:30 - Overview of clinical trial data used 5:31 - Opening up RStudio session on Posit Workbench 7:51 - Creating a new directory in RStudio 9:16 - Upload the ADaM dataset to Posit Workbench 10:17 - Using packages from a validated repository on Posit Package Manager 12:37 - Install packages for your Shiny application 13:49 - Pasting the code for the Shiny application (https://github.com/ryjohnson09/adam_analysis) 16:16 - Publishing your Shiny application to Posit Connect 18:36 - Changing access controls to published Shiny application 20:25 - Using renv to record your R environment On the last Wednesday of every month, we host a Posit Team demo and Q&A session that is open to all. You can use this to add the event to your own calendar. Who are these monthly demos for? Everyone is welcome to join us - regardless of industry, background, or experience! We will discuss topics that will speak to: * Data scientists and administrators new to Posit Team or are looking to grow their understanding of our toolchain, * Teams searching for a new analytic platform built to support open-source data science, * And, those that are just curious about Posit Team! What you can expect from the monthly Posit Team demo: During the session, we will walk through an end-to-end data science workflow and demo the core functionality of Posit Team while highlighting some of our latest features! While each session's content will vary slightly, here are a few core topics we will address each month: * Open Source Analytics: The future of data science is open source. We'll discuss methods for leveraging open-source tools and packages in a secure and scalable way! * Deployment: How to share the amazing data science assets your Team has built, including web applications, machine learning models, APIs, and more! * Data Access: Data comes in various forms and is stored in various ways. We'll discuss best practices for accessing, reading, and writing data! * Job Scheduling: Do you have recurring data science jobs? We'll show you how to automate these processes using Posit Connect. What is Posit Team? Posit Team is a bundle of our popular professional software (Posit Workbench, Posit Connect, and Posit Package Manager) for developing data science projects, publishing data products, and managing packages. Registration is not required. The event will be streamed through YouTube Premiere
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello, everybody. My name is Ryan Johnson, and I'm a data science advisor here at Posit. And welcome to this month's Enterprise Community Meetup, where we'll discuss an end-to-end data science workflow using Posit Team.
As a reminder, this will be a recurring event on the last Wednesday of every month, and we hope it'll serve as a good primer for those new to Posit Team, but also helps current Posit Team users become more familiar with our tools. Every month, we will cover a different topic and highlight all three tools within Posit Team. We have Posit Workbench, Posit Connect, and Posit Package Manager.
So this month, we have a pretty exciting topic for everyone, especially those within the life sciences, healthcare, and pharmaceutical space. So we are going to use Posit Team to create a Shiny application of clinical trial data in R, all while discussing best practices for working within what's known as a validated environment. As we go through today's session, I'll be sure to stop and note various ways Posit Team can help support a validated compute environment and encourage good coding practices.
High-level overview of Posit Team
I first want to make sure everyone on the call has a solid, high-level understanding of Posit Team. So starting up here at the very top, we have our data scientists and analysts. These are the folks writing code and creating insights using Posit Workbench. And they can use whatever language they want, including R and Python, as well as any development environment they choose, including the RStudio IDE, JupyterLab, Jupyter Notebooks, and VS Code.
Now for the R developers, they may be creating insights like Shiny applications, pins, R Markdown documents, or APIs using the Plumber package. And for Python developers, they have a home to create interactive applications using things like Streamlit, Dash, Bokeh, Shiny. They can also create documents using Jupyter Notebooks or even Quarto, and APIs using Flask and FastAPI.
Now once the developers create content, they need a way to share it with the people that need to see it, including decision makers, coworkers, clients, or maybe even just friends and family. And so that's the role of Posit Connect, which is our professional publishing platform. And finally, we have Posit Package Manager, which helps organize and centralize the great open source R and Python packages your team uses, as well as host and distribute any internally developed R and Python packages you've created.
So for today's session, we are going to start within Posit Workbench, and we will leverage the RStudio IDE to create a Shiny application in R using clinical trial data. And then we'll publish this application to Posit Connect, all within the context of a validated environment. Now just for awareness, the clinical trial data that we'll use today is publicly available. It's been properly de-identified and cleansed for demonstration purposes.
About the clinical trial dataset
So before we actually dive into this workflow, let's take a few moments just to talk more about this clinical trial data set. Now what is it? Where does it come from?
Now as a high level overview, clinical trials, they are conducted usually to assess the efficacy of some disease intervention. So during the trial, data will be collected, organized, and eventually that data will find itself inside of an analysis ready data set called an atom or analysis data model data set. So these atom data sets are usually what's submitted to various regulatory agencies, such as the Food and Drug Administration or FDA.
So here's actually a quick glimpse into what this example atom data set looks like. Now we are only going to focus on the columns that you see here, which include things like subject ID number, the treatment group, age, BMI, height, weight, and years of education. So these subject level variables are usually found in the data set. Now while this data is not particularly large, it does contain numerous descriptive columns, and it can be a challenge to create static insights such as plots and tables to explore them all. So for this reason, we will create an interactive web application using this data set.
Setting up the RStudio project
So to start, let's open an RStudio session within Posit Workbench. So here we have in Posit Workbench on the right hand side, you can see some of the projects I've been working on, but we'll go ahead and click New Session. And you can see right off the bat that I have a choice between Jupyter Notebooks, JupyterLab, RStudio Pro, and VS Code. So for today, we're going to select RStudio.
And now before I start this session, I want to briefly describe our demo computing environment here at Posit. So we host Posit Workbench within a clustered environment, which is managed by a tool called Kubernetes, which you can see here. So for validated compute environments, it's often necessary that certain aspects of this RStudio session are controlled for, including what R version is being used, which R or system packages come pre-installed. And so that's where tools like Docker come into play. And here at the very bottom, you can see that I have a choice for various Docker images I can deploy within. And this can be customized to your team's needs. So for this demo, I'm just going to stick with the default image right here. And we'll go ahead and click Start Session.
So while this is booting up, before we do any analyses, it's really important that we establish a new RStudio project for this clinical trial analysis. So this will provide us with a working directory for this project and only this project. So this helps ensure that the analyses we do here, that they won't impact anything else on our system or another project. And this is generally considered good practice for not just clinical trial data, but really all data science projects.
So to create a project, we have a few different options. We can go to File at the top of my screen and select New Project. There's also a button right underneath the Edit button down here. Or if you go to the right-hand side of your screen, you can see Project None. And this is actually a good way of knowing that I'm not currently within a project. So let's go ahead and create one. I'm going to click on this button and select New Project.
So this is going to be a brand new project. So we're going to go ahead and create a new directory. But I do have a choice to create a project within an existing directory or pull in a project from Version Control. But we'll stick with New Directory. Select New Project, the first option. And the directory name, I'll just say testADSLAnalysis. I'll place this directory, this project, under my PositProjects directory. And then you'll see two boxes right here, Create Git Repository and Use renv with this project. So I'm going to check those boxes and I'm going to hit Create Project.
So while this is booting up, a quick note on those two boxes that we just checked. So to ensure reproducibility for your project, it's always a good idea to use Version Control and to initialize the project with something known as renv. Now once we finish our workflow today, we'll actually go back and discuss renv in more detail and how it can be used to record our project's package environment.
Installing packages from a validated repository
Alright, so here we are now within our new project and you can see the name of it in the top right corner. So first step in our workflow is to upload that Atom ADSL dataset, which can be found right here on the CDISC website or the GitHub page. So this is going to be the dataset right here, ADSL.xpt. So I've downloaded this file to my local machine and to upload it to Posit Workbench, all we have to do is select this Upload button, choose File, and I'll select ADSL.xpt, hit Open and OK. And now we can see the file right here at the bottom in our Files directory.
So now that we have the data in our environment, we need to install some packages, some R packages that will be used for the creation of our Shiny application. Now in a validated environment, developers are often limited to a select number of packages that their team has approved for clinical data analyses. So the question really becomes, how do I make sure I use packages from a validated environment? How do I use a validated repository? Well, enter Posit Package Manager.
So here in our demo environment of Posit Package Manager, you can see I have a few customized R repositories over here on the left-hand side of your screen, including right down here a validated repository, which includes packages that we have classified as validated for clinical work. Now this is just an example repository to demonstrate that it's possible to create customized R repositories depending on your team's needs.
Alright, so we can start exploring these various R packages, but how do we actually install them from this repository into our RStudio session on Posit Workbench? So to do this, we're going to select the Setup tab at the top of your screen, and I'm going to scroll down to the bottom right here, and we see Using Packages Inside the RStudio IDE. And you have the choice. You can change the repository URL using the global options, or you can set it more programmatically by running the code right here at the very bottom.
So I'm going to copy this code, and I'm going to come back into Posit Workbench, and I'm going to paste it into my R profile script that you see right here. So I'm going to click on this, and I'm going to add a few new lines here. I'll write a comment to myself saying Validated Repo, and I'm going to paste that URL that I just copied, which you can see here is that Validated Repository on Posit Package Manager. So the .rprofile script will always be run as soon as you start a new R session, so I'm going to go ahead and save this file, and I'm going to restart R by clicking Session, Restart R.
So if I close out of this .rprofile and come into my console, I can check my active repository by running Options, Repos, and hit Enter, and you can see that I have my validated repository ready to be used and to install packages from.
All right, so once we have all that, we can, well, for our Shiny application, we actually need to install some of the packages first. So before I actually bring in the Shiny application, let's first install some packages. So I've actually copied these install script over here, and I'm just going to paste it into my console, and we have a few packages here that we're going to need. So we're going to install the Shiny package, obviously, to create our Shiny application. The Haven package is going to be used to read in this ADSL.xpt dataset. BSLib is used to customize the user interface for our Shiny application. And then ggplot2, scales, and plotly, these will all be used to create that box plot for our Shiny application. So I'll go ahead and I'm going to hit Enter, and install all these packages from that validated repository.
Building and running the Shiny application
All right, and there we go. So now the stage is set. So let's go ahead and create a Shiny application to explore this Atom dataset. So to do this, I'm first going to create a blank R script. So right here in the top left corner, I have this little drop down menu, and I'm going to select R script. And I'm just going to paste the code for the Shiny application. We're not going to dive too much into the contents of it. But I do want to demonstrate its functionality. So before we do that, let's first save it. So I'm going to hit this little floppy disk symbol, and we'll save it as app.r.
And you can see right here, I can run this application within the RStudio IDE running on Posit Workbench. So I'll select run app, and I'll open up my viewer pane down here.
And so this is the simple application that explores the subject level variables in this study, and it compares them between the various treatment arms. So here we have those that were given a placebo. In the middle, we have those that were given a low dose of a drug known as Xanimaline. And then another treatment group that was given a high dose of Xanimaline. And on the y-axis, we had the various subject level variables that we can explore, such as age, baseline BMI, baseline height, baseline weight, and we have years of education.
Publishing to Posit Connect
So now that we've built the Shiny application within our validated environment, it's now time to publish it to Posit Connect so that we can easily share it with the people that need to see it. So for instructions on how to publish a Shiny application from within the RStudio IDE, I'm actually going to enlist the help of one of our jumpstart examples on the Posit Connect homepage.
So let me switch over to Posit Connect and on our homepage right here, I can select publish and I can go to jumpstart examples. And you can see right here, I have a Portfolio Dashboard Shiny application. So we're not actually going to use this specific app within this example, but we will use the instructions for how to publish to Posit Connect, which starts on step five. So we'll select get started using Portfolio Dashboard and then I'll skip forward to step five, which is publish.
So it tells us that we need to push the publish button, this little blue button, from within the RStudio IDE. And if publishing for the very first time, you'll actually need to grab the URL for your Posit Connect server. So I'll just copy that to have it on my clipboard. All right, so let's go ahead and do this. So we'll come back to Posit Workbench where I have my Shiny application running. I'll go ahead and stop it. And here's our Shiny application. It's app.rscript. And right at the top, we see that blue publishing button. So I'll go ahead and click on this.
And you can see I'm publishing this for the very first time. So I'll select next. And I'm going to publish to Posit Connect. And here's where you want to paste in the address for your Connect server, which you can see for us, it's actually auto-populated. So I'll select next. And it's going to get this little pop-up menu right here to connect. And that's what I want to do, so I'll select connect. Successfully activated token. So I can close out of this and connect my account. And now you can see the Posit Connect instance over here in the top right quadrant.
We can give our application, our Shiny application, a title. I'll just leave it as a default right here. And this is where you want to choose what's going to go into the deployment bundle. And so really we only need this ADSL dataset as well as the code for the Shiny application, which is just showing over here. And then we'll hit publish.
And so once we do this, we'll get this deploy tab that opens up. And RStudio pretty much takes care of this for us. So what it's doing right now, it's actually capturing my environment. So it's looking to see what R version am I using, what packages am I using, what versions of those packages. And some other information about my environment and then replicates it on the Posit Connect server, which it just opened up to. And once that environment's been replicated, it then deploys my Shiny application, which we see right here.
All right, so here's the running Shiny application. I can still interact with it just like it was running within Posit Workbench. And now, but it's hosted on Posit Connect. And again, the main purpose of Posit Connect is making it super easy to share this content with people that need to see it.
So here within the access tab, so as a publisher, I have control over who has access. I can change the sharing settings. Currently, it's set to specific users or groups. And I'm the only one, that's me, I'm the only one that can actually view or edit this application. And if I want to be very specific, I can maybe share this with Rachel. And now we would be the only two that can view this content.
I have a few other options as well. I can select all users login required. And that just means that if someone has the credentials to log into the Posit Connect server, they can view this content. Or I can select anyone no login required. And that basically means that if someone has the URL at the top of my screen, which you can customize down here, they can then view this content. So it pretty much opens it up to the world. And the people that are seeing it, they don't have to know anything about Shiny, R, Connect. They can just view it like they would any other website.
they don't have to know anything about Shiny, R, Connect. They can just view it like they would any other website.
Recording the R environment with renv
Okay, so to wrap things up, let's actually go back to Posit Workbench. And we're just going to take a moment to record our R environment, so that anyone else or probably more likely our future self could reproduce our work. So some of the things we'll want to record include which R version did I use, which R packages we use, and what versions of those packages, as well as which repositories we use to install the packages. Now, as I mentioned earlier, we're going to use an R package called renv to help out here.
So renv, it's an open source R package, and its name is shorthand for reproducible environments. So we're going to use renv to record our R environment into what's known as an renv log file. As you can see right here, I'll go ahead and click on this.
So this is the current state of our log file, which was created when we first initialized this RStudio project. And we can read it from top to bottom. You can see we're using R for this project, version 4.2. And here are the active repositories when I first initialized this project, which we'll obviously need to update to reflect our validator repository. Now, everything below this, starting on line 11 right here, this is information about the packages in this project, including which version I need. Now, as you can see, we are only seeing one package here, the renv package. And so we'll need to update this log file to reflect the current state of our package environment.
So I'm going to open up my console down here, clear my screen. And I'm going to run from the renv package the snapshot function. We'll just ignore that message for right now. And we're going to see some text right here.
So I think this is one of my, the favorite parts about the renv package is that it'll actually inform us what will happen if we decide to proceed. So it's letting us know that all these packages that are not currently in our log file, as represented by this asterisk, will be written to the log file with the version reflecting, reflected in my current R environment. So if this all looks good to you, which it looks pretty good to me, do you want to proceed? I'll hit Y for yes and hit enter.
And so now if we open up our log file, you can see there's a lot more packages in here, but I didn't have to edit this myself at all. And it is now reflecting my current R environment and all those packages inside and the correct versions. And so this file can be shared with other users or as a companion file for FDA submission, for example. And it can be used to reproduce your R environment and your analysis results.
And so this file can be shared with other users or as a companion file for FDA submission, for example. And it can be used to reproduce your R environment and your analysis results.
And so with that, we have come to the end of our workflow for today. So I hope everyone found this month's demo helpful. And we'd certainly love to chat more about how Posit team can assist with your data science workflows and support a validated compute environment. So feel free to stick around and we'll have a few Posit folks available to answer any questions you have. So thanks everyone for joining and we look forward to seeing everyone again next month.
Thank you so much for the demo, Ryan. And thank you everybody for joining us today. I see there's already been a few great questions in the chat. And as Ryan mentioned, a few of us from Posit are going to stick around here for another 15 minutes and answer additional questions. As a reminder, if you do want to ask anything anonymously, you can use the Slido link, which we'll share in the chat again, but it's pos.it slash demo dash questions. And I'll keep that open for the rest of the week if you want to add anything there. And we'll add the answers to the questions there too.
While we are answering questions using only the chat today, if you'd like to talk with us live, we can set that up too. It's a little bit weird hopping from one platform to another in one session, but I'm going to share a link in the chat right now where you could sign up to go through any follow-up questions with Ryan and I live too. But with that, thank you again for joining us today. And we're going to go jump over to the chat for Q&A. Bye everybody.