Joe Rickert | R Consortium initiatives in medicine | Posit (2019)

Transcript#

This transcript was generated automatically and may contain errors.

I work at RStudio , and I'm also RStudio's person on the Board of Directors of the R Consortium. So I'm here today in a dual capacity, and what I want to do is give you some idea of what the R Consortium is helping to make happen in medicine and pharma.

So the R Consortium, we are a non-profit organization organized under the Linux Foundation with three goals. I mean, we're here to promote the growth and development of R and the R community, to support the R Foundation, the people who develop R and take care of it, and also to enable the use of R in large-scale operations and organizations.

So what you see here are the companies that are presently members of the R Consortium. I particularly like to point out Genentech Roche, which has a new member, joined the consortium just this year. And I think, in part at least, their membership was inspired by some of the events that I'm going to tell you about today.

So how do we accomplish these goals? We do it by supporting projects and initiatives that are mostly community-initiated. We go out twice a year with a call for proposals, and we award grants to proposals that are accepted. So far, in the three years or so that the consortium has been in existence, we have awarded over $740,000 in grants directly to community members who've submitted proposals.

We sponsor conferences and working groups and other community initiatives like the R Ladies. And we also help to organize what we call working groups, which are generally community-initiated projects that need organization and sometimes consensus building. So working groups are the way to try to organize the resources of the R community under the sponsorship of the consortium to get something done that maybe you can't do on your own or in your own company or with a small group.

So what I'm going to be talking about today are all R Consortium working groups in process. So what's going on? And well, before I move on, let me emphasize that these initiatives that I'm talking about here are not something that was thought up by the board of directors of the R Consortium. These are really grassroots community-started projects that were just come to fruition through the enthusiasm of the R champions in various companies throughout the world and who managed to organize and get the people who are their employers and supervisors excited about what they're doing.

R/Pharma conference

So let's look at some of these. So first is R and pharma. So R pharma started out last year, envisioned as an annual conference, the kind of complement to this huge gathering, a small kind of almost working group with the kind of environment where there's just people there to get actual work done and make progress.

The conference itself is summarized here. You can see reading left to right, what you should notice is that the organizing committee was made up of people from lots of different pharmaceutical companies and device companies. You can see in the middle graph there that there were even more companies who were attending the conference. We had keynote speakers from the FDA, so the FDA involvement, and the word cloud over there gives you an idea of some of the things that interested them. In particular, Shiny was on the minds of almost everybody. More than half of the talks had Shiny mentioned them in some way or another.

So just to emphasize that, what I'd like to show you here are a couple of projects that were presented at this R pharma conference, and in particular, this BioWarp application from Roche. It was written up in the R views blog. You can see it under how to build the Shiny truck. It's 100,000 lines of code, and what you should take away from here is that it is a production Shiny application that totally permeates all of the regulatory tasks that go on in order to get a drug out.

So here we have, you can see here that they have, they're advancing the use of R and Shiny in an environment that is extremely important to how Roche operates.

Another talk that was presented at the same conference, R pharma, also by a Roche person, emphasized the kind of environment that they're putting together there, and it gives you some insight into the kind of nitty gritty steps that need to get done in order to go through their workflow. In particular, they call out the creation of tables and particular formats that have to go into regulatory submissions, a pipeline for producing those tables, and you have to have a modular framework to keep that all happening in a dynamic way.

So Shiny and the pharmaceutical industry are, you know, it's an extremely important and embedded part now of how they're doing drug development.

Validated R package hub

A parallel initiative, this is another working group, organized about the same time, this project was awarded last year, initiated through this, PSI is a worldwide organization of statisticians working in the pharmaceutical industry, and they had a special methods working group, and what they're trying to produce is a validated hub for R packages. Validated in a way that's particularly important for the pharmaceutical industry.

So what does it mean? You know, validation can mean lots of things, and what they're struggling with is to come up with a concept and a workflow to help people build packages and repositories that would contain just those packages that are validated according to their kind of particular environment in their company.

So what they'll do, the goals are, in the beginning, is to come up with an idea of validation that seems appropriate for the regulatory environment that they're in, to develop a concept of risk that might be associated with a package, and to provide standards for how to go about validating packages, and hopefully it will result in perhaps a repository that's out in the open that will contain packages that have been validated this way. But I think more importantly, companies will use it behind their firewalls to create repositories that have packages that are meaningfully validated according to their particular company standards.

But I think more importantly, companies will use it behind their firewalls to create repositories that have packages that are meaningfully validated according to their particular company standards.

So this is a big deal. You can see that these are just a sample of some of the companies that are involved, and it takes, you know, it goes across the spectrum of the pharmaceutical industry. I'm very hopeful that this kind of thing will set a precedent for other kinds of validation in different industries. I think perhaps people working in financial organizations will find this of use. They'll have to change the details, but with any luck, we'll be able to produce something that would be easy to propagate throughout different industries, and hopefully the R Consortium could be of help in making that happen.

R/Medicine conference

Now in a parallel effort that's going on, is the last year, we also started an R in medicine conference, and originally I thought, you know, why one conference, why not one conference to deal both with pharma and medicine, but the two groups were very, very adamant that they all needed their own space, and as it turns out, it would not have worked to just have one conference, because these working groups are really working at a level of detail that's specific to the particular kinds of problems that they're solving.

So the R in medicine 2018 was really focused on clinicians and clinical uses of R, and something that I did not expect. We had many of the attendees were MDs, PhD MDs, who were either teaching themselves R, or in some cases R in statistics all at the same time, and here are a couple of examples of the kinds of applications that they were talking about.

This was presented by Rob Tibshirani of Stanford, and what you see is a very practical problem of predicting blood platelet usage, a tremendously, you know, valuable resource. People give their blood for this, and they only have the packets, the platelets are only usable for three days, so they have about a five-day shelf life, two days for getting everything in the system, and the graphs there you see are indicative of the kind of waste over time.

So there was a tremendous amount of enthusiasm, not only to help the human problem of saving this valuable resource, but for saving money, to not have these things go to waste. And the solution that they came up with, which is up and running right now, I believe still in parallel with the original system, was to forecast the prediction of how many units to order. So what they did was they changed the, you know, the nature of the question while they were working, and it came up that it was, turned out to be a kind of standard convex optimization problem, a linear programming problem, solved, I think they used it using the CVXR package in R, which is really its own kind of DSL for formulating and solving convex optimization problems. And they're hopeful that they'll be able to save a considerable amount of money.

So this package is, the link there, you can see the actual analysis, but the package CVXR I think might be of use to many of you.

And the next application here I think is particularly interesting for a group of people, you know, data scientists who are used to machine learning. And what we have here are fast and fugal trees. They're a special highly restricted kind of decision tree. It's a standard, one way to look at it, it's a standard classification algorithm. But it's highly optimized for decision making. So you can see that every branch of the tree terminates in at least one leaf. So there's a decision to be made.

And this is the kind of thing that is particularly appealing to doctors who have to do a kind of very quick decision making. And what they'd like to have is a tool that is sophisticated, can help them make the good decision, but is not complicated and easy to explain and interpret. So these trees seem to be doing pretty well.

And they provide an example of, you know, almost every statistical inference problem can be conceptualized as a decision problem. And sometimes going back and forth between the two when you're doing this kind of analysis can be really fruitful. And I think what you'll see with this fast and frugal tree kind of approach is that you don't need necessarily the best algorithm in order to make the, you know, to have a successful outcome. You don't have to have the thing that perfectly beats everything else on the ROC curve. What you need to do is have an algorithm that's good enough to make the right decision.

What you need to do is have an algorithm that's good enough to make the right decision.

So this is a, I like it because it's at the interface of machine learning, decision making, very, you know, in a fast and dynamic environment. And I think it could be helpful for all data scientists to delve deep enough into a problem to see whether it can be conceptualized as a decision problem and not necessarily a mechanistic approach to seeing how to grind through a solution.

So the package itself, FFtrees, is up on CRAN. And you know, I review a lot of packages, R packages, every month. I blog about that. And I particularly like this one because of the extensive documentation. I would like to offer it as an example of an R package where it seems to me that the authors put almost as much work or maybe more work into the documentation than into the package itself. So I encourage everybody to look at that. And you can see what I have here is a diagram that's a standard one out of their vignette. They took a lot of attention to presenting the work and helping people think through the analysis that they're presenting. So I highly recommend that.

Looking ahead to 2019

Now, I want to say a couple of words about what we might be looking at for 2019. This comes from Joseph Chu, an MD at the neonatal intensive care facility at the Massachusetts General Hospital for Children. And he describes this, this actually is an email he sent to me where he calls it his glorified chart review program, his process for chart review.

So are there physicians out here? Yes. So I'm sure then you'll relate to this kind of thing. Not only his sense of humor, but it's pretty amazing the amount of sophistication that he has put into taking data from the epic, you know, the system that is a given that they have to work with and getting it into a place where you can manipulate it with tidy techniques and put it out there in terms of flash for, you know, flex dashboards and reproducible reports. So I think this is the kind of thing that's setting a standard for doing like reproducible medicine at a micro level at the doctor individual clinician level.

Well, it's a I hope that we can learn a lot more from Joseph's participation in the organizing committee for 2019.

Another group that's doing an incredibly interesting and prolific work with Shiny is the Cleveland Clinic Department of Quantitative Health Services. All of these are different risk calculators, Shiny based risk calculators that they have up online. All of these different diseases, they're up there for everybody. Here's an example of one with, you know, the disclaimer. Don't take this as medical advice, but it's a beautiful kind of thing that can help anyone who's interested in the disease and their particular where they might be at risk can access. Again, it's all Shiny based and these are production. Obviously, these are production applications up there, you know, 24-7.

So I'm hoping, in fact, that we will be able to interview some of the people involved at this and I'll publish it on the Our Views blog when that happens.

And then a final example of analyzing glaucoma. Has anybody here ever taken a visual field test? Yes. So glaucoma is a disease that, you know, one of the leading causes of blindness. Prevalence is about 4 percent in the population of people 40 to 80 years old. So I'm in there. And it's a degenerative progressive disease. You know, your optic nerve deteriorates.

So one of the ways they measure this is by making you take this visual field test. So you sit there and you've got your eyes, one eye at a time, focused on the dot in the center. And you've got this little clicker in your hand. And it's really nerve wracking because what you're supposed to do is click every time you see a flash of light. And, you know, the first thing that happens is you say, where's that flash going to come from? So the flashes of light are randomly distributed around your visual field. And the most annoying part is that they vary in intensity. So sometimes it's a sharp speck and sometimes you don't even know that you really saw it. But what they're doing is measuring the differential light intensity. So your sensitivity to the changes in light at various parts of the visual field.

And there's something called a Humphrey analyzer. And what this does is map like 54 sections of the visual field. 52 if you take out where your blind spots are. And it produces a number. So you get an individual number for a test. And what you see here are time series in these little boxes because they represent longitudinal data for a patient. And what these statisticians are doing, Birchuk at Duke and Warren at Yale, is they're building models to predict the progression of the disease by analyzing the changes in these visual field records using spatial statistics in a very sophisticated way. Because they need to measure the nearness not only in terms of where these boxes are adjacent to each other but how they actually go into the optic nerve which is not the same. It's a different kind of mapping.

So they do that and they're using these packages WAMBL-R and this new SPCP package. And they've developed models for the progression of the disease again at an individual level. So this is the kind of thing that if you had this disease you might go to your your physician and say would this be useful in tracking the progression of glaucoma for me over the course of my lifetime.

So you can find that you can find a series of three blog posts written on our views describing this in some considerable detail. And of course the packages and the statistics speak for themselves. You can find them on CRAN.

Call to action

Well that's basically what I want to say and I want to leave you with an observation and a call to action. The observation is that all of these efforts are accomplished by both people and and organizations getting involved in something bigger than themselves.

So when you think about the R Consortium you could you could think about it as kind of the gateway for companies to become involved in the R community. So just as you may write packages and contribute to the community in various ways this is a possibility for companies to organize contributions in a way that are much bigger than an individual can mount on his or her own efforts. So help us keep up the momentum. Get involved if you can. Your company doesn't have to be a member of the R Consortium in order to join a working group and that's down where the action happens. So that's it. Thank you.