Posit Meetup | Jake Riley, Children's Hospital of Philadelphia | Translating Facts to Insights

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to the RStudio Enterprise Community Meetup. I'm Rachel Dempsey. I'm calling in from Boston today. For today's meetup, we'll learn how the team at the Children's Hospital of Philadelphia is translating facts into insights and learn more about a new package for generating useful talking points. But with that, thank you all again so much for joining us. I'd love to turn it over to our speaker for today, Jake Riley. Jake is a data analyst at the Children's Hospital of Philadelphia and the author of several R packages related to data visualization and automated exploratory analysis. I'll stop my share here and turn it over to you, Jake.

So hi, everyone. It's cool to see such big people from all over the world joining the talk today. So yeah, my name is Jake Riley. I'm a data analyst at CHOP. It's the Children's Hospital of Philadelphia. I love doing data visualization and geospatial analysis. I've also built a couple of R packages. One is called Simple Colors. That was my first package, which is just nice, simple, nicely named colors. I'll share a link at the end to that vignette. One around just like working with Shiny dashboards and then Headliner, which is the one we're going to talk about today.

This is my first time using ggtext, and it's an awesome package, but the headline part was very fast and easy for me to incorporate.

Using that Pixar films data, so this one, I just told it to compare two conditions, and then facet these charts based on the original columns in the data set. So there was the box office domestic, box office international, the Metacritic and Rotten Tomatoes scores, and then the runtime. So these are all columns that were in my data. So what I did here is I pivoted the data to be long, because that's, I think, the easiest way to build a faceted chart like this, and then I said, there's a column in the data set around the film order, so I said, tell me when the film order is less than or equal to 10. So that's like our first films, so those are the ones I want to compare against, and the ones I want are the more recent films. So I'm comparing, yeah, how are more recent Pixar films doing in comparison to the first 10 that were released?

So I'm able to use compare conditions, the data has been pivoted long. When I did that, I have a column called metric, which are these different groups, and then the value, the thing that's being aggregated is a field called value, so it's going to group by these conditions and then grab the means of those values. And then I also created a little lookup table for how I want these headlines to be presented, so these around the box office are going to be talking about USD, it's going to have a format with a dollar sign, the Metacritic is really talking about points, and the runtime is going to be talking about minutes. So I just created a little lookup table to give it, you know, the original title was DOA domestic, I want that to say box office domestic, and title case here, and then I'm giving it the template I wanted to use for the headline. So I just, I created, I like compared the conditions, I left join to this table here to get the headlines I want to use, and then I like just pass that, both those off to ggplot, and you get this nice chart.

How CHOP uses headliner in production

So I want to talk a bit about how we use this at CHOP. So this is like one of my favorite uses of Headliner to date. So the team built this really nice PDF that's housed on RStudio Connect, and it updates every two weeks talking about, yeah, like the first half of the month, and then the second half of the month that just passed. And so when the user gets to this, gets to this content on RStudio Connect, it, there's like a front page, which is really talking about, these are like high level stories about what's changed in the data, and then it will say next talking points, and it'll say like more on page two, more on page four. But they're using Headliner to write a lot of, to write a lot of these.

So it's saying inpatient test positivity increased, so as folks are admitted to the hospital, we're constantly testing to see everyone's COVID status. So it's saying, you know, the positivity increased in May, comparison to prior months, the average weekly test positivity averaged a certain percentage since the last update in mid-May, in comparison to a certain value earlier in May, and then also the percentage that it was in March. So yeah, so they're using Headliner all over the place here to just kind of create the census, saying here's the value, here's the value in context, and then sometimes also comparing for the value in March, which was meaningful to the team.

And then inpatient census, census is just the number of beds that are filled, so the beds available and the beds that are filled. So you want to know like what's the average midnight census for the hospital, and here we're talking about what it was in May. So say the average midnight daily census for May was blank, it's a 14-point increase from April's end of month average daily census of that value, and then it's giving more context, saying this is blank above the budgeted, that's kind of like our expected daily census of whatever that value was. And then yeah, there's like more talking points here about like where some of those increases are from, and then it tells them like more on page four. So when the user goes to those pages, there's like nice charts and stuff describing or showing trends over time, and there's often more details again using Headliner.

So an example on the inpatient census page is it has that talking point that was on the front page, but then there's some cool ways that they've said the following trends increased and the following trends decreased. So they're using some of that logic to say yeah, show me things that kind of using that raw delta value that increased a lot, decreased a lot, and kind of bubble those up to the top. So what I love about this is that the format of the dashboard remains pretty consistent. Like the users always know, because we've been using this for I think years at this point, users always know where to look to find data, but the data that's displayed, it's always bubbling up like what's new since you were last here, what's like most interesting. So it's not like they're kind of like skipping over a lot of content because it's the same every time.

Instead, the contact is like what we're including here is stuff that's like most interesting, newest, you know, thing that's like got the biggest change since we last saw the report, and for that reason I think it has a lot of high engagement.

Instead, the contact is like what we're including here is stuff that's like most interesting, newest, you know, thing that's like got the biggest change since we last saw the report, and for that reason I think it has a lot of high engagement. So yeah, this is how we're, this is one way we've used it at CHOP, but it's my favorite use case, so I just wanted to share that with you all.

And then the way that they're using the hood is they're creating a headline, so they have like this month look back function that they're using. So when they say the average census for one month ago was, and then they're using some stuff to like color the values, and then they're using those talking points, like this is a delta point trend, so that's the increased decrease since two months ago, or two months ago's end of month average daily census of whatever the original y value was. And then they're using that plus operator to string together two headlines, so it's saying write that headline, so it's going to look like this, and then they're stringing together a second headline saying this is a certain value above the budgeted May average daily census of whatever that value is. And then, yeah, here they want it to say above and below, so they're using that turn phrases argument to specify not increase or decrease, but above and below.

Next steps and roadmap

So that's kind of an overview of headliner. I do have some next steps. It is already on CRAN, so you can install it with install packages headliner. I have a vignette on there, I'll share that after I stop sharing my screen here. I do know that there's an issue with NA values. Someone was like live, like trying this out in a meetup, and the first thing they did had NA values and threw an error, so a bug is fixed, they just need to send it off to CRAN. I do want to increase language support, so right now that un8 verse 833, like that AN language is really specific to English at this point, so I would love this to be something that folks can use in other languages as well. I am looking into the option of setting default values globally, so if you always want that, you know, sort of like x verse y, you want to say x from y, or there's like ways that you always want certain things to be formatted, you don't want it to default to increase, decrease, you always want it to be more and less, so I'm looking into a way to set those values globally, kind of the way you might set a ggplot theme at the top of the script. And I've also started building out like a widget to help build headlines, so you can kind of give it two values to play around with, and then you can put your cursor in this box and click on different talking points, and I'll stream them together, show you what it will look like, and give you code that you can then like copy and paste into your script.

So that's the overview of Headliner, I'd love to get any feedback you have, this is my email address, there's an issues page on the GitHub if you find anything that's not working, but yeah, that's what I have for you all.

Q&A

Awesome, thank you so much, Jake. I always use the reactions down on the bottom so you can hear us clapping and see it through our emojis, but thank you for a great talk, this is awesome. But I see, Jake, there were a few questions that came through already, and one was, can you use every package you want within your work environment, or are there regulations?

What do you mean? I'm not sure what that means. I'm assuming it means, like, do you have access to CRAN, or how do you keep track of what packages you can use? Are you talking, like, at the hospital, like, are there regulations? Yeah. Okay, yeah, I think anything that's on CRAN is fair game, CRAN does a really good job of making sure that there's nothing, like, dangerous coming through, anything that's going to, like, send confidential information over the web, or I think that they do a really good job of ensuring that their content is safe. So I think, for the most part, anything on CRAN is an option, and then we have our internal, we have, like, an internal CRAN that we have where we have in-house packages that we've built. We do a lot of code review as well, so if something is kind of weird in there, hopefully someone in code review will catch it and say, I'm not sure if we should use this package. We do really try to encourage that folks use, like, our core packages as much, not our core, but, like, a set of core packages, so anything on the tidyverse is fair game, but if there's, like, a package that does something just slightly better than something in the tidyverse does, I think we would just recommend to use the tidyverse function just because a wider portion of the team is going to know how to interact with that package versus something that's, like, more niche.

I know there's another question that just came through on the chat, which was, how do you load previous datasets to compare to current datasets? So if you were to use compare conditions to try to say, yeah, some values in a different dataset in this dataset, and I want to, like, compare those conditions, I would probably row bind them, like, I'd create a column telling me the source of where that data came from, row bind them, and then use that compare conditions to find the difference. Just comparing two data frames generally, there's a, well, I don't think that's what you're asking. The waldo package is great for comparing the differences across data frames, but in terms of using from, like, the compare conditions function, yeah, I would probably row bind them and then figure out the difference between the two. Or you can aggregate them, so you can say, like, grab, like, you could say headline the sum of my first data frames column, and then the y value would be the sum of a column in a different dataset, and then headline will work with that as well.

Oh, waldo, but I think that's something different. That's, like, my datasets are showing change, and I want to know what changed. Like, you're trying to find a data entry error. But waldo, waldo is really good for that. Waldo compare is the function. Oh, I get it. Like, where's Waldo? Where's Waldo? Yeah. Okay. Where is his bug? Yeah.

So, there was another anonymous question that was, when you mentioned a front page on RStudio Connect with the headlines, what is that? It's not really a front page. It's a PDF, and so it's just the first page that the users see has these talking points. It's not really, like, a different page that the users are seeing. There's ways you could do it if you were building a dashboard. You could have, like, yeah, different pages to navigate through, but it's a PDF. It's just the first page of the PDF that we're using here. Sorry if that was confusing.

Hi. Sure. Jake, this is awesome. I wish I had this when I was working as a data analyst in a health system. I see a lot of value here, and one of the things I'm thinking about is all the, you know, scientific manuscripts and preparing manuscripts, and actually, you know, as you change one little thing about, you know, you get rid of your outliers, and then, you know, you're going to run your results, and it's going to change very slightly, but you want to kind of capture that in the narrative of a manuscript. I see a lot of value for this. Is anyone using it for that at CHOP right now or that you know of?

Not that I know of, but Headliner is pretty brand new, so I haven't been, like, in the past, you could only get it on my GitHub, so I wasn't really promoting it that much, but now that it's on, it just got on CRAN, like, two weeks ago, so I am now trying to spread it. So, yeah, hopefully someone will start using it for manuscripts at CHOP, but to my knowledge, I don't think anyone has yet. Yeah, that'd be a cool example to get out there, too. Yeah, I can see what you're saying. Yeah, you, like, make some slight changes, and all the places you said increased, you might not even be talking about decreases or no significance or something. Yeah.

Thanks, Jake. I see someone else put a question in Slido that was, what has the reception or usage been of this? People really like it. I did get a lot of feedback. I had terrible argument names when it started, so I have worked with folks to rename it, so the stuff that's on CRAN, I feel pretty good about. We workshopped it a lot, but yeah, the reception's been really positive. I know a couple, I know at least two teams that are using it at CHOP, and I've used it myself for some of my projects, but yeah, folks seem to like it. They find it to be pretty versatile to say a wide range of sentences, so that's what I was hoping for.

That's great. A question that I had was, how has this helped kind of change the conversations that you're having with people across the hospital? Like, does giving them the headlines there, like, help spark conversation or ideas? I wish I could speak to that more. I didn't build the executive report, so I haven't been in conversation with those stakeholders. I've just been able to talk with the analysts who did it, and it sounds like their teams do really like the report that they now have, but unfortunately, I don't really know how they're using it, but I feel that it is, because it's so dynamic and, like, bubbling up the most interesting talking points and, you know, kind of keeping the report fresh, I would guess that it does have, like, high engagement and high value to its end users.

Cool. Thank you. I see one anonymous question was, what has the adoption of R or Python been within the hospital analytics team? Yeah, we love R. I think we're one of the bigger R groups for RStudio, if I'm not wrong. I mean, we have a humongous team. I think we have almost 100 people, and almost everyone's using R and RStudio Connect to host our assets, so it's been really positive. We have a lot of folks who come over from Python, and they pick up R pretty quickly, because we're getting more Python people, and I know that RStudio is doing a lot of work to make sure that RStudio can be a bit language agnostic, so you can, like, write stuff in Python. You can, like, mix and match your R and Python code together with Articulate. I'd be curious to know if we start seeing more Python output from our team, but, yeah, everyone has, like, some comfort with Python, or with R, and then even our Python folks have adjusted pretty easily, but I would say it's probably, like, 90% R and 10% Python at this point.

Thank you. Another anonymous question I see that came in is, I work at a hospital, and my manager is very leery of R because of technical debt. Do you have any suggestions for convincing her of R's value? Set up a meeting with us. I mean, I'd be happy to help chat. We've done that with another hospital, just kind of a conversation around why we use R, why we find it to be valuable. So, we also use ClickSense, which is a point-and-click application similar to Tableau, and while ClickSense, I think, can be a bit of an easier learning curve for folks, especially folks who aren't, like, data analysts, because, like, our stakeholders on our teams, some of them are trained in ClickSense as well, it has a lower learning curve, but if you're doing something that's really repetitive over and over and over again, I feel like that adds a lot of technical debt, just doing, like, you know, I'm, like, in all these little boxes saying, like, well, the title is this, and, like, the aggregations are these, and the labels are whatever, and the tooltip has this, where that is actually, like, less technical debt to do that in R, because I can just create a function that does that, and I'm just calling that function over and over again.

So, I could see feeling, like, it's, yeah, like, now you have all this code you have to maintain, and, like, not everyone knows R, that could feel like technical debt, but I feel like our team is ramped up on R pretty easily, and, you know, we have code reviews, so that is both, you know, does the code make sense, but also is the code, like, documented well, did you name your variables well, and those things together, I think, make the code easier to maintain, so, you know, if the person's using, yeah, like, they're trying to do something that's, like, more complicated, and we know there's an easier function, you know, we'll help, like, incorporate that, so I think the code review helps with the technical debt, but, you know, a lot of point-and-click stuff, I think, adds up with that, and then it's version-controlled, too, so if you want to say, you know, I made this change for my stakeholder, they actually don't like it, they want what we had before, I can look in my version history and go back to that code, and, like, reinstate it. It's going to be a lot harder to do something like that with a totally click-sensitive web.

So, I see the risk of, you know, having, if there's not a lot of people who know R, they could feel like more technical debt, but, again, we also use, like, a core set of packages, so, like, you know, everyone is, like, well-trained on the tidyverse, that's kind of our bread and butter, everyone's well-trained on our internal R package that does a lot of our charts, color themes, etc., and then, yeah, just, like, good variable names and documentation, I think, does a lot to reduce technical debt, and, like I said, feel free to, like, schedule some time with me, we'd be happy to talk with you or, you know, your manager to figure out what some of the barriers are.

So, there was another anonymous question that was, when you mentioned a front page on RStudio Connect with the headlines, what is that? Yeah, I don't currently see a process where we're, like, capturing that data and then, like, carrying it forward. I think in the slides that I showed, we have, like, something that's doing, like, a month back of N, so I think that number just kind of keeps looking back one month and two months and comparing it, so hopefully the numbers are staying consistent from report to report, but we aren't really, like, holding on to the value from last time and bringing it forward. We're just, yeah, like, filtering for values that were two weeks ago versus that used to be.

But you certainly could, so there's, like, the talking points, so that compare about, so Headliner, like, returns the headline, but compare values returns a list of the values, so you could be doing that. You could use pins or something. There's different ways you could do it, but one way you could do it is to use pins on RStudio Connect and figure out the talking points today, pin them, figure out the talking points the next time you run them, and, like, compare the pins. There's ways you could do that, so there's talking points of, like, the delta delta p, et cetera.

So, one last question that came through on Slido was, so is your team embedded within a centralized analytics team for the hospital? Yeah, yeah, we have a central team, and then we partner with different clinical groups, so there's a QI portion of the hospital, so QI is quality improvement, so maybe the emergency department will say, you know, we want to decrease length of stay or triage time or, you know, they'll have, like, some goal that they have that they think will improve patient care, patient experience, so that would be, like, a quality improvement project that they would be working on, so we have, like, teams devoted to that, and then we have analysts who are, like, partnered with, like, urology, orthopedics, radiology, etc., so they're, like, dedicated to that team, but they sit within our central group, so the managers are all managers and analysts, and then, yeah, there's, like, someone closely on that clinical team that, like, will partner with us.

Were there any major challenges in creating the hospital's internal package? I think just time, like, giving dedicated time for people to work on it, I think that the, we're very busy, we've spent a lot of time, like, getting the hospital invested in having an analytics team, so we've just been growing, growing, growing, growing, so there's, like, even though we're a large team, there's still just so much work for us to do, so I think the biggest barrier has just been dedicated time to work on our package. I think that the team has recognized the value that it brings, but it's been hard to, like, carve out time to work on it, so, because it's, like, well, I really want to work on that, but I have all this other stuff on my plate, like, I kind of feel guilty working on it, so just really, like, working with leadership to say, like, no, no, no, it's, like, really worth everyone's best interest, or it's, like, for everyone's best interest if we, like, make all the time for this, because it's going to save so much work down the road for people.

Also, like, we had to learn how to do package development, you know, that was, like, new for, I think, all of us, but that's, like, where I learned how to build packages. Headliner would have never happened if I didn't work with our central group, so there was already an R package that was, that had been built when I joined, and then I just started, like, working with them, and I learned a lot there, but it's a little bit of a barrier, is learning package development, but there's great resources online for that. Did you use Hadley's book, or what, and Jenny's book, or what was it? Oh, yeah, yeah, absolutely, yeah, the R-TTGS R packages, that book down site is great.

Perfect, I just put that into the chat, too. Okay, awesome, thanks. Well, thank you so much, Jake, for sharing your experience and sharing Headliner with us, that was awesome. Yeah, thank you for letting me present. Please reach out to me, I love to hear, yeah, how folks are using things, places where folks are stuck. Thank you so much, Jake, really appreciate it. Thank you all for joining, and for all the great questions, too. Have a great rest of the day, everybody.

Posit Meetup | Jake Riley, Children's Hospital of Philadelphia | Translating Facts to Insights

Transcript#

Facts vs. insights in dashboards

Choosing a chart title

Introducing the headliner package

How glue works under the hood

Helpers for data frames

Using headliner with ggplot charts

How CHOP uses headliner in production

Next steps and roadmap

Q&A

Featured software#

rstudio

tidyverse

tidyverse.org

waldo