
Posit Meetup | Jake Riley, Children's Hospital of Philadelphia | Translating Facts to Insights
RStudio Healthcare Meetup: Translating facts into insights at Children's Hospital of Philadelphia Led by Jake Riley, data analyst at The Children's Hospital of Philadelphia Abstract: {headliner} is a new R package to add dynamic, insightful text to plots and reports. {headliner} generates useful talking points that users can string together using {glue} syntax. This makes it easy to write an informative sentences without adding a lot of technical debt to a project. Learn how to get started with {headliner} and ways we have used it at The Children's Hospital of Philadelphia. Speaker Bio: Jake Riley is a data analyst at The Children's Hospital of Philadelphia. He is the author of several R packages related to data visualization and automated exploratory analysis. You can find his published work [simplecolors] and [shinyobjects] on CRAN with more packages on the way. Timestamps: 0:49 - Start of talk 1:25 - Dashboards focused on facts vs. insights 2:56 - What's a good title for a chart? 5:09 - Intro to headliner package 7:41- using glue() under the hood 14:04 - helpers for working with data frames: compare_conditions() 18:41 - using ggtext 21:27 - example using pixar_films 23:40 - how they've used it at CHOP 28:05 - Next steps for headliner package 29:32 - Start of Q&A session Questions: 29:32 - Can you use any package you want in your organization? 31:13 - How do you load previous datasets to compare to current datasets? 32:48 - When you mentioned a front page on RStudio Connect (with the headlines), what is that? 33:25 - Is anyone using this for manuscripts at CHOP now? 36:24 - What has the adoption of R or Python been within the hospital analytics team? 37:28 - My manager is very leery of R because of technical depth. Any suggestions for convincing her of R's value? 42:22 - How does CHOP use R for non-clinical analysis? 43:36 - How do you train new people to use R? 46:28 - How do you compare last week's analysis to this week's? 49:37 - Were there any major challenges in creating the hospital's internal package? Resources/links shared: Jake's LinkedIn: https://www.linkedin.com/in/jake-riley-70736a3/ headliner package: https://github.com/rjake/headliner waldo package: https://www.tidyverse.org/blog/2020/10/waldo/ Examples of R in Life Science & Healthcare: https://www.rstudio.com/champion/life-science Chris Bumgardner's talk on building an R-based analytic practice at Children's Wisconsin: https://youtu.be/pHZ8dsc0PhY simplecolors package to generate hex codes using uniformly named colors: https://rjake.github.io/simplecolors/ R Packages book by Hadley Wickham & Jenny Bryan: https://r-pkgs.org/ Meetup Links: Future events: rstd.io/community-events-calendar If anyone's interested in speaking at a future meetup, we’d love to hear from you too! rstd.io/meetup-speaker-form
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to the RStudio Enterprise Community Meetup. I'm Rachel Dempsey. I'm calling in from Boston today. For today's meetup, we'll learn how the team at the Children's Hospital of Philadelphia is translating facts into insights and learn more about a new package for generating useful talking points. But with that, thank you all again so much for joining us. I'd love to turn it over to our speaker for today, Jake Riley. Jake is a data analyst at the Children's Hospital of Philadelphia and the author of several R packages related to data visualization and automated exploratory analysis. I'll stop my share here and turn it over to you, Jake.
So hi, everyone. It's cool to see such big people from all over the world joining the talk today. So yeah, my name is Jake Riley. I'm a data analyst at CHOP. It's the Children's Hospital of Philadelphia. I love doing data visualization and geospatial analysis. I've also built a couple of R packages. One is called Simple Colors. That was my first package, which is just nice, simple, nicely named colors. I'll share a link at the end to that vignette. One around just like working with Shiny dashboards and then Headliner, which is the one we're going to talk about today.
Facts vs. insights in dashboards
So when I think about building dashboards, I think there's kind of a spectrum of kind of how the data can be presented. You could have a dashboard that's really more focused on facts, like the one on the left, versus one that's more focused on insights. So although these two examples have the same underlying data, I think they have a really different user experience. The one on the left is really just totals, percentages, and I would say it's really presenting facts, whereas the one on the right is sharing more context. So that number of 989 total applicants, what does it mean? Is it good? Is it bad? Is it high? Is it low? Is it expected?
On the one on the right, there's a little more context here. So it's saying there's a 10% decrease in candidates hired. And then it's telling more, there's a 20% increase in total applicants. And if that were to continue, this will decrease the revenue by 8%. So the one on the left, I think, is a little easier out of the box, but because it doesn't really provide that context, I think there's a risk of lower adoption. The one on the right, I think, has more context. I find it to be more meaningful, but I think it's much harder to produce. And for that reason, I think it has a high risk of technical debt.
Technical debt isn't a term you've heard of before. If I had to pick this up in six months, or if I had to hand it off to somebody, how much time and energy is going to be used just trying to catch up to speed or to make any modifications to it? So yeah, I think the one on the right is really interesting, but I think it's really hard to accomplish.
Choosing a chart title
So if we look at an example of this plot, so I looked at, there's a package called CRANlogs, and you can say how often a package has been downloaded from CRAN. So I just looked at the tidyverse, and in particular, I picked the ggplot2 package and just plotted the number of downloads by month. So if we were to make a title for this chart, there's a couple of different titles we might pick. We could say something like ggplot2 downloads by month. It's straightforward, but it's also kind of static. So it's the same title every time we look at it. It's not really providing that context or that insight I was talking about.
The next one we could say is something like ggplot2 monthly downloads changed from 1.5 million, and this is looking by year, so this is all 2020 versus all 2021. So we could say it changed from 1.5 million to 2.1 million. So that's going to be a little more dynamic. I would say the insights are better, but still not, I don't quite have a lot of context. And it involves a little more lift, so I have to now kind of create some aggregations upstream to figure out what the totals are by year, and then feed that into the title.
What I think is a more interesting title is something like ggplot2 monthly downloads increased by 44%, so 2.1 million versus a prior value of 1.5 million. I would say that's a very dynamic title. I think it provides more context and I think has some good insight, but it's a much heavier lift. So that phrase of increased versus decreased, I think just gets tricky, and so I work on a team of a lot of, we have a very large team of R users with all levels of experience. So someone might be able to produce a sentence like that with a relatively short amount of code, and someone might write it in like several hundred lines of code, depending on kind of how they are trying to deal with the grammar and syntax. And so for that reason, I like that title, but I just thought it had this like risk of really producing a lot of technical debt or being hard to maintain and edit over time.
Introducing the headliner package
So headliner, that's this package I built to kind of help write those dynamic sentences. So right out of the box, the main function is headline. So we just pass headline two values. We say like compare x to y. So here I'm comparing the value of 32 to the value of 24. Just by default, it's going to spit out increase of 8, 32 versus 24. If I reverse those numbers, I'm saying tell me about 24 in comparison to 32, that number is lower. So it's going to just out of the box say it's a decrease of 8.
So what's going on with the hood is that headline is, or headliner, and then this particular function headline is using the glue package. And so what I'm able to do is kind of string together different talking points to create these phrases. So under the hood of the headline function is this function called compare values. So when I give compare values to numbers, it's going to spit out a list object with all these different like what I call talking points. So it's going to say the original y value is 24, the original x value is 24, the y value is 32, the delta as an absolute value is 8. So I want to say increase of 8, decrease of 8. So even if the value is negative, I'm always going to display it as the positive. So delta is going to be that absolute value. Delta p is the percentage difference. So it's a 25% difference. And I've timed that by 100 so that it's not a fraction.
But then I also have some phrases where I'm appending an article like a or an to the value. So an 8 verse 825. And then I also have the raw delta value, the raw delta percentage. I have articles for both of those. And then I've also brought back the sign. So negative 1, positive 1 or 0 if there's no change. And that can be really useful in this package if you want to like color a value box on a Shiny app based on if the value went up or down, you could use the sign to kind of control the color. And then there's just like the shorthand of a rich values, which is going to say like 20 x verse y. And then there's a trend value, which in this case is bringing back the decrease. So with those talking points, you can use glue to like string them together by putting them in curly braces. So I can say curly brace trend of curly brace delta, parentheses, and then curly brace original values. So when headline reads that string, it's going to put together this increase of 8 with this comparison, decrease of 8.
How glue works under the hood
So a little bit about glue. So what glue does is it embeds R's expressions in curly braces and then combines them together to create a longer string. So an example of using glue would be to say my age is 37, which it is. And then with glue, I can say I am and then in curly brace, say my age, and then it will like find my age in the environment and then bring that back in this. It's kind of like paste on steroids. You don't have to like kind of keep separating with commas and quotes. You can just kind of like string it all together using the curly braces to plug in your variables. And I think it's a very, very good package.
So we could like try to write these headlines using just glue. So we could say, so you can create temporary objects as you call glue as well. So I can say, I'm going to set up X is 24, Y is 32. And then in the curly braces, I can give it an expression to evaluate. And so this will kind of create that percentage difference. And then I can say like X versus Y. So I can write a sentence that's like pretty similar to what headlines spitting out, but headliner just simplifies those steps. So I can just say, if I want to write the same type of sentence, I can just say Delta P percent, and then I can call trend. So this is lower. So it's going to be a decrease. And then that original value shorthand is going to bring back that, like what the before and after values are.
So some examples of like putting these together in different combinations. So you can pass it a vector. So I passed it 8 through 12 and 12 through 8. So I have positive, negative, and also an example of no change. So when I pass that to the headline function, and I give it this string to evaluate, it's going to write there was a 33% decrease versus an 18% decrease. If there's no, there's some control, you can like tell it what to write if there's no difference at all. By default, it says there was no difference. But you can specify what you want that to say. And then when the numbers were higher, then it's saying a 22% increase, a 50% increase.
So increase and decrease are the default values for trend. So because they both end in an E, we can just add a D to the end. So if I just say trend in the curly braces and just add a D, I'm going to get decreased, increased. And then again, I'm using that delta P value to say by 20%. Or I could say decreased by 2, and then show like the raw delta P, so negative 20%. Or I could write a sentence like, you know, the difference, and then the curly braces and bring back the raw value sets like negative 2. So 8 from 10 is 2 less. So difference is negative 2. And then I'm writing a phrase like today equals the original x value, the prior is the original y value. And so it kind of creates this whole sentence for me, I think, relatively easily. So there's a lot of perks of using glue under the hood.
So as I mentioned earlier, you can make objects as you are calling the function. So if I said A is 1 and B is 2, I could write a sentence calling A and B, and then inside the curly braces, add them together. You can do other stuff because headlines using that under the hood, you can actually use a lot of the same tricks. So maybe I want to capitalize the first letter, I could just create a little function. I mean, I could call this whole function, which is from the stringer package, which is string to sentence that's going to capitalize the first letter. But just for shorthand, I'm just going to write cap equals, call that function. And then inside the curly braces here for the trend argument, which is going to be that decrease, I can say capitalize that, add a D to the end. So now we have decreased by delta P, percentage. And then I want to say as of today. And then today, I am just formatting today's date. And this is like a little shortcut to build out a date as like month, day, year. So yeah, so you can build objects inside as you call it, and then string those together as well.
You can also concatenate them just using the plus operator. So I could say, yeah, decrease of one, semicolon, increase of three. You can string them together like that.
And then there's a lot of control on how this stuff comes back. So by default, it's going to say increase and decrease. But there's a trend phrases argument that headline takes. And you can use this trend terms function to specify if the value, if x is higher than y, display. In this case, I want it to display higher. If it's less, then I want it to display lower. So then when I call this headline of trend by delta, I get lower by two. You can also pass it a named list. So here in trend phrases, I'm passing a list of higher, which is going to be higher versus lower, and then more, which is going to be more and less. So then when I string this together, I can call that higher object that got created and say higher by this percentage. And then I want to say two more. So the more is getting called from the trend phrase, and the higher is getting called from this one up here. So I've tried to make it pretty robust so that you can write a lot of different types of sentences.
And then there's a similar rule for, there's a similar method to deal with plurality. So if I want to say something like, we hired one person, first we hired two people, I can create this people and an object called people and these employees, which it's going to look at the difference between the two values and say, if the value is one, then bring back person, otherwise bring back people. So in the end, I'm able to make phrases like we hired one new person, this employee will start Monday, June 20, or we hired two new people, these employees will start Monday, June 20. So yeah, we've got our X and Y values, these plural phrases, I'm using headline using the methods I described, and then I'm creating an object called next Monday, where I'm just figuring out what the next Monday is and formatting it nicely. And then they just all get together and I get these nice dynamic phrases.
Helpers for data frames
There's also some helpers for dealing with data frames. I created one helper that's just called add date columns. And there's a built-in function called demo data, which is just going to build a little data set with some groups and columns to aggregate on, but there's also a date column. So it's really just these first five columns here. So that's going to come from demo data, but then I can use add date columns. And what that's going to do is it's going, you tell it what columns to look at. So add date columns is going to look at this demo data set, and then look at the date column and then figure out how far away that value is from today. You can tell it what date to reference, but by default, it's going to count from today. So it's going to say, this is one day ago, this is one week ago. But if I pick an earlier date, like from February, it's 120 days ago, it's 18 weeks ago, it's four months ago, one quarter ago, same calendar year, same fiscal year.
And so by using this helper function of this add date columns, I can then use this compare conditions function that is in headliner to say, compare when month is zero. So when it's the current month versus when month equals negative 12. So it's going to compare the sales for this month versus 12 months ago. And it's really using the across function. So it's like filtering on a condition within the data set, and then using across to aggregate it. So you can tell it to like aggregate, like using means and standard deviations to sum them to count the number of values. There's a lot of different ways you can aggregate the data, but by default, it's going to grab the mean. So I just created this year over year object, and it's going to spit back using compare conditions, it's going to return a data frame with the, it's going to say mean sales of x and mean sales of y.
So with that data frame, I can then pass it to a function to create a headline. It's currently called headline list, I'm probably going to rename that at some point. But if I give it a data frame with one row, it will know to return a headline from that because it's going to return a string. So if I give it that year over year data frame that I just showed, pass it the headline list, it's going to write for us a decrease of 601 verse 107. If you have many rows, which this is a use case I use quite often, I want to create a headline for every row in the data set. You can use add headline column. So there's a built in data set for headliner called Pixar films. And from here, I'm just going to select the name of the film, it's Rotten Tomatoes score, and it's Metacritic score. And I just want to compare like, which ones is the Rotten Tomatoes score higher than the Metacritic score? So by how much?
So I can use add headline column. And this time, when I call x and y, those are going to be names of columns in my data set. So I'm going to compare Rotten Tomatoes to Metacritic, and then I'm going to write a headline. So when I write the headline, this time, I have not only the talking points that I described earlier, but I can also reference columns in my data. So I can say, film, that's from my data set, that's the title of the film was, and then I can use those talking points like delta, trend, etc, to string together sentences. So I've picked the trend terms higher and lower. And so I'll get some like Onward was 27 points higher, Bonsters Inc was 17 points higher, Cars 2 was 17 points lower, just to describe the difference between the Metacritic and the Rotten Tomatoes and the Metacritic scores.
I also can tell it to return columns as it does that. And I use this quite a bit to kind of bring back the most interesting talking points. So I am able to say, yeah, bring back the delta and the raw delta, and then arrange it in descending order by just the absolute delta value. So when I do that, the biggest change is, the biggest difference is with the movie Onward, followed by Monsters Inc and then Cars 2, but I can see both the absolute value and the raw delta, showing that it's not only a big difference, it's a difference, positive, negative.
Using headliner with ggplot charts
So in practice, I just want to share some ways that I use this to create charts. So I use this in conjunction with the ggtext package. So again, I use the tidyverse downloads of a CRAN. And I just want to know which package had the biggest increase, what package had the lowest change overall, and what package had the biggest decrease. So I was able to use that add headline column, saying, yeah, look at the package, tell me the trend, show it to me as a percentage. And then I wanted it to nicely label, because the numbers were quite huge. These are in the millions.
So in the skills package, there's this label number formatter, where I can say, you know, give it accuracy, like into the tens unit. So it's like 2.1, 1.5. And then there's this really nice scale cut argument where you can tell it to, that's going to add these, it's going to abbreviate these as like millions, billions, trillions, or use K when you think of thousands. So yeah, so as it's filling that headline, I am just creating this temporary object of ln, which just stands for large number. I usually have better names when I fill out functions, but because the headline is like kind of all wrapped in the curly braces, having them as short as possible, I think just makes it easier to kind of follow the logic. Yeah, so I'm saying the package trended this percentage, and then in parentheses here, I'm then doing a large number format for the x value and a large number format for the y value. So I get these nice facet headers, where I'm saying increase 44.4%, this million versus that million, increased by these different percentages. And then for the inverted package, it decreased.
Then what I did, so I added the headline column, I brought back the delta and the raw delta values, and then I just did a little logic saying case when, sorry, at the bottom. So case when the raw delta was the biggest value, that's going to be the biggest increase. When the raw delta is the smallest value, that's the biggest decrease. And then when it's the min delta overall, that's going to be the values closest to zero, that's the lowest change overall. So then I just, I created those categories, and then everything else, just the way that case when works, is going to return NAs. So once I've built those, I can say drop NA, and now it's only going to keep the records where it was either the lowest change, the biggest increase, or the biggest decrease. Yep, and then I just piped that through to ggplot, and it built this chart for me.
This chart, I'm not going to lie, it was a little bit of work to make, but it mostly was just getting the formatting correct, and the headliner part wasn't, to me, wasn't the hardest part. The hardest part was getting different colors in different parts of this, holding certain values. This is my first time using ggtext, and it's an awesome package, but the headline part was very fast and easy for me to incorporate.
This is my first time using ggtext, and it's an awesome package, but the headline part was very fast and easy for me to incorporate.
Using that Pixar films data, so this one, I just told it to compare two conditions, and then facet these charts based on the original columns in the data set. So there was the box office domestic, box office international, the Metacritic and Rotten Tomatoes scores, and then the runtime. So these are all columns that were in my data. So what I did here is I pivoted the data to be long, because that's, I think, the easiest way to build a faceted chart like this, and then I said, there's a column in the data set around the film order, so I said, tell me when the film order is less than or equal to 10. So that's like our first films, so those are the ones I want to compare against, and the ones I want are the more recent films. So I'm comparing, yeah, how are more recent Pixar films doing in comparison to the first 10 that were released?
So I'm able to use compare conditions, the data has been pivoted long. When I did that, I have a column called metric, which are these different groups, and then the value, the thing that's being aggregated is a field called value, so it's going to group by these conditions and then grab the means of those values. And then I also created a little lookup table for how I want these headlines to be presented, so these around the box office are going to be talking about USD, it's going to have a format with a dollar sign, the Metacritic is really talking about points, and the runtime is going to be talking about minutes. So I just created a little lookup table to give it, you know, the original title was DOA domestic, I want that to say box office domestic, and title case here, and then I'm giving it the template I wanted to use for the headline. So I just, I created, I like compared the conditions, I left join to this table here to get the headlines I want to use, and then I like just pass that, both those off to ggplot, and you get this nice chart.
How CHOP uses headliner in production
So I want to talk a bit about how we use this at CHOP. So this is like one of my favorite uses of Headliner to date. So the team built this really nice PDF that's housed on RStudio Connect, and it updates every two weeks talking about, yeah, like the first half of the month, and then the second half of the month that just passed. And so when the user gets to this, gets to this content on RStudio Connect, it, there's like a front page, which is really talking about, these are like high level stories about what's changed in the data, and then it will say next talking points, and it'll say like more on page two, more on page four. But they're using Headliner to write a lot of, to write a lot of these.
So it's saying inpatient test positivity increased, so as folks are admitted to the hospital, we're constantly testing to see everyone's COVID status. So it's saying, you know, the positivity increased in May, comparison to prior months, the average weekly test positivity averaged a certain percentage since the last update in mid-May, in comparison to a certain value earlier in May, and then also the percentage that it was in March. So yeah, so they're using Headliner all over the place here to just kind of create the census, saying here's the value, here's the value in context, and then sometimes also comparing for the value in March, which was meaningful to the team.
And then inpatient census, census is just the number of beds that are filled, so the beds available and the beds that are filled. So you want to know like what's the average midnight census for the hospital, and here we're talking about what it was in May. So say the average midnight daily census for May was blank, it's a 14-point increase from April's end of month average daily census of that value, and then it's giving more context, saying this is blank above the budgeted, that's kind of like our expected daily census of whatever that value was. And then yeah, there's like more talking points here about like where some of those increases are from, and then it tells them like more on page four. So when the user goes to those pages, there's like nice charts and stuff describing or showing trends over time, and there's often more details again using Headliner.
So an example on the inpatient census page is it has that talking point that was on the front page, but then there's some cool ways that they've said the following trends increased and the following trends decreased. So they're using some of that logic to say yeah, show me things that kind of using that raw delta value that increased a lot, decreased a lot, and kind of bubble those up to the top. So what I love about this is that the format of the dashboard remains pretty consistent. Like the users always know, because we've been using this for I think years at this point, users always know where to look to find data, but the data that's displayed, it's always bubbling up like what's new since you were last here, what's like most interesting. So it's not like they're kind of like skipping over a lot of content because it's the same every time.
Instead, the contact is like what we're including here is stuff that's like most interesting, newest, you know, thing that's like got the biggest change since we last saw the report, and for that reason I think it has a lot of high engagement.
Instead, the contact is like what we're including here is stuff that's like most interesting, newest, you know, thing that's like got the biggest change since we last saw the report, and for that reason I think it has a lot of high engagement. So yeah, this is how we're, this is one way we've used it at CHOP, but it's my favorite use case, so I just wanted to share that with you all.
And then the way that they're using the hood is they're creating a headline, so they have like this month look back function that they're using. So when they say the average census for one month ago was, and then they're using some stuff to like color the values, and then they're using those talking points, like this is a delta point trend, so that's the increased decrease since two months ago, or two months ago's end of month average daily census of whatever the original y value was. And then they're using that plus operator to string together two headlines, so it's saying write that headline, so it's going to look like this, and then they're stringing together a second headline saying this is a certain value above the budgeted May average daily census of whatever that value is. And then, yeah, here they want it to say above and below, so they're using that turn phrases argument to specify not increase or decrease, but above and below.
Next steps and roadmap
So that's kind of an overview of headliner. I do have some next steps. It is already on CRAN, so you can install it with install packages headliner. I have a vignette on there, I'll share that after I stop sharing my screen here. I do know that there's an issue with NA values. Someone was like live, like trying this out in a meetup, and the first thing they did had NA values and threw an error, so a bug is fixed, they just need to send it off to CRAN. I do want to increase language support, so right now that un8 verse 833, like that AN language is really specific to English at this point, so I would love this to be something that folks can use in other languages as well. I am looking into the option of setting default values globally, so if you always want that, you know, sort of like x verse y, you want to say x from y, or there's like ways that you always want certain things to be formatted, you don't want it to default to increase, decrease, you always want it to be more and less, so I'm looking into a way to set those values globally, kind of the way you might set a ggplot theme at the top of the script. And I've also started building out like a widget to help build headlines, so you can kind of give it two values to play around with, and then you can put your cursor in this box and click on different talking points, and I'll stream them together, show you what it will look like, and give you code that you can then like copy and paste into your script.
Q&A
Awesome, thank you so much, Jake. I always use the reactions down on the bottom so you can hear us clapping and see it through our emojis, but thank you for a great talk, this is awesome. But I see, Jake, there were a few questions that came through already, and one was, can you use every package you want within your work environment, or are there regulations?
What do you mean? I'm not sure what that means. I'm assuming it means, like, do you have access to CRAN, or how do you keep track of what packages you can use? Are you talking, like, at the hospital, like, are there regulations? Yeah. Okay, yeah, I think anything that's on CRAN is fair game, CRAN does a really good job of making sure that there's nothing, like, dangerous coming through, anything that's going to, like, send confidential information over the web, or I think that they do a really good job of ensuring that their content is safe. So I think, for the most part, anything on CRAN is an option, and then we have our internal, we have, like, an internal CRAN that we have where we have in-house packages that we've built. We do a lot of code review as well, so if something is kind of weird in there, hopefully someone in code review will catch it and say, I'm not sure if we should use this package. We do really try to encourage that folks use, like, our core packages as much, not our core, but, like, a set of core packages, so anything on the tidyverse is fair game, but if there's, like, a package that does something just slightly better than something in the tidyverse does, I think we would just recommend to use the tidyverse function just because a wider portion of the team is going to know how to interact with that package versus something that's, like, more niche.
I know there's another question that just came through on the chat, which was, how do you load previous datasets to compare to current datasets? So if you were to use compare conditions to try to say, yeah, some values in a different dataset in this dataset, and I want to, like, compare those conditions, I would probably row bind them, like, I'd create a column telling me the source of where that data came from, row bind them, and then use that compare conditions to find the difference. Just comparing two data frames generally, there's a, well, I don't think that's what you're asking. The waldo package is great for comparing the differences across data frames, but in terms of using from, like, the compare conditions function, yeah, I would probably row bind them and then figure out the difference between the two. Or you can aggregate them, so you can say, like, grab, like, you could say headline the sum of my first data frames column, and then the y value would be the sum of a column in a different dataset, and then headline will work with that as well.
Oh, waldo, but I think that's something different. That's, like, my datasets are showing change, and I want to know what changed. Like, you're trying to find a data entry error. But waldo, waldo is really good for that. Waldo compare is the function. Oh, I get it. Like, where's Waldo? Where's Waldo? Yeah. Okay. Where is his bug? Yeah.
So, there was another anonymous question that was, when you mentioned a front page on RStudio Connect with the headlines, what is that? It's not really a front page. It's a PDF, and so it's just the first page that the users see has these talking points. It's not really, like, a different page that the users are seeing. There's ways you could do it if you were building a dashboard. You could have, like, yeah, different pages to navigate through, but it's a PDF. It's just the first page of the PDF that we're using here. Sorry if that was confusing.
Hi. Sure. Jake, this is awesome. I wish I had this when I was working as a data analyst in a health system. I see a lot of value here, and one of the things I'm thinking about is all the, you know, scientific manuscripts and preparing manuscripts, and actually, you know, as you change one little thing about, you know, you get rid of your outliers, and then, you know, you're going to run your results, and it's going to change very slightly, but you want to kind of capture that in the narrative of a manuscript. I see a lot of value for this. Is anyone using it for that at CHOP right now or that you know of?
Not that I know of, but Headliner is pretty brand new, so I haven't been, like, in the past, you could only get it on my GitHub, so I wasn't really promoting it that much, but now that it's on, it just got on CRAN, like, two weeks ago, so I am now trying to spread it. So, yeah, hopefully someone will start using it for manuscripts at CHOP, but to my knowledge, I don't think anyone has yet. Yeah, that'd be a cool example to get out there, too. Yeah, I can see what you're saying. Yeah, you, like, make some slight changes, and all the places you said increased, you might not even be talking about decreases or no significance or something. Yeah.
Thanks, Jake. I see someone else put a question in Slido that was, what has the reception or usage been of this? People really like it. I did get a lot of feedback. I had terrible argument names when it started, so I have worked with folks to rename it, so the stuff that's on CRAN, I feel pretty good about. We workshopped it a lot, but yeah, the reception's been really positive. I know a couple, I know at least two teams that are using it at CHOP, and I've used it myself for some of my projects, but yeah, folks seem to like it. They find it to be pretty versatile to say a wide range of sentences, so that's what I was hoping for.
That's great. A question that I had was, how has this helped kind of change the conversations that you're having with people across the hospital? Like, does giving them the headlines there, like, help spark conversation or ideas? I wish I could speak to that more. I didn't build the executive report, so I haven't been in conversation with those stakeholders. I've just been able to talk with the analysts who did it, and it sounds like their teams do really like the report that they now have, but unfortunately, I don't really know how they're using it, but I feel that it is, because it's so dynamic and, like, bubbling up the most interesting talking points and, you know, kind of keeping the report fresh, I would guess that it does have, like, high engagement and high value to its end users.
Cool. Thank you. I see one anonymous question was, what has the adoption of R or Python been within the hospital analytics team? Yeah, we love R. I think we're one of the bigger R groups for RStudio, if I'm not wrong. I mean, we have a humongous team. I think we have almost 100 people, and almost everyone's using R and RStudio Connect to host our assets, so it's been really positive. We have a lot of folks who come over from Python, and they pick up R pretty quickly, because we're getting more Python people, and I know that RStudio is doing a lot of work to make sure that RStudio can be a bit language agnostic, so you can, like, write stuff in Python. You can, like, mix and match your R and Python code together with Articulate. I'd be curious to know if we start seeing more Python output from our team, but, yeah, everyone has, like, some comfort with Python, or with R, and then even our Python folks have adjusted pretty easily, but I would say it's probably, like, 90% R and 10% Python at this point.
Thank you. Another anonymous question I see that came in is, I work at a hospital, and my manager is very leery of R because of technical debt. Do you have any suggestions for convincing her of R's value? Set up a meeting with us. I mean, I'd be happy to help chat. We've done that with another hospital, just kind of a conversation around why we use R, why we find it to be valuable. So, we also use ClickSense, which is a point-and-click application similar to Tableau, and while ClickSense, I think, can be a bit of an easier learning curve for folks, especially folks who aren't, like, data analysts, because, like, our stakeholders on our teams, some of them are trained in ClickSense as well, it has a lower learning curve, but if you're doing something that's really repetitive over and over and over again, I feel like that adds a lot of technical debt, just doing, like, you know, I'm, like, in all these little boxes saying, like, well, the title is this, and, like, the aggregations are these, and the labels are whatever, and the tooltip has this, where that is actually, like, less technical debt to do that in R, because I can just create a function that does that, and I'm just calling that function over and over again.
So, I could see feeling, like, it's, yeah, like, now you have all this code you have to maintain, and, like, not everyone knows R, that could feel like technical debt, but I feel like our team is ramped up on R pretty easily, and, you know, we have code reviews, so that is both, you know, does the code make sense, but also is the code, like, documented well, did you name your variables well, and those things together, I think, make the code easier to maintain, so, you know, if the person's using, yeah, like, they're trying to do something that's, like, more complicated, and we know there's an easier function, you know, we'll help, like, incorporate that, so I think the code review helps with the technical debt, but, you know, a lot of point-and-click stuff, I think, adds up with that, and then it's version-controlled, too, so if you want to say, you know, I made this change for my stakeholder, they actually don't like it, they want what we had before, I can look in my version history and go back to that code, and, like, reinstate it. It's going to be a lot harder to do something like that with a totally click-sensitive web.
So, there was another anonymous question that was, when you mentioned a front page on RStudio Connect with the headlines, what is that? Yeah, I don't currently see a process where we're, like, capturing that data and then, like, carrying it forward. I think in the slides that I showed, we have, like, something that's doing, like, a month back of N, so I think that number just kind of keeps looking back one month and two months and comparing it, so hopefully the numbers are staying consistent from report to report, but we aren't really, like, holding on to the value from last time and bringing it forward. We're just, yeah, like, filtering for values that were two weeks ago versus that used to be.
But you certainly could, so there's, like, the talking points, so that compare about, so Headliner, like, returns the headline, but compare values returns a list of the values, so you could be doing that. You could use pins or something. There's different ways you could do it, but one way you could do it is to use pins on RStudio Connect and figure out the talking points today, pin them, figure out the talking points the next time you run them, and, like, compare the pins. There's ways you could do that, so there's talking points of, like, the delta delta p, et cetera.
So, one last question that came through on Slido was, so is your team embedded within a centralized analytics team for the hospital? Yeah, yeah, we have a central team, and then we partner with different clinical groups, so there's a QI portion of the hospital, so QI is quality improvement, so maybe the emergency department will say, you know, we want to decrease length of stay or triage time or, you know, they'll have, like, some goal that they have that they think will improve patient care, patient experience, so that would be, like, a quality improvement project that they would be working on, so we have, like, teams devoted to that, and then we have analysts who are, like, partnered with, like, urology, orthopedics, radiology, etc., so they're, like, dedicated to that team, but they sit within our central group, so the managers are all managers and analysts, and then, yeah, there's, like, someone closely on that clinical team that, like, will partner with us.
Were there any major challenges in creating the hospital's internal package? I think just time, like, giving dedicated time for people to work on it, I think that the, we're very busy, we've spent a lot of time, like, getting the hospital invested in having an analytics team, so we've just been growing, growing, growing, growing, so there's, like, even though we're a large team, there's still just so much work for us to do, so I think the biggest barrier has just been dedicated time to work on our package. I think that the team has recognized the value that it brings, but it's been hard to, like, carve out time to work on it, so, because it's, like, well, I really want to work on that, but I have all this other stuff on my plate, like, I kind of feel guilty working on it, so just really, like, working with leadership to say, like, no, no, no, it's, like, really worth everyone's best interest, or it's, like, for everyone's best interest if we, like, make all the time for this, because it's going to save so much work down the road for people.
Also, like, we had to learn how to do package development, you know, that was, like, new for, I think, all of us, but that's, like, where I learned how to build packages. Headliner would have never happened if I didn't work with our central group, so there was already an R package that was, that had been built when I joined, and then I just started, like, working with them, and I learned a lot there, but it's a little bit of a barrier, is learning package development, but there's great resources online for that. Did you use Hadley's book, or what, and Jenny's book, or what was it? Oh, yeah, yeah, absolutely, yeah, the R-TTGS R packages, that book down site is great.
Perfect, I just put that into the chat, too. Okay, awesome, thanks. Well, thank you so much, Jake, for sharing your experience and sharing Headliner with us, that was awesome. Yeah, thank you for letting me present. Please reach out to me, I love to hear, yeah, how folks are using things, places where folks are stuck. Thank you so much, Jake, really appreciate it. Thank you all for joining, and for all the great questions, too. Have a great rest of the day, everybody.


