Resources

epoxy: Super Glue for Data-driven Reports and Shiny Apps - posit::conf(2023)

Presented by Garrick Aden-Buie R Markdown, Quarto, and Shiny are powerful frameworks that allow authors to create data-driven reports and apps. But truly excellent reports require a lot of work in the final steps to get numerical and stylistic formatting just right. {epoxy} is a new package that uses {glue} to give authors templating superpowers. Epoxy works in R Markdown and Quarto, in markdown, LaTeX, and HTML outputs. It also provides easy templating for Shiny apps for dynamic data-driven reporting. Beyond epoxy's features, this talk will also touch on tips and approaches for data-driven reporting that will be useful to a wide audience, from R Markdown experts to the Quarto and Shiny curious. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Elevating your reports. Session Code: TALK-1155

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Well, hi, everybody. My name is Garrick, and today I'd like to talk about a package that I wrote called Epoxy. But first, I'd like to tell you a little bit about my kids. My daughter Ruth is turning four soon, and my son Augie is about to turn seven. And six going on seven is a really fun age. It's about the time when you start to ask questions like, Dad, how much money do you make? Or recently, okay, so tell me, you sit in front of a computer all day and you type on it, and then they give you money?

And I was thinking about this because I ran into a dataset on the Tidy Tuesday project about childcare costs. Did you know that infant daycare in the United States costs up to $15,417 a year? Now, I know what you're thinking. Children are expensive. And they are. You're right. And also, as a room full of data scientists, I also have a suspicion that you're kind of thinking about this number. How did that number get there?

Like, which dataset did it come from? It came from the Department of Labor, for example. But what decisions were made in the process of grouping and summarizing and aggregating and calculating this statistic? And that's a very valid question. As someone who spent a lot of time thinking about how data scientists write reports and spending time building tooling for data scientists to write reports and build apps, when I ask how did that number get there, I'm thinking very specifically, like, how did that number get there? Right?

I'm imagining somebody sitting down at a word processor and they start typing. Infant daycare costs up to how much does it cost? And then they maybe open up a spreadsheet and they find the cell where they did the calculation and they copy it and then they come back here and they paste it. A year. Okay, cool. That's how much it costs a year.

I like to imagine a data scientist who is working in R Markdown and they're writing this in kind of the same way. They write the sentence, infant daycare costs up to, but when they get to this point, they have some extra context available to them. In this R Markdown report, just a little bit above this sentence, there's a code chunk where they've been doing this calculation. They took childcare costs and they grouped and summarized and filtered and they came up with a number and they called that, they put it in a variable and they called that variable max median cost. Already you know something more than you did before about that number because we see the label that it was given when we saved that number to a variable.

And having this context, they can take that variable and use R's inline code syntax, right, to embed the max median cost of childcare right into the sentence with a little backtick R and then the variable and another backtick. This is the power and the promise and just what makes R Markdown wonderful, right? That you can work with your data and also talk about it, summarize it, report on it, describe it to other people here in the same place, right? It's kind of like a source document for the report, right? And at some point you hit the render button and you get this amazing this amazing sentence. Yeah.

This is the power and the promise and just what makes R Markdown wonderful, right? That you can work with your data and also talk about it, summarize it, report on it, describe it to other people here in the same place, right?

Infant daycare costs up to 1.541696 and then that part, the little bit at the end means like times 10 raised to the fourth power. This doesn't make any sense. It doesn't hit in the same way as this statistic. This problem of what happens right before of formatting things in a way that is readable and useful is exactly the problem that Epoxy is trying to solve. It gives you an inline syntax that makes it easy to use just-in-time formatting so you can say that that thing should be formatted as a currency. And as a result, we also get to use some of Knitter's powerful tools like these chunks to create little templates that we can reuse throughout our document.

How Epoxy is built on Glue

If this hex sticker looks familiar to you, it's because Epoxy is built on another really cool R package called Glue. Glue is really, really cool and we'll see the syntax for Glue and how Glue powers Epoxy. But Glue has a different target audience. Glue is made primarily for developers. It's very lightweight, so developers can feel comfortable bringing it into their packages. It has no dependencies, which means that it's kind of limited. It's pretty good in files that end in R or places where you're writing R scripts.

Epoxy, on the other hand, takes the assumption that we're providing a tool for data scientists. We will build on packages that you're probably already using to do the formatting. And then you can use Epoxy in a bunch of different ways to kind of make Glue more powerful for you in R scripts and reports like R Markdown and Quarto and in Shiny apps. We're going to see a little bit of all three of those things today.

Epoxy syntax and inline formatting

Okay, let's go back to our sentence. Infant daycare costs. This is the R Markdown version. And we're going to turn this into an Epoxy version. The first step is to put the whole thing into a chunk, an Epoxy chunk. This is kind of like an R chunk, except what's inside is going to be Markdown. Once we've done that, we can now use Epoxy syntax for inline code. Instead of the backticks R space expression backtick, we wrap the expression in curly braces. You can even say we embrace the expression. And then the final little bit of flair is to say how we want this thing to be formatted. So, we add dot dollar, borrowing from some syntax that I learned from the CLI package. Dot dollar says format this as a currency. When we render this, you get a nicely formatted sentence, a number.

Okay. Let's step back and see how this magic works just a little bit. When you're using Glue, Glue works great for strings. It's a little bit like paste, except it uses this embracing syntax to mark an expression. And it will go find max, median, cost in your global environment and fill it in. It doesn't do much formatting, though. But it does give us a key that Epoxy builds on. And that's the transformer argument. So, normally the identity transformer is the normal thing that Glue does. And it basically says find this variable and replace it with its value. Epoxy steps in here and uses the transformer argument to provide its own logic for what should happen when we see these embraced expressions. And that lets us do things like use the dot dollar syntax. And now we have a number formatted like a dollar.

And, of course, this is a lot of stuff that you don't want to type every time you use Epoxy. So, Epoxy wraps it all up into one function called Epoxy. And this also then powers the chunk, the Epoxy chunk.

All right. Let's see some of these other things that you can do with these inline formatting syntaxes. So, we saw that if you have a number that should be a currency or a dollar unit, you can use dot dollar. And we get a number formatted like a dollar. If you have a number between zero and one that should look like a percentage, you can say dot percent. And it's formatted like a percentage. A very large number that should probably have some commas to tell you where the placeholders are, you can use dot comma. Epoxy isn't just for numbers. You can also use it for text. Suppose you have oh, wait, no, there's one more number thing. And that is an ordinal. So, this is something like saying first, second, third, fourth, et cetera. And then it's not just for numbers. It's also for text. It's pretty cool. That you can do things like take a character string and say I want this to be formatted in title case. You could take a string and say I would like it to be bold. You could take another string and say I'd like this to be italic.

Building reusable templates

Okay. Let's put this all together into like a statistic, a real sentence that kind of actually that uses all these things and makes sense. We'll throw it in a blender and we get this. First, we start with a chunk. We call that it's the epoxy chunk. We give it a label. This time we're gonna give it a label of cost summary. So, remember that. We're gonna come back to that in a second. And in here, I'll just use the embrace syntax with these little dot transformers. And I can write infant daycare in very large counties ranks first in child care costs at 15,416.96 cents per year or 24% of median household income.

Okay. That's pretty cool. It's nice. I like it. I like parts of it. But there's a couple little things that are bugging me still. And we're gonna fix those. The first one is that 15,400 something dollars is kind of it's a big number. And I would like to, you know, express this in terms of how much is that per month? What is the monthly budget of child care costs? So, to do that, you would take, you know, 15,000 and whatever and divide it by 12. You might write a little function to do that. Take X, divide it by 12. And if you give it to another function called epoxy transform set with a little name like .CPM, you've now created your own inline transformer that you can use. So, if I go back to my sentence, I can add a phrase like about .CPM median max cost per month, which turns into 12,800 something.

It's not right. But we'll get there. Hold on. So, you can actually format that like a dollar also by adding another level of embracing. And now we have 1,284. The previous slide should have said 1,284 as well. But just without the decoration of dollars. But, yeah. So, you can compose these. So, you can have multiple transformations as you move from the number to the thing that you actually want to show. And you know, we could do even more. And we could add bold or italic around that as well.

These .$ and .% and the other ones that we saw that are built in with Epoxy are basically just functions as well. They come from packages that we use already and that we know, like the scales package. So, .$ really is just labeled dollar from the scales package and .% is labeled percent from the scales package.

So, now we get to the other little thing that's bugging me about this. It's like, do we really need to include the number of cents in our summary of per year, right? But maybe I do want a little bit more accuracy in terms of the percentage because I'm comparing between counties. And you know, the median household income. You know, I want it to sound really official. So, like we need another decimal point before that percentage. So, label dollar and label percent have arguments. And you can say I want dollars rounded to the nearest hundred. And label percent, you can say I want that rounded to the nearest. I want to show at least .1. So, now we have 23.8% in median household income. And we have about $1,300 per month. That's a lot of money. That number hits differently than our original statistic did, right?

That's a lot of money. That number hits differently than our original statistic did, right?

Using data frames with Epoxy

So, so far I've been kind of pretending like these variables come from the global environment. And they could. You could be calculating each one of these and you could save them in a variable or something. But most likely you're going to put this in a data frame. You have a data frame. You've been using that to summarize. And you can see now this whole time I've actually been sneakily using a data frame. And you can see the columns that we've seen so far. Like max, median, cost right there in the middle. And age.

So, if I would like to have this epoxy chunk use a data frame, say a row from the data frame, perhaps the first row from the data frame, I can use the .data chunk option and give it a data frame or a row from a data frame. Here's the first one. This is the same stat that we saw so far. And if I wanted to, for example, compare infant daycare with infant home care, then I could go back and look at this table and say, oh, it's row number 8. And I can change the data argument to row number 8. And now epoxy is going to take that row and apply the values in that row of the data frame to the template. And now the chunk itself kind of looks more like a template that we can reuse.

Most people, however, don't really like, you know, having to use chunk options as a way to interface with this. And certainly you're not going to just be like, oh, I want the eighth row. That's not how we think about this. Instead, what you would do is you take your child care summary, you pipe it into dplyr filter. You say age, I want where age is infant, where type is home care, where county size is very large. And imagine this is in an R chunk in your document. And we have that other epoxy chunk that we talked about earlier that has the label cost summary. So, I can pipe this data frame, which has one row, into epoxy use chunk, tell it to use the chunk with the label cost summary, and now re-render that sentence using these new values.

HTML, LaTeX, and script usage

Sometimes you don't write and just mark down. So, the epoxy chunks we've been seeing so far assume that what is in that chunk is basically marked down. But sometimes you need to actually write HTML or you're writing an HTML report and you want the stuff coming out of these epoxy chunks to be HTML. Sometimes you might be writing in LaTeX and you need to write some LaTeX if you have to. And in both cases, epoxy gives you two chunk types. We have an epoxy HTML chunk engine and an epoxy LaTeX chunk engine. They have slightly different semantics, just a little bit. It's just tweaked enough to make it really make sense for that language. So, in HTML, it's double embracing. And instead of dot bold, we would use at bold. What you get, though, is you get actual HTML instead of markdown formatting for bold, for example. In LaTeX, it's basically the same except the embrace characters are kind of taken already in LaTeX. So, we use chompy alligator things and we put that around county size. And we can still use dot bold. And in the end, you get whatever that is that LaTeX uses to make bold text.

Okay. All of these chunks are powered by our functions ultimately. So, under the hood, you can just easily call this an R. So, we have an epoxy function, epoxy HTML, epoxy LaTeX. Epoxy itself is basically kind of like a drop in replacement for glue with all these extra fun little bits. But because we don't have chunks in the R console or in R script, you can also use epoxy use file and read a template from a file.

Epoxy in Shiny

So, we've seen how you can use epoxy in reports, our markdown Quarto reports, markdown HTML LaTeX outputs. We've seen epoxy in scripts kind of. I mean, you get the idea, right? We can use it with these three functions. And I really can't talk at PositConf without mentioning my favorite web framework of all time. Shiny. So, let me show you a Shiny app.

In this case, epoxy brings these same ideas to a Shiny app with a little bit of a twist. So, it's not quite the same as what we've seen, but it's kind of the same idea. So, here's a Shiny app. Don't try to read all the code, please. It makes a little app like this. We have three inputs where we get greeting company and year. And then we have a section that we mark with UI epoxy HTML. We give it an ID so we can find it. And inside of that, we can just put Shiny things, UI elements, labels, pretty much anything. Here I picked P. And you can see kind of the template. So, it's making a paragraph tag and there's some stuff written in there. And you can see kind of the template for it. We have greetings, company, and year, right?

So, on the server side, we match we take render epoxy, match that to the ID of the template, which was hello. And then we send it some things that are going to end up being text. And they come out at the bottom. They come out in the paragraph tag over here. Hello, RStudio. Oh, hello, PositConf.

So, that's a pretty quick summary of what epoxy does. There's a whole lot more that it can do. And just to remind you what we talked about, epoxy is a great way to blend data and prose, build on the ideas of R Markdown. It gives you reusable templates, just-in-time formatting, use it in your reports and your scripts and your apps. Generally, just go have fun with it. And you can find it here and online in general. And thank you very much.