epoxy: Super Glue for Data-driven Reports and Shiny Apps - posit::conf(2023)

Transcript#

This transcript was generated automatically and may contain errors.

Well, hi, everybody. My name is Garrick, and today I'd like to talk about a package that I wrote called Epoxy. But first, I'd like to tell you a little bit about my kids. My daughter Ruth is turning four soon, and my son Augie is about to turn seven. And six going on seven is a really fun age. It's about the time when you start to ask questions like, Dad, how much money do you make? Or recently, okay, so tell me, you sit in front of a computer all day and you type on it, and then they give you money?

And I was thinking about this because I ran into a dataset on the Tidy Tuesday project about childcare costs. Did you know that infant daycare in the United States costs up to $15,417 a year? Now, I know what you're thinking. Children are expensive. And they are. You're right. And also, as a room full of data scientists, I also have a suspicion that you're kind of thinking about this number. How did that number get there?

Like, which dataset did it come from? It came from the Department of Labor, for example. But what decisions were made in the process of grouping and summarizing and aggregating and calculating this statistic? And that's a very valid question. As someone who spent a lot of time thinking about how data scientists write reports and spending time building tooling for data scientists to write reports and build apps, when I ask how did that number get there, I'm thinking very specifically, like, how did that number get there? Right?

I'm imagining somebody sitting down at a word processor and they start typing. Infant daycare costs up to how much does it cost? And then they maybe open up a spreadsheet and they find the cell where they did the calculation and they copy it and then they come back here and they paste it. A year. Okay, cool. That's how much it costs a year.

I like to imagine a data scientist who is working in R Markdown and they're writing this in kind of the same way. They write the sentence, infant daycare costs up to, but when they get to this point, they have some extra context available to them. In this R Markdown report, just a little bit above this sentence, there's a code chunk where they've been doing this calculation. They took childcare costs and they grouped and summarized and filtered and they came up with a number and they called that, they put it in a variable and they called that variable max median cost. Already you know something more than you did before about that number because we see the label that it was given when we saved that number to a variable.

And having this context, they can take that variable and use R's inline code syntax, right, to embed the max median cost of childcare right into the sentence with a little backtick R and then the variable and another backtick. This is the power and the promise and just what makes R Markdown wonderful, right? That you can work with your data and also talk about it, summarize it, report on it, describe it to other people here in the same place, right? It's kind of like a source document for the report, right? And at some point you hit the render button and you get this amazing this amazing sentence. Yeah.

This is the power and the promise and just what makes R Markdown wonderful, right? That you can work with your data and also talk about it, summarize it, report on it, describe it to other people here in the same place, right?

Infant daycare costs up to 1.541696 and then that part, the little bit at the end means like times 10 raised to the fourth power. This doesn't make any sense. It doesn't hit in the same way as this statistic. This problem of what happens right before of formatting things in a way that is readable and useful is exactly the problem that Epoxy is trying to solve. It gives you an inline syntax that makes it easy to use just-in-time formatting so you can say that that thing should be formatted as a currency. And as a result, we also get to use some of Knitter's powerful tools like these chunks to create little templates that we can reuse throughout our document.

That's a lot of money. That number hits differently than our original statistic did, right?

Using data frames with Epoxy

So, so far I've been kind of pretending like these variables come from the global environment. And they could. You could be calculating each one of these and you could save them in a variable or something. But most likely you're going to put this in a data frame. You have a data frame. You've been using that to summarize. And you can see now this whole time I've actually been sneakily using a data frame. And you can see the columns that we've seen so far. Like max, median, cost right there in the middle. And age.

So, if I would like to have this epoxy chunk use a data frame, say a row from the data frame, perhaps the first row from the data frame, I can use the .data chunk option and give it a data frame or a row from a data frame. Here's the first one. This is the same stat that we saw so far. And if I wanted to, for example, compare infant daycare with infant home care, then I could go back and look at this table and say, oh, it's row number 8. And I can change the data argument to row number 8. And now epoxy is going to take that row and apply the values in that row of the data frame to the template. And now the chunk itself kind of looks more like a template that we can reuse.

Most people, however, don't really like, you know, having to use chunk options as a way to interface with this. And certainly you're not going to just be like, oh, I want the eighth row. That's not how we think about this. Instead, what you would do is you take your child care summary, you pipe it into dplyr filter. You say age, I want where age is infant, where type is home care, where county size is very large. And imagine this is in an R chunk in your document. And we have that other epoxy chunk that we talked about earlier that has the label cost summary. So, I can pipe this data frame, which has one row, into epoxy use chunk, tell it to use the chunk with the label cost summary, and now re-render that sentence using these new values.

HTML, LaTeX, and script usage

Sometimes you don't write and just mark down. So, the epoxy chunks we've been seeing so far assume that what is in that chunk is basically marked down. But sometimes you need to actually write HTML or you're writing an HTML report and you want the stuff coming out of these epoxy chunks to be HTML. Sometimes you might be writing in LaTeX and you need to write some LaTeX if you have to. And in both cases, epoxy gives you two chunk types. We have an epoxy HTML chunk engine and an epoxy LaTeX chunk engine. They have slightly different semantics, just a little bit. It's just tweaked enough to make it really make sense for that language. So, in HTML, it's double embracing. And instead of dot bold, we would use at bold. What you get, though, is you get actual HTML instead of markdown formatting for bold, for example. In LaTeX, it's basically the same except the embrace characters are kind of taken already in LaTeX. So, we use chompy alligator things and we put that around county size. And we can still use dot bold. And in the end, you get whatever that is that LaTeX uses to make bold text.

Okay. All of these chunks are powered by our functions ultimately. So, under the hood, you can just easily call this an R. So, we have an epoxy function, epoxy HTML, epoxy LaTeX. Epoxy itself is basically kind of like a drop in replacement for glue with all these extra fun little bits. But because we don't have chunks in the R console or in R script, you can also use epoxy use file and read a template from a file.

Epoxy in Shiny

So, we've seen how you can use epoxy in reports, our markdown Quarto reports, markdown HTML LaTeX outputs. We've seen epoxy in scripts kind of. I mean, you get the idea, right? We can use it with these three functions. And I really can't talk at PositConf without mentioning my favorite web framework of all time. Shiny. So, let me show you a Shiny app.

In this case, epoxy brings these same ideas to a Shiny app with a little bit of a twist. So, it's not quite the same as what we've seen, but it's kind of the same idea. So, here's a Shiny app. Don't try to read all the code, please. It makes a little app like this. We have three inputs where we get greeting company and year. And then we have a section that we mark with UI epoxy HTML. We give it an ID so we can find it. And inside of that, we can just put Shiny things, UI elements, labels, pretty much anything. Here I picked P. And you can see kind of the template. So, it's making a paragraph tag and there's some stuff written in there. And you can see kind of the template for it. We have greetings, company, and year, right?

So, on the server side, we match we take render epoxy, match that to the ID of the template, which was hello. And then we send it some things that are going to end up being text. And they come out at the bottom. They come out in the paragraph tag over here. Hello, RStudio . Oh, hello, PositConf.

So, that's a pretty quick summary of what epoxy does. There's a whole lot more that it can do. And just to remind you what we talked about, epoxy is a great way to blend data and prose, build on the ideas of R Markdown. It gives you reusable templates, just-in-time formatting, use it in your reports and your scripts and your apps. Generally, just go have fun with it. And you can find it here and online in general. And thank you very much.

epoxy: Super Glue for Data-driven Reports and Shiny Apps - posit::conf(2023)

Transcript#

How Epoxy is built on Glue

Epoxy syntax and inline formatting

Building reusable templates

Using data frames with Epoxy

HTML, LaTeX, and script usage

Epoxy in Shiny

Featured software#

Quarto

Shiny