Using Quarto to Improve Formatting/Automate the Generation of Hundreds of Reports (Keaton Wilson)

Transcript#

This transcript was generated automatically and may contain errors.

I'm Keaton. I'm a solutions developer at KSNR. We're a market research company about 40 years old and we do database work with a variety of clients across a variety of sectors. I feel like this talk is appropriately placed because a lot of the data that we get is from surveys so I don't have to explain survey data. But I want to talk today a little bit about some work we've done over the past year and a half to leverage Quarto and some data wrangling tools to automate reporting.

Before I do that though, I want to talk about my great grandmother's zucchini pickles. Bear with me. So I'm a bit of a foodie and a collector of recipes. I have a lot of these recipe cards that have been handed down in my family. And they provide a really nice window into history. But they also have some interesting facets about them.

So the first is that they're kind of hard to read. I don't know about you all. I don't read or write cursive on a regular basis very much anymore. And when I first looked at this recipe, I actually thought it called for a half a cup of suet instead of salt. And I think those pickles probably would have turned out a lot differently.

You can also see some self-edits here. My great grandmother is correcting her cursive or her spelling, which is pretty great and pretty interesting to see. There's also some ambiguous instructions, right? What is a fairly fine chop? That means different things to different people. There's retro or hard-to-find components, right? Pickling spice varies by region, right? The same spices in one part of the world in pickling spice might be very different from another. There's some scaling challenges, right? If I wanted to become a zucchini pickle maven and make thousands of gallons of these, this recipe might not scale well. And finally, I can't ask for clarification, right? My great grandmother passed away a number of years ago. I can't call her on the phone and ask her questions about this recipe.

Reporting pipelines as recipes

So why am I talking about zucchini pickles? I would argue that reporting and data pipelines share a lot of similarities with recipes. Often we don't have up-to-date or thorough documentation on our pipelines. If those pipelines don't contain mechanisms for version control are based in GUI-based or graphical user interfaced clicking workflows, it's hard to see the history and understand those self-edits. Often we're working in orgs with outdated toolkits, and there's conversations around, why should I fix this? It isn't broken. Why are we spending time updating things? Scaling challenges, just like scaling up to that thousand-gallon batch of pickles is often challenging. And sometimes we can't ask for clarification, right? We have team members that leave or move on, and we can't ask them specifics about knowledge that only exists in their head.

I would argue that reporting and data pipelines share a lot of similarities with recipes.

The project teams reported that they saw efficiency gains of 50%, which is pretty big. We cut their time in half in the amount of time that they have to spend generating these reports and interacting with it, which is pretty huge.

I think we started with something like this. It works. The concept is great, and there's good bones here. But if we want to scale it and we want to bring it into 2025, we ended up making something like this, where the idea is multiple chefs or cooks can interact with a recipe that's well-documented and replicable. We employed some automation here in the form of a multi-handed chef robot to do some of the hard work for us. And then this allows us to build into the future. Again, shooting towards this goal of the 1,000-gallon jar of pickles.

So a big thank you to a lot of folks, colleagues on the DS&I team, Jamie Favada and Mike C. They are part of our survey programming crew. Really instrumental in, again, that first step of getting the survey data into a place where we could start working with it on our pipeline. And then, of course, the project team, Gunnar and Amy, those folks are the ones working with the clients directly and are seeing the efficiency gains and really instrumental in this whole process.

Again, feel free to connect with me. And if you have any questions on the code, feel free to submit an issue or shoot me a message on GitHub or find me somewhere over the next day or two. And happy to answer any questions.

Q&A

All right. So question number one. Do you have any logic for flagging or removing outlier responses?

A lot of that happens before we get the data. So we have mechanisms in our survey reporting platform. And as part of that wrangling process, that's where a lot of the sort of first step QA, QC happens. So luckily, we didn't have to worry about that too much once we got to our portion of things.

Is this workflow for web-based surveys? They're wondering if you use any validation tooling like Pointblank to flag issues among respondent data.

Yeah, that is a great question. It is from web-based surveys. And it's a, our survey platform is Forsta Plus is what we use. Although, I don't know, survey down may be in our future. But I think the Pointblank validation is a good idea and really, really cool. And I think it's an amazing package. I think we just need to figure out where in the entire process would be best to implement it. And again, there's so much data QA, QC that happens before we even get the data.

Did you parameterize the reports or try using parameters for the dates or other user selections within R Markdown?

Yeah, I think there's, if you start looking at the code base, there are opportunities for improvements on the automation front. The way that the parameterization works now is passing through respondent IDs that are tied to that particular date range that we selected. That gets passed through to each individual report. A dynamic set of Quarto files based on the template get rendered, and then that gets rendered ultimately into the PDFs. So, short answer is some, there's some parameterization. I think more could be done.

Were there any challenges in convincing the team to switch to Typst?

No, we haven't crossed that bridge yet. I think that's a thing that I want to do, and my boss wants to do, and maybe some other folks on my team want to do. But yeah, I think there's, we'll see. We'll see.

How are the client reports commented on in the process?

Yeah, so the way that it works is we do an internal QA, QC check of all the reports for a particular batch, right? So, we have eyeballs on every report that gets sent to the client before it goes, and then they're sent in batch. And if there's additional comments, right, that goes back to the business team and may get filtered back to us. I hope that's what they meant about commenting.

And then, last question, how do you generate the reports? How long does it take to, say, render 100 of them?

Yeah, so pretty, it's not like insanely speedy, but again, we leverage a lot of Posit infrastructure like Workbench, so it can just kind of run in the background. But generally, I would say 100 reports is probably like five to seven minutes.

Using Quarto to Improve Formatting/Automate the Generation of Hundreds of Reports (Keaton Wilson)

Transcript#

Reporting pipelines as recipes

The recipe challenge

How the pipeline works

Hurdles: automation vs. flexibility

Design improvements

Results and takeaways

Q&A

Featured software#

Quarto

tidyverse