Generate 100s of custom reports in minutes with Python & Quarto! (Parameterized report automation)

A practical guide to generating hundreds of customized reports using Quarto and Python. Learn how to leverage Quarto's parameter system to create PDFs and HTML reports at scale. Using a movie dataset example, we'll cover: - How to automate report generation with Python and Quarto - Create dynamic templates with data visualization and formatting - Adapt parameters to work with both Python and R code - Build scalable reporting workflows for any dataset This tutorial demonstrates how to transform what could be hours of manual reporting work into an efficient automated process. Quarto Crash Course Video: https://youtu.be/_VKxTPWDhA4?si=Q09V5xXlyo1YVQqs Generating Analytics Reports (typst) Video: https://youtu.be/Q3phTByW138?si=FIMiAWAHP6HhUnhG Link to Code (starting): https://github.com/KeithGalli/quarto-crash-course Link to Code (finished): https://github.com/KeithGalli/quarto-crash-course/tree/parameterized-reports Information on the dataset can be found here: https://github.com/KeithGalli/quarto-crash-course/tree/master?tab=readme-ov-file#resources Video timeline! 0:00 - Video Overview 0:58 - Walking through some starter code (linked in description) 3:20 - Passing parameters with Python into Quarto Markdown Files 4:27 - Passing parameters using R 5:18 - Using a Python script to render many reports with variable parameters. 6:36 - Dynamically generating markdown sections with Python (output: asis - execution option) 11:56 - Adding images, table of contents, & pagebreaks to our reports 15:00 - Adding graphs to our reports 17:54 - Tying everything together 21:47 - Adding fenced divs & styling/formatting dynamically with code

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey, what's up everyone? Wanted to share a quick tip today that I find very valuable in my own professional experience working with Quarto , and that is the ability to use parameters to generate 100s of different reports quickly using Quarto and Python.

So some context for why you might want to do this, imagine you are working for your local government and you need to create different types of analytics reports for each town in your state, or imagine you're working for a big retail store and you need to generate sales reports for each department in your store, or maybe you are working at a software company and you need to generate financial reports for each one of your clients. I think traditionally this might require a bunch of manual work where you're just copy and pasting like time after time and it can get very tedious, whereas using Quarto strategically you can just do this in minutes.

I think traditionally this might require a bunch of manual work where you're just copy and pasting like time after time and it can get very tedious, whereas using Quarto strategically you can just do this in minutes.

In this video we will be using a movie dataset to generate our reports, but after watching the video you should be able to apply the same concepts to your own real-world use cases and datasets pretty easily.

So to get started, make sure you have Quarto installed and know a little bit about the basics of Quarto. If you don't, that's totally fine. We recently posted a Quarto crash course that you can see right here, or I'll link it in the description if not, and that covers, you know, setup and installation, the basics of the markdown syntax and what you can do with Quarto.

Overview of the movie dataset

At the end of that video specifically, we highlighted an example where we could create these parameterized reports. So for context, we have this, the movie database dataset, which I'll link information about in the description, but it has all sorts of different movies and information like what did people rate them out of 10? How many votes did they get? And you know, other information such as like an overview of the movie and the genres that the movie is in, etc.

So at the end of the Quarto crash course video, we had a little report that we could generate based on the top movies for a certain genre. So we'll see an example of that real quick. So we can see for the top movies in comedy from this dataset, we have these five movies. In this video, we'll build off of that and see how we can do all sorts of things and actually generate hundreds of PDF analytic reports for all these different movie categories.

Rendering a report with a parameter

The code is linked in the description. So we have a example Quarto markdown file right here. If we look at some of the information in here, so let's run this cell and run this cell. We have this data frame of movies and that's what I just showed.

So I'm going to just highlight this real quick. We have all of these different movies and we specifically are sorting it by genre. So we can see the genre by looking at this. So this is specifically movies that have comedy as their first genre listed. And then we're basically getting the top movies from that, which we can see right here are these five movies in this situation.

Let's imagine that we wanted to be able to set any genre in any number of movies and create a unique novel report for each of these situations. So to start out, let's see how we could dynamically create a report for a different genre. So I'm going to open up a terminal window and it's as easy as doing something like Quarto render. We have parameterized report.qmd here. We're going to render this, but we're going to specifically pass a parameter and we will call this parameter as we see here.

We have this cell that defines what parameters we can use, but we'll pass in the genre is action. If we look in our files, basically a new report will be created and we can open that and we see we get the top movies in action.

Real quick, I just want to mention that while we're using this syntax to specify our parameters using Python and Jupyter, if you're doing the same type of thing with R, you can specify your parameters within your actual YAML. So I would basically paste these params. I would edit my header to include these params and then I could reference them by doing the following.

But you can repeat basically the same exact process to create many, many reports very quickly using this type of command line syntax. If we wrote a Python script, we could very easily do this for many different genres. So if you look at the file generate all reports, again, GitHub code is linked in the description. We see we take all of the different genres we have and we can iterate a loop over these genres and we can output a specific report for each genre and we save it in the file genre.html.

You could look at any one of these reports. This code works. You could look at any one of these reports and you can find the information about these different categories.

Adding dynamic parameters and visualizations

But imagine we wanted to take this a step further. How could we do that? So one task, and feel free to try this on your own, but imagine you wanted to have a dynamic number of movies here. How might you do that? Feel free to pause the video, try it out on your own before you see the solution that I would use.

And also imagine maybe you wanted to make this analytics report more interesting. Maybe you wanted to add a graph or something, and maybe instead of HTML, make it PDF. How could we do all of these things?

Okay, so starting out, if we wanted to dynamically list a certain amount of movies, we could delete all of this, or maybe I'll copy one cell just so we have it for reference, but we could delete all this. And basically, if you think about Python, think about code, if you have the same type of thing repeated, it's kind of prime for a for loop. So let's make this a for loop.

For i in range number of movies, we basically want to output the movie. So we could, you know, do print, how did we get the index, and we might also actually want to iterate over this. Okay, so for i in range number of movies, we want to get our movie. So movie equals top movies dot iloc i, how about? We want to print out maybe the movie title. And we also want to print out the overview of the movie.

Okay, it didn't really execute the code, it looks like. Because we didn't close our Python code cell. All right, run that again. Okay, we see it all there. But it's not formatted in any way. So maybe we'd be like, okay, let's make this an f string and add a header tag here. And then, you know, pass this in as a parameter. So now we're using f strings to actually pass in markdown, we do the same down here with our description, and maybe I make this, you know, description. And then pass in movie overview as a parameter.

Close that off. Make sure that this is closed too. Run this. And we see it's all there again, still it's not being rendered as markdown. So there's a cool output option that you have available in both Python and R within Quarto. And you can pass in output, I think it's output is ACES. Don't know what this stands for exactly. But let me pull in the documentation.

So on this page, quarto.org docs computations execution options dot html, we have this output ACES option, which allows you to generate raw markdown output. And it's different based on if you're using the Jupyter Python engine or the knitter engine for R. It's slightly different. But we're going to set that option.

And now rerun this. Not quite working how we see we want it. But I think if we add some new lines here, so a new line for our headers, and a new line for our descriptions, then these will print out properly. Look at that. And maybe we also wanted the number of the movie. So instead of doing for i in range number of movies, we want to kind of iterate over this backwards, I believe.

So range number of movies to zero, iterate negative one, because we want it backwards. So this would be number i, print, maybe we make this bold, we can pass in the same type of bold parameters, title. And now let's run this. And we'll make this as number of movies minus one. And this is a little quirky, but if we're iterating backwards, we want it to stop at zero. So we could do to negative one to get it to go to the right range. And we could just do number i plus one here just to make it more, you know, because we started indexing zero with Python, but we, you know, count from one. So here we go. Top movies and comedy.

And if you wanted to make this a little bit more fancy, one thing you could also do is add a URL of the poster image. So in that data set, there is a poster path image. And if you add this poster path to a base URL, so we can say URL equals the base path is base path is this. And then we want to add our poster path. So I can just pass in movie poster path. And now we could also include this in our data. I really just need to pass in the URL, I don't need to actually pass in the title here. So this is now an image. Run this.

Okay, that you can make it smaller. If you wanted to, you could do width equals 50% or something here. Maybe we can make it 25% I think we have to pass this in as a string. Oh, it should be in the print statement. Run this. And now we're using the same level of quotes inside of quotes. So it's getting a little bit messed up. Let's just try 25%. Oh, it's getting confused because we're putting variables within an F string. Okay, that works 25%.

Cool. So just understand what's going. So we use double brackets that undoes, that basically just lets you use the raw brackets in your string. Okay, so now we have top movies and comedy looks like this.

Generating PDF reports

We could very easily change this to a PDF format. We see we get the same looking thing here in PDF format. If you wanted to make this more interesting, you could, let's say, add a table of contents. Now we see a table of contents. We could also, let's say, add page breaks. So maybe I didn't want anything to show up. Maybe I wanted the top movies to show up on a different page, on a page break. See now it starts there.

We could also add, let's say, a graph of all of the movies in that genre. So how about we do ratings distribution or something. I'm going to change up the syntax of this a little bit. So we're going to do top movies here, we're going to have that double header. We're going to make these numbers triple headers, and we're going to have something that's just called ratings distribution for, how about, the genre.

And that's going to be a Python graph, Python histogram, import, just close this off real quick, import matplotlib, pyplot, plot.hist. We'll pass in the genre of movies, and we'll specifically be plotting the vote average. And look at how well it automatically auto-completes this with GitHub Copilot. I'll add a x label, which is just going to be the rating. And then the y label will be the vote count, or the frequency. So the number of times, the number of movies that have this rating.

And you could, using Quarto syntax, do like a figure, caption, rating, histogram for genre. You could add a figcap location, we'll make it below. Let's see what the options are, bottom. And we'll align this to the center, so fig, align, center. We run this now. How about we add a page break here, too. Run this. See top movies, we get a histogram.

Then we want to do a plot.show to just tie it all together. Cool, so we get this on the second page, because we're using page breaks, then we see the top movies.

And with this now, we can render it with a different set of parameters. So Quarto render, parameterize, report.qmd, dash p, number of movies, how about is going to be 20. And how about the other, the genre will be action. So now we have 20 movies, and they're action. See all of them. So cool, it's dynamically working.

Scaling up to hundreds of reports

We could then go to our generate all reports. In addition to iterating over the genres, maybe we also iterate for num movies. So maybe we make the number of movies a range. So maybe it's between, you know, 15 and, or 5 and 25, trying to think how we can get to 100 reports. This is 18 categories. So if we do 20 iterations for each category, we'll get to hundreds of reports that we can generate.

So how about for i in range 5 to 25, this is the number of movies we're going to include. For genre movies, with i movies, and then we could also do genre underscore i. We can make this a PDF. We could also specify where we save this. So maybe we save it in a PDF report, PDF reports directory. So I'm going to change my cordo.yaml to do that. And so here we will be generating tons of reports in just a matter of minutes. And then we pass in another parameter. Number of movies is going to be i.

So we run this command, I'll make it a little smaller so you can see. And now if we run this, we see this folder PDF reports get created, and it is populating different PDF reports for each of these parameters. And if we look in that folder, we can go to PDF reports. We can click on one of these. So here's the one for action 5. We see 5 movies with the image in the description. We could go to action 11, and we see this one has 11 movies.

And yes, this use case might not be exactly what you need for your own project. But the cool thing here is that we're running a Python script to dynamically set these parameters, and we can just change up the variables, run new reports, and generate all these super quickly. So you have a data set where you're doing this for each town, doing this for each department in your store, et cetera. You can use these same concepts and do it super quickly.

But the cool thing here is that we're running a Python script to dynamically set these parameters, and we can just change up the variables, run new reports, and generate all these super quickly.

Using fence divs for layout

One final concept worth touching on quick is even in this for loop, we can also add, let's say, fence divs if we wanted to. So let's say we wanted to align the image and the description on the same row. We could do that using a layout fence div. So imagine we pass in another print statement. We pass in the three colons to specify fence div. We pass in the layout parameter. So layout equals something like 100. That's going to affect the title because that's our first paragraph. It's not going to affect the heading. And then I can pass in how about 30% and 70%.

We can even just keep it as the 25% that we have right now for the image. And we can now delete this width. And so if we look at what happens when we do this, run this, I guess I need to close the fence div as well. So let me just do that real quick, print, and we should probably add some new lines here. Let's add a new line here and a new line here. We might have to add multiple, I'm not sure. But run this again.

Okay, it didn't quite work. Let's just add another new line. I'm going to add two new lines to each of these just to make sure it's separated plenty. Make sure that this is closed, looks like we screwed up the layout. I also didn't close the layout tag. Keep making small mistakes. There we go. That should be good.

Now we see that it is aligned with 30% of this in this column and 70% in the next column. It looks a little clunky though right now. One cool trick you can also do is you can specify these negative spaces. So this is just basically a blank space column. And then I make this 65. So it's saying, okay, the first item here takes up 25% of the real estate, which is the URL. And then the next one takes up 65% and we basically leave this blank 10% width. So if now I run this, we see we get a little bit of space there too. So you can play around with formatting things like this. You can incorporate fence divs using the print statements and whatnot too. Cool to check out.

All right, while we were doing that, I think all of the reports have run. Let's check it out. And if I go to this directory, as you can see, we have hundreds of reports created in a short amount of time. If you want to see how we can generate reports like this with more complex formatting using types and just some more complex concepts, you can check out the video that I posted in my own channel that will be linked right here or also linked in the description. All right, that's all we're going to do in this video. Hopefully you enjoyed it. Hopefully you found it useful. If you did, throw it a thumbs up, subscribe to the channel if you haven't already, and let us know if you have any recommendations for future videos. Until next time everyone, peace out.

Featured software#