
Great Tables: Make beautiful, publication quality tables in Python | Rich Iannone & Michael Chow
Tables are undeniably useful for data work. We have many great DataFrame libraries available in Python, and they give us flexibility in terms of manipulating data at will, but what happens when presenting tables to others? It's nice to display tables. Tables can efficiently carry information, just like plots do, and at times it is the better way of presenting data. Indeed, it is time to bridge the divide between raw DataFrame output and wondrously structured tables suitable for publication. Now, let us turn our attention to the state of 'display tables' in 2024. Let us go over what comprises key components for building effective information displays in tables. It may surprise one how new a well-crafted table can be hewn. We'll take a look at the combinations of Python packages that fit together to make this important task possible, and marvel together at the tabular results they can provide! Learn more at https://posit-dev.github.io/great-tables/ Timestamps: 0:00 Intro: Meet Rich and Michael 0:41 What we mean by "publication ready tables" 1:29 Overview of what we'll talk about in this video 1:44 Table Goals: Ways to make a table beautiful 4:41 Tables made from reproducible code! 5:11 The history of table generation to influence our API 6:15 Our modern take on a table display framework 6:35 The problem with Excel 7:38 Introducing Great Tables! 8:00 Key Ingredients of making a Great Table 8:24 Structure: Title, column spanners and nice column labels 8:52 Format: Compact dollar values and percentages 9:23 Styling: Fill color and bold text 10:09 Imports and Polars Selectors 11:08 Coding the structure 12:27 Coding the format 13:07 Coding the styling 15:03 Putting images and plots in your table cells 15:49 Advanced Design 16:07 .fmt_nanoplot(): Small plots within table cells 19:08 .data_color(): Heat maps in tales 20:51 Powerful and plentiful methods to format cell values 22:48 To sum up: TABLES RULE
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
All right, hey everybody, I'm Michael Chow and this is Rich Iannone, and we wanted to talk today about how making beautiful, publication quality tables in Python is possible in 2024. And we're really hoping that at the end, you don't just think that they're possible, but a total delight.
So just to give you a little bit of background, we are software engineers at Posit PBC, and between the two of us, we have two PhDs, two dogs, and five cats, and we are fanatics of table display. We're just so interested in beautiful, publication-ready tables. We each have two and a half cats. That's the joke.
Yeah, what I mean by display tables and publication-ready tables is less of the table on the left, which is a Polars data frame. So this is like a raw data frame. If you're a data frame plumber, you're a data scientist and you're working on the data, this view makes sense. You know, the column names have underscores in them. You can see the types of the columns, and you can really see the nitty-gritty raw values.
We mean less of that kind of table that has its guts hanging out for analysis, and more of the table on the right, which is a table that you might send to someone if you're trying to convey something about the data. So this is the table we'll be talking about. It's a coffee table, and we'll be walking through it today.
Just to give you a sense of where we'll be going, first we want to talk about tables as data visualization, then key ingredients to creating a beautiful display table, and then last, some more advanced considerations when creating tables for display. So I'll be talking about our table goals. So basically the first part of the talk.
Tables as data visualization
And I want to start just by showing you a few beautiful tables I got from the internet. So the cool thing about this table here is that it's pretty visually stunning. It has all the parts that I like in a table. Like look at this header. Basically it just has a title and a subtitle, but what it does is it explains the purpose of the table. Before you see anything else, you sort of see what the table really is supposed to drive at.
Another cool thing I like is that there's a spanner. What that means is there's a label above a number of columns. It just gives you some sort of grouping and some structure to the table. And another cool thing is these team logos. They quickly convey the identity of each row. And formatting. That's a thing here. We see that we have percentage values, and they're nicely formatted for readability. And the really cool thing is that this row here, the second row, is highlighted, and it essentially draws attention to the main subject of the table. I won't get into that right, what it is, but it's a key part of the data visualization. It gives you the focus and it drives you to the insight pretty fast.
And so I have another table from the same author, coincidentally. What really grabbed me about this table is that there's bar charts inside this table. You can see right here, quite a few of them, they're front and center. And what they do is they enable really fast comparisons between values here. If you just have the numbers, like the percentage values, it wouldn't be so highly readable, but we have both. We have the bars plus the precise values.
Another cool thing is that we have lots of rows, but what the author did was subdivide the rows into categories or little groups for a better organization. So that's great. If you want to focus on, like, the box scores or the shooting part or some advanced stats, you can focus on them one bit at a time without the entire table overwhelming you.
Another cool thing is that we have a footer here, and in that footer, the author chose to put in some footnotes, and that provides additional detail. It didn't have to be put in the text. It could just be put right in the table. So that's a really cool thing that the author did.
Okay, here's a third beautiful table I grabbed from the internet. This is sort of a big one. It's a lot of numbers, but the cool thing about it is that there's essentially a heat map in this table. We see lots of color, and the color is actually pretty good because it tells you what the values are without you having to look and parse each and every single value. You sort of get an idea just from, like, the colors presented.
And there's a number of good things I like about this table. Besides the heat map, we have nicely formatted percentage values, and we noticed that if you look really closely, the percentage values have decimal alignment, which makes for easy readability of the values, which is really a nice touch. And of course, the heat map, like I said before, it helps you scan the values, and it really aids comparisons without having to read every single value.
So the really cool thing about these tables is they're not made in Illustrator, they're not made in some graphics program, they're made from code. And of course, the benefit from that is that we get a reproducible workflow. You go from input data to analysis to visualization, which is this, and you can bring that right to reporting. And you don't have to, you know, go to another program, dump the value, dump the data out, and then, like, you know, use some other tool. You have one continuous reproducible workflow, which is very cool.
You have one continuous reproducible workflow, which is very cool.
So how did we get here to this point where we can make tables from code? Well, we had to generate an API, but before we create that API, we had to look, you know, at, you know, the state of the art in terms of, like, what's done for tables. Surprisingly, and this is something that made me a little bit upset, is that there weren't many texts at all for table design. So we had to look really, really hard. You know, lots of books here. These are great for everything else. But luckily, we found one book, which was great. It's a pretty old one, but it's a great one. It's the Census Manual of Tabular Presentation.
What does it do? Well, it essentially dials the concepts on table display to 11. It provides many solid and useful recommendations. It has nearly 300 pages of, you know, just table stuff, which is really great. The most important thing is right away it formalizes the structure of a table. We don't have to go too far. It's actually in figure two of this whole book. It gets right down to just showing you what the different parts of a table are called and how they're structured and what they do.
So we took that because these are really great ideas, and we adapted it to a sort of a more modern, you know, version of a table. And we got this. So we have the different parts, like the table header, the stub, and table footer, and all the other parts. And we gave them names, which was super important.
So how do you make tables today? Sort of like looking at how people do it. You may take a raw data frame. This is, I believe, a Polars data frame. You can present it to other people like this, just like, here you go, the raw data frame. This is your results. Not really recommended because it doesn't look too good and also doesn't have all the data. It's kind of like not so great. It's a bit raw.
Another idea. This is what a lot of people do. I'm guilty of this. You can bring your data to Excel and then make the table there. Now, the problem is, and this is what I discussed before, is that now your workflow is not reproducible. It's a bit broken in that way. So not great. I mean, you got the table done, but, you know, at the expense of like breaking your reproducible workflow.
So our idea is you can use an API like Great Tables and work entirely in Python. It's reproducible. Probably less effort overall. And the tables look really good. So you're not losing anything by keeping it in Python. You're actually getting a lot because it's pretty great.
Okay. So Great Tables. It's our package. It's focused purely on the display of tables. That's its only concern. It's definitely not the only approach in Python. But we do promise you that it's comprehensive, actively developed. And it tries to go deep on all table related problems. So throughout the rest of this talk, we'll use Great Tables and illustrate the process and design behind making presentation quality tables. And for that, I'm going to give you Michael again, and he'll focus on the key ingredients of making a great table.
Key ingredients: structure, format, and style
Oh, yeah. Nice. Thanks. So I'm going to walk through how to go from a sort of plain data frame to a beautiful display table. And the three key ingredients I'll talk about are structure, format, and style.
So basically, structure is adding parts to the table that are sort of outside the data itself. So notice that here we've added a title. And we have some column spanners, which actually go over the column labels and give us some good grouping information around columns. And then we formatted the column labels themselves so that they're a little bit cleaner. There are no underscores. Things are in title case and a little bit easier to read.
The next piece is formatting. So formatting, for example, is changing the actual data values. So notice that we have compact dollar values. So we're using things like K to mean thousands and M to mean millions. This is common in some domains like some government domains on tables. They like to report out using these compact formats. We've also formatted the percentages just to clean them up. So they're whole numbers, and they have a percentage sign after them.
And the last piece is styling. So for style, we're going to fill the background of some columns to group them together. So notice the revenue columns are blue, but they have a blue background. And the profit columns have a papaya whip background. The bottom row is the text is bolded to emphasize that it's a total. I just want to emphasize that one of these columns, a percentage column, sums up to 102%, which is not a Great Tables problem. It's a Michael Chow problem. I screwed the data up before this talk. But hopefully, it's not too distracting. I just wanted to own that before you owned me and dragged me through the internet for my terrible addition skills.
All right, so we're going to go through these step by step to show what they might look like in code. The very first thing we need to do is some imports and a little bit of Polars explanation. So first, we import from Great Tables the GT object. That's the thing we use all throughout Great Tables to format the tables. So for Polars, we're going to use Polars DataFrames. So Polars is a library for data analysis. And the one thing we need to mention is these Polars selectors. So we're going to import Polars selectors as CS. And the one thing that this lets us do is it lets us select groups of columns really easily. So we can say things like CS.startsWith Revenue to choose all the columns that begin with revenue. So it's a really convenient tool that we'll use throughout. But we just wanted to flag it up front so you can get a sense for what's going on in the examples.
All right. So looking at the first piece structure, this is the plain table output. And we're going to add some parts to it. So the first part is tab header. Tab header just adds a simple title to the table. This is a really small piece, but it's probably the most important part of the table. Because in the information hierarchy, the title says what people are even looking at. It's probably the most important piece of information if someone has three seconds to figure out what's going on.
So the next piece we'll add is a tab spanner. And this is just a label over the columns to group them together. So here we have revenue. And notice that we're just putting this label over the revenue columns so that people can see they're grouped a bit more easily. And we'll go ahead and do it to the profit columns just to have a similar spanner. And then to finish off, we're going to rename the columns. So notice before we were repeating in the column name words like revenue that we now have in the spanners. So we're going to go through and clean them up so that extra information is gone. And people can just see revenue in the spanner and amount in the column label. So now things are quick to read.
So the next piece is formatting. And here we use these methods that start with .fmt. So the first piece is format percent. And here we're using a Polars selector to get all the columns ending with percent, PCT. And we're cutting out their decimals. And so notice the columns flagged in purple now are formatted as percentages. And they don't have any trailing decimals. They're just whole numbers. The second piece is format number. This is the piece we use to make the number compact. So instead of being written out with all of its zeros, now it's shortened to have K for thousands and M for millions. And so that's it for formatting.
The last piece is styling. So for styling, we'll use the .tab_style method. And the key to this is .tab_style takes two arguments. So one is location. Basically, that's where in the table you want to apply the style. And the second is style. And that's the actual styling you want to do. So it's sort of the where versus the what that you want to do for style. So in this example, we're styling the body. So loc.body means the actual data in the table. And we're selecting specific columns, the revenue columns. And style.fill means we want to fill the background. And in this case, we're filling it Alice blue. So this is just to make the revenue columns group together visually.
So in green, the code flagged in green is we apply it again to profit. And we do something similar, but this time we make the profit columns have a papaya whip background. And then the last thing is a little bit different. Instead of formatting columns, so before we made the backgrounds of the columns a color, we target a row. So if you look at the very bottom of the table, the text is bolded. What we do here is we use the location. We can use Polars expressions. So these are little pieces of data analysis, essentially. So here, the rows equals part is using a Polars expression to basically filter out the bottom row of the table for styling. And then the style piece, we use style.text to make it bold. So the key is there's a whole range of styles that we can apply to a variety of places.
All right. So that was the basics of structure, formatting, and style. One other exciting piece is we can add images and plots. So on the left, we've added some images with format image. This is really nice because people can now see really quickly different products. So if they're looking for a specific thing like filters, they can hopefully pick it out really fast visually. Nanoplots on the right, so those are those tiny bar charts, are neat. They can display little trends in the table. And these are really powerful because they combine some of the quick insight and trend spotting of a traditional plot with the compactness of a table. So those are two bonus ingredients that Great Tables can put in.
Advanced designs: nanoplots, data color, and formatting
So I want to talk to you about one method we have in our package, and it's really quite nice, format nanoplot. I like to call them small plots within table cells that give you really compact visualizations that can reveal trends in the data. And they're quite fun to use as well. Like, for instance, you might notice right away that the cold brew row is, well, it's a little bit strange. I mean, you can just sort of see the pattern right there. It increases in summer to a maximum, and other parts of the year aren't so great. And also, espresso machines. Seems like we have intermittent sales, and, you know, like some months are fantastic. Other months have zero sales. And you can get that right away with nanoplots. These bar plots, this bar plot version of nanoplots, you can see those patterns right away. So really kind of cool. Better than just numbers by themselves, for sure.
So why did we make this sort of method to generate nanoplots? Well, they were inspired by sparklines, and you may have heard of those. They were popularized by Tufte. And if you didn't hear of it that way, you probably heard of it through Microsoft Excel, because there's an implementation of that in Excel. And people really like that feature. I see that stuff all over the place when people are talking about Excel. So sparklines are really cool, and they work in tables, as we can see that it works in Excel.
So we have a different version of a nanoplot here. Basically, these are like line-based nanoplots. And it's really kind of cool, because in a table like this, you want your time to insight to be really minimized. This is essentially a table of daily tests performed on a patient, measurements from day three to day eight. And we can sort of see the progression of these different lab tests day after day. Some of these trends go up, some go down. Maybe they're not so good. Maybe they're just fine. We can see right here, WBC, white blood cell count. It's increasing beyond the normal range, a little bit scary. But no matter, we can sort of see that right away with this nanoplot.
And nanoplots are really cool, because they're interactive. You can just hover over different values, like different points, and get a readout, which is really kind of neat. So you can either just look at it or explore a little bit. So inviting that interactivity is kind of a cool feature we wanted from day one when designing these nanoplots.
Okay, and they're really flexible, too. They can be styled in many different ways. And because of that, they can be used in many different contexts. Like for instance, stock prices, there's a table right there showing them. Nanoplots are pretty good for that sort of thing. Weather data, there's temperatures, also pretty good, because you can sort of compare lots of different places and get your weather data that way. You can also make it such that you have a little scatterplot. So sales data is like throughout a day, that's not a bad thing to plot, and you can hover over those values and sort of see how the sales went throughout the day.
Another thing we have is data color. It allows you to create heatmaps and tables. So here's that table we showed earlier. It's actually generated through Great Tables. And with data color, it allows you to generate a heatmap with a palette of your choosing. So essentially, it makes the large amount of data presented in this table much easier to digest at a glance. You don't have to look just at a wall of numbers. You've actually got these colored cells to help you out. And because of that, you can see large values right away. Any sort of negligible or small values, they sort of lack that color, because what's been done is that the low ends are sort of faded out. So that's quite nice to see and really easy to parse.
So big two advantages we take from plots in tables with this are emphasizing differences in values and revealing trends in the data. So let me just show you. In a row, for instance, you can compare across measures, right? So you're just looking at the France row. You can see what energy mix there is just right away just by scanning back and forth. Looking up and down a column, you can compare values across observations. So we can see here, CO2 intensity, it increases as you go down the table. And so we sort of see the ranking, the relative ranking of these different observations. And globally, we can see a pattern across the table, which is really nice. So the trend we see here is sort of like moving from the top left to the bottom right. Basically, it boils down to energy from fossil fuels. It leads to higher CO2 intensity values. That's how the table's organized. But you can sort of see the extent to how this plays out in the table.
Okay, another big thing we have inside Great Tables is formatting methods. And we saw some of that earlier in Michael's demonstration. They are really powerful, and there's quite a few of them. So what we can do with them is format numbers, dates, and strings with flexible and powerful methods. So, for instance, we take this unformatted column of values. We can just write number. And right away, we can get a thing like a fixed number of decimal places and also these grouping separators in the large part of the number. So it makes it easier to read these numbers. If you don't really care about the decimal part, we can use format integer, and it'll do the rounding for you and still have grouping separators. So these are great for counts, for instance.
Sometimes your values are either extremely large or very small. And for that, and especially in certain domains, you might want to use format scientific to get your values in scientific notation. And it's quite easy to do that with that method. Another thing is formatting as percentage values. We can do that with format percent. And if you have currency values, we can use format currency. And the great thing about that is you can provide any currency code that you want, and Great Tables will understand how to format that in terms of the number of decimal places and the currency sign to use. We run pretty deep when it comes to formatting methods. So we have all sorts of them, and that includes formatting as bytes as well. So you provide a number, and format bytes will give you the byte size of that number. And with all these formatting methods, we have certain common arguments, and one of them is pattern. It just allows you to decorate your formatted value with any sort of string literals you want around that. Yeah, and as I said before, there's many more available. You can format dates, times, you can add images, you could format text as markdown. And there's many more in development because it's a pretty sort of fundamental part of this package.
Wrapping up
So to sum up, Michael, would you take this away? I hope we make the case that making beautiful publication-quality tables in Python is not only possible in 2024, but it's actually really great. We think tables are really cool, and we think, well, we're pretty sure people love good-looking tables. Sending a polished table is a chance to make a really great first impression, and it tells people it's worth paying attention to. And there's so much at your disposal to help group information and highlight things that there's a lot of really great stuff in Great Tables for really bringing out the best in your data.
Sending a polished table is a chance to make a really great first impression, and it tells people it's worth paying attention to.
We talked a bit about the reproducibility chain that you probably already spend a lot of work. You worked really hard to be sure that your data and your analysis and a lot of your visualizations are already in Python. The neat thing about Great Tables is now you don't have to pluck your data into Excel. You can keep going with your Python scripts and have one single sort of tool chain to go from input data to reporting.
The best way to get started is to pip install Great Tables. We have a website, which is positdev.github.io slash great hyphen tables. We've tried to really fill it with a lot of examples. And along the way, as we've developed Great Tables, we've tried to blog and document what we think are some core problems that we've been chipping away at and trying to solve. So hopefully there are a lot of really nice nuggets there about table display and things we've learned along the way.
Thank you so much for watching this. And we hope that you have a lot of beautiful table use cases that you'll be able to throw Great Tables at. See y'all. Hope you make a lot of beautiful tables. Keep making those tables.


