
Talks - Michael Chow, Richard Iannone: Making Beautiful, Publication Quality Tables in Python...
Full title: Making Beautiful, Publication Quality Tables in Python is Possible in 2024 Tables are undeniably useful for data work. We have many excellent DataFrame libraries in Python and they give us the flexibility to manipulate data to our hearts content. But what happens when comes to presenting tables to others? The display of tables can be beautiful. Tables can convey information effectively, just as plots do and, sometimes, it’s the better way to present data. Truly, the time has come to bridge the divide between raw DataFrame output and wondrously-structured tables suitable for publication. Let's review the state of ‘display tables’ in 2024. We’ll go over which table components make for effective displays of information. It’s surprising but there are many considerations that go into making a well-crafted table. We’ll take a look at the combinations of Python packages that fit together to make this important task possible, and marvel together at the tabular results they can provide
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
All right, good morning. Welcome to Ballroom BC. This next session is Making Beautiful, Production Quality Tables in Python is Possible in 2024. Everyone in the correct spot? All right, please welcome Michael Chow and Richard Iannone.
All right, morning or noontime, thanks for having us. I'm Michael Chow and this is Richard Iannone, and we are software engineers at Posit PBC, and I'm so excited to be talking about how making beautiful publication quality tables in Python is possible in 2024.
Collectively, we have two PhDs, two dogs, and five cats, and we're big fans of table display.
And what I mean by table display is less of the one on the left, so that is a polar's data frame output. So this is more like if you're a table mechanic or a table plumber, this is the kind of table you might look at a lot as you're analyzing data. Notice that the column names are all lowercase, you use underscores rather than spaces, and there's a lot of kind of nitty-gritty that you need to do if you're a table mechanic. What we want to talk about and what we're so excited to talk about is the one on the right, which is a table for display. So this is a table for communicating out to humans who are interested in what the data means.
And so this table is styled for display and to help people pull key insights out quickly. All right, so just to give a quick overview, we're going to talk about three big things. So we want to talk about what the goals are of tables as data visualization. Then we want to talk about three key ingredients to making a table for presentation. And last, we want to talk about some more advanced table design topics, like nanoplots, which are also known as sparklines.
All right, so I'm going to turn it over to Rich to talk a little bit more about table goals.
Table goals and beautiful examples
So I'm going to show you some beautiful tables I found on the internet. So this one right here, it's beautiful because it's a complete visualization. It has a title, explains the purpose of the table. Team logos really help to show you what the rows even mean. And it has other little touches, like this spanner that groups columns together and nicely formatted percentage values. And most importantly, it has a highlighted row, which draws attention to the main part of the table, like basically the whole point of the table, which is great.
I have another one from the same author. And this one is a player report card. And the great thing about this one, and the thing that strikes me right away, it has bar charts, which enable fast comparisons within the table. So plots have bar plots. But tables can happen too. And it's great because you have the value, and also you can sort of see what the values are just visually. And another nice touch is that these rows are subdivided for a great organization. We call these row groups. And it also has footnotes, and they provide additional detail on things which may be confusing to the reader.
Here's one more table from a different author. And it's a brilliant table. It's all about CO2 emissions from energy production. And two things I want to point out in this table is, it has very nicely formatted percentage values. You notice that the values are decimal aligned, and there's other little touches, like only one decimal place. Really nice, very thoughtful. And it also has a heat map. Essentially, this whole thing is not just a wall of numbers. We have color to sort of denote the values. And it's done in two different ways, or actually three different ways in this table. So it makes it very easy to scan and get to the insight that you want.
So the great thing about these tables is they're all made from code. You didn't have to go to InDesign or Illustrator to make these tables or touch them up. And when we make tables from code, we benefit from a reproducible workflow. So we have input data coming in, we do our analysis, and we do the visualization right after that, and then finally get reporting.
So the great thing about these tables is they're all made from code. You didn't have to go to InDesign or Illustrator to make these tables or touch them up. And when we make tables from code, we benefit from a reproducible workflow.
So how did we get here? So obviously, we had to make an API. But even before we did that, we had to sort of get the best ideas on table generation. Surprisingly and also frustratingly, there weren't too many books on tables at all. We looked around. There's lots of design books on data analytics and visualization. So we had to look really hard. But we found one. We found one from way back in 1949. It's the Census Manual of Tabular Presentation. What does it do? It basically dials up concepts on table display to 11. It's over 200 pages, just tons of advice on tables. So it provides tons of, like, ideas on what to do and what not to do. And kind of importantly and quickly, it gets to the structure of a table. It formalizes it. It gives it names. Now, this is really important because if you don't have names for things, we can't really, you know, use them in API or discuss them at all. So we took that and we took the best parts of it and used this in our table display framework.
So there's lots of ways to make tables today. You could just, like, present a raw data frame to people. It's not really recommended. Wouldn't do it. Or another idea, and I've been guilty of this, take your analysis, write it to CSV, bring it to Excel, make the table there. But the problem with that is that your reproducible workflow is now totally broken. So we propose use great tables. It makes you work entirely in Python, reproducible, less effort, and the tables look really good.
So great tables, or we can call it GT, it's focused purely on tables. It's not the only approach in Python, but it is comprehensive and actively developed. And we try to solve all the table problems, basically try to make you make any table that's conceivable.
And so for the rest of this talk, we'll use great tables to illustrate the process and design behind making presentation quality tables. So basically, that's table goals. I'm going to hand it off to Michael for the key ingredients part.
Three key ingredients
All right. I'm loving this dance we're doing. Okay. So, yeah, that was some of the background on tables and what makes good table visualization and the value of reproducible workflows. I'm going to talk a little bit more, dive a little bit more into how you can make these tables and three key ingredients behind them.
All right. So the three key ingredients I want to talk about are structure, format, and style. So basically, structure lets you add things like a title or a column spanner. This is a sort of a label on top of your columns. And it also allows for relabeling columns. This is really important because it lets you manage the information hierarchy of your table. If someone has three seconds to figure out what your table's about, you really need a title.
Up top, you really need good grouping. Formatting is really important because different industries really care about different ways of presenting numbers. It's really incredible. A number can be reformatted a million ways depending on the industry you're in. And lastly, style, by filling in the background colors, we're able to group columns and make it really quick to tell apart groups of columns. And lastly, on the bottom, another piece of style is this bolded text. So the bottom row is actually a total. And great tables tries to make it very easy to say, like, bold specific rows of data.
Before we go on, I have to admit I have a shameful secret. In preparing this data, I made an analysis error. You might notice the percentage in the blue column adds up to 102%. I just want to own that right up front. It's not a great tables problem. It's a Michael Chow problem. And I'm so sorry. But now that that's out of the way, now if anyone mentions that flaw, I'm going to point them to this exact timestamp in this talk just to get ahead of it. All right. With that out of the way, structure, format, and style.
Okay. So just a little setup. For this example, we're going to import from great tables import GT. So that's the main kind of table object. And because we're using polars, a data frame library in Python that's really neat, I just want to do one little bit of setup, which is if you import polars selectors as CS, this is a piece that we use all the time. It lets you select groups of columns really easily. So this example, CS dot starts with revenue says give me every column that starts with revenue. Pretty straightforward, but I want to make sure that that's out of the way up front because we'll use it a ton when formatting tables.
And you can even, the nice thing is you can save these as variables. So you can say, like, select rev equals CS starts with. That lets you reuse this logic all over and kind of choose columns really easily.
All right. So let's get to it. Okay. So this is what the table looks like just straight out of the GT constructor. It's nice enough, no real formatting applied, just kind of dumped out. So the first thing we'll do is we'll use tab header to add a title to the table. Next, we'll use tab spanner to format the columns. So notice we're using the selector now. So each column that starts with revenue, we've put a new column, a label over it called a spanner, and we've labeled that revenue. So now we're able to group these columns together.
Next, we'll do the same thing with profit. So now we've put a label over profit. And lastly, so I just want to flag, notice that the column names though now are a bit redundant. They say what's in the spanner. So we're just going to relabel everything to be a little bit cleaner. Now notice we have, right, like, revenue up top, amount and percent down below. So we have this nice kind of hierarchy of information and grouping in our column labels.
Okay. So that's structuring. Next, we're going to look quickly at formatting. So notice that, all right, notice that in the purple, we formatted the percentages to have a percentage sign and to remove decimals. Just to clean up the display a little bit, we're using these formatters that start with .fmt. So this is .fmt underscore percent, grabbing every column that ends with PCT just to quickly clean them up. In the green, so down below, we have .format number. This is a way of restructuring numbers to match various, like, business requirements. So in this example, we've reformatted the numbers to have a K for thousands and M for millions and a dollar sign in front of them. I have to say, people are always really surprised to see a K and an M. A lot of feedback we got was, these should all be Ms. It doesn't make sense to have K for thousands, M for millions. I think the thing I want to flag is that in a lot of their domains, like in government, where they do this. So it's really important to meet users where they are. Like, if your boss wants Ks and Ms, you should, it's not the time for a holy crusade, right? And so, Great Tables tries to meet people or your boss where they are.
All right. And the last bit, so that's formatting. The last bit will be styling. So styling has the most steps. So I want to break it down really quickly. Styling uses the .tabstyle method and two helpful little submodules, loc and style. Essentially, how tab, how styling works is there's where you want to do the style and there's what you want to do. So loc handles where on the table should get styled and style handles what kind of styling should we do. So in this case, we formatted the body of the table. That's the where and all the columns starting with revenue. And the what is we're filling them with the color Alice blue. So two pieces that get combined to create a style, the where and the what.
I'm going to show the other two bits. So we do the same thing to add the orange. So a similar command but now on top of profit columns. And then this last bit, the one flagged in red, this tab style, we use style.text to bold the row and the column. And then we use the where to fill in the rows. So we're using it to filter rows of data where product is total. And that lets us apply styles to specific parts of the table. So I want to breeze through but wanted to give you a sense for kind of how these things build on top of each other and how we've tried to break apart what people want to do. So that's styling.
So we went through three pieces, structuring, formatting, and styling. There's one bonus piece which is sprucing up the table with images and plots. So there are extra formatters like format. Oh, I'm sorry. This is dot format image that can add a small image to your table. And there's format nano plot that can actually add a small graph to your table. And this is really nice because this visual information, people can grab really quickly. So if you're looking for a scale, you can use the images to find the row really quickly. So this helps people navigate.
All right. So those are three key ingredients of tables, structure, formatting, and style. They're really involved in customizing tables for display. And it's really important that they meet people's business requirements that we can format in a wide range of ways to meet the millions of ways that people want to format numbers and things in their domains. All right. So next, we're going to talk a little bit more about advanced design. So going a little bit more into the plots that you saw and some other moves. So I'm going to hand it over to Rich.
Advanced design: nanoplots and data color
So yeah, this is the advanced design part or really cool things you can do with a table. So format nano plot. It's a great method we have. It gives you nano plots, which are compact visualizations that reveal trends in the data. Why do you want this? Sometimes things stick out. Like you see in like this third row, cold brews for hot summers. We see a pattern there, and we can immediately see it. It's better than numbers. It's visual. And espresso machines, we see something that jumps out at us. Sales are sort of intermittent. So kind of cool. We can get that right away from nano plots.
And where did nano plots come from? Oh, yeah, Sparklines. Yeah, they were popularized by Tufte and talked about a lot and actually implemented in Excel. So people know them and use them a lot, and they like the feature. So we decided to extend that a bit and be inspired by it. So with nano plots, basically, your timed insights can be minimized. So for instance, in this table of a patient summary of like, you know, tests day after day, we see right away that in this second row that, oh, yeah, the value is well outside the normal range. Kind of bad. Faster than reading the numbers. Kind of important, too. And the cool thing about nano plots is they're actually interactive. You can just hover over the values and then get the values, essentially. So it kind of splits the difference between a plot and a table. It gives you the exact value, but also gives you a plot, which is kind of cool. And because they're so customizable, you can get bars and line plots as well. So you have lots of options to style it any way you want. And because you can do that, you can use it in many different contexts. So I have a few tables here. Stock prices, weather data, sales data. Nano plots can make it work in all these three cases.
Okay. Moving on to another method we have called data color. It basically is the method that allows you to add heat maps to a table. So basically, you just have to specify where you want the heat map, what palette you want, and it just goes ahead and does that. So basically, this is the table you saw before. And if you didn't have a heat map here or didn't use data color, it would be a little bit harder to digest. You have to go through each one of these numbers just to see the patterns. And so, like, we can see right away that large values, you can see them. And, like, negligible values are basically lacking color in this case.
So the big advantage you get from plots in tables with data color is you can emphasize differences in values and reveal trends in the data. So, for instance, you might go row-wise, just go across a row, and sort of, like, see visually what are the big values or the small values. You might go top to bottom, like across a column, compare across observations with a heat map or a color ramp. And then globally, you can see, like, a pattern of values in the table. You can see right here, right away, the trend here is that energy from fossil fuels, it leads to higher CO2 intensity values. So as you go down to the right, you see more dark brown, you see less bright green.
Formatting methods
Okay. Another advanced thing we have, and this is, like, kind of like the bread and butter of great tables, formatting methods. And you saw Michael demonstrate some of those, but we're going to show you what's available so you just know what's in there. And so, we can format numbers, dates, strings. And, for instance, I have this one column. It's sadly unformatted. So we can, like, apply different formatters to it, like format number. It gives you decimal places, grouping separators as well, and gives you lots of control over formatting a number. If you don't care about, like, the decimal part, you can just break it down to integers with the format integer method. Sometimes you have small, like, really small or really huge numbers, and scientific notation is great for that. So we have a method to do just that. Percentage values, like, sometimes you just have fractional values, or sometimes you have the value itself, which is already, like, the percentage value. You just need to add a percent sign. This method gives you an option to do both. And great tables knows all about currencies. You just give it the number, the currency code, and, you know, you can tweak out some options as well. But you generally get, like, the right, like, currency glyph associated with the value. And we even have methods for doing things like calculating bytes, like, for many bytes. And with each of these methods, you could decorate the formatted value with a pattern. In this case, I just, like, use angle brackets around each of these. And we've got many more I didn't show here, like, formatting dates, times, creating images, and also formatting markdown in the table. Let's end off with Michael.
Wrapping up
Yo, thanks. Okay. So, to sum up, yeah, Rich talked a little bit more about some advanced design for tables, like nanoplots, data color, and formatting, the whole range of activities you might do to really make it quick to pull out trends and insights and patterns across the table.
So, to sum up, making beautiful publication quality tables in Python is not only possible, it's actually really great. Tables are cool. People love good looking tables. And you can basically keep your chain, like, if you're analyzing your data in Python and you're producing plots in Python and reports, it would be a real shame to have to drag your data out to Excel and then back. So, you can keep that workflow with GT, or great tables, and your life will be vastly improved.
So, to sum up, making beautiful publication quality tables in Python is not only possible, it's actually really great. Tables are cool. People love good looking tables.
I have to admit, I'm a convert to table displays. I, eight months ago, I was just a person analyzing data in the dark of my bedroom, making plots. I have to admit, like, I hadn't done almost any table styling. And then Rich burst into the scene, waving the 1949 U.S. Census Manual, and I have to admit, I was, at very first, very confused. I didn't really understand what was happening, or all the names, parts of tables he kept mentioning. But I think it's a really neat area, and a widely overlooked one. I think we often reach for plots when the compactness and the, like, raw data-ness of a table would do such a great job.
So, if you want to get started, it's a pip install great tables away. We have a website with lots of examples and a blog where we talk about just the freaky joy that tables, table display, brings into our lives. So, yeah, we'd love your feedback. We'd love to hear about the areas where you're using tables in. I feel like we're always surprised by new freaky uses of tables. So, definitely find us and let us know. And thanks so much for watching, and we hope that great tables brings a lot of beautiful, publication-ready tables into your life in 2024.
Q&A
All right, we have a couple of minutes for questions. Any questions?
Thanks. Just had a quick question. If you have too much data to display on that single table, and I saw some interactivity with the sparklines, do you have that for, like, scrolling and other extensions? I actually don't have that yet, but it's in the works. We plan to have an interactive type of table where basically everything from this goes into interactive tables. So, you have pagination and all those usual trappings of lots of data in a table. Yeah, it's in the works.
Great talk, guys. I love tables as well. Just two quick questions. One is, are there white space formatting options, like, to expand column sizes, like, you know, row heights and things like that? Yeah, there are methods for that. We have, like, opt horizontal padding and opt vertical padding. They sort of, like, stretch your table in the x-axis and the y-axis, as it were. Okay, great. And just quickly, what kind of output formats do you have? I think you might have mentioned JPEG and HTML? Yeah, we have HTML, and we have a bunch of image formats. I guess quite a few of them, right? Like, all the usual ones, PDF, PNG, TIFF. Not JPEG, unfortunately. Also, if I can make a quick note, sometimes when we post about great tables on LinkedIn, Matt Harrison shows up, and he says, how's the LaTeX rendering? And I just want Matt Harrison to know that these comments have haunted us for the past six months, and we do want to do LaTeX rendering, Matt Harrison. We're thinking about it so hard, and every comment pushes us a little closer. So I just wanted to leave that note.
All right, I have another question over here for you. Hi, I'm wondering, also, in terms of output formats, once you have an object that is created through great tables, are you able to introspect that and, like, modify it? Can you, like, take, like, a great tables object that was produced by some function and then, you know, change, like, one part of it? Yeah. I believe that's possible, as long as it's a great tables object. We don't have any way to, like, remove parts. Say you get, like, a table, and it's got some parts you don't want. We don't have methods yet to excise those parts from the table yet. But once we start seeing, like, libraries starting to make great tables objects, then we'll more quickly get that in there. But, yeah, partially you can, yeah.
I was just wondering if you had accessibility options for, like, my co-worker has CVDs, so I was wondering if you could do shading, things like that, or patterns, that kind of thing. I think not right now for colors. We don't really have that yet. Although we do have accessibility in the HTML we produce, and we're trying to go along that road to make it better for screen readers, at least. But not so much on the color side yet.
All right. Can you summarize the subcategories? So do a number of different subcategories with the different analytics for each totals and subtotals? That's a great question. So we you can do it in a table just by sort of making your input table have that, or kind of finagle it now. But we want to do a little more beefed up support for, yeah, summary rows. I think that's one of the things in the next quarter that we're really aiming at. Because it seems to happen a lot. Yeah. Summary rows, kind of like inside tables, groupings of rows is a really common activity.
Thanks for the talk. I'm horrible at graphic design, and I would love if your tool could provide some suggestions as to which colors to use or fonts to make the text more readable. I'm assuming there's a font default, but do you have anything for colors and other aspects of the formatting as well? Yeah. One of the things we plan is to have some information type methods that give you, like, basically great tables of information, like recommendations. We don't have a way yet to get, like, Google font fonts into the table yet. Once we do, we'll have some associated documentation on recommended fonts of different types for different table displays. So we're working towards that. Right now, just a bit lean on that one aspect.
So it seemed like a lot of the examples were in polars, but I think you also mentioned that it works for pandas. So, like, does it make a difference? Does it change a lot of how you do? Is polars the way to use this library, or does it not really matter? Yeah, that's a great question. So either supported really in the most native way we thought made sense with them. So the one trick is pandas doesn't have lazy expressions, so it's really easy in the examples to show these polars expressions to grab columns and things. But you can use, with great tables, lambdas with pandas, and then you can do any kind of, like, analysis to select rows of data or conditionally select things to style. So the biggest difference is just in the spots where you saw polars selectors, you would use probably a lambda to select based on the string names, or to select a row of data, you would use a function that takes a data frame and returns the row that you care about. So it all works out of the box, and actually most of our, a lot of our documentation uses pandas since it was the one we used a ton when starting out, and our tests probably most heavily run on pandas data frames. All right, and this will be our last question.
Hello. One of the absolute banes of my existence at work is we put together a lot of tables that go in reports that get sent by email, and Outlook ruins every table we make. Do you have anything of, like, subsets of features that Outlook won't break, or any support for that, or is this just basically sticking with Outlook will always make my life horrible? Just a quick rich note, Rich also works on a lot of libraries for emailing things, so whatever Rich says, I just want to preface that he has done a lot of email stuff that he, I don't, I don't want him to undersell himself. Yeah, I'm really glad you brought up email. So basically, what you need for an HTML table in an email that doesn't break is, well, we have a lot of styles in, like, in our HTML tables, and that's usually kind of bad, so we have to inline the styles first, right, to make sure that they're high priority, and they survive the Outlook client, as it were. That's something we're thinking of doing, and we're planning on. Basically, you just need something like, like, juice to inline the styles, and then we'll have a nice embeddable table they can put into an email, and it should survive Outlook, one hopes. All right, thank you so much. Can we get a round of applause for Michael and Rich?

