Resources

Great Tables 3: Data Color and Polishing

This workshop is all about using Great Tables to make beautiful tables for publication and display purposes. We believe that effective tables have these things in common: structuring that aids in the reading of the table well-formatted values, fitting expectations for the field of study styling that reduces time to insight and improves aesthetics These materials are for you if: • you have some experience with data analysis in Python • you often create reporting that involves summarizations of data • you were often frustrated with making tables for display purposes outside of Python • you found beautiful-looking tables in the wild and wondered: 'How could I do that?' Other videos in this series: Great Tables 1: Structure, Format, and Style: https://youtu.be/QM7DbsY-nc4 Great Tables 2: Introducing Units Notation: https://youtu.be/SN0_vIL1Rhk About us: Michael Chow, Senior Software Engineer, Posit Michael is a data scientist and software engineer. He has programmed in Python for well over a decade, and he obtained a PhD in cognitive psychology from Princeton University. His interests include statistical methods, skill acquisition, and human memory. Richard Iannone, Senior Software Engineer, Posit Richard is a software engineer and table enthusiast. He's been vigorously working on making display tables easier to create/display in Python. And generally Rich enjoys creating open source packages so that people can great things in their own work. Workshop repo: https://github.com/rich-iannone/great-tables-mini-workshop?tab=readme-ov-file Learn more at https://posit-dev.github.io/great-tables/articles/intro.html

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, for this third and final part, I'm going to focus on a table, which is really nice to see because it has a lot of color in it and color based on value. It's called the Power Generation Table. Let's get right into that. So this is how the table will look in the end. This is the table that Michael showed in his intro slides.

It's a beautiful table because it's not just a wall of numbers by itself, which in itself is quite interesting. But it actually uses color by value to sort of show a pattern, to show trends in the table. We can see that higher amounts of green indicate lower carbon intensity, and higher amounts of brown mean fossil fuels and higher amounts of CO2 intensity overall.

We see that in this column. We see that in these columns. And this is basically a really nice table that's improved by adding color to the table.

Overview of methods

So a few different methods will be used, but not too many, surprisingly, for this large table. We will add, of course, a table title with a tab header. But the new thing that we'll show here is we can add source notes at the bottom of the table. This is the footer section of the table with tab source note. We will change the width of the different columns with the calls width method. We have a lot of percentage values in the table, and that'll just be formatted with format percent.

In quite a few uses of data underscore color, we will color different subsets of the table. So there'll be different treatments of color as we move across the table with data color. And then finally, we will align values away from their default alignment with the calls align method.

Tab source note and calls width

So let me show you what tab source note and calls width does. So we can add notes to the table in a footer component. This is similar to adding a title in terms of how it operates. But we're adding it to the bottom, and these are table notes. We'll do that with the tab source note method.

So using our example table, we'll just use GT, of course, to encapsulate that and to make that table. We'll add a source note with tab source note, and we'll just provide a bit of text called note. And we'll see that at the bottom of the table right here.

And the thing that we can do to change the width of columns, because sometimes you might want to take control of the column widths, is to use the calls width method. We can do that by providing a set of cases in a dictionary, providing the column name, and then, like, widths in pixel values. And we can do also other things, too. We can also use percentage values. Basically anything that HTML tables accepts in terms of CSS length units are accepted here.

So we see that when the table widths are determined by content size, we get this tight packing of columns and values together. These are the default widths. But we just customize them by providing that dictionary of values. And so we see here that this column here is 150px, this one here is 100px, and this date column is 200px, just as we provided in that calls width column.

Data color

So data color is a method that we use to apply color to tables. And it serves us sort of like making a heat map in a table. And to do that, we provide a palette, the columns that we want to apply it to, and also the range of values that the palette is going to span over. So, yeah, it really makes this large amount of information here much easier to digest at a glance.

So you can see that large values can be seen right away because they have a really intense color value. And in some cases, negligible values, like things which are very close to zero, they lack color. So that's really nice to see.

So we get two advantages from plots in tables from doing it from providing heat maps in tables. We emphasize differences in values, and we also reveal trends in the data. So for instance, here, we have this one row. We can scan across this row and see that, in this case, nuclear is the dominant form of energy in France, just by looking at the color. We didn't even have to look at the value.

And column-wise, we can compare across observations. If we're just looking at CO2 intensity, we can see that the top values here, they're ranked, but we can just see by color that these are the lowest CO2 intensity values. And if we see these as a trend, we can see a global pattern of values in the table. So the trend I see here is that more energy from columns to the right leads to higher CO2 intensity values. We see a sweep from top left to bottom right in terms of high amounts of green on the top left and large amounts of darker brown on the bottom right.

So the trend I see here is that more energy from columns to the right leads to higher CO2 intensity values. We see a sweep from top left to bottom right in terms of high amounts of green on the top left and large amounts of darker brown on the bottom right.

So data color is pretty easy to use initially. We can actually use it without any supplied arguments at all to colorize a GT table. If you just take an exible, pass out the GT, and then use data color without any arguments, we get all the columns which are colored in different ways, which is not really desirable. We don't really want a table like this. So what's more often the case is we supply certain columns along with a palette.

So in this case, the num and the currency columns will receive colorization, and we see here that they go from red to green. So red is like the lowest values, and green are the highest values. They're in order here, but they're out of order here. So we sort of see that right there.

Another way to do it is to constrain the cells which the coloring can be applied to. So we do that, of course, you saw with columns, but we can also constrain the rows. Sometimes you don't want these very large values to be included for whatever reason. We can do that just by selecting certain rows. In this case, it's a pandas data frame, so we can use a lambda here to only color the values where the currency value is less than 50. So to resolve this expression.

Another thing we're doing here is we're choosing a domain. So we're actually setting the limits of the colorization. So zero to 50, that's what we'll use for coloring, and anything outside of that will just not be used.

Building the power generation table

And for power generation table, we're going to set up by getting our imports. In this case, we want to use pandas. So we're going to import pandas as PD. And for create tables, we're going to import a few things. We're going to import GT, MD for markdown formatting, and style and look for using tab style.

Okay, so we imported those successfully. We're going to read in our power generation table as a CSV. And we're going to output that right here into the side. And what we see here inside the console, or sort of like the printed out version of the table, is a very wide table with lots of numbers. So we're going to transform that into a Great Tables table with lots of color.

So let's get that right away into Great Tables. We do that just by supplying this power generation table to GT. And let's run that. So it doesn't look much different than what we see here in the console. Instead, we just get like a nicer formatting because it's actually HTML table.

So the first thing we're going to do is use tab source note and calls with. And from that, we're going to add a title to explain the contents. Okay, so the first thing is the title. So I'm going to erase this code right here and use the title supplied above, tab header, and paste in this bit of text right there.

And I said right here I want to make sure that carbon intensity and power consumption are in bold lettering. To do that, we have to use markdown for that. So I'm going to surround the text with MD and then go to carbon intensity. I'm going to use double asterisks and then same thing around power consumption.

Okay, great. So I'm going to run that cell. And we see if you scroll back, we have the title up here and we do have certain parts of that title in bold text, which is great.

Another thing we're going to do is add some explanations about the table to the footer of the table. And to do that, we use tab source note. And I have a large amount of text I want to put in. I'm going to copy it from elsewhere. But the first thing to note is we have to use tab source note right there. And then the bit of text will be coming in momentarily.

And this is markdown formatting, and it can accept things like URLs, like right here. And these will be formatted properly because they're surrounded by MD. So if I run this, I will see that at the bottom, I do have links that do work. If I click onto these, it will take me to a browser, to the different sites, and it's a great way of providing additional information.

One more thing we're going to do is in this first part, make the width of the zone column 120px, and any other column will be 85px in terms of columns. So to do that, we use calls width. And again, offscreen, I'm going to paste in a set of cases for that argument where zone is 120px, and every other column within this table will be 85px. So a dictionary here is being used, which is great.

So now we changed the column widths, added a header and a footer. We can format the values in this table body right here. Some of them are going to be percentages for the most part, and these values right here are not going to be percentages. They're going to be, I believe, formatted as integers.

So let's do the fractional values first. What I'm going to do is use format percent like so. And then I'm just going to focus on the columns which are numeric, which are basically all these columns except for the zone column. So we can do that with CS numeric. What I'm doing here is using a polar selector, and it's selecting all the columns which are numeric.

I'm going to format those as percentage values. So I'm going to run that. I apologize. First thing you have to do is make sure that CS, which is column selectors, is actually imported. And what I'm going to do actually is not use pandas but use polars. So I'm going to import polars as PL. I'm going to ensure that I import polars selectors as CS. And then what I'm going to do is I'm going to change this to be not PD read CSV but PL read CSV.

Now, once I make these changes and rerun all the code above and then run this cell, this now works. So we can flip between pandas and polar state frames and we get different sort of things. We get these column selectors, which are really nice to use within polar state frames. And so long as you're using a polar state frame, you get to use these. For pandas, you have to resort to lambdas, which is not nearly as nice as using polars.

So we're doing this. And we see that we do get lots of percentage values, which is great. But one thing is that they're applied to all the columns because we asked for that. So CN2 intensity is not going to be a percentage. But the cool thing about formatters is that you can format over existing or columns that have been formatted before. So we can actually just use format integer on that one column. And it's totally fine because essentially it's the last column that wins.

So we're going to type in CO2 intensity right here and then run that cell. And we see that everything is still as percentages except for the CO2 intensity column, which is kind of nice. So you can format a wide swath of columns and then just target certain other ones with different types of formatting. And it's totally fine.

Applying data color to the table

So that's all the formatting we were doing in the table because we handled all the numeric columns, which are just things that require formatting. And then we're going to move focus to coloring the different cells with data color. So in this case, I have pretty large sophisticated palettes. I have them right here so I don't have to type those in manually. Suffice to say, I want to format everything in the CO2 intensity column first with that palette.

So to do that, we use data color. And then the first argument will be columns. And that will be CO2 intensity. The second argument will be palette. And this is where I'm going to paste in this very large palette here. So that's moving from green to brown, lower values to higher values. And then finally, we're going to have another argument, which is domain. So we see here in the text we want it in the range of 0 to 900. So we're going to supply that in a list of integers, 0, 900 right here.

And if you run this, we will get that first column colored like we saw in the slides, where we have a dark green color up here for low values, moving towards sienna and tomato and then brown at the bottom right there. Great. So that's our first bit of coloring. So we're going to sweep across the table and do different applications of data color.

So in this case, the cells hydro to geothermal, which will be this column here all the way over here, those are going to be colored according to value with this palette right here. So I'm going to do the same sort of thing. I'll comment this code and then use data color.

The columns in this case will be a larger set of columns. I have it as a list here. I didn't show you this before, but you can just have a list of columns. And I'll have to use a selector because it's actually a little bit complex to use a selector here because nothing is really common across these columns in terms of their naming. Columns. And then the next argument is palette. And I have a palette, which is right here. So I will copy that in. And then finally, domain. In this case, I've written that I wanted it to be from 0 to 1 because these are fractional values. So that is the domain there.

So I'm going to run this. And we see that we do have either very light green or dark green values here, depending on how high it is in the range from 0 to 1, or in this case, 0% to 100%.

We do the same thing for the biomass column, just that one column. So this is pretty much going to be another data color column right here. Data color columns is the argument we need. And it's just one column, biomass. And then it's going to be the palette. Palette is right here, snow to this other color, which I'm not very good at reading hexadecimal color values, but I'm assuming that's going to be brown. And then we're going to have a range, which is going to be represented by the domain from 0 to 0.3.

Running that cell, we can see that right here we have the biomass column colored much differently than these columns right here.

So now we have a few more to do. In this case, we want gas to oil to be this palette right here, which we have, which is pretty similar to the other palette. So data color. It's going to have certain columns. And in this case, it's going to be gas, coal, and oil. I'm just going to type those in. Gas, coal, oil. The palette in this case will be this palette. And then we're going to have a domain of 0 to 1. That's a list of two items. There we are. Going to run that cell as well. Okay, and we see it right away. It's these columns right here.

And then finally, we're going to ensure the remaining cells have the snow color applied. They're not going to have any sort of data color. Well, we're actually going to use data color. We're going to use a trick of it. We're not going to move between multiple colors. We're just going to apply the one color. So the remaining columns are going to be zone up to battery discharge. So I'm going to type in data color. Columns, zone to battery discharge. And then in this case, the palette is going to be the same color repeated twice. Snow and snow.

So the reason this works is two colors is what's needed as a minimum, but we're not changing the value. So we don't even need a domain. Essentially, it's going to be the same color for all these columns. Zone up towards battery discharge altogether. I'm going to run that cell. Great. And because snow is so similar to the background, we barely see a difference, but the difference is there upon closer inspection.

Alignment and final polish

So now what we want to do is align all the values in these cells to be centered. It's a stylistic choice. You don't often center numeric values, but in this case we're just going to go for it and do it because we can. So to do that, we'll use calls align. And then the first argument for that is align. And in this case, we want to use center.

And then the second argument will be which columns do you want to center? So there's a cool little trick in pollers where you can select all the columns by using kl.call and then in quotes and asterisks. So this just means that every column we'll have will be targeted here. So let's run that. And we see from the table that we do have every value centered, including these text values, which doesn't look so good. So why should we tweak that? Well, as a separate thing, we'll use calls align again. And maybe we'll just move it back. So it'll be calls align. Align will be left. And columns in this case will be zone. Great. So running that, we see that zone is back to the left, but everything else is centered because we just did a tweak on one column.

Now for this table, I want to use, again, the humanist font stack. And to do that, we use the opt table font method. So I'm going to type that in. Opt table font. There's an argument for that called stack. And we just have to provide the name, humanist, as a string. And in doing so and running the cell, we see that the font did change. It has quite a bit of a different look compared to before. And that's really what we wanted for this table.

As a final thing, we're going to make quite a few more tweaks with tab options. As I mentioned before in the slides, there's a huge number of options for changing many parts of the table all at once. You can, for example, change the font sizes for different locations of the table. In this case, the source notes, I'll make that small, extra small in this case. Background colors you can change for different parts of the table. And you can change the table font size that way as well, and reducing it quite a bit. So upon running this, we get a much smaller table where everything's a little bit more condensed. But we still see that we have lots of color within the table. It looks pretty much like the table we have in the slides.

I'll run this one more time. You can see the entire table right here, which does match what we have in the slides. So that's pretty much what I wanted to show in this code along for the third table, which is the power generation table.

Closing thoughts

So now I want to implore you to go on and make your own excellent tables with the knowledge you got from this workshop. Just make tables that inspire, tables that matter. And with Great Tables, you can always ask questions from us. You can get our Discord that we mentioned earlier. And show us your tables, ask questions, get some feedback on what you made. And I'm pretty sure that by no time at all, you'll have one or more tables that you can be really proud of.

Just make tables that inspire, tables that matter.

And to work with tables, you need datasets. Great Tables comes with quite a few of them. There's 16 right inside here. There's also Tidy's Tuesday's catalog of data. They have tons and tons of datasets. Just look for r4datascience in GitHub. Kaggle has a huge amount of datasets that are open. And Plotly also has a number of datasets which are useful within tables. And I just want to end it there. Thank you.