Great Tables 2: Introducing Units Notation

This workshop is all about using Great Tables to make beautiful tables for publication and display purposes. We believe that effective tables have these things in common: structuring that aids in the reading of the table well-formatted values, fitting expectations for the field of study styling that reduces time to insight and improves aesthetics These materials are for you if: • you have some experience with data analysis in Python • you often create reporting that involves summarizations of data • you were often frustrated with making tables for display purposes outside of Python • you found beautiful-looking tables in the wild and wondered: 'How could I do that?' Other videos in this series: Great Tables 1: Structure, Format, and Style: https://youtu.be/QM7DbsY-nc4 Great Tables 3: Data Color and Polishing https://youtu.be/Huteb5OmcrA About us: Michael Chow, Senior Software Engineer, Posit Michael is a data scientist and software engineer. He has programmed in Python for well over a decade, and he obtained a PhD in cognitive psychology from Princeton University. His interests include statistical methods, skill acquisition, and human memory. Richard Iannone, Senior Software Engineer, Posit Richard is a software engineer and table enthusiast. He's been vigorously working on making display tables easier to create/display in Python. And generally Rich enjoys creating open source packages so that people can great things in their own work. Workshop repo: https://github.com/rich-iannone/great-tables-mini-workshop?tab=readme-ov-file Learn more at https://posit-dev.github.io/great-tables/articles/intro.html

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So in this second table, we're going to go through a few things, and it's going to introduce a pretty large concept called units notation. The table we're going to make is going to be full of units, but let's show you why we use units at all. It's because tables are often full of values, and we need to know what those values actually signify. So it's common to provide measurement units of values within the table.

And we see this both in tables and also in plots. You see the name of the axis and also the units that the values are represented as. A table is really no different. So in this table, we have running down this column of the column called units. We have basically inline units, copies of per milliliter. We have micromoles per liter. These are somewhat complex to write in general.

So we make it a bit easier with units notation, especially this right here. We have things like exponentiated values like minus two or two or even degrees n and degrees w. Not so easy to type in, but units notation really helps with that sort of thing. And this is another type of units where it's inline and we have inverse unit values with minus one. And we see that quite a bit. So this would be joules per mole per K. And we have a way to do that quite easily with units notation.

Understanding units notation syntax

So in Great Tables , units notation, it helps you create inline units. So let's try quickly to understand this specialized syntax. First of all, we need to identify what is in units notation. We do that with the double curly braces on each side. So it's sort of bracketing what are the units. And then the different spaces separate the different units, which is part of the whole unit, essentially.

And then each of the different parts is formatted. And we see right here that we have a name of the unit and then we have the exponent part. And so centimeters cubed, we use the caret three and it gets rendered as centimeters cubed. Same goes with inverse seconds. We have the name, which is S and then like a caret minus one to create that superscripted part.

If you want more guidance on units in general, you can check out this NIST website, physics.nist.gov. There's a checklist on expressing units there. Inside the documentation for Great Tables, we have a basic guide to units notation. And of course, it's a table and we have several rules for how to build up a unit or set of units using this notation.

And we have a number of example inputs which sort of show the rule. And then we see the output, essentially, which is the formatted unit itself. So quite a few things to gather here. We have things like ways to italicize and bold the text. We have ways to insert different glyphs, which are hard to type in, like per mil, for instance. We can show chemistry by surrounding with percent signs. And we have different ways of showing a subscript and a superscript typeset in different ways.

So we can experiment with units notation. Inside the package, we just have to import from Great Tables the define units function. And if you play with that in Notebook, we can just use define units and then put a piece of text in units notation. And in the output, you'll have a formatted HTML representation of the units. So we're going to try that in the Notebook a bit later.

Formatting methods for science

But for now, we'll talk about formatting for science with format units and the format scientific methods. So we have a huge collection of format methods, and they all begin with fmt underscore. We will use two of them here, format units. And these can take in chemical formula text between percent signs and properly format those. So for instance, this H202 will just become H202 where the two values are subscripted.

Format scientific is really good for large or really small values, and it puts them in scientific notation, which is what you often see within lots of places where you represent small or very large numbers. And the cool thing it does is that if the number is in the range of like 1 to 10, it doesn't include a times 10 to the 0 part because the convention is to avoid null exponents. So that is just essentially rounded.

Another thing we can do is in a table, we sometimes have missing values. They show up as none most of the time. But we can replace that with different bits of text. So we can use a submissing method. It has a columns argument, and by default, it will target all the columns in a table. So if you just use submissing by itself on the example table, which has lots of missing values, it will replace any none text with a long dash, which is called mdash. And you can specify some other bit of text that you want as well.

Like, for instance, say you want the missing text to be missing, the word missing. That can be done, and you see within this table that anything that was none is now has the text missing.

Another thing you can do, which is great, if you want to hide certain columns from the final output, there's actually a method for that called calls hide. So why would you even do this? Maybe sometimes in a table you might use an expression which takes a value from a column, but you don't necessarily want to show that column because it's not very suitable for display. You can hide those columns and use the calls hide method for that. And in this case, you just provide a single column or a list of columns to not show in the final table.

Styling options

Also within Great Tables, we have a large number of opt underscore methods, and these give you an opportunity to quickly style the table in different ways. So we can style the table very quickly by using opt stylize, even without any arguments. It'll take this pretty boring default table and apply a theme throughout. So now we notice if we use this, we have some blue. We have some blue lines and we have blue background color in the stub.

And there's lots of options within that method. There's like six style options and also six color treatments. So any combination of those will give you a radically different looking table. So you can experiment with those.

And another thing you do is you can change the font for the entire table really quickly with the opt table font method. In this case, we're taking the input table, which applies a default font, but we can change that to Times New Roman. And in doing so, we see that every bit of text on the table is changed to that typeface.

Aside from that, you could apply a stack, which is a theme set of fonts that work well across systems. So with opt table font, we could use the stack argument. And there's a number of keywords that work with that. In this case, I'm using humanist. And what it does is it just chooses like a family of fonts which work well across different systems. So in this case, it provides a humanist look. If you know all about typefaces, you'll know a bit about this.

But the idea here is that you just choose a keyword, and it works across different types of OSs and different platforms. So there's many different stack types, and these are just a number of keywords. So you can choose monospace fonts. You can choose things which are script or handwritten. And there's lots of information about these in the help docs for system underscore fonts.

Sometimes what you want to do a lot is change the amount of space that a table takes up. And to do that, you usually want to change the padding in a table. So there's actually two methods for this. One is called opt vertical padding. And this takes a scale argument. Anything below 1 means you're shrinking the table or sort of like reducing the amount of vertical space. If you go above 1, anywhere between 1 and 3, you're actually increasing the height of the table because you're actually increasing the vertical padding.

There's another one. So basically the analog of this is opt horizontal padding. In this case, you're either compressing the table or expanding it in the horizontal direction. So in this case, I'm taking this table right here in the bottom left, applying a pretty large scale value, and what it's doing is it's expanding the table both left and right.

Code along: building the reactions table

Okay, so let's try this out right now. We'll go to our code along and we will actually start using this notation. So I'm going to move to VS Code. I have the file I want to modify right here. And basically we're going to run through making that reactions table. So the first thing we have to do is to import quite a few things.

We're going to import polars and the polar selectors in order to do column selection. We're of course going to import Great Tables, the MD helper function for markdown conversion, and from the Great Tables data module, we're going to import the reactions dataset.

Okay, so the first thing we're going to do is something like modifying reactions, that dataset, because frankly it's quite huge. We're going to cut that down to a much smaller selection of compounds, just the ones where the compound type is mercaptan. It's a pandas data frame initially. We're going to convert that to a polars data frame by using PL from pandas. We're going to select only a few columns because there's many more columns than this. You can see right away we're using the column selector from polars. Anything that ends with K298, which are the reaction rates, that's going to be included as the set of columns initially to work with.

And then we're going to just do a mutation of the compound formula column and surround the text with these percent signs. And we'll show you exactly why we do that a little bit later. So let's run this. And as we run it, we get a printout of the polars data frame right here inside our notebook, which is great. So we only have a few columns and not that many rows. We have 11 rows in this table, which is great.

So the first thing we got to do is get that data into Great Tables. Plus we'll make a stub. So we're going to use the GT class constructor to get that going. Our new table is called reactions mini. And we want a stub. So we're going to use row name call and then flag the column that we want to use as the row names. In this case, it would be a compound name. So I have it. So I assign it and then I print it out and we'll see the table on the right hand side. And we do see here. So this is our starting off table. We're just getting all the data into the GT API. And we're going to run different methods to make this a nice looking table.

So to start making that a nice looking table, we're going to add a title to explain what is in this table. So we're going to do it like this. We're going to call up the first table we made and just run a method on it. So it's going to be tab header. And then title is going to be MDE because we're going to make this markdown text. Gas phase reactions of selected. And in this case, we're using markdown. So we're going to do this for captain compounds.

So markdown with MD to signal to Great Tables that this is markdown text. And we did include a bold word in the middle. So I'm going to run this. And we can see right here, we have the title in a revised table. And we're captain is indeed in bold, thanks to using MD.

Okay. And now we're going to do another thing. We're going to add a spanner above these columns right here, all the ones that end with K298. Okay. These are the reaction rates. So our label over that is going to be reaction rate constant 298K and then a line break. And then we're going to have units notation right here in these double curly braces.

Okay. So I'm going to copy this right here. And then what I'll do is I'll make a spanner like so. The tab spanner method is what you need. And the first argument it takes is the label argument. And we just paste in what we have there. And then the second argument, which is really essential, is which columns do we span over? Okay. So in this case, we need the columns argument. And we want to select all those columns which end with K298. So we'll actually use Polars to do that, the Polars column selector that we imported. So it's going to be CS ends with K298.

Okay. So now it's going to select all these columns right here. It's going to put that label above it. So I'm going to run that. Great. So now we see that spanner over top. And we see that, indeed, we have units notation because we have nice formatting on this little bit right here, which is the units for these values.

Great. But the problem is we still have these ugly-looking column labels which remain. I mean, they're great in a data table, but they're not so great in a presentation table. So we're going to change all that. So we can do that with the callsLabel method, callsLabel, like so. Okay. Now I've got quite a bunch of these. I've got quite a few of these. So I'm going to copy that over from somewhere else. I have a Scratchpad open just so you don't see me painfully typing in all these little column names and plus their labels.

So there they all are. A cool thing I'm doing is I want 03 not to be together, like a letter O and a 3 right next to it. I want it to be like an O with a subscript 3 to be Ozone. So I'm going to actually use units notation just to create that column label. And I'll do that with double curly braces like this. And the same thing happens with NO3. I'm going to make that to be having a subscript 3.

So I'm running that. I do see I have OH, 03, NO3, and CL. And it wasn't too hard to do. I just had to identify which columns and what the new label is. Another thing is that I just got rid of the label for a compound formula. Sometimes you can just do that. If the column is obvious as to what it is, you can just omit the label by using empty string. So empty string is totally fine. You can omit columns that way. They won't be seen.

Great. So that is done. Now we have all these values here in the body of the table. And we want to format them nicely. So we're going to do a thing with the chemical formulas. What I had to do initially was to surround these values with percent signs. The reason that is is because when you use format units, the percent sign encapsulates the chemistry notation part of units. So we just have that there. So if we do that, format units will correctly interpret that as something that is unit notation first, obviously, because that's what we're doing on this column, but also something that's in chemistry notation, which is a subset of unit notation. So let's just choose columns. And we know the column name. It is compound formula.

Great. So running this cell will show me a table that has this column full of chemical formulas, nicely formatted, which is great.

So another bit of formatting we have to do on the rest of these values is scientific notation formatting. So I'm going to do that in our next code cell. So in that case, it's just going to be format scientific. And again, we choose columns to format. In this case, it's the same selector as before, which is ends with, as a polar selector. And it's going to be K298. This comes in really handy if you have systematic column names. You can reuse these selectors many times. It's really quite useful.

Great. So I have that now. Now I have values in scientific notation. If they weren't in scientific notation, they'd be really small values. These are like 10 to the minus 11, so much better in scientific notation. And that's really usually what you see.

So we have a lot of none values here. These are essentially missing values in the table. We really don't want to see none written many times. We can choose something better than that to replace it. And to do that, we use submissing. Okay. And what we're doing here is we're going to select the columns, which just basically end in K298 as before, like so. Okay, if I run this without anything else, the default replacement text will be this M dash, a long dash essentially. Yeah, so we see that all the way down. It could be a nicer way to flag a missing value compared to just having none.

But one thing also I see in this table is that we have an 03 column with no data in it whatsoever. This may be fine, but I think in this particular table, I'd rather just get rid of that column altogether. And we can do that inside the Great Tables API. And to do it, we would just have to use the calls hide method. Great. And we just specify the single column that we want to hide. So in columns, it's just going to be 03 underscore K298 because I remembered what column name that was. And then I'm going to run that cell and it's just gone.

So we're hiding it at the very end, right, essentially. We didn't have to hide it. We didn't have to omit that column initially when we worked with the table using our pollers data frame manipulation methods. We just did it right here because you can change your mind and just unhide it or not hide it at all, which is great.

We just did it right here because you can change your mind and just unhide it or not hide it at all, which is great.

And now we want to move ahead to some styling because I believe that the table looks pretty good in terms of formatting and structuring. So the last few things that remain are a little bit of styling for this table. So let's do that with opt stylize. We're going to use a number of opt methods to do some quick styling. So opt stylize gives us a few styling themes and we can just use it as it is. So I'll run the cell and there it is.

But within opt stylize, there's actually a number of different styles from one to six, they're numbered. And we also have a number of colors that you can use. And these are named colors. The best way to find out what these are is to look at the opt stylize documentation either through the API reference. You can sort of get it if you hover over it. You can see what is in the color part of the docs here. So it seems to be like blue, cyan, pink, green, red, and gray. So by default it's blue and that's what we see here. So it looks pretty good to me.

What doesn't look good to me is a default font. So let's change that. We can change the font for the entire table with the opt table font method. Okay, so we're going to do that. And just as an extra thing, we don't have to choose a font that exists on our system. We have a little extra thing called system fonts like that. And that's a helper function that lets us use a set of themed stacks of fonts. Okay, so they take a different name depending on which one you use. You can think of this as like a font theme that works across different systems. The only thing we have to do is import that helper function. So from Great Tables import system fonts.

Great. So there's a number of different names here. We have things like system UI, which is the default, transitional, old style. Humanist is what I'm using. It just shows that it's taking this font family right here. So it's pretty resilient in moving across systems. Or say you're running on a Linux system, it should work, provide you something very similar. If you're on a phone, again, this should give you a font which is of that certain theme. So I'm going to run this. And we see that the font did change. The table width changed a little bit because the font is a little bit thinner. But we see we no longer have that default font in our table. It's all been changed for every bit of text on the table.

Okay. And speaking of spacing, if you find that the table is a bit too scrunched in or there's not enough space for it to breathe, you can change that. And there's two opt methods for that. We have opt horizontal padding, which is what we'll use here. And it takes a scale argument. And what you do with that argument is you just change the amount of space. If you have a positive or if you have a value above one, it increases the amount of vertical space or padding on the left or right sides of each cell. So let's actually pump that up to three. And we'll check out the effect of that. There we go. So you can see the table is a bit wider than before. And we just have a little more space between the neighboring columns for the values. So they're not so encroached on each other.

And that is basically the entire table here.

Featured software#

Great Tables