
Rich Iannone | Introducing the gt package | RStudio (2019)
With the gt package, anyone can make great-looking display tables. Though the package is still early in development, you can do some really great things with it right now! I'll walk through a few examples that touch upon the more common table-making use cases. These will include features like adding table parts, integrating footnotes, styling/transforming table cells, using tables in R Markdown documents, and even including gt tables in email messages. VIEW MATERIALS https://github.com/rich-iannone/presentations/tree/master/2019_01-19-rstudio_conf_gt About the Author Rich Iannone My background is in programming, data analysis, and data visualization. Much of my current work involves a combination of data acquisition, statistical programming, tools development, and visualizing the results. I love creating software that helps people accomplish things. I regularly update several R package projects (all available on GitHub). One such package is called DiagrammeR and it's great for creating network graphs and performing analyses on the graphs. One of the big draws for open-source development is the collaboration that comes with the process. I encourage anyone interested to ask questions, make recommendations, or even help out if so inclined!
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you, yeah, this is a new package, it's only been released last December, made public. We've actually been working on it since March. It's not just me, it's going to be also Barrett, who's over there, and George Heng, huge contributions from them to make this possible.
So this is my name, making gt is my game apparently, and I'm available on Twitter, email, GitHub, all the places.
So the gt package lets you build display tables with easy-to-use functions. Now the key was easy. We wanted to make these functions easy to use and fall into a pit of success, really. They can be HTML, LaTeX, that's how I pronounce that, and RTF. Now the first time I saw RTF, it was a real WTF. So I'm hoping you never have to do that yourself, generate RTF code, it's actually just hell.
So the package is available on GitHub. It's going to be a while until we get it on CRAN, we just really want to get the internals right.
So the basic workflow is you've got a table data, this can be a table or a data frame, and you introduce the gt, and it becomes a gt object, and you can use all the functions inside gt. There's lots to modify the table. What you get in the end is a gt table, and that, again, is HTML, LaTeX, or RTF. That can be done in R Markdown, it can be done in the console, wherever.
Table features and parts
When we first set out to make this package, we thought, what are some useful features in display tables? What do we bring into the table, as it were? First thing is kind of obvious, a table header, it's nice to have a title and a subtitle. It's fantastic. It sort of identifies the table, describes it.
Row labels. So here's a cool thing. We've got cars, and we have these countries where the cars came from. So we basically have groups of rows, we'll call them row groups. So it allows you to sort of structure your rows, essentially.
This is probably a bit more familiar. It's column labels, just having column spanners over certain columns. So here we can see performance going over top of, you know, a few other columns.
A really big thing is data formatting. We want to make it super easy to have your data transformed to any way you want. This is actually a gt table. So as you can see here, the data, of course, doesn't come in this way. You format yourself, and there's easy ways to target different cells and to make the formatting easy.
And this was a big one. Footnotes. I love footnotes. The idea you can put little notes, you can annotate your table. A real nightmare sometimes when making a table is, like, getting them in the right order. So gt always puts them in the right order. We don't have to worry what order you specify the footnotes. Gt will handle it. So you can see right here, footnotes are kind of small to see, so I called them out. It's always going to be left to right, top down in terms of numbering. And you can see the notes at the bottom, they're also in order. So you don't have to worry about order when you're doing footnotes in gt.
So gt always puts them in the right order. We don't have to worry what order you specify the footnotes. Gt will handle it.
So in order to make this work really well, we need to define a structure for a table. Some sort of, like, structure that will solve 90% of your table needs. So we call them table parts.
So the most basic table is this. It's very familiar. You've got column labels. You've got a table body. The table body has cells. It's arranged in columns and rows. Very, very familiar. It's the most basic form of table.
But here's a cool thing. A stub. It's a little known term in table technology. It's essentially like row names. But we can do better than row names, like in data frames. We can have a stub that defines, we can have labels for each row. Again, it's not always needed, but it can be useful in certain circumstances. Like if rows are representing some sort of thing that you want to call out.
And because we can conceivably have groups of rows, it's nice. We can have space there for a row group label that extends out into the table body, which is really quite nice.
And a feature of GT is creating summary rows. So if you have summary rows, you can have summary labels. You can identify what are in the summary cells. It can be, like, an average, a range, what have you. It's nice to describe what that is. Otherwise you wouldn't really know.
So at the top, we have a table header. It's a great place to add a title and a subtitle. The best place, really. And the bottom is the footer area. That's where you throw in the footnotes, as many as you want, and source notes. So it's like a special type of note that just... You can use it for all sorts of things, but typically use it to just, you know, say the source of the data or other things. Other little notes that are not really labeled.
Functions and formatting
So a question you might be asking is, do we have the functions to get these display tables made? Yes. We totally do. We have lots of functions. That's why it took so long to get this package out. We were just writing functions like crazy.
So the main one is GT. That's how you introduce your table data to GT. So it's great. You have table data. Again, table or data frame. Just pipe that into GT, and you got that object. And just right there, it looks pretty good. I'll show you some demos. The back half of this talk is like a live coding situation inside RStudio, which is always risky, but I like to live dangerously.
It may be the case where your table data is huge. It's not really summarized. But luckily we have tools like tidyverse. We can summarize our data, do cool things to it, reshape it. And then when it's ready, then you bring it to GT for the final touches, as it were.
So we have lots of families of functions. I'm going to just introduce you to two families. We have these tab underscore functions. And these are just concerned with adding parts to a table. So they're all targeting a different element. So a tab header is routing a table header. Spanner is routing spanner columns. And tab footnote is routing footnotes. As many as you want.
We have a series of formatter functions. They all begin with FNT underscore. And these are great for making the data appear as you want them to appear. It's kind of like an Excel where you're selecting a bunch of cells and applying a format. It's the same sort of idea here.
Formatting numbers is really simple. Use format number. Here's a before and after sort of shot of, like, the data. So we have this column right here. It's got some data. It's to three decimal places. And we can use format number. Just put in the argument decimals equals one. And yeah. We have it to one decimal place. And even throws in a comma for you. Which you can disable as well.
So GT knows about currencies. And actually locales, too. It's crazy. So it can do all sorts of currencies. You just provide a currency code. Which is easy to look up. And we actually have a function that gives you a table of currency codes. And yeah. Past use currency. Format currency. And then that before column becomes a set of currency code. A set of currencies right here. And the locale here just ensures that the formatting is correct for the locale here using French and France.
So there may be some NAs in your data. And they appear in GT as NAs. Those can be replaced. You just use format missing. Choose some replacement text. In the default case, we just use an em dash. And it looks pretty good in the presentation table. But you can choose any sort of text that you want and target any sort of NAs that are applicable for that text.
There's tons more functions. There's just no time to even mention them here. I got to move quickly. Because I really want to get into the sort of live coding bit. Which is going to happen in a sec.
Live demo in RStudio
Tons of demos I want to show you. The GT package actually comes with six example datasets. Country pops, SEA, GT cars, SP500, pizza place, and Exible. We're just going to take a look at some simple examples with the Exible dataset. It's an example table. It's only eight rows. But it's great for making little small tables just the right size.
And we'll use the pizza place dataset. So we're going to summarize that a bit. Because it's a huge dataset of 50,000 rows. Take that, use a bunch of functions on it. Get inline HTML. And this is like the secret sauce you need to send an email. You need to have the chunk of HTML to send it out.
So let's head on over to RStudio, which I have here.
So we have two packages we need. GT and tidyverse. Can everybody see this? Especially in the back. Do I have to pump up the font?
So let's have a look at Exible. It's, again, a dataset. It's just eight rows. So the features here is that we can use these different columns just to experiment with the API in GT. And importantly, we have these two categorical columns, row and group. We can use those to actually create row labels and actually create the groups, like row groups.
So let's look at Exible just piped to GT by itself, how it looks. I mean, not bad. You know, you have the table as it looks there. Some row striping. It's reasonable. It's HTML. But here's the cool thing. We can create a stub with row labels and include row groups just by having a few arguments to GT being used. We're just passing in column names here.
So let's try this. See what we get. There we go. So here's a stub right here. And here's the row groups. So group A, group B, and different rows inside those groups.
So now we're going to do a bunch of, like, little function tests here. And we can sort of play around because we're in a live coding environment. So here's a function test. Great. Format scientific. So if you want to do some scientific notation. This is nice. All you have to do is supply the column name. We wrap that in bars. So we have three places, in this case, three. And so this is going to act on the num column. Let's have a look and see how its scientific notation looks. It's a bit nicer, right? It's got the times 10 to the power of, as you can see here. So it's better than that E notation you get by default.
So we can also format dates in this date column right here. So we have dates as ISO formatted dates right here. They're just as, like, character text, essentially. So here's a really cool thing. We can specify the column date, but we can also specify rows and provide a condition. In this case, we're looking at the column char, and there's a number of different fruits here. So we're saying for the columns in char, sorry, the rows in char that begin with A to D, those are the cells we're going to format right here in date. And we're using DateStyle6. I'll show you about DateStyles in a second here. So we'll just see how that works. Yeah. See, there we go. So it's isolated to those two cells. So we can target certain cells just by using a combination of columns and rows.
So DateStyle6, how do we know that it's actually a good DateStyle? We can use these info functions that give you a table that allow you to, like, sort of see what the different styles are. So there we go. It's an info table, and right here, this is number six. That's what we've used, and that's how this date would be formatted. That's great. We've got a ton of other info tables, too, like InfoCurrencies, because, you know, who knows all the currency codes? InfoTimeStyles, because we might want to format times. And here's a really, really cool one. We can use colors inside GT to color cells with the data color function, and use as palettes. But this is a great way to discover all these palettes, and essentially, this is from the Paletteer package, which is fully supported in here. So tons and tons of, like, colors you can explore just inside GT.
So we can get rid of some columns, too, if you find that we don't need them, just by using this calls hide function. So let's try that. And see what that looks like. Now we have a narrower table, for sure. Just hid those columns. Because they're hid, they're still usable in these expressions that rely on other columns. So it just means they're not being displayed in the end, but they're still usable for all sorts of things.
So we have something called currency. It's not really as currencies right now. We can format it to be currencies. So we can use the format currency function here. So we're gonna specify this column, provide the currency code, and this will become currencies in pounds. There we go.
Now, here's my favorite thing. Adding footnotes, as I said before. Uses the tab footnote function. It's a bit more complicated, because you're targeting potentially anywhere in the table to put a footnote. So we have this locations argument, and it takes a number of location helper functions. So we have cells underscore data, cells underscore column labels, title, all sorts of things like that. It just allows you to target where you want to put the footnote, and it attaches it to the end of the text. So in this case, we're using cells data, because we want to target some data. And the interface is that we have columns and rows, and it's just like the formatters in this case. We can provide an expression, provide some column name. And so it's gonna attach to currencies where the value is less than 20. So just because these are formatted doesn't mean the value is still not there. It's an immutable table inside the actual object, so we can still use the values. So anything less than 20, we'll get this footnote. These are lower prices at the bottom.
We're gonna add one more footnote to the currency column label itself. We're gonna attach it right to the column label. Oftentimes you want to do that, so we'll use cells column labels, and just provide the column name. And here's a really cool thing. Provide this second, but it's gonna auto-order the footnotes. It's become one here, and two for all these other ones. It changes. It doesn't matter what order we present these footnotes in. It sort of just listens to you and handles that in the end.
One thing I neglected is to add a title to the table and a subtitle, so I'm going to do that just for this. I want to explain that you can throw in markdown in here, just use the MD helper, and then we have this rendered as code, and this will be in bold right there. So let's have a look at that. Great.
There's all sorts of other options sort of stowed away in tab options. It's huge. You can see the popup text. It's truncated because there's, like, millions of arguments. But basically you can change almost all aspects of the table. You can style things. One thing you can change is the footnote glyph. If you're tired of, like, numbers, they can just be letters. So let's try that. Really hard to see because footnotes are tiny, but these became A and B. Because we care, this can be letters or letters like that.
So it's a whole PIV success thing. So trying to make it easy. Although this is not easy. You have to know what this is. But I can see this is going to be a vignette one day describing what these options are. There's lots of them. But essentially this one allows you to color the background of the column labels blue. Now let's see what happens there. Turn blue. But the cool thing is the text turned white. It does that for you because it wants to give you the maximum contrast. So it does that with SCSS in the background. If it doesn't do it with that, it does it with some code. So it's trying to give you the best looking table we possibly can.
gt in R Markdown and email
And because we're in a track that cares about R Markdown, here's the whole thing in R Markdown. I'm just going to knit it and we'll see how it looks. All the chunks have been processed. Here it is in HTML. All the chunks are there. And here's all the tables as HTML. Again, you can do this as PDF. Just change the output type as RTF if you dare. And it's all there.
Now, the final thing. Sending an email. I don't know. Getting an email could be kind of cool. It's got to be the right type of email. If I had an email with a table in it that was useful, that would be kind of cool.
So let's load a bunch of packages here that are concerned with that sort of thing. And we're just going to set this up. I'm just going to gloss over this as much as I can, because it's not important. It's just getting sort of stuff for the email body itself.
So let's look at pizza place. It's one of my favorite datasets in this package. 50,000 rows. It's like a year of pizza sales, essentially. Every transaction in that year of a fictional pizza place. So we've got this huge data frame or table. If you were to render this in GT, it would be massive. It would just be a long table. Nobody wants that. So let's do some dplyr. I'm not going to go through the details. But if you just run this code up to here, we'll see that the end table becomes a more manageable 14 rows, which looks great for a GT table.
So I'm going to pass that right into GT. And there it is. We'll do a few more things. And I'll show you what that looks like. Right here. Great. Fantastic. Very nice. You can't put an emoji in. That's very fun.
And here's a cool thing. As raw HTML, that makes it mailable. So we can just create this email. We'll preview it right inside RStudio. And then I'm going to flip over to my private email. This is a... I won't show you my embarrassing emails, but I'll show you just this one message. Here it is. It came across. The emoji is there. The pizza slice is there. All the colors are there. That's an email I want to see. This is important stuff. Pizza sales.
That's an email I want to see. This is important stuff. Pizza sales.
So that's about it. And yeah. If you want more information, go to the repo. It's github.com RStudio slash GT. This very presentation is in my personal GitHub, if you want to see that. And all the data is there. And otherwise, thanks. Thank you very much.
Q&A
We've got time for some questions. Do you want to turn around? Let's see how good your catching is.
My name is Klaus. So this was great. My question is more of a request. I think you'd make a lot of analysts eternally grateful if you could add Excel output.
Yes. I've been asked for various things, Excel, Word, plain text. These are all in the pipeline. For now, we really just want to concentrate on getting parity between the formats I specified, HTML, LaTeX, and RTF. Because right now, certain things are ignored between them, and we want to get style parity between them. The whole idea is that you write it once, and hopefully, depending on your context, it just works. But essentially, it should be compatible, portable across the types. But once that's done, in the next few months, we're going to focus on more output types. Even things like SVG natively. It's just we have high hopes for this.
Hi. My name is Kamal. I was just wondering, can we merge two cells? Is that a functionality in there?
It's there. Yeah. You can do that. There's a function that I haven't covered. And just one more. You can take a table format and apply that to a table. There's a format function, which you can use to make your own function out of. But essentially, it's very flexible. I'm going to make a vignette out of that. But essentially, what you do is provide your own code for all the different contexts, which is default, HTML, RTF, as many as you want. And depending on the context you're in, it will format like that. You just use a default as well. But yeah, you can totally do that. It basically just relies on a vector. Some vector processing, essentially.

