Resources

Build Captivating Display Tables in Python With Great Tables | Real Python Podcast #214

Do you need help making data tables in Python look interesting and attractive? How can you create beautiful display-ready tables as easily as charts and graphs in Python? This week on the show, we speak with Richard Iannone and Michael Chow from Posit about the Great Tables Python library. Links from the show: https://realpython.com/podcasts/rpp/214/ Michael and Richard discuss the design philosophy and history behind creating display tables. We dig into the grammar of tables, the background of the project, and an ingenious way to build a collection of examples for a library. We briefly cover how Richard and Michael started contributing to open source. We also discuss practicing data skills with challenges and resources like Tidy Tuesday. This episode is sponsored by Mailtrap. Topics: - 00:00:00 -- Introduction - 00:02:00 -- Michael's background in open source - 00:04:07 -- Rich's background in open source - 00:05:27 -- Advice for someone starting out - 00:08:55 -- What do you mean by the term "display" table - 00:11:32 -- What components were missing from other tables? - 00:13:31 -- Using examples to explain features - 00:16:09 -- Why was there an absence of this functionality in Python? - 00:19:35 -- A progressive approach and the grammar of tables - 00:21:26 -- Sponsor: Mailtrap - 00:22:01 -- The design philosophy of great tables - 00:25:31 -- Nanoplots, spark lines, and column spanners - 00:27:06 -- Building a gallery of examples - 00:28:56 -- Heat mapping cells and automatically adjusting text color - 00:32:54 -- Output formats for the tables - 00:34:46 -- Building in accessibility - 00:36:55 -- Dependencies - 00:37:42 -- What is the common workflow? - 00:41:39 -- Video Course Spotlight - 00:43:15 -- Adding graphics - 00:46:41 -- Using a table contest to get examples - 00:49:47 -- quartodoc and documenting the project - 00:55:00 -- Tidy Tuesday and data science community - 01:00:29 -- What are you excited about in the world of Python? - 01:03:46 -- What do you want to learn next? - 01:08:05 -- How can people follow the work you do online? - 01:09:57 -- Thanks and goodbye Links from the show: https://realpython.com/podcasts/rpp/214/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to the Real Python Podcast. This is Episode 214. Do you need help making data tables in Python look interesting and attractive? How can you create beautiful, display-ready tables as easy as charts and graphs in Python? This week on the show, we speak with Rich Ione and Michael Chow from Posit about the Great Tables Python Library. Michael and Rich discuss the design philosophy and history behind creating display tables. We dig into the grammar of tables, the background of the project, and an ingenious way to build a collection of examples for a library. We briefly cover how Rich and Michael started contributing to open source. And we also discuss practicing data skills with challenges and resources like Tidy Tuesday.

I'm excited today to have a couple people from Posit come and visit me. We've been doing a few additional data science-focused things lately, and I'm excited to have Michael and Rich from Posit on the show today to talk about Great Tables. So, welcome to the show.

Thank you. Yo, thanks for having us.

Getting into open source

Yeah, awesome. So, one of the things I've been trying to do lately is talk a little bit about how people got not necessarily involved in programming, but maybe how they got involved in open source, creating projects like this. Maybe we can start with you, Michael. Do you have a background in open source before working on Great Tables?

Yeah, that's a great question. So, I guess it all started in grad school. So, I did a degree in cognitive psychology. So, I was using R a lot for research and Python a lot to do some neuroscience stuff. And I think that it sort of started me, like a lot of people, with interacting with a lot of open source. And then sort of as I went, I would say I built a lot of really bad open source tools that never really went somewhere. And then from there, I joined a company called Datacamp, where I was able to actually like kind of put the tools out that we were using. And then I think from there, working on some other tools for open source. So, I worked on a tool for data wrangling called Suba. And I think that's where it kind of like, I went from like fledgling open source stuff to more kind of serious. So, I think it sort of started at grad school, and then like crappy open source to more kind of reasonable-ish tools.

What were things that defined the change for you? Like, you felt like you graduated when you started doing Suba. What was different about it? I think open source really helped in the beginning. It was more like learning that I was like building open source stuff. And that let me kind of like show my friends and try out like putting things into the world.

Yeah. But I think I needed more like professional experience with software engineering. And so it's kind of like over time, it just got a little bit better until I felt a little more kind of ready to put stuff out. But I think it started with like research and then interacting a lot with open source. And then just doing a bad job a lot until it kind of started to feel okay to share with people.

That sounds good. Yeah. Awesome. Rich, what is your background in open source?

For me, it happened a bit after grad school. And it's wild. Like I started with R just because I heard R a lot. It was mentioned by people around me. He did this in R. She did it in R. Like, you should learn R. Okay, fine. I'll get to learning R. Okay. And then like it was great. I started using it for work just after grad school. I got a real job. It made things automatable. Things I had to do on the daily were just a bit easier. Now I developed some personal tooling. I was like, wow, I'm going to just release this stuff on GitHub and see where it goes. Nobody's going to care. But eventually, maybe some people cared. But I got into more general stuff after that instead of domain specific things. And then I started interacting with people. Then I super caught the open source bug.

Yeah. When I sort of made more broad, useful things, then people started to interact and made PRs and a lot of communication going. It just felt good. And I just wanted to keep going with that. And so I did more of that. So it kind of snowballed from basic, timid, you know, personal projects to more generalized, accessible projects for everybody.

Yeah. Do you feel like, and this is maybe for both of you, do you feel like that's an optimal way to get into open source to do the thing of dabbling and trying to send out requests to different projects or building your own? Do you have suggestions for people as far as the kind of path that might be easier to, I don't necessarily easier, but a path in to open source?

Well, for me, it was pretty low pressure. It wasn't my actual job, so no one was telling me what to do. So it was like a personal thing. And so it felt leisurely in a weird way. And I thought that was great. Like there's no pressure, there's no work pressure to do this stuff. It's totally optional. So that did the trick for me. How about you, Michael?

Yeah. I mean, I think for getting started, it's just so valuable contributing through like issues or PRs, but I actually think just choosing an open source place and kind of trying to be useful in whatever way you feel most comfortable is really, it's good for everyone. Like the, I feel like when you stick around, open source people notice and are really happy.

And you get kind of like some practice and often guidance. Like we, we have a really great contributor right now and Great Tables, Jerry, who just like, he answers GitHub discussions and he responds to issues. And I feel like we're so team Jerry, like when Jerry opens PRs, like I want Jerry to be really successful because he's been so helpful. And I think that's a really nice way to get involved is even if you're just responding to issues or discussions, I think it really builds relationships with the maintainers so that when you want to do other things, like I would take time out of my day to do anything that would be useful to Jerry.

And I think that that's, that's the kind of exciting opportunity that open source is, you can kind of start with issues and through relationships move to even like mentorship and guidance and new stuff.

I think for getting started, it's just so valuable contributing through like issues or PRs, but I actually think just choosing an open source place and kind of trying to be useful in whatever way you feel most comfortable is really, it's good for everyone. I feel like when you stick around, open source people notice and are really happy.

That's cool. Do you feel like it's helped your career also, like kind of, you know, finding positions and things like that? Like if there was an open spot, you, you say, Oh, maybe Jerry or, you know, something like that. Is that a common occurrence that you've seen?

I've certainly seen it. Yeah. And even my own sort of story is, is like that, like contributing open source leads to like, you know, eventually getting a position cause you get good enough and you get known enough to, to be in the running for, for a job.

But for others, yeah, I totally seen it where people just contribute and like they're in there, they're talking on social media as well. Well, that really helps a lot because you know, they're known.

Yeah. I do think it helps too, that if you're working on open source, yeah, you do kind of have to explain like, what did I do and why is it useful for people?

And I think that if you're a software engineer and what you're building is more proprietary, you do almost have the same challenge for your career. Like you do need to explain to people, like what did I do and why was it useful, but it can be a lot harder to kind of talk about it. You kind of have to find creative ways to talk about it sometimes. So I think it helps having open source where it's just kind of laid out and you can point to it and kind of talk about it.

What is a display table?

Yeah, that's cool. Awesome. Well, I brought you here to talk about Great Tables. I had seen the project and I was like, this is super interesting. Like this idea that so many people focus on data visualizations as the primary way to communicate what's being created. But to me very often, the amount of information that you could present in a table is just like dramatic. And I think your examples just really show it off. And so maybe we can start with this. What do you mean by the term display table?

Oh yeah. Yeah, it's tough. It's tough. It's tough to choose a word, like a word before table, because tables can be a lot of things. But I think we stuck on display table because it's the sort of table you see published like in a journal or see sports analytics, like a standings table, for instance. Yeah. It's hard to get that across just from data table, something you manipulate. This is like a kind of thing for presentation. Yeah. And it's different than other types of tables you might see on my pages in that it doesn't have like hundreds or thousands of rows you just page through. And it's also not like a raw data frame, like as in a console. Yeah. So yeah, it's tough. There's not really great names for this sort of thing, but we went display table. It sort of fits and gets a point across. It's basically something you want to gussy up and show people, present. It's for presentation.

So like if you were thinking about like a common thing in a presentation. Yeah. This idea that everybody's been in the presentation where there's a 5,000 bullets on a slide. It's like, no, this is the opposite of that. This is the detailed information in a tabular format. Yeah. And it's trying to get the point across as quickly as possible. Exactly. It might be like like a summary table, not raw data. So it's like really distilled. You're trying to get something across pretty quickly, but you're doing it in table form and you can still do that. You don't have to have a giant wall of numbers. You could, you know, you could make it parsable more easily. You can give it, you can narrow down the scope. You can give it like annotations and things like that. Yeah. Yeah. So that's really what it is. It's more like a, like a taking the best parts of plots and putting them into tables.

You mentioned briefly there, like the idea that this might be something you'd see, you know, in a print form or web form, potentially since print, there's not as many magazines and newspapers as there used to be, but you know, definitely those kinds of things that you would see this in. Yeah. There's so many features I definitely want to dig into there, but you mentioned some of them. Yeah. You're trying to go beyond this just data frame, right? This idea of this. Exactly. Just columns and rows and maybe headers. What were some of the things that you felt were like, okay, these are, these are missing as far as like doing this job of being a display table or the components and things that you felt like, oh, this definitely needs to be there.

I definitely think it's interesting because yeah, like plot the word display table support. Cause the word plot is almost like cheating. Like, you know, with a plot, you know, with a plot that you're communicating. Yeah. And to your question of like, what, what's missing. I think it's tricky because table, we think of like a big rectangle, like you mentioned, right. But it's almost like the question, potentially a spreadsheet kind of thing. Yeah. But it's almost like the question, like when we're displaying it, when we're using a table as a visualization, what does it need? And I think a lot of the really beautiful parts are sort of like information hierarchy, being able to drive attention to the right pieces at the right time.

And so like a plot, oftentimes plots have titles and that's, that's probably the biggest, one of the biggest pieces of the information hierarchy is that you're drawn right away to the title. It's really big. And it says, if you have two seconds, this is what you're seeing. So I think it's funny, like that might not technically seem like part of a table to people, but for display tabling, like plotting, it's like really critical.

And then having all those other trappings, like subtitles, footers, ways to kind of break up, like what's a row of the table and groupings. Okay. I think those really big structural elements are important because they really break things up and make it easy to kind of like group and move your attention right from kind of big things to like details. So I think structure is one really huge piece of tables that's often missing. Yeah. That kind of guides attention.

Yeah. I feel like these are skills, maybe a higher order Excel person who is involved in making presentations would learn through creating sub-level headers and sections and things like that. And in our space of data science, we typically aren't moving to a tool that does both of those jobs that is communicating the information. It's just the tools have been data frames, you know? And so the idea of like, okay, well, I want to like have this subheader, like I'm looking at this one about the New York air quality measurements, and it's got the subheading of like time and measurements. And then this nice, nice underlining indicating these are the columns that are indicated below it and kind of creating a, like you said, drawing your eye to it.

Design philosophy and history

Some of the things that you've added that I think are really impressive are like the coloring inside of cells, if you will, of the table, like that solar zenith angles is just like a piece of art. I love that example. And then a lot of the sports ones definitely have the same kind of vibe where you're having icons inside the columns there. And then also having small like bar graphs and things in there to again, sort of blend tables and plots and graphs to kind of together. I feel like a lot of this stuff has been built for plotting for a long time and graphing libraries. Why do you feel like there was the absence of that in tables? Like were there other attempts at this before your guys' exploration in here with Great Tables?

Yeah, that's a great question. There's almost like two phases I feel like this went through. One is what you mentioned Rich worked on in our library to do a lot of this. And so he'd had like a few years running up to working on Python of just kind of exploring some of that space. Okay. And then there is in the Python space, Python libraries that have tried this. Like I think there's Plottable that works with Matplotlib. Itables that does some more simple interactive tables. And then things like Tabulate that's like a dead simple, like one function that puts out like a simple markdown table.

Rich had worked on kind of the more complex, like how do you create labels on top of your columns and things like that. Yeah. I don't know, Rich, if you have thoughts on like in the R space, what was there when you kind of started to think about how do you handle like structuring, formatting and styling? Yeah. R was a bit different in this space because it actually has lots of table packages. Oh, it has? Okay. Yeah. At the time that we started building like the R implementation in 2018, we kind of upwards of 10, maybe 12. They were different stages of development. Some of them were a little bit abandoned. Some of them, you know, were less cared for. Sure. But still like there was lots there. Right. And I don't know what that speaks to maybe because R is a stats heavy language and it has a native data frame. You just expect to take that and move it into a publishing workflow, which is kind of their LaTeX. And with R Markdown, we had that with other things as well.

At the time we wanted to make something, I don't know, not better, but like in the same spirit of like something that's like a little bit academic where it's like, you know, you can replicate these tables you find in journals, but we want to go beyond because, you know, we're not just dealing with paginated, you know, plots on like tables on a page, you know, we have HTML and we have, you know, like the web. So we want to go beyond that and like try to see where the best ideas taken from like tables in pages and see what we can do with doing the exact same thing on in HTML, for instance. Yeah. And a lot of it just came from like the different, different sort of like buckets of features, which are formatting, structuring, and styling.

I think it's worth noting too, that the R library is called GT, which is short for grammar of tables. So I feel like the grammar part's important. Does that come from the grammar of graphics kind of in the same, you know, rhyming with that? Yeah, a little bit. It's a bit of a riff on that. Yeah. Yeah. Okay. But I, I don't know that, that's one thing that struck me with Great Tables and these tools is that it felt like that idea that there's a grammar, like for all the tables out there in the world, trying to figure out the right kind of way to express them. Yeah. That's, that's like simple, but also flexible and able to kind of hit the different examples. But it's also worth noting GT, GT was not available on PyPI when we began building Great Tables. Yeah. So we had to have a discussion about like, what should this thing be named? And so it's worth noting Great Tables is actually kind of a, a little rewriting of history from, we just renamed the acronym from grammar of tables to great tables. Yeah. As a kind of. It's good marketing too. Yeah. That's positive. You know, it's optimistic. You can make great tables with this. Not adequate tables.

Kind of going into that a little bit, you talk about this idea of the grammar of it. Was that a process then to like, you have this article, which I think is really interesting. The, this is the design philosophy of Great Tables. And you have like this historic research into like, you know, how to design great tables. And you have like this historic research into like, literally like tablets that have, you know, tablature layouts of, of stuff or, you know, table forms and so forth. Yeah. Is that something that you did at the time or kind of came up a little later to say, or was it kind of a mix of things?

It's a mix. I mean, we went back, right, Michael? Like, I mean, maybe you weren't there at the time. With GT.

I mean, with GT, we looked at some old, we wanted to get a lay of the land, not even just like our packages, but stuff done before. It's remarkably hard to find any information, any books on tables. I think just take it for granted. There's like plots. Yeah, sure. There's lots of stuff on plots and graphics, but not so much on tables. It's hard to find any, any writings on it. So we really had to research. And I mean, the hardest part is finding any, any writing on tables at all. I mean, there's like things like, like style guides you find in journals and they have recommendations, but nothing. We were lucky. We found something. I'm not even sure how I found this. I think someone mentioned a long time ago. It's like this large work on tables from the Census Bureau. Oh, okay. And it's called the Manual of Tabular Presentation. You might've seen that maybe in the post. Yeah, it's in the post. Yeah, it's really cool. Yeah, yeah, yeah. And that's like 1949. Yeah, 49, 49. Good year. But it's a great book and it's available, it survives as a scanned PDF that if you just search it up, you can find it. And what a book, like, I mean, like it goes through like nearly almost, I don't know, well over 250 pages of just like table recommendations, what to do, what not to do. And like, I mean, it's, it's focused on census. It's basically a census table style guide, but it can be, it's useful everywhere, I think. I mean, some of the stuff in there is, I mine ideas from that all the time and like, there's so much good stuff in there. And there's nothing else like it. There's no other like book on just like tables.

I think about the history of that because like, you know, that's the dawn of civilization of like, you know, we're going to count the people, partly so we can tax them. Dang. Of course. Yeah.

You know, and then like early forms of accounting, I'm guessing too, right? So, but, but yeah, it would make sense that when suddenly there's a organization of government that's going to count people that they would want to, you know, codify it or whatever.

Oh yeah. Basically way back then, Mesopotamia, they had big city problems and like, you know, like trading issues. It's amazing when you look at those tablets, those like scans and drawings, like recreations, they have all the, all the features that we have now, like, okay. Like sub rows and like headers and footnotes, like notes. It's like, you know, missing, missing cells. It's like pretty wild, actually. Yeah. Yeah. How much they got right.

Favorite features of Great Tables

What are some of the favorite features you have you want to call out that are parts of Great Tables? Hmm. What do you think, Michael? What are your favorite features?

Good question. I honestly, I, I think some of my favorites. So there are some big, they're like some heavy hitters, I think worth mentioning, like nano plots, where you can make like a small bar plot or a line called a sparkline in your table. Yeah. And I think these are really great because they kind of balance the value of a plot, like quick patterns with some of the compactness of tables. So you can have a bunch of lines just indicating like how something doing over time. But I don't, some of my favorite parts have been even the silly parts, like the column spanners. So this is like putting labels over your columns. Yeah. Like when I started looking at working with Rich, I, I realized I hadn't really done a lot of styling. Like, like in grad school, I was like copying data out and using like Excel or Microsoft word to format. Right. But it's even the concept of like putting a label over your columns felt a little bit surprising and foreign to me. And that that's one thing I've really loved is this idea that these are like really small, basic things you might do to make it better. Yeah. And now they're kind of like quick and easy to do. And, and people love it.

Like when we had, you know, like a constrained feature, only one level of spanners, people like, we want more. Come on, come on higher. Yeah. That's not enough to this guy. So like, yeah.

Yeah. I think the aesthetic stuff really jumps out to people too. You know, like if you see a research paper and it's just like raw table after table or page after page and they aren't doing the summarizations or not like leading your eye in the sense in this, but it's hard, right. As a, as a programmer or even a data scientist, you maybe understand what you want to draw attention to, but you don't have the design skill. Yeah. And so do you feel like this is something that would help them quickly get good results? Yeah. One of the first things Michael did is make a gallery for like Great Tables. And like, that's so important to have like good examples upfront. It gets your imagination going and what's possible. Right. And then the codes right there, if you want to see it. Yeah. I think that's super important. And, and the examples we have, we try to make them nice, you know, so we try to like impress people a little bit, but also just let them know what's, what's possible, what you can do.

Yeah. I think, I think one thing that helped with the examples too, is because I am a table barbarian. Like I hadn't, I didn't have four years leading up to it and the US census manual of tables at my disposal. So I think it helped. I was just like pulling out of Rich's brain. The, it's almost like the examples gallery is like me taking notes on neat tables that as Rich is just like flinging tables left and right. So hopefully one thing that makes it nice is it's like, it was like an explainer to me as a total novice to tables. Like what, what's the deal and how deep does the rabbit hole go?

Yeah. It's kind of wild when you have a heat map, it's almost like the colors come first and then the numbers become the annotations, right? You can just pretty much read it like just based on like where it is on the grid. Yeah. Comes like a graphic. And then if you want to know the number, just look at that secondarily, right? It's sort of turns the problem on its head, which is great.

One of the things that is a feature in that is with the, let's say lighter shades of the color that are not maybe as deep, you are using, you know, black for the text that's in the, in the cells. And then for the deeper, darker shades that are more opaque, you're switching to white to kind of highlight that there. So it also makes it stand out. Is that something that someone has to implement themselves or is it something that it does automatically?

Totally Great Tables. That'd be so annoying to implement, right? You'd have to know like color levels, the optimal sort of contrast. It's like, thank goodness, like that's handled.

I think it's a really good example of our dynamic, like how we work together is that like on this library, I approach it a lot like a software engineer. Like I'm kind of here to help batten down the hatches and, and like build out the like guts, but that, that type of thing, like, yeah. Bending over backwards to help a user get the right contrast, like switch the text from dark to light. I don't, in my engineering brain, there's a temptation to be like, oh, that's your problem. Like, like use a, use a function and hand it over to them or like give them the tools and be like the powers in your hands. But, but I don't, I really appreciate that Rich, I think hit the reality of like, there's maybe too much flexibility we might give people if we like task them with that contrast, it might not hit as nice.

Was there a lot of thought that went into that then, Rich? Like the idea is that I don't want to control what you can do, but I want you to have a nice result. Yeah. By default. Yeah. The defaults. Yeah. Certainly you can escape out of that. You can, there's an option just to turn that off and you handle, you know, like the text, we only touch it. We'll just, yeah. But yeah, certainly. And like, there's even been revisions along the way in your implementation to make the contrast better. Right. We started with something that was like, Oh, I forget the standard. It's using the web, but I think there's an upcoming one that's starting to gain currency called APCA. Okay. Which offers even better contrast. There's certain color combinations where like, you just see it and you're like, Oh, that's not the, that's not right. I mean, it's APCA, APCA really handles it nicely. It's like, wow, this is just an improvement over, over that. So yeah, it's definitely something I've cared about a lot and sort of kept up with because I think it's so important, like, you know, to make tables readable no matter what you do.

Output formats and interactivity

I wonder about that on the output side. How are these output? Like, is it, is it output as when you run that code, is it output as a graphic or as a text or like how, how is it output?

Yeah. Basically what we have right now is HTML as like the main output. And we also have like a way to capture tables as images, PNGs, PDFs, even in other image formats. So we've got those two things down. The future, which hopefully is not too far in the future, we're going to work on other open formats like LaTeX, maybe Typed, which is an upcoming one. If that's a good challenger to LaTeX, maybe, maybe not. And also things like Word output tables, like just basically having a table that is XML. Friendly for that. Okay. Exactly. That could just be put into a fresh Word document or possible to embed it in an existing one.

And we do have a hotly, like probably one of the hottest issues is people really want interactive tables, like paginated. So I think in the name of display, we've went hard on static, like really getting it right for publication and print. But people love... Because it's totally missing. Yeah. People really want interactive, like paginated or like easy other, like collapsible things.

The one, the thing I thought you were going to hit at Rich was opening tables in the browser that I was surprised to learn, Great Tables. There's a lot behind tables in accessibility. So like screen readers and things like that.

We, yeah, we collaborate regularly with a blind professor, a computer science professor at University of Illinois. So we did a lot of work on making tables accessible to screen readers. And, you know, when you get to like making tables as, you know, a true table, like a table element, right. Instead of like interactive tables, they typically go to divs, which are maybe less accessible in terms of screen readers. Yeah. But maybe not, maybe they solve that problem. But we got it so that in our collaboration, we got so many of the recommendations for making tables screen reader friendly. Even, so when you introduce complexity, like complex headers with spanners and even row groups, you have to, you have to do a few things in the, you know, HTML code to make that not sound terrible to a screen reader, right. To orient itself. Right. Right. Yeah. So we've all those recommendations. And like, so the feedback we got in the end was, yeah, this is probably the most successful table package available in R. Nice. And so we took all that stuff and we, we wrapped that into like our outputs for Great Tables as well. So that comes along for the ride. So it's a great example of like, you know, learning a lot in a very long development cycle and taking that distilled bit of learning and like hitting the ground running really hard in Python.

Working with pandas and Polars

What would the normal workflow be for a Python data scientist coming in to add Great Tables to their project? You mentioned now three different sort of styles data can kind of come in. You mentioned pandas and NumPy and you mentioned Polars. Does it matter the format and how does it look kind of starting and adding Great Tables to a project like that?

It's really simple. One exciting thing is for Polars v1, which is coming out soon. The style property will return a Great Tables object. So you'll just be able to say like my data frame dot style and get a great table out of it. For things like pandas, it's easy to, you just call, we have a GT class that wraps the data frame and then you can call the Great Tables methods to add things like headers and kind of organize or highlight information.

I do think the most, some of the most exciting things are Polars related. And this is something we learned sort of like early on, but after we'd started developing Great Tables is that Polars has a few really nice features that make it work really well with Great Tables. And these are like, they have kind of like some special functions and interfaces to say, select columns of a table. So if you wanted to like grab a group of columns to put a label over them, it's really easy in Polars to say like, give me every column starting with ABC. Okay. The other things for styling, Polars, these same things, lazy expressions, make it really easy to just tell Great Tables, like I want to make every cell yellow where this value is like equal to something, or I want to make every NA value, I want to fill that cell to be yellow. Okay. And this is kind of just part of Polars design is it has these lazy expressions, but whether or not you use pandas or Polars, all of this is possible.

Yeah, that's right. And we pretty seriously air gapped it early on. So we moved all of our data frame specific stuff into its own module. So that really helped that we didn't have it as spread throughout the code. So it was pretty quick as we rolled Polars support, we sort of had all our data frame stuff concentrated and could kind of slot it in. And that's, I think that's something I've seen in a lot of libraries more recently is I think a lot of libraries are doing more like pandas and Polars support. And so kind of keeping a lot of their data frame stuff in one place.

Yeah. I didn't research this, but what is it written in? I mean, I know you started in R, but what is the core language that Great Tables is written in? Oh, it's Python. It's pretty much all Python. Yeah. Okay. Yeah. I think it's a fair question though. Like in some world we would have said like, oh, it's like a Rust library, you know, but yeah, it's coming so hot right now.

One of the things I was wondering about with, again, some of these really cool examples that you have shown are some of these ones that have to do like sports analytics, which I guess people would have normally maybe seen online or again in a magazine or something like that. This idea of like adding graphics, is that a difficult thing to do inside these tables?

I mean, it's not difficult for the user. I mean, it's a little difficult for us to plan the API for that. I guess it relies on having the graphics on disk or accessible through HTTPS and things like that. And then, you know, referencing what these graphics are in a sort of nice way, right? Okay. So we did a few things to make it so you can sort of use patterns and the like to construct like a path, like a template for a path, I suppose. Yeah. And then we just made it work and we have, you know, sizing options as well. So graphics in a table is always going to be a little bit difficult because cells are small.

Okay. I mean, okay. So Michael mentioned nanoplots. Love it myself. I also just like, as a sort of like a bag of features, like one collection formatting methods. I don't know why I love that so much. It's just like, it's such a simple thing, but it goes so far. Yeah. I mean, just like, just a simple thing, like formatting something to a static one or two decimal places, like numbers. Okay. I mean, that's wonderful. I mean, that's so simple, but it can run deeply. You can, you can have things formatted to specific domains. Yeah. Like financial stuff like that. Yeah, exactly. Yeah, exactly. So, so it fits within expectations of that domain. I think it's great. And I think there's so much you can do because it really just transforming data to another form, which is more palatable for a table, you know, like reader, I guess you can say.

One of the things that you mentioned, Michael, of coming on board is, you know, creating this really great like set of examples. Let's kind of get going. And are there ones that you've wanted to add that you haven't gotten to yet? There's like additional things you're like, oh, I want to show examples of this. Or is it something that's rolled out as features have rolled out?

Yeah, it's a good question. I feel like there's a ton that we want to add. We just did a, well, what is this the third table contest, Rich? How many table contests have there been? This might be the fourth. Okay. Wow. Okay. That sounds fun. It's hard to keep count. So we, yeah, we have this table contest. It's great. Like Posit puts it on. It's a celebration of tables, lots of prizes. You just submit tables and the code. And wow, like people are, we posted on it and people are just, I hope people are agog with like all the tables.

Yeah. Previous contests. Yeah. I think the winners are being announced like within a couple of days too, for this contest. Okay. So we might have it. We should have it then. We should have the link. Okay. But yeah, I mean, in terms of coming out at examples, like even the existence of the table contest confused me. Like I was like, oh, of course you like, you've been running contests where people submit tables for the past three years. Right. Like just classic, classic open source software development. But I think, so with this latest kind of like contest, I think there were, were there around 60 submissions? Yeah. About that. Yeah. And they're so inspiring. Like these, every single time this, this thing rolls around, I'm like, whoa, that's neat. I seen some crazy stuff. And it's, it's so, you know, like I've heard this before, video game designers, like they make the games, but they're not so good at the game sometimes. Like the players are way better. It's kind of like this, like we, we make our, we make tables too, but they're not as good as people in the community. Like speed running and things like that. Yeah, exactly. Yeah. It's exactly that same parallel.

Quarto and documentation

Can we bring up a Quarto doc because we wouldn't have such good examples or a great project website probably without Quarto doc. I mean, not as easily for sure. This is Michael's thing. This is Michael's thing. Yeah. I, I was, I've wanted to get somebody on to talk about that. Uh, generally, um, you're saying, um, Quarto or I'm not going to pronounce it. Yeah. Quarto doc. It's kind of tricky. I think Rich is schmoozing me because I, I maintain a tool called Quarto doc, which generates, um, API documentation for Python. So, so Great Tables, kind of like API reference. It's powered by a tool called Quarto doc. Okay. This is, I get the confusion because there's a tool Quarto doc with it works with a tool called Quarto. Yeah. Yeah. So Quarto is kind of the real star. Um, and it's, it's a tool to make it easy to produce like reports. So if you're doing a data analysis, you know, to generate like an HTML report, it's like maybe going beyond a standard notebook to something a little more presentable, but still interactive. Yeah. And it uses a really nice format that's called QMD. So it's all text-based. Okay. It's like a beefed up markdown setup where you can put in code or markdown. So Great Tables documentation is built with Quarto. Okay. So that's a neat thing. You could create like a single standalone report or like a full fledged website or slides pretty easily.

And I do think it's, it is, it is funny in open source, like lowering the barrier for executing code and having examples in your docs is I think surprisingly impactful. Like, yeah, as it turns out, if you have a little bit of friction, you just don't like adding examples. Right. One kind of nuts case is Great Tables doc strings. So if you look in the methods, there are tons of like really long examples and the examples produce pretty rich outputs. It's a little bit unusual. If you look at tools like pandas or Polars or things that are using Sphinx for documentation, or just the standard, what's it called? The built-in Python doc format where you use kind of like three greater than signs. The dots. Oh yeah. The examples tend to be pretty short because you hard code the outputs of your examples. Right. In your doc string. To be able to do doc test on it. Right. Yeah. Right. Yeah. Doc test. But that, that creates a kind of funny cycle where now as you're writing docs, you're like, oh, I have to create an example small enough to hard code the output. Yeah. Okay. If you think of a table library, that's a bananas task force because we have to output a table. And so it's nice with, with Quarto we're just executing large blocks of code. So we're able to give people pretty rich doc strings with nice long examples. And that's been a really nice kind of cycle to get into.

Lowering the barrier for executing code and having examples in your docs is I think surprisingly impactful. Like, yeah, as it turns out, if you have a little bit of friction, you just don't like adding examples.

Tidy Tuesday

I wanted to build on top of the, you know, kind of the contest thing. I feel like this is semi-related and I haven't had anybody on to talk about it, but I wanted to talk about Tidy Tuesday a little bit. I'm sorry. Is that right? Yeah. Yeah. Yeah. And we were talking right before we began and you said that you have some experience and have been involved in it somewhat. And so maybe you can tell a little bit about what it is and how you're involved with it.

I mean, I've heard about it a long time ago. I used to follow like, used to be a thing, like it's not so much now, but in R we had a lot of like social media activity on Twitter. Many through the R stats hashtag. And then also the, I think also the Tidy Tuesday hashtag, I don't know, but it used to be like full of stuff. Like my feed would just come, oh my God, filled with like examples of the Tidy Tuesday. I would know what the Tidy Tuesday dataset was because I had to see tons of examples on the dataset of the week. It was wild to see that. That was an example of overload, but it was always inspiring. So I'm like, and I think I owe a lot of debt to like care, for examples from Tidy Tuesday, like seeing all this stuff is really, like I said, inspiring. So I want to inspire people, obviously in my own works. So like spirit of Tidy Tuesday is wrapped up in GT and Great Tables.

Yeah. I use it a lot for mentorship, which has been really nice because I think that there's this funny thing where like, if you watch certain tutorials, sometimes things are like a little bit too clean. And I think it's really good for mentorship. Yeah. You're like, wow, this really worked out really sweet. Like this path was nice.

It's not very real world at all. It's the whole like 80, 20 thing. It's like 80% of your time is just like getting this ready to be able to like start working with it. Right. And I do think those are valuable in their own right to kind of get people started. But I think Tidy Tuesday has been really nice to like, be like, let's see what will go wrong. And let's see what happens when you hit an error and you have no idea what caused it or like your CSV won't read for mysterious reasons. And I feel like it's been really good for data science mentorship to not only show people that and have them try it, but to hit it myself and be like, hmm, just going, I'm like, I don't know why this broke or like reshaping data, just like working through it. And then like figuring out what did I do that kind of like, how did I even approach this problem?

Yeah. So there's, there's a really great data scientist, Dave Robinson. He does this all in R, but he used to record himself once a week for an hour analyzing Tidy Tuesday data. And I found it so inspirational. It was one of the big inspirations for working on Suba and tools for data wrangling that I got to watch him live, work through analyses and see how quickly he could do things. And also, I'll see how he edits his code because when people work, they like, they go back, they adjust code, you know, they move things. Right. Right. Yeah. And that's really hard to capture when you write something out. And so just seeing like, what is this first five minutes look like? How does he get to a plot quickly? And what would I have to do to kind of like keep pace with him?

So I've really enjoyed those aspects. The more like in the weeds, rough, unclean, kind of like realistic analytic settings. Yeah. I think it feels like this is, you know, the data community's answer to like code challenges and things like that. You know, here's something to puzzle through like what the advent of code and things like that in that vein.

Yeah. It has amazing steam power. And Dave has a lot of this stuff on YouTube, right? Pretty much