
Making Things Nice in Python (Rich Iannone, Posit) | posit::conf(2025)
Making Things Nice in Python Speaker(s): Rich Iannone Abstract: When working on the Great Tables and Pointblank Python packages, we've tried to make them 'nice'. These packages give you a lot of convenient options, and a large volume of docs and examples. In the Python world, this might be received differently than it would be in R. Whether it was integrating Polars selectors in Great Tables or accepting a multitude of DataFrames and DB tables in Pointblank, these design choices can be seen as surprising things to established Python developers. However, I argue it's good to be doing this! People are benefitting from these approaches. I'll share a few of these developer stories with the takeaway being that Python packages could and should pay attention to good user experience. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Well, I think users generally like nice things in their packages. But what is nice? Here are a few things I want users to think when experiencing niceties in a package. One is, the maintainer understands that different users can have different needs. Next one is, the package made something otherwise cumbersome pretty easy to deal with. Learning how to use the package was a snap, thanks to the great docs. Lastly, the provided datasets are convenient, makes it easy to try out the package.
Now I'll provide examples on how I added nice things to two Python packages, namely pointblank and Great Tables. So we'll start with the first one, where the maintainer understands that different users can have different needs.
Supporting many table types in pointblank
First off, let's get a quick introduction to pointblank. It's a package for validating tables, like do we have duplicate rows? Are cum values in the defined range? We can find this out. We define a validation plan, we can interrogate the data table, and you get a nice report in the end.
But let's look at line 3 of this pointblank validation code. Now this is the part I really wanted to make nice, this data parameter. Many other libraries that accept a table, they tend to support only one or a few types of tables. It gets kind of restrictive. The problem with this is that if a user uses different types of tables that aren't supported, they are generally less likely to use the package.
I really wanted to, just like the R version of pointblank, support many types of tables. This includes a variety of data frames and a plethora of database tables. Luckily, I didn't really have to write too much custom code to support each of these table types, because there are actually two Python libraries that could help make bring your own table pretty painless.
The implementation uses two nice libraries, narwhals and ibis. Narwhals can handle all sorts of data frames like polars, pandas, and spark, and ibis handles a huge number of other types of tables, like many database tables. By incorporating these libraries into pointblank, it was actually a cinch to support all the tables that narwhals and ibis can work with.
Supporting all sorts of tables meant that we had to build two internal processing pipelines, one for narwhals and the other for ibis. This doubled the amount of code to maintain, but it still felt like a win, because it was so nice for the user. We can see in this diagram that the user just has to provide a data table, and the program will handle it in one of two code paths. The way it was set up ensured that the user will see the correct results no matter what type of table was used, and crucially, the user will get output artifacts in the same format as the input.
Not very long ago, narwhals gained the ability to directly use ibis tables. This was amazing, since it meant that pointblank, with a little work, would no longer need branching code for both narwhals and ibis. It just goes to show that narwhals is itself a very nice Python package.
Anyway, we jumped at the opportunity to simplify our code, and this is not the only way that narwhals got nicer. Narwhals gained the ability to use spark dataframes, so pointblank no longer needed a sparkdf to be an ibis table. For a spark user, this is so much nicer. No need to transform the dataframe before using it in pointblank. In the future, we can go even further by removing the user's need to prepare ibis tables for other table types.
Making column selection easier in Great Tables
Let's now get to know the Great Tables package. As the name might imply, it's a package for making tables. One could say it's a package for creating beautiful tables for display purposes. On the left is a chunk of Great Tables code. On the right is a very nice-looking table. It's the kind of table that would look great in a Quarto document or in a Shiny app.
Because we are using a dataframe to create a table for display, we easily run into the problem where dealing with lots of table columns can be a bit tedious for the user. Some tables just have lots of columns, and with the way that Great Tables works, we have to often declare column names in order to format values within them, or to style cells.
We can always use lists of column names with the Great Tables API. Here's an example where we are trying to add a column spanner above four different columns. Notice that the user has to write out in full each of the column names. It's a lot of typing. Of course, this is problematic, and we could and should make the experience nicer for users of the package.
The solution to this problem was to incorporate the use of Polars column selectors with the Great Tables package. Polars, if you don't know it, is a very nice and performant dataframe library. The column selectors that they offer are very similar to our TySelect select helper functions. You might have used those if you worked with Tyverse packages like dplyr.
Now, compare the verbose code block with all the column names with the code in the bottom. Using a selector to reference the common piece of text across all four column names makes for a shorter chunk of Great Tables code, and to me, it's quite a bit more readable, owing to its brevity. Using Polar selectors in Great Tables is what I would call extra nice.
Using Polar selectors in Great Tables is what I would call extra nice.
If you are coming from GT, which is the R version of Great Tables, this is similar to the TySelect helpers used in that package. Let's compare some bits of TySelect with Polar selectors. As you can see in this comparison table, much of the nomenclature is the same. Sure, everything becomes all in Polars, and by name does the work of two TySelect helper functions, but by and large, these separate implementations of selectors largely use the same naming.
Having selectors available can encourage the use of smart column naming. Let's have a look at the small table of column name motifs and how Great Tables code with column selectors can take advantage of certain styles of naming.
If you have columns that end with underscore PCT, they are all percentage values, and you'd probably want to format them as such. Using format percent along with the ends with column selector, we can perform percentage formatting all in one statement. Maybe you have columns with count values. One naming convention you can hold to would be prefixing those column names with end underscore. The Great Tables formatter format integer could be used with the starts with column selector. Now all those count columns can be formatted.
Specific names like lat and lon for latitude and longitude could get the format number treatment. Just use the by name select helper to programmatically pass in those columns. Styling in Great Tables can also take advantage of Polar's column selectors. Columns that begin with ID underscore are understood as ID values. The tapStyle method requires locations as a parameter, and if you want to target body cells, you can use locBody. That itself has a columns parameter, and we can pass in starts with along with the ID underscore text fragment. All your ID columns will then be styled. In this example, we set the font as monospace, but you can do anything you want.
I just want to conclude this whole section, section 2, by saying that having to enter a ton of column names is not nice. Using column selectors? Very nice. On top of that, having some consistency with the R version of Great Tables, and I'm talking about GT, lowers the switching cost if you're coming from R and then going to Python.
Making documentation great
The next section, section 3, is all about making the documentation great in the project website. This is a nice thing to do because it may elicit the response, learning how to use the package was a snap thanks to the great docs.
This slice of nice is all about making a nice project website. The challenge with the pointblank API was the bigness of it all. It has 7 classes, 41 methods, and 25 functions. Not exactly a small package, to say the least. So making the documentation nice for first-time users and returning users is essential. And docs are totally my bag, so I was fired up to make them nice anyway.
I wanted to make four really nice corners of the project website. They are the user guide, the examples gallery, the API reference, and the Point blog. And I had goals for each of these sections. For the user guide, I really wanted to have a treasure trove of useful information. And having tons of info is useless unless you have instructional chops. We used a spiral sequence to introduce large topics and then cover bits of those topics in subsequent sections. Finally, links from the user guide to the appropriate parts of the API reference were deemed essential.
We wanted to build out an examples gallery with tons of examples that are easy to understand, aren't text-heavy, and cover a lot of ground. The API reference had to be nicer than nice. I wanted it to be comprehensive, have wonderful examples, and I wanted each article to be easy to navigate. Finally, the Point blog. It's really a blog, and the goals were to keep things interesting, post often, and have it provide insights that the rest of the docs don't really have.
We continuously improved the pointblank user guide. It can start off being small, but now has 26 pages. As I mentioned before, we embraced the spiral sequence. What that does is to balance introductory breadth with subsequent depth. Put another way, what we do in practice is introduce parts of a large concept and then break it down in small sections.
As the name of the game with user guides is continuous improvement, we kept improving on the quality of the examples provided. The examples typically consist of code, output, and usually both. We tried to place the example early in each new section. To make it really nice, we show you the actual output you'll see in your own environment.
The pointblank website has a gallery of examples. We now have 25 examples, and they cover a wide range of use cases. To make the examples gallery the best it can be, in other words, really, really nice, we followed a particular style. First, we kept exposition down to a minimum. The code and output was the focus of each example page. Secondly, we flipped the example around and presented the output first, and then the code needed to produce that output. This is nice because you start each example page with an interesting visual.
Next, we wanted to reduce any sort of example noise, so any code statements that didn't serve a purpose for teaching the main concept of the example were eliminated. We did provide code comments in the example code. They're usually off to the side, and they are mainly for clarifying the code. We also provided a preview table of the input data table, and that's at the bottom of each example page and initially hidden.
The API documentation part of the website has been overhauled to add more structure. We were able to make quite a few improvements over the defaults. The first is a visual indicator that shows us whether we're looking at documentation for a class, a method, or a function. The summary line just below that gives readers quick info on what the class, method, or function does. The usage information is displayed next, and under that, we put one or more description paragraphs underneath. Any further information is typically provided as special sections further down in the doc. The parameter section, since it has very important information, is clearly marked with an outline. The example section has even more prominent styling. This is helpful for getting a sense of place, as one might scroll quickly toward the bottom of the doc where the example section is placed.
pointblank has a blog on the package website, and actually Great Tables has a blog on its own package website. Blogs for a single package are nice, and since we use Quarto and Quarto doc for the package website, adding a blog is very easy. We put up these blogs to keep people informed of new ideas. It's also useful just as a place to talk about the subject that the package is trying to address. So in the Point blog, we've covered topics in data validation, and in the Great Tables blog, we've talked more generally about tables at certain points.
Adding datasets to a Python package
Well, that's all I wanted to say about making documentation great for a package website. Let's move on to the fourth section, and the topic there is adding lots of datasets to a Python package.
Great Tables has a lot of datasets. Here they all are. There are 16 datasets in total, and each one has a decorative medallion. Those serve to make the datasets nice and fun.
Having lots of datasets in Great Tables means we can provide great examples with them. For example, if I wanted to show a super text-heavy table, I would use the Peeps dataset. If the demonstration involves adding graphics to the table, then I would likely use the Metro dataset. I use the formatImage method to get those SVG graphics into the table cells. If I wanted to show users how to make nanoplots, I would probably reach for the Illness dataset. And as a final example on how datasets make for nice examples, I like the Films dataset when showing tables with country flag icons within them.
From what I've seen, not too many Python packages include data. Nevertheless, I still think it's nice to have them. And there's no real convention for datasets in Python packages, both in the including part and in the consuming part.
In terms of dataset access, we did things a little differently in Great Tables and pointblank. Both are nice, I just want to point them both out in case you want to try including datasets in your own Python package. In Great Tables, we expose a data module. The user would then import any dataset needed. This is easy for the user and great for the docs as each dataset would get its own documentation page.
In pointblank, it's a little different. That package provides the loadDataset function. You can choose from multiple datasets and output table types with that function. Again, there's no real conventions for datasets in Python packages, but both these approaches are nice enough.
Providing a selection of datasets in a package is good for users in other ways. One way it helps a lot is when experimenting with the API. It's something of a pain to create your own data frames on account of all the typing you have to do for what is essentially throwaway experimentation code. A good pattern instead is to draw from a set of datasets and pick one close to your immediate needs. You might trim it down a bit before testing it out, but it's a good pattern.
With Great Tables, a user might use the ExhibitDataset to test out formatting methods. It's small and the rendered table will easily fit on a screen. If you want to make a table that takes advantage of categorical value columns, you might use the PizzaPlaceDataset. It's a big dataset, representing an entire year of pizza sales from a pizza place. For that reason, the user might just want one slice of it. As a final example of a dataset that's nice and useful for the user, the PhotolysisDataset is great if you want to experiment with nanoplots.
Well, that's really everything I wanted to say about datasets. They are good to have and really add a lot to a Python package.
Closing notes
I'm all done with the examples on how I added niceties to two Python packages. Let's move on to some closing notes right now. Having lots of nice things in a Python package will probably be appreciated by your users. As for myself, I'll keep on improving the packages I work on. There are real benefits to making packages nice.
There can be lowered friction for the user, where many small conveniences can go a long way. Users may get productive faster. Whenever you can reduce boilerplate, people will get meaningful results sooner. If you have nice docs and datasets, you might not have to address as many issues. Things will just be more clear for the user. And lastly, a strong set of nice qualities will lead to stronger credibility for the project and likely better discovery. After all, the package will be easier to showcase and easier for others to recommend.
And lastly, a strong set of nice qualities will lead to stronger credibility for the project and likely better discovery.
This concludes the talk. I hope that some of the information here is useful to your own Python package work. Thanks for watching.

