Making Things Nice in Python (Rich Iannone, Posit) | posit::conf(2025)

Transcript#

This transcript was generated automatically and may contain errors.

Well, I think users generally like nice things in their packages. But what is nice? Here are a few things I want users to think when experiencing niceties in a package. One is, the maintainer understands that different users can have different needs. Next one is, the package made something otherwise cumbersome pretty easy to deal with. Learning how to use the package was a snap, thanks to the great docs. Lastly, the provided datasets are convenient, makes it easy to try out the package.

Now I'll provide examples on how I added nice things to two Python packages, namely pointblank and Great Tables . So we'll start with the first one, where the maintainer understands that different users can have different needs.

Supporting many table types in pointblank

First off, let's get a quick introduction to pointblank. It's a package for validating tables, like do we have duplicate rows? Are cum values in the defined range? We can find this out. We define a validation plan, we can interrogate the data table, and you get a nice report in the end.

But let's look at line 3 of this pointblank validation code. Now this is the part I really wanted to make nice, this data parameter. Many other libraries that accept a table, they tend to support only one or a few types of tables. It gets kind of restrictive. The problem with this is that if a user uses different types of tables that aren't supported, they are generally less likely to use the package.

I really wanted to, just like the R version of pointblank, support many types of tables. This includes a variety of data frames and a plethora of database tables. Luckily, I didn't really have to write too much custom code to support each of these table types, because there are actually two Python libraries that could help make bring your own table pretty painless.

The implementation uses two nice libraries, narwhals and ibis. Narwhals can handle all sorts of data frames like polars, pandas, and spark, and ibis handles a huge number of other types of tables, like many database tables. By incorporating these libraries into pointblank, it was actually a cinch to support all the tables that narwhals and ibis can work with.

Supporting all sorts of tables meant that we had to build two internal processing pipelines, one for narwhals and the other for ibis. This doubled the amount of code to maintain, but it still felt like a win, because it was so nice for the user. We can see in this diagram that the user just has to provide a data table, and the program will handle it in one of two code paths. The way it was set up ensured that the user will see the correct results no matter what type of table was used, and crucially, the user will get output artifacts in the same format as the input.

Not very long ago, narwhals gained the ability to directly use ibis tables. This was amazing, since it meant that pointblank, with a little work, would no longer need branching code for both narwhals and ibis. It just goes to show that narwhals is itself a very nice Python package.

Anyway, we jumped at the opportunity to simplify our code, and this is not the only way that narwhals got nicer. Narwhals gained the ability to use spark dataframes, so pointblank no longer needed a sparkdf to be an ibis table. For a spark user, this is so much nicer. No need to transform the dataframe before using it in pointblank. In the future, we can go even further by removing the user's need to prepare ibis tables for other table types.

Making column selection easier in Great Tables

Let's now get to know the Great Tables package. As the name might imply, it's a package for making tables. One could say it's a package for creating beautiful tables for display purposes. On the left is a chunk of Great Tables code. On the right is a very nice-looking table. It's the kind of table that would look great in a Quarto document or in a Shiny app.

Because we are using a dataframe to create a table for display, we easily run into the problem where dealing with lots of table columns can be a bit tedious for the user. Some tables just have lots of columns, and with the way that Great Tables works, we have to often declare column names in order to format values within them, or to style cells.

We can always use lists of column names with the Great Tables API. Here's an example where we are trying to add a column spanner above four different columns. Notice that the user has to write out in full each of the column names. It's a lot of typing. Of course, this is problematic, and we could and should make the experience nicer for users of the package.

The solution to this problem was to incorporate the use of Polars column selectors with the Great Tables package. Polars, if you don't know it, is a very nice and performant dataframe library. The column selectors that they offer are very similar to our TySelect select helper functions. You might have used those if you worked with Tyverse packages like dplyr .

Now, compare the verbose code block with all the column names with the code in the bottom. Using a selector to reference the common piece of text across all four column names makes for a shorter chunk of Great Tables code, and to me, it's quite a bit more readable, owing to its brevity. Using Polar selectors in Great Tables is what I would call extra nice.

Using Polar selectors in Great Tables is what I would call extra nice.

If you are coming from GT , which is the R version of Great Tables, this is similar to the TySelect helpers used in that package. Let's compare some bits of TySelect with Polar selectors. As you can see in this comparison table, much of the nomenclature is the same. Sure, everything becomes all in Polars, and by name does the work of two TySelect helper functions, but by and large, these separate implementations of selectors largely use the same naming.

Having selectors available can encourage the use of smart column naming. Let's have a look at the small table of column name motifs and how Great Tables code with column selectors can take advantage of certain styles of naming.

If you have columns that end with underscore PCT, they are all percentage values, and you'd probably want to format them as such. Using format percent along with the ends with column selector, we can perform percentage formatting all in one statement. Maybe you have columns with count values. One naming convention you can hold to would be prefixing those column names with end underscore. The Great Tables formatter format integer could be used with the starts with column selector. Now all those count columns can be formatted.

Specific names like lat and lon for latitude and longitude could get the format number treatment. Just use the by name select helper to programmatically pass in those columns. Styling in Great Tables can also take advantage of Polar's column selectors. Columns that begin with ID underscore are understood as ID values. The tapStyle method requires locations as a parameter, and if you want to target body cells, you can use locBody. That itself has a columns parameter, and we can pass in starts with along with the ID underscore text fragment. All your ID columns will then be styled. In this example, we set the font as monospace, but you can do anything you want.

I just want to conclude this whole section, section 2, by saying that having to enter a ton of column names is not nice. Using column selectors? Very nice. On top of that, having some consistency with the R version of Great Tables, and I'm talking about GT, lowers the switching cost if you're coming from R and then going to Python.

And lastly, a strong set of nice qualities will lead to stronger credibility for the project and likely better discovery.

This concludes the talk. I hope that some of the information here is useful to your own Python package work. Thanks for watching.

Making Things Nice in Python (Rich Iannone, Posit) | posit::conf(2025)

Transcript#

Supporting many table types in pointblank

Making column selection easier in Great Tables

Making documentation great

Adding datasets to a Python package

Closing notes

Featured software#

Great Tables

pointblank