Resources

How to name files - Jennifer Bryan

Low-tech common sense about filenames. The holy trinity is: +machine readable +human readable +sorted in a useful way More at https://github.com/jennybc/how-to-name-files https://normconf.com/

Dec 4, 2022
5 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

I'm here to talk about an exceedingly low tech, very norm core topic, which is how to name files. My name is Jenny Bryan. I'm a software engineer at Posit, formerly known as RStudio. I work on a team that maintains a bunch of open source R packages known as the Tidyverse.

While I was procrastinating on preparing this talk, I watched the latest Jason Bourne movie, which actually has this great little bit of filename porn within it. These filenames aren't exactly my style, but I have to say there's a lot of good stuff going on here with these filenames.

My overall goal is to inspire you to have a filename convention, even if it differs from mine. It will eliminate a lot of irritating micro decisions and make your filenames easier to compute on downstream.

Here's a bit of filename before versus after. Up top, the before filenames are problematic because they're generic or mysterious or they contain challenging characters, whereas the after filenames down below are easy to read and easy to compute on.

You want a system that gives you filenames that are machine readable, human readable, and sorted in a way that's useful to you.

You want a system that gives you filenames that are machine readable, human readable, and sorted in a way that's useful to you.

Machine readable filenames

The most basic thing you might want to do is use globbing to isolate files that match a simple pattern. A more sophisticated operation that comes up is the need to extract data from a filename using a regular expression. I'm showing a bit of R code that turns a bunch of filenames into a data frame, and I assume you can do this in your preferred language.

I use the underscore to delimit fields and the hyphen to separate words within fields. I think it has to be this way because a very common field is a date and a hyphen is how we separate year from month from day.

The overall guiding principle is to make things easy for future you. So a machine readable filename is easy to target with glob patterns and regular expressions, and it's going to have a very intentional use of delimiters.

Human readable filenames

Let's say you drop in on a project at 3 a.m. before a deadline. Which set of filenames are you hoping to see? I think it's the ones on the right because they embrace the slug. We're going to borrow this term from the world of the web, which has the idea that a good URL indicates something about the page content.

So human readable filenames make it easy for humans to infer what the content or the purpose of a file is based on its name.

Sorting and dates

Here are two examples of useful sort order, logical and chronological. On the left are files from a small data analysis pipeline that basically sort in order of execution. On the right are some ad hoc housekeeping scripts where I record the date I did a little piece of work.

I want to put in a plug for left padding numbers so that you don't end up in the sad situation you see up top where the last script in a pipeline appears before the first script. Left padding is going to fix this problem for you.

I live in Canada now but I was born in the U.S. so I'm going to apologize on behalf of that nation for the horrible things that Americans do with dates. In any context that is remotely data oriented, all dates should follow the ISO 8601 standard where we have year, month, day. That is all.

So by default, you know you're going to see your files in alpha numeric order. You should just resign yourself to that and plan accordingly. So we want to see dates or a left padded number at the beginning followed by a highly informative slug.

I hope this helps you develop a file naming scheme that works for you. These slides and some other resources are available at the short link you see here and I look forward to seeing more of you on Twitter or GitHub or Mastodon. Thanks and bye bye.