Jeroen Janssens - Package Your Python Code as a CLI | PyData London 25

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to this tutorial where we're going to examine how you can package or transform or convert your Python code into a command line tool. My name is Jeroen. A lot of the material today is based on a book I wrote a couple of years ago, Data Science at the Command Line, which you can read for free. A couple of months ago, another book that I wrote came out together with Thijs. What's good to mention is that we're giving away 25 signed hard copies on Sunday. If you want to participate in the raffle, you can go to this URL or scan the QR code.

For the next 90 minutes, we're not going to be focusing on Polars. We're going to be focusing on that other technology which has been around for a bit longer, the command line. I'm not going to explain why you should use the command line. The fact that you're here pretty much shows that you are already convinced that the command line is a wonderful way of working, very complementary to using an IDE or a programming language.

One of the beautiful things about the command line is that you can very easily extend it. You can create your own tools. There are already a lot of tools available. Even if you just install a Linux distribution or if you have Mac OS, there are, I'd say, about 100 tools already at your disposal. Then there are thousands more tools available to you. New tools are being written every day. The good thing is that it's actually not that hard to write your own. That's what we're going to be looking at today.

I really believe that if you're able to create your own tools, that they'll eventually, or actually immediately, make you more efficient and productive and effective. Because the command line has been around for so long, because it's been around for over 50 years, we can safely say that it's going to be around for the rest of your career. And there will be many, many tools, many programming languages will come and go. Plenty of JavaScript frameworks. But there's this one technology which is going to be sticking around for a very long time, and that is the command line.

But there's this one technology which is going to be sticking around for a very long time, and that is the command line.

And you know, that is the beauty of the command line, is this flexibility to solve a big puzzle, right, by combining all these pieces together in various ways.

Has anyone ever tried reading and writing to the same file in one pipeline? Well, you can't do it, so you shouldn't. If you do that, what happens is that the last process that is supposed to write to a file, to the file that you're also reading from, immediately opens it for writing, and thereby truncating the file. So the first process, which might be the same one, starts with an empty file, and as a result, you just delete your data. But there is a solution. You can either write to a different file first, and then rename that, of course, or there is this tool called Sponge, which is not installed by default, but Homebrew has it, and it's also part of more utils.

Break

All right. I think that's enough time to think about streams. We're also halfway this tutorial, so I suggest that we take a quick break. You know, today's actually my birthday. To celebrate this, I brought Stroopwafels from the Netherlands. And I hope I have enough Stroopwafels. But if you would like one, you're very welcome to come over here and grab one. So let's take a five-minute break, stretch your legs, go to the restroom, whatever, eat a Stroopwafel, and then we'll reconvene.

Six steps to turn a Python script into a CLI tool

So, so far we've talked mainly about concepts. And that's actually what I care most about. Syntax itself is not that exciting. Those are things you can look up or even let some LLM do for you. What I really want you to take home are the underlying concepts. But I also want you to know how to proceed. So, we're going to look at some concrete steps now in how to turn that very simple Python script, greet-simple.py, how we can turn that into a command line tool.

So, we're going to use very, very basic steps for this. We're going to copy the existing code. Maybe you have your code in a notebook or a Quarto markdown file or, you know, you need to just extract some portion of a larger code base. So, we're first going to make sure that we have the required code in a separate file. That's our very first step. Then, again, we're going to think about the arguments. We're going to be thinking about the standard input. Or are there other streams that we want to make use of? Then, as a fourth step, we're going to be looking at the very first line, the shebang line that our interpreted scripts could really use in order to start using them as a proper command line tool. I'll explain a little bit about making your code executable by changing the permission bits. And then, lastly, I'll show you how to change your search path so that it really feels like a proper command line tool.

So, this tool, which you can already look at if you have the GitHub repository open, that's our end state. I'll walk you through the steps.

So, we have our starting position and the final position where we want to go to now open next to each other. You've already seen this one here on the left. That's our very basic script. It's not a command line tool yet because it's not executable. We still need Python. It doesn't take any arguments. It doesn't read from standard input. So, those are the things that we're going to change.

So, now let's save this. Now, what we can do is, it's not yet a proper command line tool. You're right, but bear with me. Now, it just, it still works the same, but I can also pass in one argument and then, hey, we have different output. So, we have now already added some flexibility to this script and, you know, command line tool in the making.

The third step. Our current tool reads from names.txt and that's a fixed value. It's hard-coded. So, we want this tool to take in, yeah, any data, right? Whether that comes from the internet or a database or a local file, we want to be greeting everybody.

So, instead of reading from names.txt, well, so, this is also a file. It's just a special file here in our Unix-like environment. And it's already open. So, we don't need to open this. It's already open. So, we can immediately start iterating. But for our purposes, we can just treat it as a file.

That brings us to step number four. Oh, yeah, the shebang. You know, funny story. I once participated in a beatboxing competition. I did that once and only once. And I had to come up with a stage name. And I was, of course, already, you know, nerding out on the command line. So, I thought, like, shebang. Sounds awesome. Sounds powerful. Of course, when I was announced, it was pronounced as shebang.

So, the shebang, the name, the term comes from these two characters, the hash and the bang, the exclamation mark. Together, these are, you know, this particular combination are a particular set of bits that instruct the shell that the remainder of this line is the executable that should be interpreting the rest of the file. So, here we have user bin env. And it takes one argument here, Python 3. So, all that this does, we can say env, and we can say Python 3, and it runs Python 3 regardless of where the executable Python 3 is on my system. So, using user bin env makes your tool more portable. Because it might be that on someone else's system, Python 3 is in a different location. So, this is not something that you want to hard code. So, this is very common. User bin env, Python 3.

Now it says permission denied. Okay. What's going on? This brings us to step number five. Every file and every directory in a Unix-like system has certain permissions associated with it. What is allowed? If we have a look here at the files in our directory, then in the very left, we see, yeah, rwx and some dashes. And these indicate the permission bits, the permissions that are associated with each file and each directory. So, when we go to greet simple, that's the file we're currently working with, we don't see an X. So, we do not have the permission to execute this file. So, we need to change this. And to do that, we use chmod or change mode. And we're going to be turning the execution permission bit, the X, we're going to turn it on, which is the plus for the user.

And if we're then going to check again, you can see that the execution bit is now set, because we do see an X here. And so, if we try to run this. So, previously, we got permission denied. I'm going to try again. It works. And we noticed that we didn't need to specify Python 3, right, because of the shebang line. We're almost there. I would say that in a lot of cases, this is sufficient. But that .slash, what's up with that? That doesn't really feel like a real command line tool.

The reason is, is that this file is in a directory, which is not on our search path. The search path is an environment variable, which is a list of directories, where the shell will look for your executable, when you use its name. Our current directory is not in here. So, we can do two things to fix this. We can either move our script to a directory that is in our search path. Or we can add the current directory to our path. What I really like is to have just one directory, somewhere in my home directory, maybe .bin or whatever, a place where you store all your own command line tools, and then just change your search path to add that one directory. So, it's a one-time setup, and then you're done and you just have all your tools in one place.

And you know what? There's actually one more step that I like to do. We don't really need that extension. .py, that's not needed. That's not needed in Unix or Linux. So, we can just get rid of it.

Now, ladies and gentlemen, we can say greet simple. That's it. That's how you turn, even though it was just a simple script, how you can turn an existing Python script into a reusable command line tool with arguments, with streams, and whatnot. And it doesn't matter which approach you're going to use, whether that's the basic sys.argv, or as we'll see in a moment, the argparse module or Typer, or what have you. These steps are always there.

Upgrading with argparse and Typer

So, so far, we've been using argv. Now, there is a built-in module, right? So, it's part of the Python standard library called argparse. And this gives you a ton of flexibility when it comes to defining your command line arguments. It can automatically generate help. You can denote whether your arguments should be optional or required. It can do some post-processing. If you indicate, like, hey, this is a Boolean, you actually can use it as a Boolean instead of first parsing that string. Handles short and long options.

Again, this is our greeting script, but now there's a lot more code. These first lines over here, that's me setting up argparse, right? Instructing argparse. Okay, what arguments does this script accept? Again, it uses argv under the hood, yeah? But it does a lot of work for you. So, I recommend that you try this out. It's a lot more convenient. And if you have more than two arguments, I can definitely recommend moving on from argv and using argparse.

Or use Typer, which is a relatively new Python package created by Sebastian Ramirez, the person behind FastAPI. This is a very modern way of declaring your command line interface, so to speak. Built on top of Click, which also uses argv under the hood. The main difference is that with Typer, you use type hints. And from that, it generates your interface and the help associated with it. And it is generally a lot less code than if you were to use argparse.

And you can see here, I have a function main with some arguments, like functions have. What I want you to focus on right now is, it's not so much the code itself, but just, you know, the takeaway here is that Typer uses type hints. And this function is still usable as a regular Python function. And that is not the case when you use Click. Click really changes the signature of the function. So that is one advantage of Typer. See, so that's a lot less code than if you were to use argparse.

Self-contained tools with UV and PEP 723

Now, what is this here at the top? This is super awesome. This here is recently approved PEP 723, which allows you to specify dependencies inside the script itself, right? So the syntax, you know, is perhaps a little bit weird. You have to start with three slashes, script, and then it's something that looks like pyproject.toml. You know, you'll get used to it. And uv, you know, this new tool to handle your packages can help with this. This allows you to make your tool standalone. And this is definitely a good thing when you use external dependencies. You can specify the Python version that your tool needs and any other external dependencies go in here, in between these brackets.

And there is at least one, which is Typer itself. But also if your tool uses, say, the requests library or Polars, yeah, it works. And if you then, because we now use a slightly different hashbang line, so it's no longer Python, which runs this code. No, it's uv run. It detects whether you have these packages, these dependencies already installed. And if not, it will do so. Making the ability to share this script with others all the more easy.

Wow, look at this. It's beautiful. That's the help auto-generated by Typer. And normally, if you would run this script, this tool, for the very first time, you will see that it installs some packages. Now, this was not shown here because I already had these packages installed because I already ran it. So it's being run in its own virtual environment. These packages are not installed globally. This process gets its own virtual environment. Pretty cool.

This process gets its own virtual environment. Pretty cool.

Wrapping up

There's some best practices for you to review. I don't have time now to go over all of these, although some of them have been mentioned along the way, some anti-patterns as well that you want to avoid. And so this is your take-home exercise, right? It's to start building, and preferably as soon as possible, because otherwise, you know, you will have forgotten all about this. And if you have built something, I would love to hear from you. If you have any other questions after this tutorial, you know, I want to hear from you. So with that, I want to thank you very much for attending this tutorial. I think you've made the right choice, right? These 90 minutes, they're going to benefit you for the rest of your career. So again, thank you very much. Enjoy the rest of the conference.

Jeroen Janssens - Package Your Python Code as a CLI | PyData London 25

Transcript#

Overview of today's tutorial

What we mean by CLI

Exercise: designing your next tool

Arguments, options, and switches

Standard streams

Break

Six steps to turn a Python script into a CLI tool

Upgrading with argparse and Typer

Self-contained tools with UV and PEP 723

Wrapping up

Featured software#

cli