
Jeroen Janssens - Package Your Python Code as a CLI | PyData London 25
www.pydata.org Learn how to transform your Python code into a command-line tool. Jeroen Janssens, author of Data Science at the Command Line, guides you through the process of turning your scripts into reusable, executable tools, integrating them into your data workflows and harnessing the power of the Unix command line. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to this tutorial where we're going to examine how you can package or transform or convert your Python code into a command line tool. My name is Jeroen. A lot of the material today is based on a book I wrote a couple of years ago, Data Science at the Command Line, which you can read for free. A couple of months ago, another book that I wrote came out together with Thijs. What's good to mention is that we're giving away 25 signed hard copies on Sunday. If you want to participate in the raffle, you can go to this URL or scan the QR code.
For the next 90 minutes, we're not going to be focusing on Polars. We're going to be focusing on that other technology which has been around for a bit longer, the command line. I'm not going to explain why you should use the command line. The fact that you're here pretty much shows that you are already convinced that the command line is a wonderful way of working, very complementary to using an IDE or a programming language.
One of the beautiful things about the command line is that you can very easily extend it. You can create your own tools. There are already a lot of tools available. Even if you just install a Linux distribution or if you have Mac OS, there are, I'd say, about 100 tools already at your disposal. Then there are thousands more tools available to you. New tools are being written every day. The good thing is that it's actually not that hard to write your own. That's what we're going to be looking at today.
I really believe that if you're able to create your own tools, that they'll eventually, or actually immediately, make you more efficient and productive and effective. Because the command line has been around for so long, because it's been around for over 50 years, we can safely say that it's going to be around for the rest of your career. And there will be many, many tools, many programming languages will come and go. Plenty of JavaScript frameworks. But there's this one technology which is going to be sticking around for a very long time, and that is the command line.
But there's this one technology which is going to be sticking around for a very long time, and that is the command line.
Overview of today's tutorial
So this is roughly what we're going to be covering today. First of all, I want to make clear what is it that we're actually talking about when we say CLI. Then the interface. How do you design your next tool? And there are two components to that. On the one hand, arguments and options and switches. And on the other hand, how do you deal with streams, streams of data? Then we're going to go into the fundamentals. How can you transform an existing piece of Python code into something that is usable from the command line? In six easy steps.
Then I'll show you how you can upgrade from that, you know, the fundamental way of doing things using either argparse, Python module, or Typer, a relatively new Python package that you can use to easily build modern command line tools. We'll have a look at UV and how you can use it to create self-contained tools. And along the way, I'll share some best practices. This is a tutorial, and I really want to encourage you to ask questions along the way, right? So this is your opportunity to extract as much information as you want from me.
What we mean by CLI
All right then, what are we talking about here? We're talking about the Unix command line. So if you're running either Mac OS or a Linux distribution, the most popular one being Ubuntu, then you're good. If you're currently running on just Windows, that's not enough, right? Windows does offer, you know, two types of command lines, but they're, you know, a different breed of command line interfaces. So if you want to follow along today and you're running Windows, you have two options. You can either install Windows subsystem for Linux, WSL, or you can use Docker and use a Docker image that has a Linux distribution on it.
We're not going to be talking about textual user interfaces or TUIs. They're great, right? Here we have LazyGit, for example, but they are a very different approach. They're more like applications that, you know, are running in your terminal, but they're a very different approach than working with command line tools. If you are interested in this and you are interested in creating a TUI using Python, then I can recommend you check out the textual package.
So command line tool. It's an umbrella term. We have a variety of different tools and how they're implemented. We can have simple aliases or shell functions defined in our configuration file, or we can even have scripts that are compiled and so binary executables. We're going to be looking at interpreted scripts, right? Because of course today our language of choice is Python. And I believe that this really hits a sweet spot when it comes to creating your own tools. They give you all the flexibility you need from a programming language and still all the power, maybe not the speed, but still all the power from a binary executable.
But the good thing is, and you might already be aware of this, is that the command line itself doesn't really care in which language a script is programmed in. As long as you adhere to certain standards. It doesn't matter whether it's in Bash or Python or Rust. We do see more and more command line tools being implemented in Rust. But this also means is that you can easily combine tools written in different programming languages.
And then there's this other part of the so-called Unix philosophy. Is that most tools, most, not every tool, they do one thing. But they do it really well. Because you can string these tools together that gives you a lot of flexibility. You can solve many, many different problems that way. So keep that in mind as you are designing your next command line tool.
So you shouldn't try to implement everything in your own tool. It's good to be aware of the existing tools that are already out there. So these can be features that the shell provides, such as piping or reading from a file, writing to a file, or globbing. Or command line tools such as cat curl for downloading things. If your command line tool downloads something in order to process it, you may want to consider, like, hey, how about I just rip that out of my tool and let curl do the work so that it can pipe its output to my input. So that requires a little bit of a different way of thinking.
So in other words, stay in your lane. Try to focus on what your tool needs to do well.
Exercise: designing your next tool
So here we are with our very first exercise. Don't worry, it's not a coding exercise. This is a pen and paper and a thinking exercise. I would like you to think about what your next command line tool could be. So try to remember something that you have created in the past, whether that was a helper script or a one-off notebook. Does it contain something that can be generalized, right? If you strip out all of the hard-coded values. So what was a particular task that you may be needing to do in the future? And then let's think about how you can turn it into a command line tool.
Arguments, options, and switches
Let's talk about how we can design the interface of our next command line tool. Like I said, the interface is roughly based on two aspects. On one hand we can give our tool arguments, also known as options and switches and flags, and on the other hand we can let our tool deal with streaming input or input and output and error messages. So let's first focus on those arguments.
Now, there are a lot of terms here when it comes to those arguments and various sources use different lingo. But this is what I believe to be the case. So here we have one example. This git commit command. Now, git is what is the command. Commit, in this case, is the subcommand. Dash v, which stands for verbose, is an option. Or, no, it's a switch. You see, it doesn't really matter what we call these things. Just know that, you know, sometimes they are called options or switches or flags. I believe because this particular one, so dash v, doesn't take any arguments, any parameters, it can be known as a flag or a switch. And then dash m would be an option.
So a very short piece of code. It just prints the argv value. So arg values. What are the arguments passed to this script?
So we use sys.argv in this case. This is as low level as you can get. Just a list of command line arguments. If you want to do any fancy things, if you want to support both short and long options, if you want to do any verification, then those are things that you have to do yourself. So that might be a lot of work. But if you're creating a tool that needs zero, one, or two arguments, then I would say that's fine. If this tool is for your purposes only, and I would say that's 99% of the tools that I expect you to create, then it's fine to use argv.
When it comes to options that you can use for your next command line tool, there are certain conventions out here. A lot of tools, not all of them, there are always exceptions. But it is generally understood that these short and long options, they're common conventions here. So when it comes to providing help, you would provide both the short option dash h or the long option dash dash help.
Standard streams
In a way, command line tools are kind of like functions. Why do we use functions in our code? Well, to abstract away some complexity. And functions can take zero, one, or more arguments so that you give them some flexibility, that they do one thing and do it well, hopefully. But there is at least one thing that sets command line tools apart from your Python functions. And that is that each process in a Unix-like system has three standard streams available.
And again, it's all text. That's the universal interface. So if you want to output numbers or if you want to ingest numbers from some other tool or from a file, you need to do some work there. But that's a necessary thing in order to make this a general interface that always works. So there are three standard streams. Standard because they're always available. Standard input, standard output, and standard error, abbreviated often as stdin, stdout, and stderr. Not every tool uses these. Your tool may not need it, right?
Instead of redirecting to a file, we can pipe our output to another tool, which then reads from its standard input. And you know, that is the beauty of the command line, is this flexibility to solve a big puzzle, right, by combining all these pieces together in various ways.
And you know, that is the beauty of the command line, is this flexibility to solve a big puzzle, right, by combining all these pieces together in various ways.
Has anyone ever tried reading and writing to the same file in one pipeline? Well, you can't do it, so you shouldn't. If you do that, what happens is that the last process that is supposed to write to a file, to the file that you're also reading from, immediately opens it for writing, and thereby truncating the file. So the first process, which might be the same one, starts with an empty file, and as a result, you just delete your data. But there is a solution. You can either write to a different file first, and then rename that, of course, or there is this tool called Sponge, which is not installed by default, but Homebrew has it, and it's also part of more utils.
Break
All right. I think that's enough time to think about streams. We're also halfway this tutorial, so I suggest that we take a quick break. You know, today's actually my birthday. To celebrate this, I brought Stroopwafels from the Netherlands. And I hope I have enough Stroopwafels. But if you would like one, you're very welcome to come over here and grab one. So let's take a five-minute break, stretch your legs, go to the restroom, whatever, eat a Stroopwafel, and then we'll reconvene.
Six steps to turn a Python script into a CLI tool
So, so far we've talked mainly about concepts. And that's actually what I care most about. Syntax itself is not that exciting. Those are things you can look up or even let some LLM do for you. What I really want you to take home are the underlying concepts. But I also want you to know how to proceed. So, we're going to look at some concrete steps now in how to turn that very simple Python script, greet-simple.py, how we can turn that into a command line tool.
So, we're going to use very, very basic steps for this. We're going to copy the existing code. Maybe you have your code in a notebook or a Quarto markdown file or, you know, you need to just extract some portion of a larger code base. So, we're first going to make sure that we have the required code in a separate file. That's our very first step. Then, again, we're going to think about the arguments. We're going to be thinking about the standard input. Or are there other streams that we want to make use of? Then, as a fourth step, we're going to be looking at the very first line, the shebang line that our interpreted scripts could really use in order to start using them as a proper command line tool. I'll explain a little bit about making your code executable by changing the permission bits. And then, lastly, I'll show you how to change your search path so that it really feels like a proper command line tool.
So, this tool, which you can already look at if you have the GitHub repository open, that's our end state. I'll walk you through the steps.
So, we have our starting position and the final position where we want to go to now open next to each other. You've already seen this one here on the left. That's our very basic script. It's not a command line tool yet because it's not executable. We still need Python. It doesn't take any arguments. It doesn't read from standard input. So, those are the things that we're going to change.
So, now let's save this. Now, what we can do is, it's not yet a proper command line tool. You're right, but bear with me. Now, it just, it still works the same, but I can also pass in one argument and then, hey, we have different output. So, we have now already added some flexibility to this script and, you know, command line tool in the making.
The third step. Our current tool reads from names.txt and that's a fixed value. It's hard-coded. So, we want this tool to take in, yeah, any data, right? Whether that comes from the internet or a database or a local file, we want to be greeting everybody.
So, instead of reading from names.txt, well, so, this is also a file. It's just a special file here in our Unix-like environment. And it's already open. So, we don't need to open this. It's already open. So, we can immediately start iterating. But for our purposes, we can just treat it as a file.
That brings us to step number four. Oh, yeah, the shebang. You know, funny story. I once participated in a beatboxing competition. I did that once and only once. And I had to come up with a stage name. And I was, of course, already, you know, nerding out on the command line. So, I thought, like, shebang. Sounds awesome. Sounds powerful. Of course, when I was announced, it was pronounced as shebang.
So, the shebang, the name, the term comes from these two characters, the hash and the bang, the exclamation mark. Together, these are, you know, this particular combination are a particular set of bits that instruct the shell that the remainder of this line is the executable that should be interpreting the rest of the file. So, here we have user bin env. And it takes one argument here, Python 3. So, all that this does, we can say env, and we can say Python 3, and it runs Python 3 regardless of where the executable Python 3 is on my system. So, using user bin env makes your tool more portable. Because it might be that on someone else's system, Python 3 is in a different location. So, this is not something that you want to hard code. So, this is very common. User bin env, Python 3.
Now it says permission denied. Okay. What's going on? This brings us to step number five. Every file and every directory in a Unix-like system has certain permissions associated with it. What is allowed? If we have a look here at the files in our directory, then in the very left, we see, yeah, rwx and some dashes. And these indicate the permission bits, the permissions that are associated with each file and each directory. So, when we go to greet simple, that's the file we're currently working with, we don't see an X. So, we do not have the permission to execute this file. So, we need to change this. And to do that, we use chmod or change mode. And we're going to be turning the execution permission bit, the X, we're going to turn it on, which is the plus for the user.
And if we're then going to check again, you can see that the execution bit is now set, because we do see an X here. And so, if we try to run this. So, previously, we got permission denied. I'm going to try again. It works. And we noticed that we didn't need to specify Python 3, right, because of the shebang line. We're almost there. I would say that in a lot of cases, this is sufficient. But that .slash, what's up with that? That doesn't really feel like a real command line tool.
The reason is, is that this file is in a directory, which is not on our search path. The search path is an environment variable, which is a list of directories, where the shell will look for your executable, when you use its name. Our current directory is not in here. So, we can do two things to fix this. We can either move our script to a directory that is in our search path. Or we can add the current directory to our path. What I really like is to have just one directory, somewhere in my home directory, maybe .bin or whatever, a place where you store all your own command line tools, and then just change your search path to add that one directory. So, it's a one-time setup, and then you're done and you just have all your tools in one place.
And you know what? There's actually one more step that I like to do. We don't really need that extension. .py, that's not needed. That's not needed in Unix or Linux. So, we can just get rid of it.
Now, ladies and gentlemen, we can say greet simple. That's it. That's how you turn, even though it was just a simple script, how you can turn an existing Python script into a reusable command line tool with arguments, with streams, and whatnot. And it doesn't matter which approach you're going to use, whether that's the basic sys.argv, or as we'll see in a moment, the argparse module or Typer, or what have you. These steps are always there.
Upgrading with argparse and Typer
So, so far, we've been using argv. Now, there is a built-in module, right? So, it's part of the Python standard library called argparse. And this gives you a ton of flexibility when it comes to defining your command line arguments. It can automatically generate help. You can denote whether your arguments should be optional or required. It can do some post-processing. If you indicate, like, hey, this is a Boolean, you actually can use it as a Boolean instead of first parsing that string. Handles short and long options.
Again, this is our greeting script, but now there's a lot more code. These first lines over here, that's me setting up argparse, right? Instructing argparse. Okay, what arguments does this script accept? Again, it uses argv under the hood, yeah? But it does a lot of work for you. So, I recommend that you try this out. It's a lot more convenient. And if you have more than two arguments, I can definitely recommend moving on from argv and using argparse.
Or use Typer, which is a relatively new Python package created by Sebastian Ramirez, the person behind FastAPI. This is a very modern way of declaring your command line interface, so to speak. Built on top of Click, which also uses argv under the hood. The main difference is that with Typer, you use type hints. And from that, it generates your interface and the help associated with it. And it is generally a lot less code than if you were to use argparse.
And you can see here, I have a function main with some arguments, like functions have. What I want you to focus on right now is, it's not so much the code itself, but just, you know, the takeaway here is that Typer uses type hints. And this function is still usable as a regular Python function. And that is not the case when you use Click. Click really changes the signature of the function. So that is one advantage of Typer. See, so that's a lot less code than if you were to use argparse.
Self-contained tools with UV and PEP 723
Now, what is this here at the top? This is super awesome. This here is recently approved PEP 723, which allows you to specify dependencies inside the script itself, right? So the syntax, you know, is perhaps a little bit weird. You have to start with three slashes, script, and then it's something that looks like pyproject.toml. You know, you'll get used to it. And uv, you know, this new tool to handle your packages can help with this. This allows you to make your tool standalone. And this is definitely a good thing when you use external dependencies. You can specify the Python version that your tool needs and any other external dependencies go in here, in between these brackets.
And there is at least one, which is Typer itself. But also if your tool uses, say, the requests library or Polars, yeah, it works. And if you then, because we now use a slightly different hashbang line, so it's no longer Python, which runs this code. No, it's uv run. It detects whether you have these packages, these dependencies already installed. And if not, it will do so. Making the ability to share this script with others all the more easy.
Wow, look at this. It's beautiful. That's the help auto-generated by Typer. And normally, if you would run this script, this tool, for the very first time, you will see that it installs some packages. Now, this was not shown here because I already had these packages installed because I already ran it. So it's being run in its own virtual environment. These packages are not installed globally. This process gets its own virtual environment. Pretty cool.
This process gets its own virtual environment. Pretty cool.
Wrapping up
There's some best practices for you to review. I don't have time now to go over all of these, although some of them have been mentioned along the way, some anti-patterns as well that you want to avoid. And so this is your take-home exercise, right? It's to start building, and preferably as soon as possible, because otherwise, you know, you will have forgotten all about this. And if you have built something, I would love to hear from you. If you have any other questions after this tutorial, you know, I want to hear from you. So with that, I want to thank you very much for attending this tutorial. I think you've made the right choice, right? These 90 minutes, they're going to benefit you for the rest of your career. So again, thank you very much. Enjoy the rest of the conference.

