Resources

KEYNOTE: Dr. Jeroen Janssens - Embrace the Unix Command Line and Supercharge Your PyData Workflow

www.pydata.org Discover why the Unix command line remains a powerful and relevant tool for data scientists, even in a Python-dominated landscape. This talk will demonstrate how embracing the command line and leveraging its many tools can significantly enhance your productivity, streamline data workflows, and complement your Python skills. Jeroen Janssens, PhD, is a polyglot data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. Jeroen is passionate about open source and sharing knowledge. He is the author of Data Science at the Command Line (O’Reilly, 2021) and is currently writing Python Polars: The Definitive Guide (O’Reilly, 2025). Every now and then he blogs at https://jeroenjanssens.com. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, everybody, and welcome to our very first keynote of PyData Global 2024. I am very excited to introduce our keynote speaker, Dr. Jeroen Janssens, and he is going to be sharing a talk with you, Embrace the Unix Command Line and Supercharge Your PyData Workflow. Thank you so much for being here today. Thank you so much for keynoting. The floor is all yours.

Thank you, Tamara, and thank you for joining me. Today, I'm going to talk about how you can embrace the Unix Command Line so that you can supercharge your PyData workflow. You see, I'm a huge fan of the Unix Command Line. I use it every single day for a variety of tasks, and I really believe that it makes me more productive and more efficient.

So, of course, I'm very enthusiastic about this, and whenever I tell a colleague or friend or some random stranger on the street that they should also start embracing the Unix Command Line, I'm often met with questions such as, what? Why? How? So, the time has come to answer all these questions and more, once and for all. So, today, I'm going to share with you every tip and trick that I know of that's going to help you embrace the Unix Command Line.

What is the command line?

So, let's start with the what. What is the Command Line? Here's what the Command Line looks like. On your system, it may look different. The Command Line is an interactive environment that allows you to run Command Line tools, right? If you know how to wield it and which spells to cast, then the Command Line can help you perform all sorts of magic. I mean tasks. But, hey, look at it. It's hideous. Sure, the penguin is cute, but don't be fooled. With a single command, you can wipe out your entire system. No wonder people are intimidated and reluctant to embrace it.

Still, I think you should embrace the Command Line. Says who? Well, says me. I started using Linux about 16 years ago when I was doing my PhD. Now, I wasn't quite ready to leave Windows, so I installed Ubuntu Linux next to it, known as Dual Boot. Back then, there was no Windows subsystem for Linux, aka WSL. After my PhD, I moved to New York City to work as a data scientist for a number of startups. I started using more and more tools for my daily job. And so, in 2013, I wrote a small blog post called Seven Command Line Tools for Data Science.

This hit the front page of Hacker News, and lots of people were commenting on this. So, I thought, like, hey, there might be something in here. Other people seem to be, you know, interested in this. And so, one thing led to another, and in 2014, I got the opportunity to write a book called Data Science at the Command Line. Now, in 2021, I wrote the second edition, which you can read for free on my website. This book, you know, also led me to start giving workshops, and eventually, I decided, all right, let's build a company out of this called Data Science Workshops. You know, and then thanks to the pandemic, this wasn't all that fun anymore, so I decided to join a company again called Xomnia, where I'm currently employed as a senior machine learning engineer. And that's actually the place where I came in touch with yet another interesting tool, Polars, which I'm not going to talk about today, but I just want to mention that I am currently writing a book together with Thijs Nieuwdaal about Python Polars, which should come out in February.

Why use the command line?

So, let's move on to the second question. Why? Why should you care about the Command Line? See, as researchers and developers, we have many, many tools at our disposal. Programming languages, IDEs, spreadsheets, pen and paper, and of course, the Command Line. In any case, I really believe that you should use the tool that gets the job done.

So, where does the Command Line fit into this? As this diagram tries to convey, I believe it hits a sweet spot between using the mouse and the programming language, right? For example, renaming or converting, as in this diagram, converting 100 JSON files to CSV, right? How would you do this? Try to think about this for a moment with the tools that you're currently using. Would you write a script for this, right? Perhaps a Python script for this. Is it worthwhile to do this for a one-off task? Or do you have some graphical application, some GUI that is able to convert CSV to JSON or maybe a website, right? That might work for one or two files, but what if you have, right, 100 of them? It gets tedious quite quickly. So, this is something that you can do very easily using the Command Line.

And another example is, and this is a little bit more complex, right? Let's say you have log files, over 10 million lines in this log file, and you need to know the top 10 errors in these logs, how would you do that, right? It's such a one-off task. And I'm, of course, going to try to convince you that the Command Line is really well suited for these kind of tasks.

Now, what other kind of tasks can we do? Let's have a look here. So, when it comes to general tasks, the Command Line is really close to the file system, yeah? So, it makes it really easy to work with files and directories, search through text. I'm sure that you've used the Command Line to pip install a package or two, or maybe even create a virtual environment, right? But the Command Line can do much, much more than that. If you're more in the operations side, it allows you to schedule jobs, configure servers, monitor your resources. Deploy software, right? Also, if you take a look at your CICD pipelines, right? If you peel away all those layers of YAML, then very often you find that it's actually a Command Line tool under the hood doing all the work.

Okay. So, but I think we all know what the real reason is, right? Why we want to use the Command Line, in which you look like a hacker. And then, of course, not talking about hacking into someone else's system. No, I am talking about solving a problem in a timely manner, right? This is the infamous data science Venn diagram by Drew Conway. And, you know, when it comes to hacking, I believe that the Command Line plays a very important role.

And if you don't believe me, then perhaps you believe this article in Nature. Nature argues that researchers should embrace the Command Line. They say that it can help wrangle with big files and that it allows you to parallelize your work and even automate it.

Nature argues that researchers should embrace the Command Line. They say that it can help wrangle with big files and that it allows you to parallelize your work and even automate it.

Now, I was convinced already 30 years ago when I saw Jurassic Park for the first time. And in this scene, Lex is able to close the doors because she knows Unix. Yeah? So, that the velociraptors don't get lunch that day. But there are plenty of other reasons.

So, one of the biggest reasons is perhaps that the Command Line has been around for, well, it says almost, but it should now be over 50 years, right? It's even older than I am. And because it's been around for so long, thanks to what is known as the Lindy effect, we can be pretty sure that it's going to be around for the rest of our careers. Yeah? And that's comforting to know. It means that the time that you invest now is going to be paying off for a very, very long time.

Yeah? Right. I already mentioned this. It is very close to the file system. Right? It's what we work with on a daily basis. With files and with data. And so, you can do most of these things using the mouse, right? But as I said earlier, the Command Line is much more suited for these one-off things. Of course, and I'll talk about that later as well, is that at a certain point, there is definitely benefit to using a programming language.

Now, here's another reason. I think that a lot of us have used, well, the Command Line not only for installing Python packages, but also for Git. Sure, there are graphical user interfaces for this. But when shit hits the fan, right? When things are really messed up, then it's, well, the Command Line is the only interface that allows you to clean up that mess, right? If you want to do it properly. Of course, you can throw it away and clone the repo again, right? There's these graphical user interfaces. There's also always some level of abstraction. And because of this, they have limitations and they cannot really scale as well.

Command line tools in action

Okay. So, let me show you. Let me give you a couple of examples of some Command Line tools. And now, I've been using the tool as an umbrella term, right? In fact, there are five types of tools. There are binary executables, commands that are built in to the shell, scripts that get interpreted, such as your Python scripts, and then there are shell functions and aliases.

And I'll talk more about those in a moment. Here's a slide to intimidate you. A very long list of Command Line tools that I, well, that I have used at some point. Or am I still using? Not all of them. And for most of these tools, I've actually put them in a Docker image that I prepared for my book, Data Science at the Command Line. Now, I'm going to talk a little bit more about Docker in a moment. But if you're interested, then in Chapter 2, there are instructions for how you can get a hold of this Docker image and run it yourself.

So, there's no way I can cover all of these tools. And, you know, this is only the tip of the iceberg. New tools are being developed every single day. And there's not a day goes by when I'm browsing Hacker News. Should probably do something about that. But when I'm on Hacker News and I see, all right, there is a new Command Line tool in town. And, you know, it is difficult to keep up.

But that's okay. Because in the end, it's not so much about the tools, but it's about the environment. The way of working. Yeah? And after a while, you cannot imagine working on a system without a Command Line. Trust me.

Okay. So, I do want to highlight a couple of tools that I, you know, use every now and then. So, let's head over to the Command Line.

Here we go. So, what do we see here? We have on the top left corner, we have the prompt. We see my cursor here indicating that the Command Line is ready for my command. So, I'm also inside a program called Tmux, which I'm going to talk about in a moment as well. But for now, let's just run a couple of commands here.

And I believe that I have some — I don't know. What can we look at? Well, we can look at chapter one of the book, right? The Python Polars book. So, what that does, that is like Cat. And I guess I should also then demonstrate Cat. Cat just outputs every file that you pass it. So, in this case, a single ASCII doc file. Yeah? That's the format that we've been writing this book in. It is just text. And Cat shows everything. That's it. And without any colors. That, on the other hand, you know, allows you to scroll through the text or the file at your own pace. And it adds colors to this.

So, for example, let me try another file here. Maybe some data over here. Right? If the file is small enough, then it immediately shows it. This is just 11 lines. This is a CSV file. And you can see that we have, well, we have colors. Yeah. It just looks a lot nicer than just using Cat.

So, that's bad. Not every tool has to be that big. Right? Fx. I like this whenever I'm working with JSON data. Right? Imagine that you get from your colleague or from some API JSON data that is deeply nested. Right? How do you, right, when you're in Python, how do you get a hold of its structure? How do you, you know, investigate or inspect what's in there? Now, Fx is something that I like to use for this. So, let's see here. I have some data about Pokémon. This comes from the Pokémon API. And this allows me to really navigate in all the objects.

And what's more, here in the bottom, you can see the, well, the path of each object. So, that you can use this information in, say, Python for when you want to access certain fields in certain objects. Right? So, this really allows me to get a hold of some JSON data. Imagine or compare that to, well, I mean, I'm sure that is able to demonstrate this as well. But then you get everything. And if it's really deeply nested, it can be very difficult to, you know, understand where you are in the structure of all these objects.

So, that's Fx. Now, then, some core utils. Yeah. So, let's say, again, we have here these power tools. That's not a JSON file. It's a CSV file. Right? This is the raw data. It's just text. And Unix command line tools are very good at processing text. So, what we can do is we can pipe the output from cat to another one. Well, let's say what is something interesting to do here? We could, for example, count the number of tools per brand. Yeah? So, I try to think about how would I approach this? How would I split this task or this problem into multiple subproblems? Well, okay. I first want to extract a single column in the CSV file. And then I want to count this.

So, let me just show you how that is typically done. And of course, you can do this in Python using pandas or polars. But I just want to demonstrate, you know, for these one-off things, how you can use the tools that already come with the Linux distribution preinstalled. Right? So, what I'm going to use now are all what is known as core utils. Okay. So, split on comma. And then I want the third field. So, this gives us the third column, so to speak, in quotes, because it's just text. It's not being interpreted as a CSV. I want to get rid of that first line here. So, for that, I'm going to use tail. Now, in order to count it, I first need to sort it. Now, it's counted. And if I want to have it sorted in reverse order, I can sort it again. And now we see that we have five Makita power tools, three from Bosch and two by DeWalt.

And this is usually how it goes. The way of working is that you iterate on your command by adding additional tools to this.

Parallel, which I'll skip for now, is very powerful for whenever you want to run some process multiple times. Right? As the name suggests. Right? It allows you to utilize all your cores and even multiple machines if you have access to those. Yeah. And then you can very quickly, say, convert those 100 JSON files to CSV, for example, and then run a number of jobs in parallel. It's like a glorified for loop, but then much more efficient. If you're interested, if you want to learn more about Parallel, chapter 8 of Data Science, the command line.

Creating your own command line tools

All right. So, you can easily extend the command line with your own tools. Right? You're not confined by the limits or by the tools that currently exist. You can easily create your own tools. Yes, you can. So, I'm going to show you how.

So, first you should know that most of these tools do one thing, and they do it well. Right? Remember in my previous example, I used tools like Cut and Sort and Unique, and they're all built for a single purpose. And because of that, they also get really good at it. But more importantly, the command line allows you to let these tools work together. Yeah? In various ways. And they can do that because they adhere to another idea of the Unix philosophy, and that is that text is the universal interface. Yeah? So, one tool outputs text, which can then be fed into another tool. Yeah? And text is more abundant than you think. Log files, for example. Who here works with log files? Log files are just text. Source code and configuration files, they're text. CSVs, JSON, XML, HTML, all just text. And many other data sources can often be converted to text. So, lots of options there.

The command line doesn't care in which language a tool has been implemented in, as long as they adhere to this Unix philosophy. They can all work together. And there's this interesting trend in that more and more tools are being written or rewritten to Rust, right? Think UV, a relatively new package manager, or RUF, right?

I think my camera just stopped working. That means I have to switch to a different camera. This is a little bit unfortunate. But I can fix this. You'll just see me with another camera.

There we go. And now we have this glaring light here in the back. And that's something I'm going to fix as well by just turning off this light. There we go. Not to worry, this isn't about me anyway.

Hey, so... Where was I? As long as they adhere to the Unix philosophy, they can work together.

Oh, yeah. I was talking about the Rust tools that are being implemented these days. That's very interesting. I don't know Rust myself just yet. I really ought to learn it. But it does mean that, as a result, many, many more tools are getting faster and faster. And that's, of course, for us Python developers, a good thing, when all these additional utilities get faster and faster.

So, let's go back to creating your own tools, right? All you have to do is follow these six easy steps. Yeah? So, of course, you have to copy your source code, wherever it comes from, into some file. Yeah? That's a no-brainer. Then you have to think about, okay, what things in my script do I want to be variable? Right? What are the arguments that my command line tool should take? Yeah? So, that is the argv part. I'll show you that in a moment. Maybe you want your tool to accept data from standard input. Yeah? So, when one tool produces text, do you want to be able to read in this text? Now, number four, the shebang, is needed to make your tool executable. So, is step number five, where you change the permissions of your Python script or whatever language it is called in. But in a moment, we're going to focus, of course, on Python. And step six, that is very convenient. You add your tool to what is known as the path, so that your command line is able to find your tool. And you can invoke this wherever you are.

Okay? So, I can imagine that right now, you're feeling a little bit like Alice, tumbling down the rabbit hole. Well, let's not worry. Let's all take the red pill, and I'll show you how deep the rabbit hole goes. So, for this, I'm going to use the command line again. And I'm going to show you an example, which is not that sexy. But it is something that we've actually used while writing the book.

In chapter eight of the book, right? Doesn't matter. There are a lot of code examples that are of a similar form, where we use the method with columns, right? Here's an example. We have this very small data frame. And I want to demonstrate, or we want to demonstrate in this chapter, lots of different methods that we can apply to some column in this data frame. All these methods. So, here we have lots of mathematical expressions that we can use. And you can see there is a certain pattern here. We have lots of repetition here. And I felt like, okay, we can automate this. I want to be smart about this. So, what I did is I created a small tool that's able to help me with this.

And that's this tool. Yeah? So, let me demonstrate how this works. So, let's say we have a column called X. And I want to apply two methods to this. Seal and floor, right? X, seal and floor. These three values here. I've passed those as arguments to the script, right? On the left here, on line 20, we can see that we get these values by accessing argv, right? From the sys module. And then it's actually quite a simple application. But it is very, very useful. And this is how I like to think about creating your own tools. Extending your toolbox. These don't have to be very elaborate tools. It's really the little things that count. Yeah? And because, right, we see the result here down here on the right, because this is, again, text, well, I could either put this in my clipboard. I'm working on a Mac, by the way. Or I can save it to a file, like so. Yeah? Lots of different options.

Okay. So, about those six steps, right? We have it in a file. I've shown you argv. This tool doesn't need to read from standard input. There is a shebang, right? The shebang or hashbang comes from these first two characters here in the first line. These let the shell, the Z shell, in my case, know that this script can be executed. And it should be executed by this executable, right? Python. And the rest of the file will be passed to Python. And that's the tool that will execute this file. That's the shebang.

Now, shemod, this is something that you, this is a tool, change mode. You can set the permission bit so that it's actually executable. If we were to disable this, and I wanted to execute it again, I would get an error, permission denied. Yeah? So, that's a step that you have to take. It is a thing that you only have to do once.

And this tool, by the way, is not on the path. That's why I need to prepend it with the directory name bin. But if you were to put it on your path, then you can use it like any other command line tool. And, yeah, I won't go into detail on how to do that, but it's quite easy to figure that out.

So, if you're serious about creating your own tools, then there are a number of Python packages that can help you with this. You can do everything using the standard library, but here is, for example, Qlik, which is a Python package that allows you to build beautiful command line interfaces in a rather composable way. I like it. We've used it at work. There's another package called Rich, allows you to produce beautiful output with color and style, even markdown output or syntax highlighted code. And by the same creator, there's even a package called Textual, which allows you to produce textual user interfaces, or TUIs.

Now, that is something that I sometimes use, but that we're not really going to talk about here, because TUIs or any of these textual user interfaces, they're a different beast altogether. You can invoke them from the command line, but then you're inside this application, and you don't really get to use all the other benefits that the command line have to offer. Still, they can be very, very useful, and you can produce beautiful things with this textual package, as you can see here.

Now, something that I should have clarified from the start is that, okay, it's in the title, Unix command line, but another thing that I'm not talking about here today is the PowerShell or the command prompt. These are two environments that the Windows operating system offers. So, they're a completely different animal altogether. Luckily, if you are on Windows, you can install something that is called Windows Subsystem for Linux, WSL. I have a link at the end of this presentation if you're interested in that, and that allows you to run Ubuntu Linux or some other distribution inside Windows. It's quite amazing how that's done.

The CLAIM method for embracing the command line

All right, then. I think now we come to the core of this talk, and that is, okay, how can you then embrace the command line, right? You have in front of you this stark and unforgiving environment, and you're like, all right, how am I now going to get comfortable with this? Yeah, so I have devised the claim method. We're going to claim the command line, and the claim method spans five categories of steps that you can take that make working with the command line easier and more enjoyable. These categories are in no particular order, other than that they make for a convincing acronym. Now, keep in mind that there are a lot of possible steps that you can take, and you don't have to do everything. In fact, there are a couple of things that I haven't done either, so be sure to just mix and match whatever works for you.

All right, so first up, creating shortcuts. Now, with all these commands and incantations, it is so easy to lose track, right? The command line is very ad hoc in nature. Luckily, the shell, or the command line, keeps a history, making it easy to figure out what you performed, how you performed a certain task two weeks ago. Also, you can create aliases, which I like to think of as global shortcuts because you can invoke them from everywhere.

So, those are the global shortcuts. There are also local shortcuts, and you can create those using just, and then there are bookmarks, which allow you to quickly navigate to a certain directory. Let me demonstrate this.

All right, so I've installed fstdev, which is short for Fuzzy Finder, and when you do so, it integrates with the shell in a very interesting way. When I press Ctrl-R, I get this list, and I can start typing. For example, uh, when was the last time I used sudo? Yeah, so it then brings me back to all the incantations that I've performed in the past, and it allows me to select from this list commands that I've done in the past, right? So, it really allows you to browse through your history.

Okay, so that's for fstdev. I can really recommend that tool. Now, aliases. For example, l is an alias. l is aliased to this long command, the eza command, which is a Rust reimplementation of ls, right, for list files. This is how it would look like default, but this is, of course, yeah, not a very nice thing to look at. So, um, and eza, right, this tool becomes, uh, I like, I like to call it with a number of different arguments, but I don't want to remember all those arguments. I just want things to work, so I type l. The command line then figures out, like, okay, this is an alias. Let me run this long command for you, and these can be invoked anywhere on your system. That's why they're global shortcuts.

All right, then local shortcuts, as in local to a project, and for this, we can use the command line tool called just. Just uses a file called a just file, and, um, let me see here. I have defined a number of different, well, targets, uh, that we can run, uh, in this just file. So, let me just give you an example here. I can now do just, uh, check. You see here there is command completion or tab completion. This shows me all the, uh, all the shortcuts that I have defined in this just file. For example, check unused images. Yeah, so it prints out, um, the command that is actually being run, the one that I'm highlighting right here. Now, this is something, of course, that I'm never able to, I'll never be able to remember this. So, that's why I created this shortcut, but this one is local to the project because it only makes sense, uh, for this project. Now, what does it do? It, it goes through all the ASCII doc files, right? All our, all of our chapters, and then searches for the images that we have, that we, that we're using. There's a certain ASCII doc syntax that you have to use. And then we compare that with the images that we actually have in the images directory. Fantastic stuff. And this is only using tools that come, well, pre-installed with a Linux distribution.

Now, uh, FASD, uh, this is bound, well, in my case to, for example, J, if I were to do, uh, J movies, it immediately goes to, well, uh, a directory that I've previously visited, or that I visit very often called movies. This one allows you to rapidly, uh, navigate to a certain directory because, uh, on the command line, you're always in some working directory.

Yeah. So, that's it for, uh, for shortcuts. Then of course, there are certain concepts that you have to learn and this can be, yeah, quite intimidating. And for that, I advise you to experiment with Docker. Yeah. It allows you to create an isolated environment. It's very difficult to mess up. You can even map a local directory, uh, so that you can still get data in and out if you want to. Uh, and like I said, there is a Docker image that I have prepared. Uh, and in chapter two of data science command line, I describe how you can install this, but there are many, many, uh, Docker images available that allow you to try things out.

It's fantastic. So, many of these tools come with their own help, their own documentation. And rather than immediately going for stack overflow, I can really recommend that you first check out, uh, their, um, all their help. So, um, for example, man, I don't know, tar, right. A tool to compress or extract, uh, files. It's quite a lot. There's always a lot of information in here. Um, all the arguments that you can think of. Um, so not every manual page is equally well suited. In that case, I can recommend a tool called TLDR, too long didn't read, which if you pass it in the name of a tool, it, um, it gives you a couple of, um, well, very, very short cheat sheet and how you can use this tool. So, um, what's another one TLDR get, right. Of course you, you would need a little bit more than this, but just goes to show that these cheat sheets can be quite, um, uh, useful. Now there's nothing wrong with using stack overflow or even using your, your favorite, uh, LLM to come up with incantations. Uh, my only advice then is to don't just blindly copy paste, whatever you find on the internet, always double check, uh, and try to figure out what it does.

So context, right. Um, where am I, uh, what is the get status? Uh, am I in some virtual environment? Is it activated? Um, these are things that you can show in your prompt. Yeah. So I have a pretty fancy prompt right here. It shows my current directory with some icons, even, uh, the get branch that I'm on, uh, which Python version I'm currently using. And so you can really customize this to your heart's content. This is done using a tool called Starship, um, which is, uh, well, it's quite fancy as you can see. Um, and you know, this, this gives you some more, well, it gives you context. It gives you, um, um, some confidence as well as in, okay, the next, I know that I'm in the right place.

Um, And then sessions, right. DMUX terminal multiplexer allows you to have multiple sessions, multiple windows, um, which can be very useful. Let's say you have a very long running process where you want to start a process and then keep track of it using H top. Yeah. So, um, usually if you're just working with a, a single, uh, single terminal, yeah. You would have to do other tricks or maybe have an additional tab, but now, well, you can, let's say you have another, uh, long running process, um, or all of a sudden you want to work on a different project. That's possible too. You can later come to that. Um, I'm really not giving, you know, DMUX, uh, um, um, the explanation that it deserves, but I can really recommend that you check it out. Um, especially if you, uh, often have to work on remote, uh, uh, instances on some, some cloud instance or some server. Um, and you know, you want to, uh, maintain, uh, the connection or maintain, uh, keep the processes running, even when the connection gets interrupted.

Okay. Improved looks, right? In Dutch, we have a saying, het oog wil ook wat. The eye also wants something meaning that looks matter, right? If you're being greeted with this, this black and white interface, ah, it's not very, uh, um, uh, well inviting, right. To, uh, to, to, to do some tasks on it. So, so I advise you to set up some colors. I use the Dracula theme for everything. Um, if you install a nerd font, that's what it's called. It's a, it's basically a font, but then a whole bunch of icons and glyphs added to it. You can also get these nice looking fonts. You may have seen those when I did a listing, um, of my directory contents you have here, you have all these icons. They're not essential, but it's, you know, it's the little things that then count.

All right. Um, we all make mistakes. Yeah. I still make plenty of mistakes, but there are ways in which you can minimize the number of mistakes that you make. And if you do make a mistake, you can mitigate, uh, uh, yeah, the, um, uh, the result, right? Um, so syntax highlighting, you get immediate feedback from, if you, if you type something and it doesn't exist, the Z shell, uh, automatically colors it red. That gives you some feedback and you're like, okay, uh, this is probably, this probably doesn't exist.

Whenever you move files or you remove them or you copy them, you can, if you add dash I to it, it becomes interactive and it asks, asks you for confirmation. Yeah. So, um, so that helps. There's even a setting that you could put on there to prevent, uh, overwrites. So let's say here, uh, let me first create a new, uh, directory before, uh, so let's say I want to write something. So we have cow say, Hey, right. Produces a cow. And I write this to a file called cow.cow, right? We don't need an extension. That's perfect. Okay. So this works. Now we have some other tool. Let's say we want to produce 10 numbers and also write that to cow. Whoops. Now our cow is gone and we have these 10 numbers. If I were to set this option and you would do this in your configuration files so that it's activated every time you start a new session. Now, if I want to do this, we get an error say, okay, you know, the file exists. You probably don't want to overwrite this. Um, I don't have this activated by default. So, but I can imagine that if you're just, you know, starting out with the command line, that this can be useful, uh, for you.

And then there's another thing that I don't use, but might be, uh, interesting to some is that you can actually integrate with the, uh, the macOS, uh, trash, uh, so that when you're, when you remove a file, it's not immediately gone as it's, you know, that is usually the case. Uh, you can, um, it first gets moved into your trash and there are, uh, a couple of, uh, tools that allow you to do this. Yeah. So when you accidentally remove something, it's not completely gone.

When and where to use the command line

Right. And that's, you know, that concludes the, uh, the claim methods for embracing, uh, the command line. So the last two questions that I want to answer are, uh, when and where. So when should you use, uh, the command line and where can you use the command line?

Remember, I believe that you should use, uh, the right tool for the task. Yeah. And the command line doesn't, doesn't care if you want to continue in your favorite programming language or your favorite, uh, IDE, the command line is cool with that. But still, I can imagine that you're now wondering, okay, when, when, when, when, when, where's that, you know, that threshold that you say, okay, now's the time to turn it into a proper Python script, or is it worthwhile to just do it by hand using the mouse and keyboard? And, you know, this is a tough question and this is something that you can only learn yourself by doing, by practicing, um, you know, uh, on a, you know, on a daily basis, even if it's just for 10 to 15 minutes, you'll be able to create, you know, develop this intuition of when the command line is a suitable tool. And I can really recommend that you just, you know, whenever you are faced with a problem that you chop it up into sub problems and try to think, okay, is this something that I might use the command line for?

So, the command line can be used in a variety of other places, um, besides the terminal that I've just been showing you all the time. For example, um, it's, uh, integrated into Visual Studio Code. Yeah, it's fantastic. It allows you to, well, very quickly, um, yeah, do one-off smaller tasks without leaving the comfort of your favorite IDE. Yeah. And it's also available in JupyterLab. Yeah. In the bottom, we have a full terminal. Um, it can do the same things as what I've done, but then, you know, it's a little bit more bare bones and no fancy colors. Um, you can use, uh, cell magic. So, here on the right, we see percent, percent bash, and that allows you to, you know, have these small, uh, um, multi-line, uh, um, bash incantations, right? Um, and then, finally, this is something that we haven't really talked about yet, but you can actually use the command line from your Python script. Okay. So, there are various ways in which you can, you know, integrate Python with the command line. Now, I'm not saying this is always a good idea, but it is, um, you know, interesting to, uh, know that this is possible.

And lastly, here's a, here's a picture that I took, uh, from the book Spark, the definitive guide by Bill Chambers and Matai Zaharia, and I highlighted here, uh, the pipe method is probably one of Spark's more interesting methods. So, it allows you to pass an RDD or a partition of your dataset, uh, through a command line tool. So, what it does here, wc-l, it counts the number of rows. Now, I think that's, uh, that's quite the compliment, uh, from the original author of Apache Spark, and it's really interesting that they've decided to add, you know, this functionality to leverage a 50-year-old technology. So, the command line can be used in, in all sorts of situations.

I think that's, uh, that's quite the compliment, uh, from the original author of Apache Spark, and it's really interesting that they've decided to add, you know, this functionality to leverage a 50-year-old technology.

Key takeaways

Yeah. And, um, with that, here are just a couple of things that I would like you to take away from this session. And the first one is that the command line is here to stay. Yeah. It's not going anywhere, um, unlike the, the latest, uh, HIP, uh, uh, framework that you might be using. The command, the command line is going to be around. So, it is definitely worthwhile to invest your time into this. The claim method, right? All these steps that you can take, um, in order to really, you know, own or claim, um, become comfortable with this, this, uh, interactive environment. Yeah. And remember, you don't have to do all of those steps, or you don't have to do every step, and you don't have to start using every command line tool, uh, that there is. You can start small, take baby steps, um, and build it from there.

And then lastly, my advice is to be creative. Yeah. So, in two ways, first one is, okay, uh, try to see where you can fit in the command line in your daily tasks, right? Is there something small that you can do on the command line, but also creative in the sense in that you, you know, you can create your own tools. Think about how the code that you've already written in Python, how it can be turned into a command line tool. Are there certain things, certain static things that you can make, you know, variable, or is this something that can be reused either by yourself or maybe even by others?

Yeah. So, with that, um, well, should you have any remaining questions? Uh, I just want to say that I'm organizing a round table in a few moments so that we can continue the conversation. Um, and on my website, um, jeroenjansens.com, you can find my email address and all my socials. Should you want to get a hold of me after this, uh, this conference. For now, thank you very much for your attention and, uh, I wish you good luck in embracing the command line.

Fantastic. Thank you so much for that awesome keynote. Um, the chat was hopping. So, hopefully, attendees, you are more than welcome to join, um, the lounge, which you can access through the main reception area. Um, it should be your top left. That will give you a lounge space and, um, he will create a table and, um, you all can sit around the round table virtually and, and continue this awesome discussion. Thank you so much for kicking off, um, PyData Global Keynote. Um, it was such a pleasure and thank you so much for your time. Attendees, thank you for your time. We are so glad and so thankful that you are here and, um, we will see you in sessions later on today. Thank you.

Thank you. Take care.