10 Years of Data Science Tools...and What Happens Next (Jonathan McPherson)

Transcript#

This transcript was generated automatically and may contain errors.

Good morning and welcome to PositConf 2025. So my name is Hadley Wickham . It's my great pleasure to welcome you to PositConf here in Atlanta. And my job is just to take a couple of minutes to help you be as successful as possible with conf so you have a great time.

To that end, the first point I want to make is if you don't already have it, make sure to grab the app. This is where you can find all the information about what's going on, including the talks and all the other things that are happening. This is also where you can find a link to our Slido. This is where you will ask questions of the speakers, and it also has a link to a Discord if you want to engage with any of the other participants online or in person.

New this year, we also have a really fun competition, both online and in person. Do some fun activities around conf, collect points, and you can get some pretty cool swag at the end of it.

On your way to this keynote, you've all already walked through the lounge. This is the place if you want to ask questions about Posit, about our products, our open source packages, whatever you want. New this year, we also have a bunch of live demos going on. Again, you can find out about those in the app. This is a great way if you want to find out what's new with Posit products this year.

But at Posit Conf, we also want to make sure that everyone feels safe, comfortable, and welcome. And to that end, when you registered, one of the things you did was sign a code of conduct. That's really important to us. And if at any point you feel unsafe or notice anyone else may be feeling unsafe, please reach out to any Posit employee. You can also go to the registration desk or email conf at Posit.co.

If you want to know how to recognize a Posit employee, well, you can spot us by our t-shirts. Apart from me, obviously.

And we've done a bunch of things to hopefully make you feel comfortable. So please respect the keep your distance pins or the hugs okay pins. If you see someone wearing a red lanyard, that means they prefer not to be photographed. So please keep them out of any photos. And finally, everyone has their pronouns listed on their badges.

We also have a bunch of rooms available to you, regardless of whether that's a quiet zone just to hang out, whether you want to meditate or pray. We have a lactation room and gender-neutral bathrooms on every floor.

And that brings me to really the only rule of Posit Conf, and that's the Pac-Man rule. Whenever you are standing in a circle with your friends, please make sure to leave Pac-Man's mouth open so new people can join your group. We want Posit Conf to feel welcoming to everyone, regardless of whether it's your fifth Posit Conf or your first Posit Conf.

And I would kind of encourage you, if you're a Posit Conf veteran, you know, please do your best. Like if you see someone hanging out and they look a bit lonely, you know, go up and strike out a conversation with them. Please take that as like your kind of mission from me to make everyone feel as welcome here as possible.

If you're looking for like-minded individuals, a great way to find those people is our Birds of a Feather session happening at lunch. You can spot them with the big flags. New this year, you can also create your own Birds of a Feather session. You'll find an easel with some stickies outside the entrance to the dining room.

Now I don't know about you, but one of the things I am most excited about at this Posit Conf is our evening event at the Georgia Aquarium. It is an amazing aquarium. It is going to be so much fun. The one thing I want to point out is please, please don't forget to bring your badge. We cannot let you in without it.

So without much further ado, I would like to introduce our first keynote speaker, Jonathan McPherson. Jonathan is a long-term colleague of mine at Posit. And as you probably, as you may know, I like to write little poems about our keynote speakers with the help of AI, because I'm not very good at poems.

Jonathan submitted the shortest bio ever for a keynote speaker, so Jonathan is just going to get a couple of haiku.

Architect of code, Jonathan builds Positron , visions made precise. At RStudio 's heart, he helped shape enduring tools guiding many hands. From Redmonds to now, wisdom gathered through the years, foundations endure. Please join me in welcoming Jonathan.

Jonathan's opening: tools and the brain

Good morning, everyone. So like Hadley said, my name is Jonathan. And today we're going to talk about data science tools. But before we really get into it, I'd like to show you a picture of my brain.

This is a knowledge graph. It's made up of notes. I've taken notes all of my life, but I first started doing it a little bit more obsessively back in 2016. And in this knowledge graph, every little dot that you see is a note that I took. Every connection between dots is a connection between notes. And so this is kind of a picture of all those notes put together.

If you look at this picture, you'll probably see a couple of clumps starting to emerge of notes that are related to each other. This one over here, for example, is books that I've read. I like to read. This one over here is people I've interviewed. And I think this little one up here may be house projects that I will never finish.

This graph comes from a program called Obsidian. Before Obsidian, I used a program called Vimwiki. But over the last ten years, as I've been accumulating all these notes, I noticed that something was happening to the way that I think and process information. In particular, I noticed that I had stopped thinking and then writing, and started using writing as a way to think. In particular, moving to thinking by writing. This tool actually did something to my brain.

I'm not the first person to notice something like this happening. Another person who noticed this happening was this guy. This is Friedrich Nietzsche. He is famous for two things. First of all, for being one of the most influential German philosophers of the 19th century. And secondly, for having an absolutely stupendous mustache.

Back in the year 1881, Nietzsche's eyesight began to fail him. That was a real problem, because Nietzsche was an author. He wrote his books, and he wrote them longhand. Because he was unable to see what he was writing, he was compelled to purchase a typewriter in order to continue writing. This is the typewriter that he got. This is called the Malling-Hansen writing ball. It may already be familiar to those of you who love ergonomic keyboards.

As Nietzsche began to write with this thing, it actually had an effect on what he wrote. Critics also noticed this. Nietzsche's first book that he wrote after he got this typewriter was called The Gay Science. One critic noted that it changed his prose. It said that he changed his prose from arguments to aphorisms, and from thoughts to puns, and from rhetoric to more of a telegram style. Nietzsche himself noticed this as well. It led him to make this observation, which I think is very profound. He said that our writing tools are also working on our thoughts.

He said that our writing tools are also working on our thoughts.

If this is true, then it means that tools are not really interchangeable. I think in the world of data, in the world of software, we live in a world very full of symbols and manipulations, and it's easy to think that one tool can just kind of be plugged in and substituted for another as some kind of layer of abstraction. But in this talk, I'm going to argue that that's actually not the case. Tools actually change the output. This would be a very different talk if I gave it with PowerPoint, for example, instead of Keynote here. I'm going to argue that the tools that you use and the tools that you make matter quite a lot.

And so if it's true that a tool's output is shaped by its design and the world we experience is made from the output of many tools, it follows that a tool is a statement about what you want the world to become.

So I want you to think about what statement is this thing making about what it wants to the world to be. I would argue that what this thing is saying is that writing should be closer to thinking. You know, this thing removes a step between writing and thinking because you no longer need to form the letters by hand, you just press a button and the letter appears, right? It moves your thoughts closer to your output. And as a result, what you write becomes closer to what you think.

Imagine something as simple as a garden variety leaf blower. This is also a tool that says something about the world, you know? You might think that this says something like, sidewalk should be clean, but what this tool actually says is that no one deserves to sleep after 6 a.m.

Or even think about this VS Code Pets extension, right? What does this tool say about the world? What does this tool want the world to become? I would say this tool says the world should be more fun and whimsical, right? Even something as simple as this has a point of view.

A few slides ago, I showed you this diagram of Positron's architecture. And this API that we made that lets you plug different languages into Positron is also a tool. And because it's a tool, it says something about what we want the world to be, right? We think that there should be room for many languages and they should work together.

What Positron's tools say about the world

So I want to talk for a couple minutes about the tools at Positron and the things we want them to say to the world. One thing that our tools always want to say is that science should be reproducible. So I've worked on IDEs. Let me talk a minute about how we do that in our IDEs. For example, again, looking at RStudio's data viewer, you'll notice one thing that this data viewer doesn't have is an edit button. And the reason it doesn't have this isn't because no one's ever asked for it or because it wouldn't be handy. It's because it is so easy, if you have an edit button, to start editing your data by hand and creating a workflow that's not at all reproducible.

I know you're thinking, wait a minute, Jonathan. A few slides ago, you told me to put in stuff that users want. And the truth is that this is not actually an easy decision. You kind of have to decide on the tradeoffs here. In Positron, we've tried hard to make this even more reproducible by adding a button that actually converts the sorting and filtering that you've done in the data viewer into code. So you can take the buttons that you clicked and the sorts and filters that you added and actually create code that does those same edits so that you do them again and again.

I want to take you back for a minute just to RStudio Conf 2019. You may recall at the time, these low code and no code tools were kind of having a moment. And Tarif gave this presentation about basically just coming out in support of code and saying, we actually love code. It's repeatable. It's inspectable. It's reusable. It's diffable. And in the age of AI, I actually think a couple more bullets should be added here. Code is ingestible. Code is what these models read. And if you have expressed your tools and your workflow in the form of code, the models can actually read that code and understand what you're doing. Similarly, code is disgorgeable, is that a word? These models also write code. So if you are kind of embracing a code first data science workflow, you've actually kind of already positioned yourself to get the best out of what this new generation of AI tools has to offer.

So code should be reproducible. How about this one? Science should be free and open. This is something you've heard us say a lot. When you hear it, you probably think a lot about our open source packages and models. But this is also true of our IDEs. You know, when we built RStudio, everything was kind of packaged into one unit, right? We have R execution and debugging and code formatting and all this stuff was kind of in one bit. But in Positron, we've tried to make the system more free and open.

So again, I'd like to think of Positron as like a data science workbench. And for the R components of Positron, we actually built a couple of reusable systems that can be used outside of Positron. So Positron's code execution engine and like it's R formatted and so forth, they're actually separate subsystems that we've contributed to the community to use in other ways. This is not just theoretical. For example, I don't know if any of you use the Zed text editor, but if you go and look at how to run R code in Zed, what they'll tell you is to go get the ARC kernel, which is Positron's code execution engine, and plug it into Zed so you can run R code there.

If you have used TreeSitter R, I'm guessing you think you haven't, but if you've ever looked for R code on GitHub in the last few months, you have. The R code navigation system on GitHub is actually powered in part by code that we wrote for Positron. We've contributed that back to the community and back to GitHub. The same thing is true of that R code formatter. So this is the VS code plug-in, you can get this today, it's on the marketplace. You can use Positron's code formatter inside of VS code, and you know what? You can also use it inside of RStudio. It works great there.

And so I guess what I'm saying here is we really do believe that the universe of software should be more free and open, and this is a statement that we want to make about the world with our tools.

RStudio and Positron: complementary tools

I just want to say one more thing here before we wrap up. I have kind of experienced a lot of feedback about what's going to happen to RStudio in the world of Positron, and people seem to think that we're going to remove RStudio in favor of Positron, which is not true. So let me kind of tell you how I think about this.

This is the picture that kind of comes to my mind. If you're like me, you actually have both of these tools in your toolbox. I've got a Phillips screwdriver that I often reach for, but I also have one of these screwdrivers that has multiple bits that I can kind of swap out as I need. And the truth is that some of these things are useful or not at different times.

One thing that is true about making tools is that as you make a tool more and more generic, it kind of becomes less and less good at any one particular thing. RStudio is a great tool for R in part because it only does R. It is very well built for that purpose, and if that is all you need, then it is always going to be a tool that's more like a Swiss army knife that's got a bunch of attachments hanging off of it. Sometimes you do need the Swiss army knife, and I often find myself reaching for Positron. But RStudio is very fit for purpose, and a lot of times when I'm reaching in my toolbox what I want is the tool built for that specific task.

Summary

So just to kind of remind you of what I've talked about here, I think it's really important to make tools for people. You need to listen to them, you need to watch them, not in a creepy way. You need to empower them. People are one of the most important components of a good tool. You need to make tools for results. I want you to remember that you need to shape the tool according to the good that you want it to do.

And finally, if you literally remember nothing else from this keynote, this is the thing I want you to remember as a tool builder, and that is that a tool is a statement about what you want the world to become. As people who build tools, this is an enormous responsibility, but it is also a huge, huge privilege.

And finally, if you literally remember nothing else from this keynote, this is the thing I want you to remember as a tool builder, and that is that a tool is a statement about what you want the world to become.