Resources

How Open Source, Python and AI Are Shaping the Data Future with Wes McKinney

The future of analytics isn’t just about bigger models — it’s about building smarter, more interoperable data systems. Wes McKinney, Principal Architect of Posit PBC, Chief Scientist of Voltron Data and a General Partner at Composed Ventures, joins us to explore how the modern data stack is evolving and what it means for the future of analytics. Wes reflects on his journey building pandas and Apache Arrow, sharing how open-source ecosystems grow, transform and shape the way organizations work with data today. Wes also highlights the rising importance of semantic layers, agentic workflows and defensive coding practices as teams embrace AI-driven development. Key Takeaways: 00:00 Introduction. 02:32 Wes didn’t expect pandas to drive AI but he recognized Python’s unrealized potential. 05:09 A lucky convergence helped Python’s tools snowball into the AI standard. 10:40 Early big data focused on essentials, not the interoperable stacks we rely on today. 15:44 The composable data stack grew through bottom-up, grassroots open-source momentum. 21:56 Many “data science” roles ultimately became business intelligence and dashboard work. 25:24 Complex statistical work still depends on human judgment, not fully autonomous agents. 30:27 Frontier models retrieve table data reliably, while smaller models fail dramatically. 35:16 Better models and coding agents shifted Wes from an AI skeptic to an adopter. 40:07 AI-driven code demands stronger testing and review to avoid costly failures. 45:14 An AI-built finance project ballooned, revealing how agents inflate codebases. Resources Mentioned: Wes McKinney https://www.linkedin.com/in/wesmckinn/ Posit PBC | LinkedIn https://www.linkedin.com/company/posit-software/ Posit PBC | Website https://posit.co/ Voltron Data | LinkedIn https://www.linkedin.com/company/voltrondata/ Voltron Data | Website https://voltrondata.com/ Composed Ventures | LinkedIn https://www.linkedin.com/company/composedvc/ Composed Ventures | Website https://composed.vc/ pandas https://pandas.pydata.org/ Apache Arrow https://arrow.apache.org/ DuckDB https://duckdb.org/ DataFusion https://datafusion.apache.org/ Jupyter Notebook https://jupyter.org/ Parquet https://parquet.apache.org/ Iceberg https://iceberg.apache.org/ Delta Lake https://delta.io/ Thanks for listening to the “Data Masters Podcast.” If you enjoyed this episode, be sure to subscribe so you never miss our latest discussions and insights into the ever-changing world of data. #DataStrategy #DataManagement #DataMastersPodcast

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

You're tuned in to the Data Masters podcast. In each episode, we dissect the complexities of data management and discuss the data strategies that fuel innovation, growth, and efficiency. We speak with industry leaders who share how their modern approaches to data management help their organizations succeed. Let's dive straight into today's episode with Anthony Dayton.

Welcome back to Data Masters. If you've ever written import pandas as PD in Python, then our guest today needs no introduction. He is the creator of pandas, the seminal open source library that practically defined Python as a language for data science. But he didn't stop there. He's also the co-creator of Apache Arrow, creating the de facto standard for how high-performance systems exchange data, in particular tabular data. And is currently a principal architect at Posit, the chief scientist at Voltron Data, and a general partner at Composed Ventures. Wes McKinney has spent nearly two decades building the foundational infrastructure that powers our industry. And today, we're going to talk about what comes next, from the rise of the composable data stack to his contrarian view on why LLMs might be heading for a trough of disillusionment when it comes to actual analytics. Wes, welcome to Data Masters.

Thanks for having me on the podcast.

Origins of pandas and the Python data ecosystem

So I wanted to maybe start by going back in history a little bit. And I know this is almost an unfair question to sort of cast your eye back 15 years ago when you started the pandas project. But I think it's fair to say, and again, with the benefit of hindsight, that it really, you know, it built this Python-based data science ecosystem. And in that sense, Python has become the de facto standard for a whole bunch of AI platforms, TensorFlow, PyTorch, et cetera. So I'm just curious, from your vantage point, sitting here today, looking back, if when you were first building pandas, did you have a sense that it would become the sort of foundational layer for AI?

It would be hard to predict that that far into the future. But I think I definitely saw there was a lot of untapped potential in Python. And if only there was a toolkit for basic data manipulation, data wrangling, then that would help unlock whatever was the potential or potential future for Python as a mainstream data language. But back in 2008, when I started working on pandas, it was not at all the case and was not a foregone conclusion. Even using Python for doing professional business data work was seen as fairly risky at the time because Python was unproven. It had a fairly immature open source ecosystem for statistics and data analysis work. And the idea, initially, pandas started out as a toolkit for myself to do my work at my job at a quant hedge fund. And I enjoyed building the toolkit for myself. And then eventually, I was building it for my colleagues who were excited about using it. And eventually, we open sourced the project. And I started engaging with the Python community and seeing like, is there an appetite for this? Do people want this? Is this something that the world needs? And eventually, it turned out that the answer was yes, that there was, it was at a little bit at the right, being at the right place at the right time around 2011, 2012, where people were starting to talk about data science and big data. And there was suddenly a massive need for people with data skills. And Python was an open source, accessible programming language that people could learn easily. And the sudden availability of a toolkit to be able to read data out of databases and load CSV files and read Excel files and then be able to do meaningful work with that data with code that was easy to write and easy to reason about was one of the things that helped unlock Python as a language that could be accepted in the business world.

And I don't know if this was causal or something that really factored into Google choosing Python as the language for TensorFlow. I think it was a little bit of an accident. I was partly inspired by the fact that Google used Python as their one of their three languages, the other two being C++ and Java. So Python was their scripting language that they would use to build interpreted interfaces on top of mainly C++ libraries using Swig and other wrapper generators. So that was probably the main reason why Google chose to do TensorFlow in Python. And eventually, Meta started building PyTorch. Initially, Torch was all in Lua, I believe. And eventually, they migrated that to Python to create PyTorch. And so there is a combination of being lucky and making the right prediction, but also this lucky confluence of a combination of open source projects and then major AI research labs needing to choose a programming language to build their AI frameworks in. It just happened to be that everyone chose Python. And so we were all rolling around our little snowballs and suddenly the snowballs merged together and became one really gigantic snowman that now powers the world. So it's been an interesting time, but I've been resistant to patting myself on the back and saying, like, oh, I predicted this was going to happen. I knew it was going to end up like this, because that's definitely not the case. I was hopeful that it would end up that things would end up like this, but I would have been satisfied with a much less successful outcome.

And so we were all rolling around our little snowballs and suddenly the snowballs merged together and became one really gigantic snowman that now powers the world.

Were there other features of Python that lent itself, clearly you solved the big unmet need around access to data and also munging that data, to use your term. But are there other features of Python that made it particularly relevant for this use case?

I think because Python is really easy, was originally created as a teaching language. So it's easy to learn. It's easy to read. People back in the day would often describe Python as readable, readable pseudocode, like similar to the code that you would write to describe algorithms. So you could hand a piece of Python code to somebody who'd never written Python before, and they would pretty much be able to get the idea of what the code was doing without a lot of like types. And of course, now Python has types. And so that's changing. But the language had an accessibility and a readability aspect that made it really appealing to do scientific work in. Python also had an existing numerical computing ecosystem. So there was a group of folks that were essentially building an alternative to working in MATLAB. So if you were doing neuroscience research or physics research or things like that, you had NumPy and SciPy as a basic computing foundation for numerical algorithms, optimization, linear algebra, the essential things that you would need to begin to start doing statistics and data analysis work. Like whenever I needed to run a linear regression in Python, I didn't have to wrap linear algebra libraries myself like that. That work was already done.

I think another thing that really helped tie the room together was the computing and development environment, which initially started out as being the IPython shell and eventually the IPython notebook, which turned into the Jupyter notebook. And now the Jupyter ecosystem and JupyterLab. And so that also gave people that was one of the first mainstream open source computing notebooks. People were familiar with Mathematica and other closed source computing notebooks in the past. But this one was, you know, inspired by things that had come before out of Mathematica and MATLAB and things like that. But it was open source and just just just worked with Matplotlib, the plotting libraries and all the things that existed at the time. So essentially, very quickly by 2013 or so, we had this full stack environment that had the bare essentials of a interactive computing environment through the IPython notebook, Jupyter numerical computing through NumPy and SciPy plotting through Matplotlib and data wrangling through pandas. And so if you were an aspiring data scientist or somebody who was looking to do business analytics or some type of build a data analysis application in a business setting, at that point, you could credibly make the case to your colleagues and your boss that you had the tools at your disposal to be able to build something without getting pulled down into a rabbit hole.

I mean, people some people underestimate like how important it is to be able to read CSV files. But it turned out that like, just being able to point pandas at a CSV file was its initial killer app. Like, oh, I have a data file, I can say pd.read.csv and read it. People take that for granted now, but in circa 2012, like that was that was a big level up for Python at the time.

Apache Arrow and the composable data stack

So shifting to the more closer to the present, and you know, you've also shifted your energy and focus towards Apache Arrow. And I think it would be fair to say that you would frame Arrow as the foundation for the composable data stack. So, and you know, there's tools like DuckDB and others that are based on that. So talk a little bit about what the composable data stack is, for those that are not familiar, and maybe building a little bit on these lessons you talked about, you know, building the bigger snowball, this idea of interoperability and building an ecosystem and how Apache Arrow fits into that as well.

Yeah, so well, you can think about the the late 2000s, early 2010s as being a little bit of like, the food, water and shelter era of big data and open source data processing in general. And so you had disparate communities building solutions to solve the immediate essential problems that were right, right in front of them. And so there was relatively little consideration to building, you know, larger, more heterogeneous data stacks full of multiple programming languages, different data analysis tools, processing engines, storage engines. And so people were in general building vertically integrated systems where they would build a solution for a particular problem with a layer of technologies where classically, you would have a database system that has everything tightly integrated together. So the storage engine, the data ingestion, query planning and optimization, query execution, as well as the SQL front end, that would all be present within a full stack, you know, a vertically integrated full stack system.

But one of the key ideas from the Hadoop and big data era, which was originally kicked off by Google's map reduce paper, this idea of disaggregated storage, or storage being decoupled from compute. And so you can start to think about, okay, how can we store data in a way that multiple compute engines can work on it. So that gets you thinking about standardizing data formats. So having open standards, open source standards for data formats. And that was stage one of what was happening in the big data ecosystem. But as time dragged on, there was a collective realization in the mid 2010s that the cost of building one to one interfaces, like pairwise interfaces between programming language, a computing system or storage system B, was not only hampering the performance of systems by introducing a lot of serialization, and interoperability overhead, but was also fragmenting effort and weighing down progress in the overall open source data ecosystem. So the original idea of Arrow was, if we could define an open standard for a standard for data, data interoperability for tabular data, in particular column oriented data similar to what you would find in pandas or in an analytic database. And we could use that format for moving data efficiently between programming languages between compute engines between storage systems and execution layers. And that would not only improve, you know, improve performance, but also reduce the amount of, you know, glue code that has to be written by developers to make these make these systems work.

And so we initially started with this data inoperable interoperability problem of essentially having a better wire protocol for hooking together, hooking together systems. But what was interesting is that once that we and that that came through the through the Arrow project, we already had some standardized file formats, like parquet and ORC. At the time, an Arrow was designed to work really well with those as a companion technology. But what we've seen as time has moved on is that we can start to modularize and decouple the other layers of the stack. And so we can start to think about modular execution engines, or decoupled front ends, like different types of user interfaces for interacting with compute engines, not only SQL, but also more data frame like interfaces, like what you would see from from using pandas. And so, so the kind of the way that we described this like composable data stack. Another way that we looked at it was the deconstructed database. So if you think about like the architecture of a database, database system, and trying to separate all the different layers of a traditional vertically integrated database system, putting building open protocols and open standards to connect those those pieces with each other to enable interchangeability so that we can start doing it. Maybe if somebody comes along and develops a better storage format, then we can incorporate and maybe that storage format only works really only works well for certain types of use cases. But you can choose to say use parquet files for one set of use cases, whereas, you know, now there's new file formats, which are specialized for multimodal AI data, including images and video like the Lance format. And so you'd like to be able to take advantage of those new developments in the ecosystem without having to do a full tear out of your system and replace, basically throw the baby out with the bathwater in order to get access to new functionality in one layer of the stack.

What I'm curious about is how this interacts with the more traditional closed source database and analytic software providers. Have you found support in that? Or is this felt like competition? How do you get folks like that on board? Or is that not even a consideration?

Well, I think it's one of those, like the open source model succeeding stories in the sense that a lot of the adoption and growth and success of the composable data stack, so essentially Arrow, DuckDB, Data Fusion, and a collection of related open source technologies has been really driven by open source adoption and grassroots bottom-up, you know, bottom-up adoption and pressure on, you know, some of the larger, you know, more powerful forces in the ecosystem. And so Data Fusion, for example, is a modular, customizable columnar query engine, similar to DuckDB. It's written in Rust, but it's really designed for customizability. So rather than being a batteries-included, you know, ready-to-go system, which is more like DuckDB, Data Fusion is meant to be customized. So it wants you to mess around and modify and add operators to its logical planner layer. It wants you to be able to hack on the optimizer to introduce new features in its SQL dialect. And the idea of Data Fusion is that you could have this off-the-shelf, high-performance, Arrow-native query engine and use it to build your custom database engine or your custom query processing solution.

And as time went on, Data Fusion just got more and more popular. It got better and better in terms of performance and extensibility, to the point where Apple decided to go all-in on using Data Fusion to build its Spark accelerator layer called Data Fusion Comet. So now there's a team at Apple building, you know, Spark, accelerating Apache Spark with Data Fusion. The creator of Data Fusion, Andy Grove, works for Apple and leads that team there. But that was something where it wasn't a top-down thing necessarily, but rather like people looking in on the evolution of the data stack and seeing like, okay, these technologies are becoming, you know, integrated like all over the place, and they're getting better and better over time. They're attracting more and more contributions from the open-source ecosystem. And it's better to get involved in these projects, hire people to work on them and influence their direction in a beneficial way rather than go some totally different way or build something that's completely proprietary.

I think another thing that has driven the adoption of these composable data stack technologies has been the trend in open data lakes, or what is now called like the lakehouse architecture, where you have a structured data lake, scalable metadata store like Apache Iceberg. So you can think of it as like an evolution or a formalization of some of the ideas from the Hadoop era, where originally the way that datasets were managed and their metadata was managed in Hadoop was you store the data in HDFS, and then there was a metastore called the Hive Metastore, basically a MySQL or Postgres database that contained all of the details about like what constituted a table. And whatever you were planning a query, you would read data from the Hive Metastore and that would tell you what files you need to read to be able to run a particular SQL query with your chosen compute engine. But Hive ran into scalability challenges, especially in very large data lakes. And so that led to the creation of these new open data lake or lakehouse technologies like Iceberg, Hudi, and Delta Lake, which present a more scalable and high-performance approach for some of these really massive data lakes that you find among the biggest companies in the world.

So it's been interesting, but again, it's the success of the open source model. And I think also to see cutting-edge research like CWI, the birthplace, one of the birthplaces of analytic columnar databases, it's basically CWI and MIT and a handful of academic database research labs. They chose to build a research group there, chose to build DuckDB as an open source project. They could have gone and built another analytic database company and build another commercial analytic database, but they chose to build a SQLite-type system for analytics to be one of the best batteries-included embeddable SQL engines out there. And now they have a project that is open source, has massive adoption, and you'd be crazy to go and build a brand-new embedded database engine from scratch now. Either you need something customizable and you want to work in Rust, so you can use Data Fusion. If you want something that's batteries-included and ready to go out of the box, you choose DuckDB. And that's what we're seeing across the board.

LLMs, tabular data, and the future of data science

So let's shift the conversation a little bit and not totally cast our eye forward, but maybe cast our eye to the present, which is this whole idea of building on top of this stack, how people generate insights and conclusions off this data. You know, there's a, maybe a working theory that there is no future for data scientists and data engineers because smart models, and being careful not to say LLMs, but maybe LLMs are going to become so good at understanding and working with data that the notion of asking an analyst to tackle a problem will become, seem silly, and rather we'll just ask our smart agent to figure it out themselves. And you have a bit of a contrarian view here.

Yeah, that's, there's a lot, there's a lot to that question, many layers to many layers to unpack. Well, first, one of the things that, that I've observed is that maybe one of the dirty secrets of, you know, the term data science, and maybe why we're hearing the term data science and data scientists thrown around less and less these days is that a lot of data science roles are what people were trying to hire data scientists and build data science teams in the early and mid 2010s is that a lot of those teams ended up essentially doing business intelligence. So doing engineering on data pipelines and building, doing ETL or reverse ETL or data plumbing, and ultimately to create, create a dashboard or a set of series of dashboards that could be updated. So it was, it was an evolution of the traditional, you know, kind of the old school, you know, BI engineer, ETL engineer, building a, building a database to power, you know, somebody's Tableau instance.

And so I do think that, that, you know, a lot of that work, the, the, the mundane building of building of dashboards and, you know, creating, allowing business users to, with natural language, to ask for custom, like bespoke dashboards that answer an exact question that, that they need. I think increasingly that, that work is, you know, going to be done more and more by, by LLMs, especially as there's, there's a lot of work happening recently on semantic layers, which are something that's, I think, frankly necessary in a lot of cases to make LLMs effective at being able to reason about the relationships between tables and to be able to generate correct queries against, against the data. Without a semantic layer, like there's, you know, lots of examples that have been shown about how, you know, LLM can reason incorrectly about the, the relationships, like the joint relationships between tables and make the kinds of errors and writing SQL queries that a first year, you know, first year analyst writing SQL would, would make, you know, double counting and, and, and things like that. But I think that, that, that ecosystem will become, you know, become more mature, that semantic layers will become more widely deployed and, and, and standardized and more and more of like that dashboard building and custom dashboard building work will be taken care of by, by, by agents.

But, you know, I think there is still data science work, which, you know, involves, you know, modeling and asking nuanced or subtle, subtle questions that require like judgment and intuition about the having domain expertise and understanding of like the, the, like the business context where, where the questions are being asked, asked and choosing the right, right text and techniques to, you know, be able to build a statistical model or a machine learning model. Maybe you're trying to determine a causal relationship or do some type of, type of forecasting. And, you know, these, many of the, this type of, a lot of this type of data science work is, is more, there's, it's part, part science and, and part art and relying on, you know, experience from, you know, from past modeling or statistics work. And I do think that a lot of that nature of, you know, statistical work and data science still requires a lot of, a lot of human judgment. And I think we'll be, you know, maybe eventually it will, you know, once we have AGI, maybe it will be taken over by an AGI statistician, you know, we'll see.

But in the short term, I think there's still is, is a need for that. That's one of the areas where there's the most human judgment needed where, you know, if you just turn over all of that work to an agentic, agentic data scientist is likely to run into, you know, run into pitfalls or only, only explore the types of questions or analyses that the LLMs are well suited for. And so the more, you know, complex, there might be some study that you need to run that might require, you know, dozens or, you know, dozens of queries to be run. And the results need to be like stitched together and compared in lots of different ways. And, you know, LLMs are still at a stage where like they, they often struggle to count. And so asking them to reason about, like you ran these 35 queries and, you know, we need to stitch together with the results and then reason about them. Right now, like a lot of that work is being offloaded to, to tool calling because, you know, LLMs are not, not great at looking at datasets. But, you know, we're, we're still, I think we're still at early days in terms of data scientists being put out of a job by, by, by AI agents, but, you know, we'll see, we'll see, we'll see where things land in, in a few years.

So I want to maybe make this super basic. It strikes me that large language models, I mean, the answer is in the name, it's, it's a language model and language by its nature is sequential. It also has these slightly arcane rules. I mean, all languages have a grammar, etc. And they are decidedly not tabular data. And, and they're not really even code. So like, to the extent that we believe LLMs are good at writing code, for example, code is a much closer analogy to language. In fact, it's, in a way, a simpler version of language because it has a very tight grammar, very tight syntax. Whereas there, as you know, many people are learning the English language for the first time will note, there's lots of poor, badly implemented rules in languages. Whereas tabular data has very specific set of features that LLMs are just not at all competent with. And you, you gave a very simple example of counting, but, you know, even simple notions of sorting and querying and filtering and like basic behaviors are things that are just a mystery to it. Is that, am I framing it?

Yeah, yeah. I mean, I, I think, I think, I haven't run any, you know, studies myself or, or set up, you know, structured, you know, evals to get the, get the data for myself, like the accuracy results. But I remember early on, I think it was in the Claude Sonnet 3.5 or maybe Claude Sonnet 4 era. You know, I, I built a, you know, a little system to, you know, collect and summarize data from Git, Git repository history, and it would create little tables. And then, you know, because it was, it was a lot of data to summarize. And then I asked it to, to analyze the data that was summarized in the tables. And it would struggle to do basic arithmetic in combining really small, small tables of data together. And so that was the first time that a light bulb went off in my head. I was like, oh, gosh, like this, you know, these models, like they're for language. They're like, they're, they're not, you know, they're, they struggle to do things like adding or combining datasets unless, you know, all the work is, is delegated to tool calling or writing Python. They'd be better off writing Python code to do the work than actually trying to do, trying to do the arithmetic or the logic in, in the language model, in the language model themselves.

But, but yeah, you're absolutely right. And there've been, there've been some research lately around the retrieval problem. So basically the idea of the retrieval problem is if you present a table, let's say a spreadsheet or a table of data, let's say it's like students in a class, and then there's a bunch of columns about attributes about those students. And so the idea is that you ask the model for this student, could you tell me, you know, let's say every student has 10 attributes, which are stored in the table. And you ask the model, like, can you tell me like for this student, like what is their attribute C or their attribute F? So just essentially looking at the table and straight up looking up the value in the table. And the frontier, the frontier models have gotten to a point where they're, they have high accuracy, you know, over 90% accuracy in the, in the retrieval problem. But the smaller models are, they fail catastrophically at this retrieval problem. And I'm blanking on the name of the blog post, but there have been people that have done studies, like even trying to determine what's the best data format to present a table to an LLM, especially the smaller, smaller, more efficient, cost-efficient models to get the best accuracy for, for the retrieval problem. And it's, it's unsurprising, it's, it's a little bit surprising, the results, like you would think that like CSV format would be a good format for an LLM to look at the data. But, you know, it turns out that amongst, you know, the 10 or 15 different ways you could format a tabular dataset, there's, you know, some, you know, weird formats, you know, basically markdown key values, I think was one that I saw, which was a format that I'd never heard of. I think it was invented for the study where it turned out that like, you know, presenting the data in this like markdown format with like each row is a markdown section would yield better retrieval than putting the data in XML or in JSON in, in, in the, in the, in the prompt.

So, I mean, I think part, some of it, I, I suspect, I mean, I'm not a, I'm not an AI scientist, but some of the, some of it may have to do with like the autoregressive nature of like the next token prediction, you know, design of these, of these models. And so I'm sure that they'll get better over time, but, you know, these large frontier models are really expensive to run. And so, you know, the hope is that as LLMs advance, that the small models can become more and more effective where we can run the model at the edge on our phone. Or I just got myself an, one of NVIDIA's Spark, you know, AI sort of little AI mini computers to, to experiment on and to build some of my own fine tune models. And so I'd be really excited if like, you know, we could do really great work at like, you know, model like with local LLMs, not requiring, you know, 30 or $40,000 GPUs, but to really get the performance, you've got to run on these super expensive, these really expensive hardware configurations.

And so, so the, to get cutting edge inference is quite, quite expensive these days. So it's a, it's an interesting problem. I think there, there's a number of companies that are working on foundation models for specifically for tabular data, basically an AI approach to like prediction and forecasting and regression and things like that. And so I think that think we'll definitely see more interest in that area. I'm surprised that the, you know, frontier AI research labs aren't, maybe they've got internal research projects that they haven't announced, but maybe as some of the hype shifts away from chatbots, you know, more work might shift towards building foundation models for tabular data, because ultimately, like a lot of the value in business datasets does lie and a lot of the value in, you know, value to unlock for businesses does lie in their data. And so to, to get the most value out of AI in a business context, like somehow we've got to reconcile like this, this incongruence between, you know, current generation LLMs and interacting with, with tabular datasets. Even like MCP, which was developed to provide a standardized interface for LLMs to interact with external systems and tools, not an especially efficient way to expose data to, to an LLM. Even if LLMs were good at looking at, even if LLMs were really good at looking at datasets, like MCP is not the interface that you would want to provide, you know, a hundred thousand row table to, to a model to, to look at. And so we're, we're far, just thinking about all the work that we've done, like the engineering work that we've done on Arrow to achieve high performance interoperability in all these contexts. And, you know, the AI equivalent of like, how do we expose data to an LLM, you know, looks like caveman tools, you know, by, by comparison.

And, you know, the AI equivalent of like, how do we expose data to an LLM, you know, looks like caveman tools, you know, by, by comparison.

Well, which is exactly the connection I was hoping you would draw because it's like, we've done, you know, 10, 15 years worth of work to make data, to use your example, to make data interoperable. And then we turn to this new world and we just start from scratch. Like, it doesn't make any sense. Also, I was going to ask you, the other, it seems to me, a failure point with these models is that they aren't deterministic. Like the one thing we can say confidently for any analytical problem is that if you run the analysis twice, you should get the same answer. Like, it's not okay to be like, well, so sort of around two, you know, like, or today it's two and tomorrow it's 20. Whereas that's actually a feature, I think, of these like language models, which is, you know, obviously you can adjust the temperature, but like, I think something people like about it is that it gives you different answers every time. Otherwise it's just a rules-based system. But I don't know if that resonates with you.

AI coding agents and the future of software development

It does. I mean, I've, you know, initially I, throughout all of last year, I was pretty AI skeptical. Let's put it that way. And this year my, I think in part because the models have gotten a lot better and also the emergence of CLI coding agents like Claude Code, I think have really, for me, that was a big, a big unlock where, you know, working, like I wasn't particularly enthused about using the, you know, AI IDs like Cursor and Windsurf, but I think within a couple of weeks of using Claude Code to delegate mundane coding work, refactoring, you know, just stuff that was taking up my time that didn't seem particularly high value and seeing a lot of returns on my like really quick returns and productivity benefits. I've, you know, become a big believer and now I use Claude Code almost, almost every day and definitely costing Anthropic money on my, on my max plan because of, you know, the, you know, a couple billion tokens that I consume every month.

But at the same time, like the imperfection is like, if you don't observe, like see the inconsistency or the non-determinism, like it is, it is problematic, especially for data work where you need to have an exact, exact answer every time you, you run the system. And so even the risk that like, you know, the, you know, the model, if you have the model call tools and you're leaving some interpretation of the results to the model and, you know, one day you get one answer and another day you get from the same input, you get a different, a different answer that could lead to a business decision being taken that is detrimental to the business. And that's, that, that is problematic. And I often find myself playing whack-a-mole with the prompts to, to get consistent behavior out of the models, like particularly like creating a consistent development environment where each time I pick up Claude Code that I can count on the agent to predictably do the same things. But, you know, it seems that, you know, 20 to 50% of the time, like from day to day, like without modifying the prompts or the Cloud.MD or any of the stuff at all, like, you know, the next day I'll, I'll open Claude Code and it will forget to, you know, it will forget to do things. And I'll say, Hey, you forgot to do this. Like CI is failing. They're like, you're absolutely right. I ignored your instructions that we wrote in Cloud.MD. And so the idea that these tools can just like casually, you know, forget things that, you know, even with their massive context windows, like they, they have the memory of a goldfish.

And so, you know, I'm sure that it will get better. And, you know, I, you know, I, again, like I use these tools every day, like they, they bring a lot of value for me, but I'm also not quite, you know, drinking the Kool-Aid and believing that like, you know, this is the next great step for humanity. That's, you know, going to lead to a world without work.

Yeah. And I mean, maybe to say it in a really simple way, I think you would agree that it's made you much more productive, but it's not made you obsolete, obsolete.

Yeah. No, if anything, like I, my, my experience and my experience building software feels essential when I'm, when I'm using these tools, because if I wasn't able to read the code to review it, as though I was reviewing the work of a junior developer and tell it all of the things that it messed up, it's architectural problems, like the incorrect or missing unit tests, like, you know, incorrect documentation, incorrect implementations, like I have a lot of experience reviewing other people's code. And I feel like that is one of my biggest assets when I'm working with these coding agents is that I have to, I tell myself when I'm working with Claude Code, like treat all the work that is coming out of this, this agent, like a very motivated, very productive junior developer, who's prone to errors and doing things and creating and frankly, creating messes. And so I, you know, at a glance, like to be able to spot design architectural problems, you know, things that need to be refactored, code duplication, code smells. All that is is feels essential to getting the most value out of these tools.

I tell myself when I'm working with Claude Code, like treat all the work that is coming out of this, this agent, like a very motivated, very productive junior developer, who's prone to errors and doing things and creating and frankly, creating messes.

And I've heard, you know, from from talking to other people that the people that are the, the users, the AI users who are able to get the most value out of the coding agents are the most experienced developers who are able to bring their experience and judgment to give not only to write better prompts to be very specific and articulate about what you're asking for, but also to be able to judge the output and give high quality feedback so that you can corral basically move things in the right direction to get to get what you want. But I do think that we're likely to have like a vibe coding epidemic of like, you know, kind of amplifying Dunning-Kruger syndrome of people building software, AI slop, not reading the output carefully. You know, not doing code review, basically just letting, you know, letting codex or Claude Code do its thing and then slapping up a pull request without giving it a second thought. And, you know, there's likely to be, you know, substantial business losses because developers are deploying vibe coded software into production without sufficient coding without sufficient code review.

You know, of course, you can, you know, protect yourself against some of this by taking a test driven development approach and asking the agent to build a test suite that, of course, you have to review before you set to implementation work. And, you know, in the past, like I was never really a hardcore test driven development TDD adherent, but now using coding agents, I've become much more so because each time I sit down with Claude Code, I treat it as a defensive exercise where it's like, how do I present? How do I protect myself from the agent doing incomplete work or insisting that it's completed the problem or solve the problem in the way that I asked for? But it's deceived itself into believing that it's finished the problem. And so the more test coverage, whether it's, you know, test coverage, automated checks, benchmarking suites, like all the defensive things that you would need to do already to create a piece of production software, the yeah, it's even more important to do that with these agents. Otherwise, yeah, just any software that they create becomes a huge liability otherwise.

Yeah, I think that's a wonderful insight, actually, like it's like these practices become more important because you're and I also want to loop it back to something you said before, which is the future is a smaller number of really experienced, thoughtful architect engineer types marshaling a army of these agents. But then and then to your point, layering in this defensive coding practices, maybe that model starts to feel closer to the future. I sort of I'm waiting for the first spaceship crashing into Mars due to a vibe coding error, as opposed to a units conversion problem.

Right, right. Yeah, yeah, it's, it's, it's interesting. I think one of the existential problems is how to how, how will junior developers become senior developers. And so the old the old working model was that you would you become a senior developer by doing junior work over a long period of time getting, you know, good at not only writing code, but reviewing code and seeing what good code looks like. And then as your career progresses, you more of your work transitions from maybe as a junior developer, you're doing 90%, 90% writing code, 10% code review, maybe a senior developers doing 10% development 90% code review. And so now delegating more of the coding work to these agents means that more and more of our work is is shifting to code review. And I think that code review is still going to be a bottleneck. And so even having a senior developer with an army of AI agents, who's going to who's going to review all of that code? I mean, you can have the agents review their own work, or you can like I have for that. Yeah, I mean, I have, you know, I have friends that have told me about having, you know, a Claude Code implement codex review, or vice versa, basically, you know, use the use the agents to review each other slop and to, and to make it and to make it better.

But yeah, I think, you know, to have like a smaller, I do think, you know, on the whole, we're going to have fewer software engineers. And, you know, especially really experienced engineers will spend more of their time reviewing the work the output of agents. But still, there's a human bottleneck of the code of being able to do code review and assess the work and determine whether or not it's it's, you know, it should be accepted or whether, you know, many cases, like, these AI generated pull requests and patches should just be thrown out altogether and start over. But one of the nice things is that the cost of starting over is so much smaller. And so if you see some, you know, somebody approach a problem, and they've, you know, it's basically like the Panama Canal approach versus like going around, you know, the, you know, the bottom of South America, and sometimes like, you know, with an AI agent, you can generate an impressive amount of code, like 1000s and 1000s of lines of code in an afternoon, to pack your way around a problem, but maybe there's a more elegant solution that the AI agent just didn't see or like wasn't obvious, you know, from from given all the data in its training set, to see like the simpler, more elegant solution.

And so if everyone is, yeah, so if everyone is like always using the agents and creating solutions, which are sometimes circuitous, or kind of missing, like, you know, the more elegant, more maintainable, more sustainable approach, that that's going to read lead to these projects where, you know, you have 100,000 lines of code that maybe should only be 15, or 20,000. You know, if written by human, it might be a much smaller code base, that's easier to maintain, and is more robust over time. Whereas, you know, with this large code base, you start to reach a certain limit where it's even having the agent look at the code base starts to become unwieldy, like files with 1000s of lines, like it starts to choke on on the input pretty, pretty quickly. Like I recently created like a little personal finance tool with Claude Code called money flow for interacting with like, you know, I use monarch money and so personal finance tool. And now I have a project that I created from scratch with Claude Code. So it's 95 to 99% created with Claude Code with a lot of feedback and code review for me. And it's pushing including the test suite and all of the infrastructure and everything, it's pushing 40,000 lines of code. If I'd written it by hand, probably would be a lot smaller, in part, because, you know, I couldn't spend nearly the amount of time that it would take to write a code base that large. And I would have cut, cut more corners or made simplifying assumptions in order to accomplish the same things like with a lot, a lot less code.

So yeah, it's interesting. But yeah, I'm, you know, learning day to day. And yeah, I'm no expert by any means. But But I feel like the more I use these tools, the better I understand them, and the more I get out of them.

Well, Wes, I really appreciate you taking the time. This was a bit of a journey from from where you started and through to the present day and, and also thinking about what, how to think about at least a future where we're marshalling agents on our behalf. I really appreciate you taking the time and sharing your thoughts and insights.

Yeah, thanks again for having me on.

Thanks for joining us for the latest episode of the Data Masters podcast. You'll find links in the show notes to any resources mentioned on today's show. And if you're enjoying our podcast, please subscribe so you never miss an episode.