Resources

AI missteps as stepping stones (Ryan Timpe, The LEGO Group) | posit::conf(2025)

AI missteps as stepping stones: Opportunities gained when your LLM coding assistant gets it wrong Speaker(s): Ryan Timpe Abstract: LLM coding assistants have become a valuable companion for learning and productivity in data science. (Hey Siri! Import this csv!) While their ability to generate code and explanations is impressive, I have found more value and personal growth from the mistakes they make. This talk focuses on embracing coding assistants as imperfect companions and succeeding when they fail. I'll share insights from using these assistants to facilitate my transition to Python, highlighting the pitfalls of accepting their recommendations without question. Through real examples where LLMs fell short, I'll demonstrate how these challenges provided frameworks for problem-solving and led to a deeper understanding of data science tools and methodologies. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Awesome. Hi, I'm Ryan Timpe. I'm a data scientist at The LEGO Group. It's a really cool job. I get to combine my passion for data science with a really awesome brand. But I have to say today, I'm representing myself, not them. So got that out of the way. I'm here to talk about data science in general as in 2025. Because being a data scientist this year is a lot different than being one in 2020, or even maybe last year. And again, one of those big differences is I have AI and large language models to help me be more productive. So I know we're in day two of posit.conference. It's the afternoon, you probably have heard so much about LLMs, and it's not stopping here. But bear with me, because this talk is a little bit different. It kind of compliments the ones we've been hearing a lot. Well, because I'm talking about everything they've gotten wrong for me and how I've dealt with that.

So if you use a coding assistant at all, or a chat interface, you might recognize this window. You can ask the AI some questions or respond with grandiose yes-man statements that could be super helpful for you. How many of you have looked at that fine print, though? So Gemini can make mistakes. So double check it. Or this one. Chat GPT can make mistakes. Check important information. Two really short sentences. The first one admits that the tool is flawed. And the second one encourages human collaboration. So again, I've come across so many mistakes using AI in the past year. But this talk is all about embracing those mistakes, dealing with them, overcoming them, and how that human collaboration is really important.

And I've used AI professionally for a few things now, like learning new coding languages, trying out new methodologies, and replicating previous work I've done. Each one of these has come with mistakes, and I've learned a lot coming out of them.

Learning Python with AI assistance

So my first professional real engagement with LLMs was learning Python. My tools and platform at work have changed, and I'm being forced to be as productive in Pandas and PySpark as I have been for years in R and SQL. And AI has been great here. I know the data science I need to do, and the AI can do the new language for me.

And when I first started using AI and coding assistants to help me learn Python, things went great. Some things worked very, very well. Such as very discrete individual prompts I saw were great. I could say import this data. Yes, it knows how to do that. Transform this data or group it by and summarize. Very easy tasks from go to point A to point B. The AI was great at giving me code to do that. Really big greedy tasks worked as well. So this one built a mix effects model with this data in these three columns. That worked great. I could even say build me a shiny app. It had code to do that. It mostly worked great.

The way I work is I take really small tasks, I tie them together. I import my data, I review it, I summarize it, I transpose it, and I put it into models. I do A to B, B to C, C to D, and so on forever. And when I was linking the output of my discrete from the AI and putting those all together and putting that code into different chunks of my notebook, I was getting failures all over the place. Specifically with this, I was seeing that my LLM was proposing very different approaches to answer similar prompts. I was seeing new functions pop up all the time. I was seeing new syntaxes I didn't understand. And I was also getting type errors. So my notebook seemed to be forgetting what data frames were. I was very confused. This exact problem was that my LLM in my case was swapping between PySpark and Pandas. And I had skipped all that basic Python, so I didn't know the difference. It's fine.

So the LLM and I were working under very different assumptions. My environment was very much PySpark dependent, and I hadn't known that. But since I was a Tidyverse person for so long, I always said I had heard about Pandas. And so in my head, I was sometimes trying to do Pandas work, and that didn't work out. So we had a very big miscommunication between myself and the LLM.

And I also had very imprecise prompts. I was letting the AI make all the decisions for me, because I wasn't telling it what to do. This is all fixable, though. So after I realized my issue, I could revisit my prompts and write better prompts and really make that point A to B, B to C process work a lot better. So now, of course, when I'm working with prompts, I'm very clear about what tools I'm using, what libraries to use. I'm very clear about my input data. If it's in the notebook, I can reference old chunks. If I'm in a chat interface, I can describe my data very well before I just ask the AI to give me solutions. And same with the output. I'm very, very clear about what output I want out of the AI. It's a little bit more effort, but it saves me so much more time and pain by just being more precise with my prompts.

The danger of over-reliance on AI

So at this point, for a few months, I'm producing a lot of new data science projects in Python, and I had overcome this hurdle. And I think at this point, yeah, I must be a Python data scientist by now, because I'm producing Python. It's great. So if you had asked me or if a recruiter had asked me or my boss had asked me, I'd be like, yeah, I know Python now. Yeah, because I was delivering in Python, so that counts. And so then one day I go to test myself, I open up a notebook, I import my libraries, and then nothing. My mind went blank. Turns out I did not know Python now.

I was way too reliant on the AI. Nothing was retaining. It's so bad to the point, like I was still asking the AI to import my data and read my parquet file rather than me just typing the pd.read parquet. I was so lazy. So one, this is embarrassing and not good for me, and I wasn't happy with myself, but also it was really inefficient. I was spending so much time rewriting prompts and then manually editing the output of prompts that I think it would have been a lot faster for me to actually know Python and write some code. So this was another big wake-up moment for me. I had to be a lot more engaged with my code. I was doing passive reviews, okay, I see the words, everything makes sense, but now I'm a lot more active in my review.

I'm studying the code that comes out of it. I make sure I know what functions are, and it's nice, because if I don't understand a function, I don't understand what they're doing, I can be like, hey, I can ask the AI coding assistant, why did you do it this way, and it's usually really good at answering that. So this was another big turning point for me. It slowed me down a little bit, but I also feel a lot more comfortable with my work, because I actually understand what I'm producing now.

I was way too reliant on the AI. Nothing was retaining.

Using AI to tackle new methodologies

But it also turns out that I'm a paid data scientist, so I can't just be learning all the time. I'm expected to produce once in a while. But AI has been great here as well.

So I've been at my company for quite a few years now, and so I'm working with the same stakeholders year after year, quarter after quarter. They're getting very data savvy, because they're getting used to seeing data science output, and so they're asking more exciting questions and more interesting questions. Sometimes I don't have the tools to answer those questions, and AI can do that. So for one example, I had to come up with a new type of demand model with some very specific constraints for price and substitution effect. I didn't know how to do it, but then online, I found an academic paper that showed me exactly how to do that.

The problem is that academic papers are very long and very dense. I'm a business person. I don't read those. But it's fine. I turned to my AI. I was like, hey, here's this 20-page paper. You read it, and gave me the Python code to run this model. And it said sure, and it gave me trash.

No surprise there. It gave me some code. I didn't understand it. I didn't know the inputs. I didn't know what it was trying to do. It was basically a nonstarter. I could not work this way. But also, I needed these models, because I was under a deadline, and I couldn't give up, and I knew AI could help. So here, I started over. I took the paper. But this time, I actually tried to read it. And I pulled out the snippets that I thought were important, but then I added a few things. So when I went to the LLM, I gave it some business context. I told the LLM, hey, this is how I need to use this paper. These are the questions I'm getting. This is the methodology I need to use. I gave it some pseudocode. The paper had some Matlab pseudocode in it. I didn't know what to do with that, but the AI could. I gave it sample data, so exactly the input data that I would be working with and what I would expect to put into the models, and also a description of the output, what I needed to come out of this. And then I fed that to my LLM, and I actually got something. It was a few hundred lines of very well-documented code. It did not work still at the first try, but I could see how it approached the problem, and this was way more informed than I could have been. So I could read the code and follow kind of what it was trying to do, but now I could read the code side by side with that original paper. This really helped me understand the paper a lot better, and then by understanding the paper better, I was able to go back to the not-working Python code, find the bugs, fix it, and actually got those models running after a few hours, and without AI, this would have taken probably quite a few days for me to do.

And this was another big changing point of how I use my LLMs. So working with an LLM now is a collaboration. I remain the expert wherever possible. I know my data better than the AI. I know my business better than the AI, at least right now. I know my stakeholders better. So I need to keep that expertise every time I'm asking an LLM to do work, and then where there's knowledge gaps, I then rely on the LLM to fill in the gaps. So here it was the exact methodology, and it was the Python code.

I remain the expert wherever possible. I know my data better than the AI. I know my business better than the AI, at least right now.

Testing the LLM: the Lego mosaic challenge

And then basically now with this kind of help, I'm producing a lot of things in Python, and I'm kind of understanding it, and I'm giving a lot of putting a lot of things out into the business way more than I was maybe a year ago, and I'm smiling and delivering it to the business. I'm super happy, but in the back of my mind, I'm panicking, like, is this any good? Like, I'm producing a lot. I'm hoping it's giving value, and people are making decisions based off of it, but I'm also wondering, can I actually trust it?

And so this is a very hard question to answer, and it really depends on probably every single business case or every single use case, but I'm a data scientist. I really like asking these questions and doing training sets and testing sets, so I decided to put my LLM through a test. A few years ago in R, I made a package that takes images and turns them into Lego mosaics, so something you could build out of Lego, and I was wondering if the LLM could reproduce that, so it took me a few weeks to do. I learned a lot. It was really fun. How fast can I get an LLM to produce the same thing?

So I gave it this prompt. Write me a function that takes an image and renders it as a mosaic made of Lego bricks, and to do that, I'm going to use this image of my dog, and this is Hunter. He might look like the cartoons I've been showing around. So I took the Python script that the LLM gave me, I gave it my image file, and I got, well, nothing. It didn't run the first try, but that's fine, because I'm really good at debugging LLMs at this point, or the LLM output at this point. After a few minutes, I found some bad transpose statements, and I got this.

It's not great. It's not what I asked for. I can see what it's trying to do, though, and probably from the back, it looks like my dog, but there's some issues. The colors aren't real. They chose the colors based off the image, but these are not Lego colors. I can't go to the store and buy these bricks and build this, so that's a nonstarter for me. The output size was very arbitrary. The LLM chose the output size based off of the JPEG file size, which doesn't make much sense to a Lego builder. It's all one-by-one bricks, which isn't wrong, but also not fun for me to build, and very inflexible code. So I noticed that every single time I would edit the prompt, add a new feature, it had to rewrite every single thing it was giving me, because it couldn't add that new feature into the established prompt, and then that was losing all the edits I would make manually to it, and that was just very frustrating.

But then after a few prompt changes, and I kept nudging it in my direction, I got this. And from the back, I think it looks like my dog. It's all Lego bricks. It's all different sized bricks. Those are colors I can buy. And here, the LLM did what I asked, and if I compare it to my R code from a few years ago, basically the same thing. A little style changes, some color algorithm differences, but I got the LLM to produce what I wanted to produce, and that's really cool, because I got that in a matter of minutes, where suppose I manually did this maybe over a few weeks of free time.

So in this one case, I can trust the LLM, but it's not because the LLM gave me what I wanted the first try. Instead, it's because I was working with the LLM along the way, nudging it in the direction I wanted to go, and that's how I gained that trust. And this is a silly example of my dog and Lego bricks, but this also follows my process at work I do a lot. So I have a very tried and trusted huge code base of models in R that I'm slowly porting over to Python. That doesn't work if I just copy and paste the whole repo and say, hey, translate this. I have to still mimic how I develop that. I do small chunks, and then I'm testing those chunks along the way, making sure they're producing exactly what I expect. Tech check against the truth. I know the truth. I've been with my business a long time. I want to make sure I can trust that. And then when I'm happy with that, I move on.

A checklist for working with LLMs

So all these little mistakes I've come across in the past few years, I now have this pretty solid checklist of things that I go through when I'm working with an LLM, and that has made me a lot more productive. So just kind of rehash. I write clear, precise prompts. I have to make sure that the LLM and I are working under the same assumptions, and I'm the one making those assumptions.

I work in these really small increments, doing small things at a time, trusting the increments along the way. I treat the LLM as a collaborator, not a replacement. So I'm an expert in a lot of things. I have to keep that. I stay engaged. I'm the one responsible for the code I am at the end of the day. When my business stakeholders are using it, I can't blame an LLM that's wrong. It comes to me. I want to make sure I can vouch for every single line of that code, and then I can just continuously validate the output against what I know to be true.

So going back to my initial statement, how data science is different this year, yeah, I'm a lot more productive. I'm doing things that I never really imagined, and it's kind of cool after being a data scientist for maybe ten, it's hard to define how I started, but ten years, like, I've had a supercharged year where I'm now doubling my productivity, and that's because of LLMs, but also it's because I know that LLMs are imperfect. They have a lot of mistakes, but I can now anticipate those mistakes. So with that, I thank you, and we definitely have some time for questions, but, yeah, there's my socials. It's just Ryan Timby everywhere, if you're interested. But thank you.

Q&A

Thank you so much, Ryan. We do have a few questions. So first one, do you think a DS would eventually change the job essence from writing code from scratch to knowing how to use AI to write code to solve your DS problem? It continues on saying, do you think this paradigm shift will happen as a good DS is good for solving problems versus writing code in specific languages?

Anything could happen. Yeah, I mean, I wouldn't like my job as much if that was the case, and I think there's a lot of new frontiers that a human is still available to push, but, again, I didn't know we'd be in this spot a year from last year or two years ago, so I can't answer that.

Right. 100%. So we do have a few questions about which LLMs you like to use and if you ask them to collaborate together, such as giving a prompt to one LLM and asking another to review it in terms of, like, the different providers.

Yeah, and it really is context dependent. I use Databricks a lot at work, and I like their inline assistant, but then if I'm doing something from scratch, I like going to Claude, depending if it's a professional or a personal project, I have to change what tools I'm using. I'm not a snob or elitist about any tools right now. They're all pretty amazing to me, but I do have found that depending on my situation, like, the debugging and stuff, I go back and forth between maybe Claude and the inline, but I don't know. I like just keeping my mind open, and since every model is being updated all the time, I don't think I don't want to form these solid habits yet.

Right. Definitely. You mentioned there was a situation where you as a DS need to translate some R code into Python. Can you give an idea of when a situation like that would require something like that? When my corporation tells me to.

Totally. Okay. Very important question. I'm representing myself today.

All right. Very important question. Is the build your dog out of Lego script available publicly so they can build their own dogs?

It's all R code, but also, as you just saw, you could spend five minutes with an LLM and have it do that as well, but, yeah, I have an R package that's on my repo. I want to update it with some new features that just since I work at the Lego bricks all day, I don't want to spend my free time doing it anymore, but, yeah, it's, again, I just you can do it now. This is the LLM version, not my version, so just go to your favorite LLM and nudge the prompt until it gives you what you want.

Awesome. Is Lego you from R or Python? No, this is Python. It makes me cringe, but I'm doing it now.

What's your favorite Lego creation? I'm a big fan of the classic castles from my childhood, and I would just always have them. We had the big vacuum molded base plates, and I would just always destroy it and rebuild it over and over, so I have very fond memories of it. Awesome. Thank you so much. Thank you so much.