AI missteps as stepping stones (Ryan Timpe, The LEGO Group) | posit::conf(2025)

Transcript#

This transcript was generated automatically and may contain errors.

Awesome. Hi, I'm Ryan Timpe. I'm a data scientist at The LEGO Group. It's a really cool job. I get to combine my passion for data science with a really awesome brand. But I have to say today, I'm representing myself, not them. So got that out of the way. I'm here to talk about data science in general as in 2025. Because being a data scientist this year is a lot different than being one in 2020, or even maybe last year. And again, one of those big differences is I have AI and large language models to help me be more productive. So I know we're in day two of posit.conference. It's the afternoon, you probably have heard so much about LLMs, and it's not stopping here. But bear with me, because this talk is a little bit different. It kind of compliments the ones we've been hearing a lot. Well, because I'm talking about everything they've gotten wrong for me and how I've dealt with that.

So if you use a coding assistant at all, or a chat interface, you might recognize this window. You can ask the AI some questions or respond with grandiose yes-man statements that could be super helpful for you. How many of you have looked at that fine print, though? So Gemini can make mistakes. So double check it. Or this one. Chat GPT can make mistakes. Check important information. Two really short sentences. The first one admits that the tool is flawed. And the second one encourages human collaboration. So again, I've come across so many mistakes using AI in the past year. But this talk is all about embracing those mistakes, dealing with them, overcoming them, and how that human collaboration is really important.

And I've used AI professionally for a few things now, like learning new coding languages, trying out new methodologies, and replicating previous work I've done. Each one of these has come with mistakes, and I've learned a lot coming out of them.

Learning Python with AI assistance

So my first professional real engagement with LLMs was learning Python. My tools and platform at work have changed, and I'm being forced to be as productive in Pandas and PySpark as I have been for years in R and SQL. And AI has been great here. I know the data science I need to do, and the AI can do the new language for me.

And when I first started using AI and coding assistants to help me learn Python, things went great. Some things worked very, very well. Such as very discrete individual prompts I saw were great. I could say import this data. Yes, it knows how to do that. Transform this data or group it by and summarize. Very easy tasks from go to point A to point B. The AI was great at giving me code to do that. Really big greedy tasks worked as well. So this one built a mix effects model with this data in these three columns. That worked great. I could even say build me a shiny app. It had code to do that. It mostly worked great.

The way I work is I take really small tasks, I tie them together. I import my data, I review it, I summarize it, I transpose it, and I put it into models. I do A to B, B to C, C to D, and so on forever. And when I was linking the output of my discrete from the AI and putting those all together and putting that code into different chunks of my notebook, I was getting failures all over the place. Specifically with this, I was seeing that my LLM was proposing very different approaches to answer similar prompts. I was seeing new functions pop up all the time. I was seeing new syntaxes I didn't understand. And I was also getting type errors. So my notebook seemed to be forgetting what data frames were. I was very confused. This exact problem was that my LLM in my case was swapping between PySpark and Pandas. And I had skipped all that basic Python, so I didn't know the difference. It's fine.

So the LLM and I were working under very different assumptions. My environment was very much PySpark dependent, and I hadn't known that. But since I was a Tidyverse person for so long, I always said I had heard about Pandas. And so in my head, I was sometimes trying to do Pandas work, and that didn't work out. So we had a very big miscommunication between myself and the LLM.

And I also had very imprecise prompts. I was letting the AI make all the decisions for me, because I wasn't telling it what to do. This is all fixable, though. So after I realized my issue, I could revisit my prompts and write better prompts and really make that point A to B, B to C process work a lot better. So now, of course, when I'm working with prompts, I'm very clear about what tools I'm using, what libraries to use. I'm very clear about my input data. If it's in the notebook, I can reference old chunks. If I'm in a chat interface, I can describe my data very well before I just ask the AI to give me solutions. And same with the output. I'm very, very clear about what output I want out of the AI. It's a little bit more effort, but it saves me so much more time and pain by just being more precise with my prompts.

The danger of over-reliance on AI

So at this point, for a few months, I'm producing a lot of new data science projects in Python, and I had overcome this hurdle. And I think at this point, yeah, I must be a Python data scientist by now, because I'm producing Python. It's great. So if you had asked me or if a recruiter had asked me or my boss had asked me, I'd be like, yeah, I know Python now. Yeah, because I was delivering in Python, so that counts. And so then one day I go to test myself, I open up a notebook, I import my libraries, and then nothing. My mind went blank. Turns out I did not know Python now.

I was way too reliant on the AI. Nothing was retaining. It's so bad to the point, like I was still asking the AI to import my data and read my parquet file rather than me just typing the pd.read parquet. I was so lazy. So one, this is embarrassing and not good for me, and I wasn't happy with myself, but also it was really inefficient. I was spending so much time rewriting prompts and then manually editing the output of prompts that I think it would have been a lot faster for me to actually know Python and write some code. So this was another big wake-up moment for me. I had to be a lot more engaged with my code. I was doing passive reviews, okay, I see the words, everything makes sense, but now I'm a lot more active in my review.

I'm studying the code that comes out of it. I make sure I know what functions are, and it's nice, because if I don't understand a function, I don't understand what they're doing, I can be like, hey, I can ask the AI coding assistant, why did you do it this way, and it's usually really good at answering that. So this was another big turning point for me. It slowed me down a little bit, but I also feel a lot more comfortable with my work, because I actually understand what I'm producing now.

I was way too reliant on the AI. Nothing was retaining.

Using AI to tackle new methodologies

But it also turns out that I'm a paid data scientist, so I can't just be learning all the time. I'm expected to produce once in a while. But AI has been great here as well.

So I've been at my company for quite a few years now, and so I'm working with the same stakeholders year after year, quarter after quarter. They're getting very data savvy, because they're getting used to seeing data science output, and so they're asking more exciting questions and more interesting questions. Sometimes I don't have the tools to answer those questions, and AI can do that. So for one example, I had to come up with a new type of demand model with some very specific constraints for price and substitution effect. I didn't know how to do it, but then online, I found an academic paper that showed me exactly how to do that.

The problem is that academic papers are very long and very dense. I'm a business person. I don't read those. But it's fine. I turned to my AI. I was like, hey, here's this 20-page paper. You read it, and gave me the Python code to run this model. And it said sure, and it gave me trash.

No surprise there. It gave me some code. I didn't understand it. I didn't know the inputs. I didn't know what it was trying to do. It was basically a nonstarter. I could not work this way. But also, I needed these models, because I was under a deadline, and I couldn't give up, and I knew AI could help. So here, I started over. I took the paper. But this time, I actually tried to read it. And I pulled out the snippets that I thought were important, but then I added a few things. So when I went to the LLM, I gave it some business context. I told the LLM, hey, this is how I need to use this paper. These are the questions I'm getting. This is the methodology I need to use. I gave it some pseudocode. The paper had some Matlab pseudocode in it. I didn't know what to do with that, but the AI could. I gave it sample data, so exactly the input data that I would be working with and what I would expect to put into the models, and also a description of the output, what I needed to come out of this. And then I fed that to my LLM, and I actually got something. It was a few hundred lines of very well-documented code. It did not work still at the first try, but I could see how it approached the problem, and this was way more informed than I could have been. So I could read the code and follow kind of what it was trying to do, but now I could read the code side by side with that original paper. This really helped me understand the paper a lot better, and then by understanding the paper better, I was able to go back to the not-working Python code, find the bugs, fix it, and actually got those models running after a few hours, and without AI, this would have taken probably quite a few days for me to do.

And this was another big changing point of how I use my LLMs. So working with an LLM now is a collaboration. I remain the expert wherever possible. I know my data better than the AI. I know my business better than the AI, at least right now. I know my stakeholders better. So I need to keep that expertise every time I'm asking an LLM to do work, and then where there's knowledge gaps, I then rely on the LLM to fill in the gaps. So here it was the exact methodology, and it was the Python code.

I remain the expert wherever possible. I know my data better than the AI. I know my business better than the AI, at least right now.

AI missteps as stepping stones (Ryan Timpe, The LEGO Group) | posit::conf(2025)

Transcript#

Learning Python with AI assistance

The danger of over-reliance on AI

Using AI to tackle new methodologies

Testing the LLM: the Lego mosaic challenge

A checklist for working with LLMs

Q&A