Resources

Keynote: I Wrote This Talk with an LLM - Hadley Wickham

In this keynote, I’ll explore the evolving relationship between data scientists, statisticians, and large language models through a unique experiment: this entire talk was created in collaboration with an LLM. From outline to slides, from code examples to key insights, I’ll share the practical realities of using AI as a thought partner in the R ecosystem. Drawing on my experience developing tidyverse packages and teaching data science, I’ll demonstrate how LLMs can augment (rather than replace) the R user’s workflow. We’ll examine specific examples where AI assistance shines—rapid prototyping, documentation generation, and creative ideation—alongside areas where human expertise remains irreplaceable. Most importantly, I’ll reflect on what this experiment reveals about the future of our community: How might AI change the way we teach R? What new skills should we prioritize? And how can we ensure that the tools we build remain accessible and empowering for all users? Join me for this meta-exploration of AI’s role in our work, with honest reflections on both the promise and limitations of these new collaborators in our statistical computing journey. This abstract was generated by Claude Sonnet 3.7 and lightly edited by me. I used the prompt: I am Hadley Wickham, chief scientist at RStudio/Posit and I’ve been invited to give a keynote on AI at the useR conference. Please write a talk abstract for a talk entitled ‘I wrote this talk with an LLM’ https://hadley.nz/

Oct 11, 2025
34 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Unfortunately, that was my talk abstract and title was a bit of a lie, I did not write this talk with an LLM, but I did make that abstract.

So what I want to actually give you is kind of a no-bullshit guide to LLMs, and I think there's a whole lot of bullshit about LLMs in the air, so my goal is to hopefully clear a little bit of that, if my clicker works.

And I think probably many of you have kind of like big feelings about AI, I certainly do, there's a lot, you know, if you read about AI, like on the one hand there's like so much unbridled optimism, and on the other so much like skepticism, and as you start to engage with it yourself, I think you kind of experience both of those emotions, because AI, LLMs can seem both like incredibly, surprisingly intelligent and also incredibly, surprisingly stupid as well.

So I'll show you a few examples of that, but I think it's really, like if you don't feel conflicted about AI, that's weird.

My kind of feelings of like existential dread kind of swing between mild and moderate, they're kind of currently on mild at the moment, but I certainly do, it does feel like my kind of identities as a software engineer and a data scientist are under threat, and it's very easy to feel kind of disempowered right now.

So like one of the things I do, well one of the things I do that's not so helpful is I get drunk and like donate money to good causes, I'm not going to promote that as a strategy, but the other thing that I found really useful is just to try and do like one small thing to make the world a better place. So I like occasionally when I'm feeling particularly down, I'll just like go out and pick up some trash around my neighborhood, and that kind of reminds me like it's not my job to fix the world, all I can do is make my little bit of the world a little bit better.

And so I kind of want to give you a bit of that today, I want to give you some ways that you can engage with AI, with LLMs that hopefully like empower you and make you feel a little bit more in control of your future.

And so my two goals today are to give you a handful of useful skills, and to demystify a few of the many, many, many jargony terms that come with AI.

Introducing the ellmer package

I'm going to do this like mostly in R, using the Alma package. So Alma is a package that me and a few others at Pulsar have been working on recently. As you might kind of guess from the name, it's about using LLMs in R. And it has the honor of being the first package name which I've created by adding vowels, rather than taking them away.

So here's some Alma code, we're going to start by loading Alma obviously, and then I'm going to create this chat object, here I'm using chat-anthropic, I'll show you a few other models today, but this is my kind of starting place, this is like my favorite model. And then I can interact with it, and I can interact with it from R in exactly the same way you can interact with it on the web.

What LLMs are bad at

I think one of the things that make LLMs so hard to reason about, or so hard to think about, is they kind of turn upside down, like what we think of as computers being good at, and what we think of as computers being bad at. Because you think of computers being good at multiplying numbers together, but it turns out that LLMs are pretty bad at multiplying numbers together. They'll certainly do their darnedest and get something that looks kind of right, but they'll be a little bit off.

In fact, they can't even count. So I asked this version of Anthropic, how many letters are in the word unconventional? And I won't make you think too much, it's actually four Ns, but Claude has confidently said two Ns.

And then I asked it again, and it said there were three Ns. And then the cool thing about this being an R, well I ran that a hundred times, and fourteen times it said there was one N, seventy-six times it said there were two Ns, and ten times it said there were three Ns, so zero times it got it correct. This is a pretty easy problem.

Now you might notice I've specified this model here, and that's because Claude fixed this problem. So now if you ask Claude how many Ns are there in unconventional, it gives you the right number. And this is not because the model has become more intelligent, this is because Anthropic is embarrassed by the fact that its LLM can't count the number of letters, and has kind of adjusted their prompt to make it do this correctly.

And kind of interestingly, if you've been following any of the news about GBT's latest model, it has exactly the same problem. Like how many Bs in blueberry? This is like a simpler problem I think, it thinks there are three Bs rather than two.

So LLMs kind of suck, right? Like the results are stochastic, their models are constantly changing, the difference between good results and bad results can be razor thin, this kind of term of the jagged edge. But they always give kind of results that look plausible, and it's pretty unusual for them to say like, oh, I don't know, or like, this is just my best guess. Like they don't admit doubt or lack of knowledge.

What LLMs are good at

So I think all of those things are true, and LLMs can do things that computers kind of could not. We can ask them to write poems, like give me a limerick about the tidyverse. A coder who worked with great zeal said tidyverse makes data feel real. With dplyr to wrangle and ggplot2's angle, my pipe dreams are now a good deal. Maybe not the best limerick in the world, but certainly a better limerick than I can come up with in five seconds.

Or you can also use limericks to roast your colleagues, so I gave a talk at the JSM a couple of days ago, and I came up with a few roasts. I particularly like Roger, the one for Roger Ping. Roger believes in reproducibility, which is great, because his jokes and lectures reproducibly bomb every time.

As Matt pointed out, you can of course combine the two, roast yourself with a limerick. And you can use LLMs to create art, and you can use them to make fun games.

So I'm going to give you a live demo. So I've asked it to create a fun quiz game that asks challenging questions about the tidyverse. So welcome to Tidy or Die!

Okay, so let's see, you have a data set called sales, and you want to keep only rows where the revenue column is greater than 1000. Which tidyverse function? Okay, let's do a vote. So who thinks A? Okay, who thinks B? Okay, who thinks C? And who thinks D? Okay, let's say B.

But that was way too easy. Okay, so let's do a nightmare question. I have to say, when it gets up to these really hard levels of difficulty, I'm like, yeah, this is totally wrong. But we'll see how it goes. Okay, I want to fit a linear model predicting values from row number. Extract the R-squared value and then return a tipple with group and I, wow, okay. Which approach is most tidyverse elegant?

So it's fun, right? And you know, you can certainly use LLMs to stop learning about anything, but you can also use them to learn about things in new ways.

Using LLMs as a discovery tool

Well, there's lots of other things you can do a little bit more practically. Here I'm going to take advantage of the first argument to chat anthropic, which is to provide a system prompt. System prompt is not something you can really use when you're using a chat app on the web. This is something that's going to set the tone for all of the following conversations. And I just say be terse here just so the answer will fit on one slide.

But just to demonstrate that I think like one of the things that is like challenging about Google or search in general is if you know the name of the thing you're looking for, you can find out a ton of information. If you don't know what you're looking for, you can't. And kind of LLMs often give us that ability to say I'm kind of looking for this thing that I vaguely remember, like it does something like this. And now you can get an answer. And I think that's tremendously empowering.

Well, one thing that I did use recently as I was working on Hitter2 and I needed to add a new feature, I was like someone must have thought about this problem before. So I asked it, and really the key thing it told me about was this like leaky bucket algorithm. And that like algorithm and that metaphor was like so powerful and so useful, but there's no way, it would have been very, very difficult for me to discover that previously.

Or as sort of JD puts it succinctly, you know, like I think another thing that's challenging about LLMs is that they can be bad at simple things and good at complex things. And you can't just judge them, you can't just give up on them because they make these really silly mistakes and very simple problems. That's not really what they're designed for. They're designed to do things that computers are not traditionally good at.

Structured data extraction

So again, my goal of this talk is to give you some useful skills and to demystify some jargon. So let's dive in to talk about three topics that I think are useful. The first are using LLMs to extract structured data, to turn all sorts of data into nice rectangles, talk a little bit about tool calling, which is a way to add extra capabilities to LLMs and to create interactive agents, and then we'll finally talk a little bit about how you can use LLMs to help you with your program.

So if you've ever collected data from humans, you've encountered situations like this. Mine was, I think, how there's like, how many people from Posit here? There's like eight people from Posit here, and Mine was saying that everyone put a different name for our company, Posit. Some different variation.

And you know, if you look at this, like maybe you could come up with some regular expressions, like the age, you could use digits, but some people spelled it out, we've got the big 5-0. LLMs are even more hard, like names mostly are one words, but not always, they often start with a capital letter, but there's lots of other capital letters in there, so trying to do this deterministically with code is going to be really, really hard. But this is something that LLMs are generally pretty good at.

And you can do this just using text. You can say, hey, extract the name and age from each sentence I give you, and I can give it a few sentences. And by and large, this is the sort of task that LLMs do like pretty reliably, they're not going to hallucinate because all the data's there, it's just finding the right thing.

But wouldn't it be nice, like if rather than getting this string, which we're now going to need to pass in some way, wouldn't it be nice if we can get our data structure back? Maybe something like this. Well, you can.

And all you really need to do here is you need to give the LLMs a description of the schema, like exactly what do you want to get back from it. And here I'm saying I want an object, this is a JSON schema, this is kind of JSON's of the lingua franca of the web, and so this is what LLMs are pretty good at speaking as well. I want an object, which is like a named list. The name is a string and the age is a number. And now I can use this with chat structured instead of chat, and I get a nice list back.

And I can do this to many prompts all at once in parallel using another LLMR function called parallel chat structure. And I think this is like a really cool little tool. I think this is like a data science superpower because you can now go from all sorts of unstructured data, not just text, but as I'll show shortly, images and video and audio, you can go from this unstructured data into nice rectangles. And if there's one thing that data scientists are good at, it's working with rectangles.

And if there's one thing that data scientists are good at, it's working with rectangles.

So here's another little example. Here I've got four images, and I'm going to say, well, I'll just give you a little bit of code. I'm going to find all of the files in the animals directory. I'm going to use this content image file function. That's how you tell Alma you want to send an image to the LLMs. I'm going to say what I want back. I want the animal and the background color, and then I'm going to call that same function again.

And again, there's a little bit of stochasticity there. Like if I call this four times, I get four slightly different answers. But when you look at how they vary, this is kind of fair. Like what animal would you say this is? Like I think kitten and cat are both valid answers. What's the background? Well, I mean, it's kind of a beige-y gray.

So again, you're not going to get reliable, 100% reliable answers, but I think the problem here is that we have not been clear enough with exactly what we want. And I think the other thing that's interesting here is that once you've got that rectangular data frame, now you can use all of your statistical tools with it. And statistical tools don't assume that the data is measured without error. That assumption that there's going to be mistakes and variation and differences there is fundamentally baked into statistics. And I think this combination of structured data extraction and statistical modeling is super, super powerful.

But of course, you're going to really want to think about that. You're going to think about like, is this the right prompt? Am I getting the right examples for that? If you want to do that in the LLM world, that's called evaluating your LLM, evaluating LLM. And Simon Couch, my colleague, has been working on the vitals package, if you want to make that a little bit easier. Simon's on the tidy models team, and so the goal is to really bring like what we know about evaluating statistical modeling, combine that with the state-of-the-art of evaluating LLMs, and to give you something that's like well-principled and well-thought-out.

Tool calling and agents

The next thing I want to talk about is tool calling, and I want to illustrate that with a problem. Another thing that LLMs are bad at. If you ask an LLM what day it is today, best-case scenario, it's going to say, I don't know. If you ask this of chatGBT, it actually just makes up a date, which is not very reassuring. And LLMs like can't do anything, right, they're stuck in a box. You can't say, hey LLM, like delete the CSV files on my directory. And we can get around both of these limitations by using these things called tools.

And a tool is actually something that's really simple. It's just a function along with some metadata. So I'm obviously going to write R functions, you can write them in any language, and then you're going to provide a little bit of metadata to tell the LLM when it might want to use that tool.

So, for example, I can create a tool, this tool I've called today. It's going to call sys.date, which gives you today's date in R. I'm going to say what it does, and I'm going to say what arguments it takes, which in this case is not. And now that I've empowered my chat, I can ask it what day is it today, and it can tell me.

And when you look at this, you might be like a little surprised, like, is Anthropic like running R on their servers? That would be cool, but certainly a little surprising. That's not actually how tool calls work. So when you send or request the LLM, the LLM is going to be like, oh, okay, I recognize that I need to call this tool, and it's going to send that information back to you. That's going to get intercepted by Alma, and Alma is going to call the tool and return the result back. Now the LLM has all the information it needs to answer your original query.

So the neat thing about this is because the code is running on your computer, you can do anything that you can do on your computer. So one thing you could do, for example, is to give the LLM kind of traditional computing powers. I could give it the power to multiply two numbers together, and now if I ask it to multiply two large numbers together, it can do that correctly because it's going to use that tool.

And indeed, if you ask many of these LLMs on their websites to do these things, they'll just write a little bit of JavaScript code, because this is just another tool that they can use to answer the questions, and the models have been trained to know, or specifically the ones they kind of present on their websites, like don't try and do this stuff yourself, write a little program to do it.

That brings me to one of the most powerful things you can do with tools, and that's to create an agent. So if you've read anything about AI on the internet lately, you've probably heard the term agent or agentic a million times, and you probably still have no sense of exactly what an agent is. But it turns out if you know what a tool is, and that's just a function plus some metadata, it's really easy to explain what an agent is, because an agent is just a chatbot that's been empowered with two tools. One that can read the state of the world, and one that can write the state of the world.

So let's make an agent. I'm going to give my chatbot the ability to list all of the files on the current directory, and I'm going to give it the ability to delete files. And now I can say, hey, delete all the CSV files in the current directory. It's going to say, well, okay, to do that, I'm going to need to find out what files are in the current directory. So it runs, it requests that Alma call LS, Alma is going to return the results back to the LLM. It's going to say, okay, I found two CSV files, A and B, I'm going to delete those. It's going to send that tool call request back to Alma, Alma is going to delete the files, and now it's deleted those files on my computer.

And hopefully when you look at this, you're like, Hedley, what on earth are you thinking? You have just empowered an LLM, which is notoriously unreliable, to delete files on your computer. And certainly you can put a few guardrails around it, and there are some sort of built-in to the LLMs if you ask it to delete all the files on your computer. The model itself is going to be like, that's a bad idea, I'm not going to do that.

Basically, this is the state of security today with agents. It's the YOLO model. And I think discovering ... so agents are interesting because they offer so much potential. You can empower an LLM to do anything, and so much danger. You've now empowered an LLM to do anything. And it's going to take us, I think, collectively quite some time to figure out how to do this safely.

Using LLMs to help you code

So that brings me to my third topic, which is using LLMs to code. I want to talk about ... I'll sort of first center you into what the main techniques for using LLMs to code are. Hopefully you've seen many of these before. The first one is kind of the most basic. You just go into chatgpt.com and say, hey, give me a function that does blah, or I need to write some R code that solves this problem. That's kind of fine, but you're constantly copying and pasting back and forth.

This is kind of like the old days, like pre-markdown, where you're copying and pasting the results from Excel into your Word document, and something changes. Constantly copying things back and forth, and it's very, very easy to get out of sync.

So now most modern IDEs are starting to bring in AI tooling through either inline autocomplete, so that's just when you start typing, and the LLM helpfully sometimes, and not so helpfully most of the time, suggests finishing your thought. I think one of the skills of programming today is being able to ignore that when it's bad, and accept it when it's good.

You can also have chat that's inside your IDE. This is just sort of a convenience. It's exactly the same thing as chatting with chatgpt, but you don't have to copy things back and forth. You can just kind of click a button, and the chat knows more about what you're doing because it can see what you're working on.

And then finally, kind of the newest trend is agentic coding. This is things like a cursor or clawed code, where the LLM is just going to go off and do a bunch of work. It's going to write some code, it might write some tests, run those tests, and iterate for a little while.

And so I want to talk about, there's lots of ways in which LLMs do pretty useless and horrible things when coding, but I want to talk about the things that I think are clear wins. And there are two. And the first clear win, I think, are translations. So whenever there's all of the information the LLM kind of needs there, it just needs to be expressed in a different form, they tend to do pretty well.

So if you can really accurately write in pseudocode what you're trying to do, LLMs are going to turn that into code in any language pretty reliably. One thing I use it for, often websites give you a curl command that you run on the console, you can ask an LLM to translate that to header 2 code. Or you can convert your LaTeX docs to Quarto, you can convert your SQL to dplyr, you can convert your SASS to R code, you can convert your words to regular expressions. Like many of these things, it's going to get correct a lot of the time, and even when it doesn't, just having that first shot of a translation can be really, really useful.

And I want to show you, kind of finish with like one kind of cool example from my colleague, that my colleague George came up with. So I'm going to take a screenshot of this shiny app that George drew, and I'm going to pop open Positron Chat, and I'm going to drag the screenshot in, and I'm going to be like, please, oops, please make this into a shiny app.

It's not done a very good job, actually. So this is one of these stochastic things that's certainly done better in the past, but you can see it's got the basic idea of the shiny app, right? It's got the variable, it's got a distribution, the distribution is not very useful because it only offers one thing.

It's done a pretty good job for a first pass. And that kind of brings me to the other area where I think LLMs give you a clear win, which maybe this is not the best name, but I kind of think of this as like lowering activation energy, where you can just do something very, very quickly. It might not be super correct or perfect, but it gives you something.

Maybe that's a quick demo or prototype that you can kind of look at and be like, oh, okay, this is cool. That's not cool. It gives you some immediate feedback. Maybe it's something that's just so awful, like you feel compelled to fix this. This is something that I find useful for writing. I'll ask it to write about something, not because I think it's going to do a good job, but because I know it's going to do a bad job, and then I feel compelled to correct it.

This kind of reminds me of one of my strategies back in the old day of the I help mailing list. Sometimes you ask a question and it's just crickets, like no one cares to answer. So instead of asking a question, you just state something confidently, and you know if you're wrong, someone will dive in and correct you.

The other thing I've been using it for lately is just, I've been working on test that. There are issues there that have been lingering for four or sometimes longer years, because I'm like, oh, this is going to be a lot of work. And so I've just been throwing flawed code at it, I'm just like, take a stab. And sometimes it comes up with an answer, and I'm like, oh, that problem actually wasn't that hard after all, this has all been in my head this whole time.

Also super great, like if you're working in an unfamiliar area, whether that's a new programming language or a new package or a new domain, LLMs get you good enough to be dangerous, for better or worse, very, very quickly.

And I think, and Claude actually came up with a pretty good summary of this, which is, I think, LLMs can make the cost of trying so low that you attempt things you'd normally postpone, avoid, or never get around to. And I think this is like a pretty compelling use case. Like you're not going to want to trust everything it produces, but you're now, like you've gotten unstuck, you're moving forward, sure it's going to make mistakes, but you're going to learn from those.

LLMs can make the cost of trying so low that you attempt things you'd normally postpone, avoid, or never get around to.

Concerns and closing thoughts

So I'm going to sum up, I feel like you can't really give a talk about LLMs without talking about some of the many, many, many concerns. I think the first kind of concern is the cost and the quality of access. I will say, I'd really encourage you to sign up for an API key, put $5 on it, and just see how far that takes you, because it takes you a surprisingly long amount. The LLMs are getting cheaper and cheaper and cheaper, and you can do quite a lot of cool stuff for only $5.

Also, fortunately, Google's Gemini model is kind of in a distant third place, and so Google is just spending cash like anything to try and get to make it popular, and that means it's got a very generous free tier right now, so you should take advantage of it. By and large, I think local models are not worth it. They can be in general, but if you're just getting started with LLMs, start with something that's much more likely to give you positive results, which is Anthropic, OpenAI, or Gemini.

I think another concern, like I genuinely, genuinely, genuinely believe you should care about the environment, you should worry about climate change, you should think about how much water and electricity you're conserving, you're using, but if you have flown on a plane in the last 12 months, that just blows away by orders of magnitude any environmental impact of your GPT usage. So while I think it kind of makes sense, unlike, you know, we need to think about this on a societal level, the actual impact of an individual chat GPT query is very, very small, and you should be thinking about other ways to save water and power.

Also lots of concerns about data privacy around LLMs. I think this is sort of surprisingly something that affects you more as an individual. If you're in a large company, basically every large company, all of their data already lives on Azure or GCP or AWS, like all of the data's already in the cloud, all the LLM models are running in the same cloud. Most large enterprises do not have these concerns because basically, you know, these big companies already technically have access to all of their data, but there's lots of protection in place to make sure it's not getting misused. So something you should think about as an individual, certainly if you're working with, you know, any confidential data, you need to be very, very cautious about sending it to these models, but most businesses can negotiate, you know, things with, negotiate contracts with these providers that ensure that your data is not going to be used to train future models.

Finally, the kind of the biggest concern I have, which I don't have a good answer for, is like evil billionaires. I feel like I'm becoming increasingly radicalized that billionaires shouldn't exist, but definitely a concern.

So again, to kind of come back to my first point, like if you are feeling like disempowered or out of control, or you like you don't understand how like AI and LLMs are going to affect your future life, I'd really encourage you to just do one little thing, do one little experiment that helps you kind of understand what's going on. I think you'll find that empowering. I think you'll learn that while there's lots of hype around models replacing data scientists and software engineers, we're a very, very long way from that happening in practice. They're very good at some things that can be very useful and empowering tools, but it's a long way from replacing our jobs.

The other thing I will say that I'm a little tempted to do is to do a little best defense as a strong offense, and really like if you're worried about your job being replaced by AI, it feels like managers are way more vulnerable to being replaced by AI, because so much of what a manager does is just summarize the information that they receive from multiple channels and report it out. So feel free to go on the attack at people higher up in your company.

So in summary, I kind of showed you three areas where I think LLMs are kind of particularly interesting and empowering today. Use structured data to turn unstructured data, text, images, audio, video, into rectangles. Then you can apply all of your rectangle skills that you already have.

Tools are a really neat way to extend the capabilities of LLMs. Tools are just functions plus metadata. You already know how to write functions. You can now extend LLMs to do whatever you want, including creating agents that do really dangerous and stupid things.

And then finally, LLMs like super, you know, sometimes super amazing at coding, sometimes super terrible at coding, but I think the two areas where they give the kind of clearest wins today are translations, like whenever you have expressed an idea clearly in one language or one image, you can translate that to another tool. And lowering the activation energy, giving you the ability to experiment, to try out, to expand into new areas where you don't know so much about. Thank you.