Keynote: I Wrote This Talk with an LLM - Hadley Wickham

Transcript#

This transcript was generated automatically and may contain errors.

Unfortunately, that was my talk abstract and title was a bit of a lie, I did not write this talk with an LLM, but I did make that abstract.

So what I want to actually give you is kind of a no-bullshit guide to LLMs, and I think there's a whole lot of bullshit about LLMs in the air , so my goal is to hopefully clear a little bit of that, if my clicker works.

And I think probably many of you have kind of like big feelings about AI, I certainly do, there's a lot, you know, if you read about AI, like on the one hand there's like so much unbridled optimism, and on the other so much like skepticism, and as you start to engage with it yourself, I think you kind of experience both of those emotions, because AI, LLMs can seem both like incredibly, surprisingly intelligent and also incredibly, surprisingly stupid as well.

So I'll show you a few examples of that, but I think it's really, like if you don't feel conflicted about AI, that's weird.

My kind of feelings of like existential dread kind of swing between mild and moderate, they're kind of currently on mild at the moment, but I certainly do, it does feel like my kind of identities as a software engineer and a data scientist are under threat, and it's very easy to feel kind of disempowered right now.

So like one of the things I do, well one of the things I do that's not so helpful is I get drunk and like donate money to good causes, I'm not going to promote that as a strategy, but the other thing that I found really useful is just to try and do like one small thing to make the world a better place. So I like occasionally when I'm feeling particularly down, I'll just like go out and pick up some trash around my neighborhood, and that kind of reminds me like it's not my job to fix the world, all I can do is make my little bit of the world a little bit better.

And so I kind of want to give you a bit of that today, I want to give you some ways that you can engage with AI, with LLMs that hopefully like empower you and make you feel a little bit more in control of your future.

And so my two goals today are to give you a handful of useful skills, and to demystify a few of the many, many, many jargony terms that come with AI.

And if there's one thing that data scientists are good at, it's working with rectangles.

So here's another little example. Here I've got four images, and I'm going to say, well, I'll just give you a little bit of code. I'm going to find all of the files in the animals directory. I'm going to use this content image file function. That's how you tell Alma you want to send an image to the LLMs. I'm going to say what I want back. I want the animal and the background color, and then I'm going to call that same function again.

And again, there's a little bit of stochasticity there. Like if I call this four times, I get four slightly different answers. But when you look at how they vary, this is kind of fair. Like what animal would you say this is? Like I think kitten and cat are both valid answers. What's the background? Well, I mean, it's kind of a beige-y gray.

So again, you're not going to get reliable, 100% reliable answers, but I think the problem here is that we have not been clear enough with exactly what we want. And I think the other thing that's interesting here is that once you've got that rectangular data frame, now you can use all of your statistical tools with it. And statistical tools don't assume that the data is measured without error. That assumption that there's going to be mistakes and variation and differences there is fundamentally baked into statistics. And I think this combination of structured data extraction and statistical modeling is super, super powerful.

But of course, you're going to really want to think about that. You're going to think about like, is this the right prompt? Am I getting the right examples for that? If you want to do that in the LLM world, that's called evaluating your LLM, evaluating LLM. And Simon Couch , my colleague, has been working on the vitals package, if you want to make that a little bit easier. Simon's on the tidy models team, and so the goal is to really bring like what we know about evaluating statistical modeling, combine that with the state-of-the-art of evaluating LLMs, and to give you something that's like well-principled and well-thought-out.

Tool calling and agents

The next thing I want to talk about is tool calling, and I want to illustrate that with a problem. Another thing that LLMs are bad at. If you ask an LLM what day it is today, best-case scenario, it's going to say, I don't know. If you ask this of chatGBT, it actually just makes up a date, which is not very reassuring. And LLMs like can't do anything, right, they're stuck in a box. You can't say, hey LLM, like delete the CSV files on my directory. And we can get around both of these limitations by using these things called tools.

And a tool is actually something that's really simple. It's just a function along with some metadata. So I'm obviously going to write R functions, you can write them in any language, and then you're going to provide a little bit of metadata to tell the LLM when it might want to use that tool.

So, for example, I can create a tool, this tool I've called today. It's going to call sys.date, which gives you today's date in R. I'm going to say what it does, and I'm going to say what arguments it takes, which in this case is not. And now that I've empowered my chat, I can ask it what day is it today, and it can tell me.

And when you look at this, you might be like a little surprised, like, is Anthropic like running R on their servers? That would be cool, but certainly a little surprising. That's not actually how tool calls work. So when you send or request the LLM, the LLM is going to be like, oh, okay, I recognize that I need to call this tool, and it's going to send that information back to you. That's going to get intercepted by Alma, and Alma is going to call the tool and return the result back. Now the LLM has all the information it needs to answer your original query.

So the neat thing about this is because the code is running on your computer, you can do anything that you can do on your computer. So one thing you could do, for example, is to give the LLM kind of traditional computing powers. I could give it the power to multiply two numbers together, and now if I ask it to multiply two large numbers together, it can do that correctly because it's going to use that tool.

And indeed, if you ask many of these LLMs on their websites to do these things, they'll just write a little bit of JavaScript code, because this is just another tool that they can use to answer the questions, and the models have been trained to know, or specifically the ones they kind of present on their websites, like don't try and do this stuff yourself, write a little program to do it.

That brings me to one of the most powerful things you can do with tools, and that's to create an agent. So if you've read anything about AI on the internet lately, you've probably heard the term agent or agentic a million times, and you probably still have no sense of exactly what an agent is. But it turns out if you know what a tool is, and that's just a function plus some metadata, it's really easy to explain what an agent is, because an agent is just a chatbot that's been empowered with two tools. One that can read the state of the world, and one that can write the state of the world.

So let's make an agent. I'm going to give my chatbot the ability to list all of the files on the current directory, and I'm going to give it the ability to delete files. And now I can say, hey, delete all the CSV files in the current directory. It's going to say, well, okay, to do that, I'm going to need to find out what files are in the current directory. So it runs, it requests that Alma call LS, Alma is going to return the results back to the LLM. It's going to say, okay, I found two CSV files, A and B, I'm going to delete those. It's going to send that tool call request back to Alma, Alma is going to delete the files, and now it's deleted those files on my computer.

And hopefully when you look at this, you're like, Hedley, what on earth are you thinking? You have just empowered an LLM, which is notoriously unreliable, to delete files on your computer. And certainly you can put a few guardrails around it, and there are some sort of built-in to the LLMs if you ask it to delete all the files on your computer. The model itself is going to be like, that's a bad idea, I'm not going to do that.

Basically, this is the state of security today with agents. It's the YOLO model. And I think discovering ... so agents are interesting because they offer so much potential. You can empower an LLM to do anything, and so much danger. You've now empowered an LLM to do anything. And it's going to take us, I think, collectively quite some time to figure out how to do this safely.

Using LLMs to help you code

So that brings me to my third topic, which is using LLMs to code. I want to talk about ... I'll sort of first center you into what the main techniques for using LLMs to code are. Hopefully you've seen many of these before. The first one is kind of the most basic. You just go into chatgpt.com and say, hey, give me a function that does blah, or I need to write some R code that solves this problem. That's kind of fine, but you're constantly copying and pasting back and forth.

This is kind of like the old days, like pre-markdown, where you're copying and pasting the results from Excel into your Word document, and something changes. Constantly copying things back and forth, and it's very, very easy to get out of sync.

So now most modern IDEs are starting to bring in AI tooling through either inline autocomplete, so that's just when you start typing, and the LLM helpfully sometimes, and not so helpfully most of the time, suggests finishing your thought. I think one of the skills of programming today is being able to ignore that when it's bad, and accept it when it's good.

You can also have chat that's inside your IDE. This is just sort of a convenience. It's exactly the same thing as chatting with chatgpt, but you don't have to copy things back and forth. You can just kind of click a button, and the chat knows more about what you're doing because it can see what you're working on.

And then finally, kind of the newest trend is agentic coding. This is things like a cursor or clawed code, where the LLM is just going to go off and do a bunch of work. It's going to write some code, it might write some tests, run those tests, and iterate for a little while.

And so I want to talk about, there's lots of ways in which LLMs do pretty useless and horrible things when coding, but I want to talk about the things that I think are clear wins. And there are two. And the first clear win, I think, are translations. So whenever there's all of the information the LLM kind of needs there, it just needs to be expressed in a different form, they tend to do pretty well.

So if you can really accurately write in pseudocode what you're trying to do, LLMs are going to turn that into code in any language pretty reliably. One thing I use it for, often websites give you a curl command that you run on the console, you can ask an LLM to translate that to header 2 code. Or you can convert your LaTeX docs to Quarto , you can convert your SQL to dplyr, you can convert your SASS to R code, you can convert your words to regular expressions. Like many of these things, it's going to get correct a lot of the time, and even when it doesn't, just having that first shot of a translation can be really, really useful.

And I want to show you, kind of finish with like one kind of cool example from my colleague, that my colleague George came up with. So I'm going to take a screenshot of this shiny app that George drew, and I'm going to pop open Positron Chat, and I'm going to drag the screenshot in, and I'm going to be like, please, oops, please make this into a shiny app.

It's not done a very good job, actually. So this is one of these stochastic things that's certainly done better in the past, but you can see it's got the basic idea of the shiny app, right? It's got the variable, it's got a distribution, the distribution is not very useful because it only offers one thing.

It's done a pretty good job for a first pass. And that kind of brings me to the other area where I think LLMs give you a clear win, which maybe this is not the best name, but I kind of think of this as like lowering activation energy, where you can just do something very, very quickly. It might not be super correct or perfect, but it gives you something.

Maybe that's a quick demo or prototype that you can kind of look at and be like, oh, okay, this is cool. That's not cool. It gives you some immediate feedback. Maybe it's something that's just so awful, like you feel compelled to fix this. This is something that I find useful for writing. I'll ask it to write about something, not because I think it's going to do a good job, but because I know it's going to do a bad job, and then I feel compelled to correct it.

This kind of reminds me of one of my strategies back in the old day of the I help mailing list. Sometimes you ask a question and it's just crickets, like no one cares to answer. So instead of asking a question, you just state something confidently, and you know if you're wrong, someone will dive in and correct you.

The other thing I've been using it for lately is just, I've been working on test that. There are issues there that have been lingering for four or sometimes longer years, because I'm like, oh, this is going to be a lot of work. And so I've just been throwing flawed code at it, I'm just like, take a stab. And sometimes it comes up with an answer, and I'm like, oh, that problem actually wasn't that hard after all, this has all been in my head this whole time.

Also super great, like if you're working in an unfamiliar area, whether that's a new programming language or a new package or a new domain, LLMs get you good enough to be dangerous, for better or worse, very, very quickly.

And I think, and Claude actually came up with a pretty good summary of this, which is, I think, LLMs can make the cost of trying so low that you attempt things you'd normally postpone, avoid, or never get around to. And I think this is like a pretty compelling use case. Like you're not going to want to trust everything it produces, but you're now, like you've gotten unstuck, you're moving forward, sure it's going to make mistakes, but you're going to learn from those.

LLMs can make the cost of trying so low that you attempt things you'd normally postpone, avoid, or never get around to.

Concerns and closing thoughts

So I'm going to sum up, I feel like you can't really give a talk about LLMs without talking about some of the many, many, many concerns. I think the first kind of concern is the cost and the quality of access. I will say, I'd really encourage you to sign up for an API key, put $5 on it, and just see how far that takes you, because it takes you a surprisingly long amount. The LLMs are getting cheaper and cheaper and cheaper, and you can do quite a lot of cool stuff for only $5.

Also, fortunately, Google's Gemini model is kind of in a distant third place, and so Google is just spending cash like anything to try and get to make it popular, and that means it's got a very generous free tier right now, so you should take advantage of it. By and large, I think local models are not worth it. They can be in general, but if you're just getting started with LLMs, start with something that's much more likely to give you positive results, which is Anthropic, OpenAI, or Gemini.

I think another concern, like I genuinely, genuinely, genuinely believe you should care about the environment, you should worry about climate change, you should think about how much water and electricity you're conserving, you're using, but if you have flown on a plane in the last 12 months, that just blows away by orders of magnitude any environmental impact of your GPT usage. So while I think it kind of makes sense, unlike, you know, we need to think about this on a societal level, the actual impact of an individual chat GPT query is very, very small, and you should be thinking about other ways to save water and power.

Also lots of concerns about data privacy around LLMs. I think this is sort of surprisingly something that affects you more as an individual. If you're in a large company, basically every large company, all of their data already lives on Azure or GCP or AWS, like all of the data's already in the cloud, all the LLM models are running in the same cloud. Most large enterprises do not have these concerns because basically, you know, these big companies already technically have access to all of their data, but there's lots of protection in place to make sure it's not getting misused. So something you should think about as an individual, certainly if you're working with, you know, any confidential data, you need to be very, very cautious about sending it to these models, but most businesses can negotiate, you know, things with, negotiate contracts with these providers that ensure that your data is not going to be used to train future models.

Finally, the kind of the biggest concern I have, which I don't have a good answer for, is like evil billionaires. I feel like I'm becoming increasingly radicalized that billionaires shouldn't exist, but definitely a concern.

So again, to kind of come back to my first point, like if you are feeling like disempowered or out of control, or you like you don't understand how like AI and LLMs are going to affect your future life, I'd really encourage you to just do one little thing, do one little experiment that helps you kind of understand what's going on. I think you'll find that empowering. I think you'll learn that while there's lots of hype around models replacing data scientists and software engineers, we're a very, very long way from that happening in practice. They're very good at some things that can be very useful and empowering tools, but it's a long way from replacing our jobs.

The other thing I will say that I'm a little tempted to do is to do a little best defense as a strong offense, and really like if you're worried about your job being replaced by AI, it feels like managers are way more vulnerable to being replaced by AI, because so much of what a manager does is just summarize the information that they receive from multiple channels and report it out. So feel free to go on the attack at people higher up in your company.

So in summary, I kind of showed you three areas where I think LLMs are kind of particularly interesting and empowering today. Use structured data to turn unstructured data, text, images, audio, video, into rectangles. Then you can apply all of your rectangle skills that you already have.

Tools are a really neat way to extend the capabilities of LLMs. Tools are just functions plus metadata. You already know how to write functions. You can now extend LLMs to do whatever you want, including creating agents that do really dangerous and stupid things.

And then finally, LLMs like super, you know, sometimes super amazing at coding, sometimes super terrible at coding, but I think the two areas where they give the kind of clearest wins today are translations, like whenever you have expressed an idea clearly in one language or one image, you can translate that to another tool. And lowering the activation energy, giving you the ability to experiment, to try out, to expand into new areas where you don't know so much about. Thank you.

Keynote: I Wrote This Talk with an LLM - Hadley Wickham

Transcript#

Introducing the ellmer package

What LLMs are bad at

What LLMs are good at

Using LLMs as a discovery tool

Structured data extraction

Tool calling and agents

Using LLMs to help you code

Concerns and closing thoughts

Featured software#

rstudio

tidyverse