Daniel Chen - LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

Transcript#

This transcript was generated automatically and may contain errors.

All right, so, by a show of hands, how many of you are still really skeptical about using LLMs, anyone?

Oh, that's actually a lot more than I expected. What about just regular skeptical? Yeah, okay, cool, everyone else? Neutral-ish? Promising? All the hype? All right, cool.

Well, today I kind of wanted to make sure that we can get a good solid foundation on sort of just moving that curve up just a little bit. And you can tell me at the end if I've done a good job or not.

LLM hallucinations and limitations

But, yes, we've all heard stories about LLMs doing hallucinations. This is an example of a research paper that they gave an LLM the ability to run a vending machine company, and then over the course of time, thank God it was an air gap system, it ended up wanting to call the FBI on Cybercrimes Division on the owner of this vending machine company because someone wasn't paying the stocking fee or something like that. So these things can go horribly awry.

Other things that it can do, this is literally a night or two ago, I asked it to draw an intricate piece of ASCII art, and it said, here's a majestic dragon, in which case my sister said, no, that's a very nervous piglet, and now we need a poo sweater. So clearly not very good, but highly confident in its response.

But, yeah, so LLMs have a bad reputation, and as data scientists, can we really ask the LLM to produce trustworthy results? As we all work in data science here, our job is really important because our products are used to find insights, and reproducibility and replicability is a really important foundation in the work that we do.

And so if we want to think about what makes good data science, you want to make sure that your results are correct. Are they just, don't do the wrong thing, please. Are they transparent? That's also why we always promote coding in some language so we can actually audit the process. We think coding is really important, too, because it makes things more reproducible and replicable.

But these are all things that LLMs are pretty bad at.

But these are all things that LLMs are pretty bad at.

And so another example is something that they're really bad at is counting. So here's some code. It uses a package called chatlas that I'll talk about in a bit. But all I'm doing is saying, I'm going to have you generate an array, and I'm going to ask you, the LLM, to count how long this array is.

And this is using Anthropic 4.1, I believe. And if I give it an array of 10, it's like, you have 10 things. All right, that's good. I give an array of 100, and it gives me, and it tells me, you have 100 things. And 1,000, it's giving me 1,000 things. And then if I ask it for 10,000, it's saying, you have 20,000 things. So that's not right.

Maybe OpenAI can do a little bit better. Ask for 10, it gives me 10, 100, 100, 1,000, 1,000. At least at 10,000 OpenAI, this is 5.0, it is smart enough to at least tell me, please run this code because I can't actually count. So at least it's giving me some type of feedback on things it cannot do. But even this, these are examples that are kind of made up because I'm using very round numbers. You can imagine if I try to give it some arbitrary number, it probably won't do very well.

And so the perception is the LLM should be able to do easy things really well and hard things very poorly. But I just showed you, counting things, which is a very cornerstone of coding, it's not able to do. But the actual graph looks something more like this, where there are certain easy tasks that's able to do, and there are certain difficult tasks that is also able to do well. And then there are certain things that we just didn't realize it's not able to do well.

But one of those things that the perception is very difficult, but it's actually able to do a pretty good job, other than writing Kubernetes code, is actually writing code. And so those two figures that I just showed you was actually just completely byte-coded, and just said, just draw me this thing, please. So it is actually able to produce code, which is a very difficult task.

So all of these things sound really fancy. But at the end of the day, it's a function. You can write your own function and register it as a tool call. And you can see I don't really have to do anything other than document my actual function, which is something you should be doing anyway.