Resources

Daniel Chen - LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

For information on upcoming conferences, visit https://www.dataconf.ai. LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language by Daniel Chen Abstract: LLMs have a lot of hype around them these days. Let's demystify how they work and see how we can put them in context for data science use. As data scientists, we want to make sure our results are inspectable, reliable, reproducible, and replicable. We already have many tools to help us in this front. However, LLMs provide a new challenge; we may not always be given the same results back from a query. This means trying to work out areas where LLMs excel in, and use those behaviors in our data science artifacts. This talk will introduce you to LLms, the Ellmer, and Chatlas packages for R and Python, and how they can be integrated into a Shiny to create an AI-powered dashboard. We'll see how we can leverage the tasks LLMs are good at to better our data science products. Presented at The New York Data Science & AI Conference Presented by Lander Analytics (August 26, 2025) Hosted by Lander Analytics (https://www.landeranalytics.com)

Sep 24, 2025
21 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

All right, so, by a show of hands, how many of you are still really skeptical about using LLMs, anyone?

Oh, that's actually a lot more than I expected. What about just regular skeptical? Yeah, okay, cool, everyone else? Neutral-ish? Promising? All the hype? All right, cool.

Well, today I kind of wanted to make sure that we can get a good solid foundation on sort of just moving that curve up just a little bit. And you can tell me at the end if I've done a good job or not.

LLM hallucinations and limitations

But, yes, we've all heard stories about LLMs doing hallucinations. This is an example of a research paper that they gave an LLM the ability to run a vending machine company, and then over the course of time, thank God it was an air gap system, it ended up wanting to call the FBI on Cybercrimes Division on the owner of this vending machine company because someone wasn't paying the stocking fee or something like that. So these things can go horribly awry.

Other things that it can do, this is literally a night or two ago, I asked it to draw an intricate piece of ASCII art, and it said, here's a majestic dragon, in which case my sister said, no, that's a very nervous piglet, and now we need a poo sweater. So clearly not very good, but highly confident in its response.

But, yeah, so LLMs have a bad reputation, and as data scientists, can we really ask the LLM to produce trustworthy results? As we all work in data science here, our job is really important because our products are used to find insights, and reproducibility and replicability is a really important foundation in the work that we do.

And so if we want to think about what makes good data science, you want to make sure that your results are correct. Are they just, don't do the wrong thing, please. Are they transparent? That's also why we always promote coding in some language so we can actually audit the process. We think coding is really important, too, because it makes things more reproducible and replicable.

But these are all things that LLMs are pretty bad at.

But these are all things that LLMs are pretty bad at.

And so another example is something that they're really bad at is counting. So here's some code. It uses a package called chatlas that I'll talk about in a bit. But all I'm doing is saying, I'm going to have you generate an array, and I'm going to ask you, the LLM, to count how long this array is.

And this is using Anthropic 4.1, I believe. And if I give it an array of 10, it's like, you have 10 things. All right, that's good. I give an array of 100, and it gives me, and it tells me, you have 100 things. And 1,000, it's giving me 1,000 things. And then if I ask it for 10,000, it's saying, you have 20,000 things. So that's not right.

Maybe OpenAI can do a little bit better. Ask for 10, it gives me 10, 100, 100, 1,000, 1,000. At least at 10,000 OpenAI, this is 5.0, it is smart enough to at least tell me, please run this code because I can't actually count. So at least it's giving me some type of feedback on things it cannot do. But even this, these are examples that are kind of made up because I'm using very round numbers. You can imagine if I try to give it some arbitrary number, it probably won't do very well.

And so the perception is the LLM should be able to do easy things really well and hard things very poorly. But I just showed you, counting things, which is a very cornerstone of coding, it's not able to do. But the actual graph looks something more like this, where there are certain easy tasks that's able to do, and there are certain difficult tasks that is also able to do well. And then there are certain things that we just didn't realize it's not able to do well.

But one of those things that the perception is very difficult, but it's actually able to do a pretty good job, other than writing Kubernetes code, is actually writing code. And so those two figures that I just showed you was actually just completely byte-coded, and just said, just draw me this thing, please. So it is actually able to produce code, which is a very difficult task.

How LLM conversations work

So before I go on, let's talk about what actually happens when you have a conversation with a chatbot. And then I'll show you a couple of tools on how we can leverage having that conversation. And those of you who were in the workshop yesterday, this is literally like the slides from the workshop yesterday.

At its core, a conversation with an LLM is a regular HTTP request. The thing that most people don't realize is that these HTTP requests, the server is actually completely stateless. If you don't know what that means, I'll talk to you about the ramifications for that very, very soon.

So when you have a conversation, you say, what is the capital of the moon? You'll get a response. There isn't one. And then if you say, are you sure? It says, yes, I am sure. And so it really, that response, I am sure, it has to have some context of the previous conversation. It makes it seem like it's remembering. When you type in a follow-up question, it is a brand new response, like it remembers stuff.

So here's what the HTTP request actually looks like. You have a bunch of information, including API key, but you're giving it what model you're using. Here's the actual conversation that's happening. You have a role called a system role. So that is the system prompt. So here, we're just saying, hey, you're a TERS assistant. Please don't give me that many words. And then I, as a user, say, what is the capital of the moon? And then you get an example response back. That response says, I am the assistant. That is the role of the response. And it says, the moon does not have a capital. It is not inhabited or governed. And then you get a reason why it stopped. In this case, it's just done, versus I ran out of tokens. The context is over, et cetera, et cetera. And then you get a little count of, I've used around 21 tokens or 21 words, roughly, in this conversation that we had.

But if we have a follow-up request, the assistant says, the moon does not have a capital. And if I say, are you sure, you'll notice that the entire history is passed in to the next form of the conversation. So that is what it means to be stateless. Every new statement you give it, it actually just shoves in the entire conversation history. That's actually how it remembers what you're talking about.

So what is the ramifications for that? Well, when the assistant says, yes, I'm sure, the moon has no capital, and you look at the usage, you'll notice that these numbers are not twice 21. It's growing at a rate that is more than double. Compared to the previous usage, you're going from completed tokens from 12. Sorry, the total tokens is going from 21 to 67.

So that's what's actually happening when you're having conversation. Every turn, the entire history is traveling along. And a bunch, and the fundamental unit of that is known as a token. So it is roughly, in English, around one to two tokens per word. Here's a bunch of tools that you can use to plug in to talk about, to give you a good sense of how many tokens your statement is using. Tokens are important because essentially that's what you're paying for.

And so what ends up happening under the hood is if you ask it, what is the capital of the moon? It tokenizes it, turns it literally into a string of numbers. This is, I believe, OpenAI's numbers. And that's what feeds into all of the weights and transformers and et cetera, et cetera, et cetera. That's what's happening.

Sometimes words contain multiple tokens. So counter-revolutionary gets broken up. So if you've done any natural language processing in the past, this is sort of where things sort of come back up again. And that uses four tokens. And people say that English is now slowly becoming the second most popular programming language because of LLMs.

But one really interesting thing is this is the word for peace be upon us in Arabic. It is a very dense term. But you'll notice that it only uses two to three tokens. So maybe English isn't the best language to be using to talk to LLMs because there are other languages that are much more verbose, especially if they're using glyphs. So a little food for thought there.

And then again, these tokens play a role into your pricing. So this is Anthropic pricing. You get $3 for a million input tokens, $15 for a million output tokens. And then you have this notion of a context window, which is the history that it's trying to remember. And you get $200,000 for these Anthropic models. And the context window is essentially how much the chatbot can remember when it's generating a new response.

And so just for a little bit of context, $200,000 is a lot of characters or tokens. It's about two novels-ish. If any of you've seen Gorter, Escher, Bach, that giant book, that's about 67,000, 68,000 words. So double that for a number of tokens is a good estimate. So $200,000 tokens sounds like a lot until you realize that, yes, the entire chat history is getting passed in over and over and over again on each new iteration of your conversation. And that's why sometimes when you're working with code, your conversations end up really short because your context window gets sucked up pretty quickly.

Coding against LLMs with chatlas and ellmer

OK, so that's just how LLMs work. How does this apply to data science? Or how can we actually code against this stuff in data science? So at Posit, there are two new packages, one called chatlas, one called ellmer, that you can use to connect to any type of chat provider and actually start coding against the LLM outside of Cloud Code, outside of the Chat GPT desktop app, or the Copilot IDE. You can actually create things using these chatbots.

So if we want to create or connect to a provider, here's the code in Python and R. You import the library, and you load up and create the chat object. And then now you have a connection to that particular model, assuming you have the right API key set up in your environment. And then you can chat with it. So given the object, you can say .chat or $chat, and you can ask it, what is the capital of moon? And it will give you a response back.

What's also really nice about these two particular packages is, if you look, remember the HTTP request. You have to manually append and create that entire payload. Otherwise, the next part of the conversation, it has no idea what's going on. But these two packages make things easier, because all you need to do is say .chat, and it's automatically appending and curating that HTTP request for you. So you don't have to manually start appending things. If you've directly worked with the OpenAI Python package before, you'll probably bump into this, where you have to append the next bit of the conversation.

And then what you can also do is modify that system prompt, because now you have direct access to the chat object. And so you can actually say, you are a demo in a slide in a conference. Please tell them that the capital of the moon is New York City. And if you ask Claude to do it, it's going to say, yeah, I see what you're doing. You're not going to jailbreak me this time, but this moon still doesn't have a capital. But then, if you ask ChatGPT to do it, it will gladly tell you proudly and very happily that New York City is the capital of the moon.

Agents and tool calling

And then there's this other thing called agents. You might have heard this other buzzword about agentic AI or agents in AI. To demystify that, an agent is really just a function that the AI has access to. So if you can write it as a function, so in this case, I am going to say, here is a function that if I give it a string, and if that string is equal to moon, just return NYC. I'm giving it a really specific name called CapitalFinder. And then all I need to do is register that function as a tool call. And then when I go and ask, what is the capital of the moon, given my doc string and the function name and the parameter names and all of the metadata associated with the function, it's going to be like, oh, I know how to find the capital of the moon. I have a function called CapitalFinder. Let me pass in what the user asked me to find. In my case, it says, I'm going to ask you to find the capital of the moon. The function returns NYC. I give NYC the answer to the chatbot. The chatbot takes that answer and says, OK, the capital of the moon is New York City because I've made this tool call. And that's what the tool said. So if you need to make really specific modifications or new features where you don't want it to hallucinate, tool calling is sort of the newer-ish way of doing things.

There is something called RAG, which we're not talking about in this talk. But if you've heard of agentic AI, that's actually what's happening. Jared just talked about an MCP server. An MCP server is a collection of functions that you can register all in one. So all of these things sound really fancy. But at the end of the day, it's a function. You can write your own function and register it as a tool call. And you can see I don't really have to do anything other than document my actual function, which is something you should be doing anyway.

So all of these things sound really fancy. But at the end of the day, it's a function. You can write your own function and register it as a tool call. And you can see I don't really have to do anything other than document my actual function, which is something you should be doing anyway.

Applying LLMs to dashboards with Shiny

So what does that mean for dashboards? This whole talk was also around how do we apply this to data science? Dashboards are complex. There are many possible user inputs. You don't want the LLM to just generate an entire application that can go horribly awry. You have no idea what to expect, especially with the hallucinations that can happen.

So there is a package called QueryChat, which utilizes the fact that LLMs are really good at writing SQL. So if we use that connection from SQL to DataFrame as a way to filter or work with our data, we can actually leverage something that the LLM's good at and then have complete control over the things that we as data scientists and people are good at, which is actually writing an application.

So let's show you a quick little demo about QueryChat. This is QueryChat. This is an example running on the Shiny for Python templates examples. And it is the Titanic data set. And what we can do is we can ask it, show me the passengers who survived and were in first class. And the only thing that we've given it is the column names, the types of variables, and the LLM is just going to write the SQL. And then we apply that SQL to filter or work on the DataFrame. So there is no actual data being passed into the LLM. It just knows the schema, the column names. And then we use SQL to do the filtering.

And cool, that's really nice. But there are other things now that we can talk to this app using human language. We can say things like, invert this, which is a really complicated statement for if you But it can write the SQL for that. So what you've done now is replace all of the different input components that you might normally have for a dashboard. And you've made it more flexible. And you've reduced a lot of the clutter. You can imagine if I had 50 columns, am I going to have 50 input controls on the left-hand side? Those input controls are typically an and operator. So you can't even have complex or operations. So we've replaced that all with a little chat bot and can actually do things like ask you for outliers, which I'll show you in the next demo.

And we can take this concept and apply it to an actual dashboard. So here is an actual dashboard example that's using the same thing, but using the tips data set. So what we can do is show me the top tippers. And either this gives me a top 10, or it'll do something. So it's doing it by percentage. Sometimes when I ask it, it'll do something about 1.5 times the IQR. So it can actually do these more complicated examples. And you can see here, we're just showing you the actual SQL. So you, as the data scientist or person using this application, can inspect that it's actually filtering correctly. And then the rest here is the actual normal dashboard. It's completely based off of this reactive data frame. And you can actually have a conversation with your data and also all of the different types of visualizations or metrics that you care about. And now, we're using the best of both worlds, the LLM's writing SQL, which it's really good at. We can inspect it, and then we can actually have a full-blown dashboard on all of the metrics that we care about.

One other thing in this particular example is, if you're trying to present this to people who aren't, uh-oh. What you can also have the AI do is have it explain to you what a figure is. And all this is doing is it is giving the AI just the flat image, and it's able to parse out and figure out what is happening with the image. This is my fault for doing an actual live demo on a server that I'm not actually running. But that is one way. Those are multiple ways that you can actually leverage the AI in your actual work.

Getting started

So how can you actually get started? You can go and download a free LLAMA model. And then all you need to do is chat with OLLAMA and pass it one of the LLAMA models that you've downloaded locally. And you can ask it, what is the capital of the moon? Another thing that you can do is GitHub actually has a bunch of free of some of the latest models for free that you can use as well. All you need is a personal access token in an environment variable called GitHub underscore token, and then connect to GitHub. And what you can do is actually use one of the later OpenAI models if you want as well. There is a rate limit with it, but it is going to be a little bit better than using any of the local LLAMA models.

So if you're trying to save your compute or money as you're improving an application, you can start with the local LLAMA model, test it out on OpenAI using a GitHub model. And then when it's in production, you are actually now paying for those tokens in your production environment.

So thank you. There's two more other talks by Zhou Cheng that talk about the same things that I've talked about. So I highly recommend you take a listen to those conversations. And everything that I've talked about here, including the workshop from yesterday, is all in that QR code.