
querychat in Python: Query Your Data with Natural Language | Shiny + LLMs | Veerle van Leemput
Learn how to use QueryChat to add natural language querying to your Python data applications. This tutorial shows you how to let users ask questions about data without building complex filters or dashboards. In this video, data scientist Veerle Eeftink demonstrates how to integrate QueryChat with Shiny for Python to create interactive data exploration tools powered by LLMs (Claude, GPT-4, Gemini). What You'll Learn: - Set up QueryChat in minutes with any data source - Understand how QueryChat translates natural language to SQL - Build custom dashboards with reactive data frames - Configure LLM behavior with greetings, data descriptions, and instructions - Ensure safety and read-only access for production environments Key Features: Works with Claude, GPT-4, Gemini, and other LLMs Transparent SQL generation (no black box) Integrates with Shiny, Streamlit, and other frameworks Uses DuckDB or bring your own SQL database Read-only operations for data safety Enterprise-ready with private LLM support Resources: - QueryChat Documentation: [https://posit-dev.github.io/querychat/py/index.html] - Blog Post: "Where Questions Become Queries" [https://shiny.posit.co/blog/posts/querychat-python-r/] - SheScores Demo App [https://github.com/hypebright/shescores-dashboard/tree/main/Python] #Python #DataScience #LLM #AI #Shiny #QueryChat #NaturalLanguage #SQL #DataVisualization Timestamps 00:00 - Introduction: The dashboard problem 01:28 - What is QueryChat? 02:36 - Demo: Three things you'll learn today 03:00 - Setup: LLM API keys and installation 04:04 - Quick start: First QueryChat app in 5 lines 04:27 - Live demo: Natural language queries 06:10 - How QueryChat works under the hood 07:19 - Four benefits: Reliability, transparency, reproducibility, safety 08:17 - Tool calling explained 08:54 - Schema information and DuckDB integration 09:29 - Building custom interfaces 09:45 - Customizing behavior: Greetings and data descriptions 11:16 - Adding custom instructions to the LLM 11:35 - Building a Shiny dashboard with QueryChat 13:09 - Code walkthrough: Sidebar chat and reactive data frames 14:14 - Complete dashboard demo 14:52 - Safety and security deep dive 16:22 - Resources and links
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey, my name is Veerle Eefting and I'm a data scientist slash consultant slash corporate trainer who loves building data solutions and likes to be involved with basically everything web dev. Today I'm going to talk about something called querychat. Okay, so I'm kind of a data nerd who loves data and you, you probably love some data too. And dashboards? Well, you probably really love building them. You did everything right, you know, you have a polished layout, you collected feedback, handled edge cases, and made sure your dashboard is reliable and works wonderfully.
Now, you're very proud of it, so you show it to a colleague and they say, ah yeah, that looks awesome. Can you show me just, you know, the FIFA World Cup matches? And you say, well, of course I can. And you start filtering it down. Then they ask, well, that's cool. Now, can you show me all the matches where the Netherlands played? And suddenly, well, things get a little bit tricky because, well, there is no country filter. And adding one means you need to edit the code, update the UI, redeploy the app. It's, you know, quite a hassle. And then comes the next question and another one and yet another one. And eventually it's kind of obvious, you know, even the most perfectly designed dashboards cannot answer every question.
even the most perfectly designed dashboards cannot answer every question.
Now, that's exactly where querychat enters the picture.
What is querychat?
So, what exactly is querychat? Well, querychat makes it easy to query data using natural language. It offers a drop-in component for multiple popular frameworks, including Shiny, which we'll use here in this video. It also offers a console interface and other programmatic building blocks. You ask questions, querychat translates it to SQL, executes it, and returns the results. The results are then available as a reactive data frame, which makes it easy to display or further process the data. There's no clicking, no predefined filters. It's just you, your data set, and, you know, questions like, show me the matches where the Netherlands played in the World Cup. Or can you filter the matches from the 2025 UEFA Euro? Or which team has the best win rate in the Canada versus United States matchup?
Today, I'll show you three things. One, how to get started with querychat really quick. Two, how it all works and why it's reliable, transparent, and reproducible. And three, how to build custom polished interfaces around querychat.
Getting started
Now, of course, you're eager to see what this is all about. So, let's get started. querychat uses a large language model, aka LLM, to translate your questions into SQL, which means we need an API key for such a model. Now, there are many choices when it comes to provider and or model, but in this demo, we'll be using Claude 4.5. Do you fancy another model? That's also perfectly fine. Other good choices would be GPT 4.1, which is the current default for querychat, or Google Gemini 3.0. Basically, as long as Chatlas, the package that is powering querychat, supports it, you can use it. Right, once you have your key, you put it in your environment file, your .env file, and it's recommended to use the .env package to load the environment file into your environment. So, we'll do that. Then, we can get started with querychat.
First, we install querychat from PyPi. We import querychat then, point querychat to our dataset, results with scorers. Then, we specify our provider model, and that's all. Then, we use the app method, where we can specify any framework of our choice. In this case, we opt for Shiny, which is the current default, but you could also use Streamlit, for example. And in that case, you can use the framework parameter of the app method. Now, we save the file, and we run it.
So, okay, let's see what this all is about. Let's ask a question. For example, here, can you show me all the matches where the Netherlands played? Okay, then we hit enter. We do have to wait for a bit until it loads. Once it's finished, we see that our data is being updated. We also see the query which was executed. It's a simple select statement that finds rows that contain the Netherlands in the home or away team.
Now, let's ask a follow-up question. Which players have the most goals for the Netherlands? Again, we do have to wait a bit for the results to be finished, and then a query and some results pop up. The top scorer here, as you can see, is Viviana Miedema, followed by Lincoln Martens. Okay, cool. So, querychat even proposes some follow-up questions. Let's click the first one here. All right, show me all the matches where Viviana Miedema scored. We hit enter, click this button, wait a bit, and again, the data updates accordingly, and we can review the SQL that was executed.
Now, within seconds, you have a working chat interface connected to your data. You type your question in the chat window, and querychat interprets the question, then generates SQL, executes it, and shows you the result. This works for anything, grouping, filtering, summaries, comparisons, you name it. And all of that with just a tiny bit of code. No complex UI needed. You're not defining every possible input. You're just letting users ask the questions they actually care about. This is, of course, extremely powerful, and I can imagine it makes you a little bit curious too. How does querychat work?
How querychat works
You could think about querychat as, you know, just a chat window sitting on top of a dataset, but it's more. With querychat, you never have to think about how to get to an answer. You don't have to think about what filters to apply or how to extract information from the filter data. No, you simply ask a question, and querychat just handles the rest. And okay, okay, I hear you thinking. You could also ask all those questions in a typical LLM chat tool. Yeah. But querychat is a bit different. You're not relying on the LLM to invent answers or reason about the data internally. No, with querychat, every question is translated into SQL. The SQL is then executed on the actual dataset and returned exactly as the data dictates. And also very important, the SQL is always shown. This means you can see precisely what is being executed.
So there are four benefits of querychat. One, reliability. The LLM does not touch the raw data itself. It doesn't analyze or, you know, transform it. It only generates SQL. querychat handles the execution of that SQL via tool calling, and all results come from the real data engine and not from the model's internal guesswork. Two, transparency. Every run reveals the full SQL statement. Nothing is hidden, nothing is adjusted, and you always know how the answer was actually produced. Three, reproducibility. Since every SQL query is visible, analysis can be reused, shared, and audited. And lastly, four, safety. querychat's tools are designed with read-only actions in mind, meaning the LLM is essentially unable to perform any destructive actions.
Great. Now, I mentioned querychat handles execution of the SQL via tool calling. Tool calling is essentially a bridge between an LLM and your Python session. The model itself does not execute code. Instead, it requests your Python session to execute a certain function with certain inputs. Once Python gets the results, the result is passed back to the model for interpretation. But, you know, you can't leave the LLM completely clueless and in the dark. The LLM does need to know something about your data, of course. For example, the columns, their meaning, their type. Now, this schema information is shared with the model, but not the raw data itself. With this schema information, it produces a SQL query as a tool call. Now, running SQL requires a database engine, and querychat's weapon of choice is DuckDB. If you have your own database, that's no problem, too. You can also bring your own. If your database is SQLAlchemy supported, it can be used by querychat. That's pretty smart, right?
Building a custom interface
So, while this quick app is nice, we want to do a little bit more, like building our own custom interface. And it all starts with tweaking querychat's behavior. We can customize the querychat object parameters to our liking. For example, to start, we can and should add a custom greeting, which we can provide in a Markdown file with nice pan HTML tags that make the suggestions clickable and put them right in the chat box when being clicked. Now, the greeting is the first thing people see, so, you know, it's best to leave a good first impression. Now, besides a greeting, we can also add a data description, giving the LLM a little bit more context about the data. This is great for data with cryptic column names or data that is very niche. Again, we provide this information in a Markdown file, and here there's no specific format needed.
So, as an example, here we provide a general description of the data, list all the columns with their type, providing context where needed. And more context means better results. The more the model knows about the columns and their types, the better the SQL it eventually generates. It is good to note, though, that you don't necessarily have to provide this extra information. And if you don't, querychat will just send the column names and their types, and it lets the LLM figure out the rest. Now, a last thing we can do is adding some extra instructions. And no surprise here, but we also provide these in a Markdown file. These instructions further tweak the LLM's behavior, and you can add anything you like. Let the LLM use a certain tone, add emojis after every sentence if you want, or add an occasional joke here and there. Or you can be a bit more serious and let the model use British English spelling or directions on the terminology to use, like we're doing here.
Right, that's all looking good. We have a greeting, a good data description, and extra instructions. The next step would be to create the interface. Now, there are several ways to build dashboards in Python, as you know, but let's just say we've built one with Shiny for Python. Shiny for Python provides a way to build fast, beautiful web applications in Python. You can build apps really quickly and create interactive visualizations in an afternoon. Unlike other frameworks that are targeted at data scientists, Shiny doesn't limit you. Shiny remains extensible enough to power large production-grade applications, too.
So yeah, Shiny, great choice for our dashboard. And to be able to ask natural language questions in our dashboard, we need a chat window. Now, querychat plays especially nice with a chat window in a sidebar, which you call with the sidebar method. Now, normally such a sidebar is a good place to put things like a year filter or a country filter or a tournament filter, but now we can just have a chat window sitting there. Obviously, we would want to use the results that querychat returns, and as said earlier, querychat returns a reactive data frame that can be used to further process or display the data. How do we get it? Well, by simply calling the df method on the QC object, which returns reactive data. And this reactive data can be used as the basis of our, let's say, value boxes, maps, graphs, tables, everything. As soon as the data then changes, the dashboard view changes as well.
Now, let's inspect our code a little bit closer. First, we import Shiny Express and querychat Express. Then we load our data, a simple CSV file here. We define the locations of our custom files, and we initialize the querychat object with these files. Of course, we also have to specify which data to use and what the provider and model is that we have chosen. Now, over here, we add a sidebar chat control with QC sidebar, and then our main page consists of a column layout with some value boxes. Here we have the top-scoring country value box, and it uses the reactive data that querychat returns. Then the only thing we have to do is call QCDF for the data, and we do this for every value box. Besides the top-scoring country, we also have the top scorer and the total number of countries. Then over here, we have a map that is displaying the tournaments, which, surprise, also uses this DF. Now, this is also the same for our chart and basically anything that is in our application.
The results? Well, a nice interface that can handle any question. Remember the conversation with our colleague from the very beginning? Well, all the questions are not an issue anymore. There are no custom filters. There are no limits. We have just querychat.
Safety and enterprise use
In just 10 minutes, you learned how to use querychat to build your next application, but before you take off, I want to come back to one important aspect of querychat, though, and that is safety. Because working with LLMs, you'll always wonder, is this safe? It's a fair question, but luckily, querychat is designed entirely around control. The LLM never executes anything itself. It never touches your data or your database, and it never sees your raw data. It only sees the schemas and the extra information that you provide yourself. Its only job is to propose read-only SQL. If you ask to edit the data or to drop tables, it will not do it. It's instructed to do so. Now, combine that with an underlying database, like could be DuckDB or could be your own, that only provides read-only access and your data will be left untouched. No destructive actions on your production database when you set querychat's database permissions to read-only.
The LLM never executes anything itself. It never touches your data or your database, and it never sees your raw data. Its only job is to propose read-only SQL.
The other nice thing is that querychat doesn't do black box magic. Every query can be logged, inspected, audited, and if you need to use private or managed LLMs, you're also covered. Azure, AWS Bedrock, Google Vertex AI all provide versions of popular models that support tool calling and can thus work with querychat. So yes, you can use querychat in enterprise regulated environments.
Now, hearing me talk for 10 minutes is nice. Well, at least I hope. But if you want to read more about querychat at your own pace, I have some resources for you too. First of all, the querychat documentation, obviously. Secondly, where questions become queries on the Shiny blog written by me. Oh, and all the code I showed you is available in full length, of course, on GitHub. You can find the relevant links in the description. Thanks for watching.
