
querychat in R: Query Your Data with Natural Language | Shiny + LLMs | Veerle van Leemput
Learn how to use querychat, an R package that lets users query data using natural language instead of predefined filters. Perfect for data scientists who want to build flexible, interactive Shiny dashboards without anticipating every possible user question. In this tutorial, you'll discover: - How querychat translates natural language to SQL using LLMs - Setting up querychat in seconds with your data - Building custom Shiny interfaces with reactive data frames - Why it's reliable, transparent, and safe for production use - Enterprise deployment with Azure, AWS Bedrock, or Google Vertex AI Key Features: Drop-in Shiny component Works with any DBI-compatible database Read-only by design for data safety Full SQL transparency for reproducibility Supports Claude, GPT, Gemini via ellmer Resources: - querychat Documentation: [https://posit-dev.github.io/querychat/py/index.html] - Blog Post: "Where Questions Become Queries" [https://shiny.posit.co/blog/posts/querychat-python-r/ - Github repo: [https://github.com/posit-dev/querychat/tree/main/pkg-r] Perfect for R users working with Shiny, data exploration, and interactive analytics. Timestamps 00:00 - Introduction: The Dashboard Problem 00:50 - Why Dashboards Can't Answer Every Question 01:47 - What is querychat? 02:46 - Getting Started (Installation & Setup) 03:08 - API Key Setup (Claude, GPT, Gemini) 03:20 - First querychat App in 5 Lines of Code 04:04 - Live Demo: Querying Data with Natural Language 05:01 - How querychat Works Under the Hood 06:44 - Four Key Benefits (Reliability, Transparency, Reproducibility, Safety) 07:37 - Tool Calling & DuckDB Integration 08:46 - Building Custom Shiny Interfaces 09:01 - Customizing querychat Behavior (Greetings, Data Descriptions, Instructions) 11:00 - Creating Custom Dashboards with Reactive Data Frames 12:00 - Complete App Structure Walkthrough 13:10 - Safety & Security Deep Dive 14:00 - Enterprise Deployment (Azure, AWS, Google Vertex AI) 15:12 - Resources & Documentation
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, I'm Veerle. I'm a data scientist, consultant and corporate trainer who loves working with R and enjoys anything web development related. In this video, I'm going to talk about a package that is called querychat. Okay, so simply put, I'm a bit of a data nerd and I really enjoy working with data. Since you're watching this video, you probably love working with data too.
And perhaps you also really love building dashboards with it. Now, let's say Shiny is your happy place. You design carefully, you collect feedback, you refine layouts, you handle edge cases. Your dashboard is solid, bulletproof even. So let's say you build a dashboard showing women's international soccer matches. You're proud of it and you show it to a colleague. They say, wow, this looks amazing. Can you show me only FIFA World Cup matches?
You say, of course, and you confidently click a filter. Then they say, oh, that's nice. Can you show me all the matches where the Netherlands played? Oh, suddenly things get awkward because, well, there is no country filter. And adding one means changing the UI, touching the code, redeploying the app, and probably getting another coffee first. And then comes the follow-up question and another one and another. And at some point it becomes painfully clear. Even the best dashboard can't answer every question that you have. And that's exactly where querychat comes in.
Even the best dashboard can't answer every question that you have.
What is querychat?
So what is querychat? Well, querychat makes it easy to query data using natural language. It offers three things. One, a dropping component for Shiny. Two, a console interface. And three, other programmatic building blocks that you can use. You ask questions, querychat translates it to SQL, executes it, and returns the results. The results are then available as a reactive data frame, which makes it easy to display or further process the data. There's no clicking, no predefined filters, just you, your data set, and questions like, show me matches where the Netherlands played in the World Cup? Or can you filter two matches from the 2025 UEFA Euro? Or which team has the best win rate in the Canada versus United States matches?
Well, it sounds and looks cool, right? In this video, I'll show you three things. First, how to get started with querychat in seconds. Secondly, how it works under the hood and why it is reliable, transparent, reproducible, and safe. Thirdly, how to build custom polished interfaces around it.
Getting started
Let's start with the best part, getting started. Since querychat uses a large language model, aka LLM, to translate your questions into SQL, we need an API key for such a model. There are a lot of choices when it comes to provider and model, but in this demo, we'll be using Cloud 4.5. Feel free to pick your favorite provider and model. Other good choices would be GPT 4.1, which is the current default for querychat, or Google Gemini 3.0.
Basically, as long as ellmer supports it, you're good to go. Right, once you have your key, you put it in your R environment file, and then we can get started with querychat. First, we install querychat from CRAN, we load querychat, then point querychat to our data set, results with scorers, then we specify our provider and model, and that's basically it. We can now run a Shiny application with the app method.
Okay, so let's see what we can actually do here. So I type a question, let's say, well, can you show me all the matches where the Netherlands played? Then we hit enter, we wait for a bit, and then we can see that our data is being updated. We also see the SQL query, which is a simple one that filters based on the home or away team, and then querychat proposes some follow-up questions. So I could just click on one, like, well, which players have scored the most goals for the Netherlands? And it appears in the chat box. Again, we have to wait a bit, but once it's finished, we see a nice table with the number of goals for each Dutch player. The top scorer here is Vivianne Miedema, followed by Lieke Martens, and you see how easy it is to explore the data like this.
This is great. So within seconds, you have a working chat interface connected to your data. You type questions, and querychat interprets the question, generates SQL, executes it, and shows you the results. Yes, this works for grouping, filtering, summaries, comparisons, you name it. It's amazing how little code you need, right? You're not building a complex UI here. You're not defining every possible input that is out there. You're just letting users ask the questions they actually care about. This is extremely powerful, and it's also intriguing at the same time. So I can imagine you wonder, well, how does querychat actually work?
How querychat works under the hood
querychat is more than just a chat window sitting on top of a dataset. With querychat, you never have to think about how to get to an answer. You don't have to think about what filters to apply or how to extract information from the filter data. No, you simply ask any question that comes to mind and querychat handles the rest. And yeah, yeah, yeah, I hear you. You could also ask all those questions in a typical LLM chat tool, but querychat is different. You are not relying on the LLM to invent answers or reason about the data internally. No, every single question is translated into SQL, and this SQL is executed on the actual dataset and returned exactly as the data dictates. And very important, the SQL is always shown. This means you can see precisely what is being run.
You are not relying on the LLM to invent answers or reason about the data internally. No, every single question is translated into SQL, and this SQL is executed on the actual dataset and returned exactly as the data dictates.
So that's great. All right. Four benefits of querychat. One, reliability. The LLM does not analyze or transform the raw data itself. It only generates SQL text. querychat handles the execution of that SQL via tool calling, so all the results come from the real data engine and not from the model's internal guesswork. Two, transparency. Every query reveals the full SQL statement. Nothing is hidden, nothing is adjusted, and you always know how the answer was produced. Three, reproducibility. Since every SQL query is visible, analysis can be reused, shared, and audited. And four, safety. querychat's tools are designed with read-only actions in mind, meaning the LLM is essentially unable to perform any destructive actions.
All right, let's go back for a second. I mentioned querychat handles execution of the SQL via something called tool calling. Well, tool calling is essentially a bridge between an LLM and your R session. The model does not execute code. Instead, it requests your R session to execute a certain function with certain inputs. Once R performs the execution, the result is then passed back to the model for interpretation. But the LLM does need to know something about your data, you know, things like which columns are there, what do they mean, and what type are they.
This schema information is shared with the model, but not the raw data itself. With this information, it produces a SQL query as a tool call. Now, to run SQL, you do need a database engine, and querychat's weapon of choice is DuckDB. If you have your own database, that's no problem too. Whatever DBI, this is a database interface package for R, supports can be used by querychat. Pretty clever, huh?
Building a custom Shiny interface
Now, we can do much more with querychat than just pass it a data frame and launch a basic Shiny app. We can also build our own custom interface, and that all starts with customizing querychat's behavior. We can tweak the querychat object parameters to our liking. For example, we can, and we should, add a custom greeting, which we can provide in a markdown file, with nice span HTML tags that make the suggestions clickable and put them right in the chat box when you click them. The greeting is the first thing people see, you know, so you better make it good.
Besides adding a greeting, we can also add a data description, which will give the LLM some more context about the data that we're using. And this is especially helpful if you have cryptic column names or data that is, you know, very niche. Again, we provide this information in a markdown file, and there's no specific format needed in this case. So what we do here, we provide a general description of the data, and we specify all the columns with their type, providing context where needed. The more the model knows about the columns and their types, the better SQL it generates. Now, you don't have to provide this extra information. If you don't, querychat will just send column names and types, and it lets the LLM figure out the rest.
Finally, we can add some extra instructions too. And no surprises here, but we also provide these in a markdown file, and these instructions can further tweak the LLM's behavior. You can add anything you like. For example, let the LLM use pirate speech or, you know, an overkill of emojis, or, you know, you can be a bit more serious and let the model use British or English spelling or directions on the terminology we use, like we're doing here. Right, so that's all looking solid. We have a greeting, we have a good data description, and we have extra instructions.
Now, the next step would be to create an interface. If we want to build a custom interface, we need a chat window. querychat plays especially nice with a chat window in a sidebar, which you can call with the sidebar method. Now, normally, a sidebar is the perfect place to have all your filters, a year filter, a country filter, a tournament filter, you name it. Now, this sidebar method, resulting in a chat window, is all you need. And obviously, you would want to use the result querychat returns. As said earlier, querychat returns a reactive data frame that can be used to further process or display the data. So, how do you get it? Well, with the server method. In this case, we're storing our reactive data frame in a reactive called FilterData, and this reactive can be used to base value boxes, maps, graphs, tables on. As soon as the FilterData changes, your dashboard view changes as well.
Now, let's inspect our code a little bit closer. First, we need to load some packages like Shiny and, of course, querychat. Then, we load our data, results with scores, and we apply an initial filter to it. The next thing we do is building our QC object, calling querychat new, and we provide all the information that we talked about earlier. The data, the name of the table, our client's greeting, a data description, and, of course, the last thing, some extra instructions. In our UI, in the page sidebar function, we then add a QC sidebar. And then, we have the rest of our UI. We have three value boxes right here. Then, we have here our leaflet map. Then, we have a graph built with e-charts for R, and we have a reactable table. In our server, it's as simple as assigning the results of QC server to a variable called FilterData. Now, then, subsequently, in every output, we use that reactive data. So, here, if you want to determine the top-scoring country, we also do that here and here. You'll get the idea. We do this for every output in our app. And the result? Well, this nice interface that can handle any question we want. Remember the conversation with our colleague from the very beginning? All the questions are not an issue anymore. There are no custom filters. There are no limits. It's just querychat.
Safety and enterprise use
And in just 10 minutes, you learned how to use querychat to build your next app. I want to come back to one important aspect of querychat, though, which is safety. Because is this safe? And yeah, it's a fair question. But luckily, querychat is designed entirely around control. The LLM never executes anything itself. It never touches your database or data, and it never sees the raw data. Its only job is to propose read-only SQL. If you ask to edit the data or to drop tables, it will simply refuse. And that's not because the model has your best interests at heart, but because it's instructed to do so. Now, combine that with an underlying database, DuckDB or your own, that only provides read-only access, and your data will be left untouched. So, no destructive actions on your production database when you set querychat's database permissions to read-only. And, you know, it's not a black box either. Every query can be logged, inspected, or audited.
And do you want to use querychat in enterprise or regulated environments? Well, if you need to use private or managed LLMs, you're covered as well. You can use Azure, AWS Bedrock, Google Vertex AI, all provide versions of popular models that support tool calling and can thus work with querychat.
Now, you heard me talking for 10 minutes, but if you want to read more about querychat at your own pace, I have some resources for you too. First of all, the querychat documentation, obviously. And secondly, where questions become queries on the Shiny blog written by me. You can find the relevant links in the description. I'll also add a link to the GitHub repo where you can find the complete code for the SheScores app that you just saw. Thanks for watching!
