querychat in R: Query Your Data with Natural Language | Shiny + LLMs

Transcript#

This transcript was generated automatically and may contain errors.

Hi, I'm Veerle. I'm a data scientist, consultant and corporate trainer who loves working with R and enjoys anything web development related. In this video, I'm going to talk about a package that is called querychat . Okay, so simply put, I'm a bit of a data nerd and I really enjoy working with data. Since you're watching this video, you probably love working with data too.

And perhaps you also really love building dashboards with it. Now, let's say Shiny is your happy place. You design carefully, you collect feedback, you refine layouts, you handle edge cases. Your dashboard is solid, bulletproof even. So let's say you build a dashboard showing women's international soccer matches. You're proud of it and you show it to a colleague. They say, wow, this looks amazing. Can you show me only FIFA World Cup matches?

You say, of course, and you confidently click a filter. Then they say, oh, that's nice. Can you show me all the matches where the Netherlands played? Oh, suddenly things get awkward because, well, there is no country filter. And adding one means changing the UI, touching the code, redeploying the app, and probably getting another coffee first. And then comes the follow-up question and another one and another. And at some point it becomes painfully clear. Even the best dashboard can't answer every question that you have. And that's exactly where querychat comes in.

Even the best dashboard can't answer every question that you have.

You are not relying on the LLM to invent answers or reason about the data internally. No, every single question is translated into SQL, and this SQL is executed on the actual dataset and returned exactly as the data dictates.

So that's great. All right. Four benefits of querychat. One, reliability. The LLM does not analyze or transform the raw data itself. It only generates SQL text. querychat handles the execution of that SQL via tool calling, so all the results come from the real data engine and not from the model's internal guesswork. Two, transparency. Every query reveals the full SQL statement. Nothing is hidden, nothing is adjusted, and you always know how the answer was produced. Three, reproducibility. Since every SQL query is visible, analysis can be reused, shared, and audited. And four, safety. querychat's tools are designed with read-only actions in mind, meaning the LLM is essentially unable to perform any destructive actions.

All right, let's go back for a second. I mentioned querychat handles execution of the SQL via something called tool calling. Well, tool calling is essentially a bridge between an LLM and your R session. The model does not execute code. Instead, it requests your R session to execute a certain function with certain inputs. Once R performs the execution, the result is then passed back to the model for interpretation. But the LLM does need to know something about your data, you know, things like which columns are there, what do they mean, and what type are they.

This schema information is shared with the model, but not the raw data itself. With this information, it produces a SQL query as a tool call. Now, to run SQL, you do need a database engine, and querychat's weapon of choice is DuckDB. If you have your own database, that's no problem too. Whatever DBI , this is a database interface package for R, supports can be used by querychat. Pretty clever, huh?

Building a custom Shiny interface

Now, we can do much more with querychat than just pass it a data frame and launch a basic Shiny app. We can also build our own custom interface, and that all starts with customizing querychat's behavior. We can tweak the querychat object parameters to our liking. For example, we can, and we should, add a custom greeting, which we can provide in a markdown file, with nice span HTML tags that make the suggestions clickable and put them right in the chat box when you click them. The greeting is the first thing people see, you know, so you better make it good.

Besides adding a greeting, we can also add a data description, which will give the LLM some more context about the data that we're using. And this is especially helpful if you have cryptic column names or data that is, you know, very niche. Again, we provide this information in a markdown file, and there's no specific format needed in this case. So what we do here, we provide a general description of the data, and we specify all the columns with their type, providing context where needed. The more the model knows about the columns and their types, the better SQL it generates. Now, you don't have to provide this extra information. If you don't, querychat will just send column names and types, and it lets the LLM figure out the rest.

Finally, we can add some extra instructions too. And no surprises here, but we also provide these in a markdown file, and these instructions can further tweak the LLM's behavior. You can add anything you like. For example, let the LLM use pirate speech or, you know, an overkill of emojis, or, you know, you can be a bit more serious and let the model use British or English spelling or directions on the terminology we use, like we're doing here. Right, so that's all looking solid. We have a greeting, we have a good data description, and we have extra instructions.

Now, the next step would be to create an interface. If we want to build a custom interface, we need a chat window. querychat plays especially nice with a chat window in a sidebar, which you can call with the sidebar method. Now, normally, a sidebar is the perfect place to have all your filters, a year filter, a country filter, a tournament filter, you name it. Now, this sidebar method, resulting in a chat window, is all you need. And obviously, you would want to use the result querychat returns. As said earlier, querychat returns a reactive data frame that can be used to further process or display the data. So, how do you get it? Well, with the server method. In this case, we're storing our reactive data frame in a reactive called FilterData, and this reactive can be used to base value boxes, maps, graphs, tables on. As soon as the FilterData changes, your dashboard view changes as well.

Now, let's inspect our code a little bit closer. First, we need to load some packages like Shiny and, of course, querychat. Then, we load our data, results with scores, and we apply an initial filter to it. The next thing we do is building our QC object, calling querychat new, and we provide all the information that we talked about earlier. The data, the name of the table, our client's greeting, a data description, and, of course, the last thing, some extra instructions. In our UI, in the page sidebar function, we then add a QC sidebar. And then, we have the rest of our UI. We have three value boxes right here. Then, we have here our leaflet map. Then, we have a graph built with e-charts for R, and we have a reactable table. In our server, it's as simple as assigning the results of QC server to a variable called FilterData. Now, then, subsequently, in every output, we use that reactive data. So, here, if you want to determine the top-scoring country, we also do that here and here. You'll get the idea. We do this for every output in our app. And the result? Well, this nice interface that can handle any question we want. Remember the conversation with our colleague from the very beginning? All the questions are not an issue anymore. There are no custom filters. There are no limits. It's just querychat.

Safety and enterprise use

And in just 10 minutes, you learned how to use querychat to build your next app. I want to come back to one important aspect of querychat, though, which is safety. Because is this safe? And yeah, it's a fair question. But luckily, querychat is designed entirely around control. The LLM never executes anything itself. It never touches your database or data, and it never sees the raw data. Its only job is to propose read-only SQL. If you ask to edit the data or to drop tables, it will simply refuse. And that's not because the model has your best interests at heart, but because it's instructed to do so. Now, combine that with an underlying database, DuckDB or your own, that only provides read-only access, and your data will be left untouched. So, no destructive actions on your production database when you set querychat's database permissions to read-only. And, you know, it's not a black box either. Every query can be logged, inspected, or audited.

And do you want to use querychat in enterprise or regulated environments? Well, if you need to use private or managed LLMs, you're covered as well. You can use Azure, AWS Bedrock, Google Vertex AI, all provide versions of popular models that support tool calling and can thus work with querychat.

Now, you heard me talking for 10 minutes, but if you want to read more about querychat at your own pace, I have some resources for you too. First of all, the querychat documentation, obviously. And secondly, where questions become queries on the Shiny blog written by me. You can find the relevant links in the description. I'll also add a link to the GitHub repo where you can find the complete code for the SheScores app that you just saw. Thanks for watching!

querychat in R: Query Your Data with Natural Language | Shiny + LLMs | Veerle van Leemput

Transcript#

What is querychat?

Getting started

How querychat works under the hood

Building a custom Shiny interface

Safety and enterprise use

Featured software#

DBI

ellmer

querychat

Shiny