Getting Started with LLM APIs in R

I'd say it's sort of like a ladder, like try prompt engineering first. Do not move directly to fine-tuning or RAG. If you really cannot do what you need to do with prompt engineering, then you would move on to one of those other methods.

The question about query chat and the metadata versus the underlying data, I'd say if you ever have security concerns, you should take those seriously with LLM things and verify with yourself and your security team at work, especially if you have sensitive data. But generally for query chat itself, part of what makes it very useful is that it is only being shown the metadata and it can only do things that it knows how to do based on the metadata. So the data is not being shown to the LLM in any way. There is, of course, there's an open sort of chat window. So if you as the user are pasting in data, that will get sent to the LLM. But at the back end of query chat, it just sees the metadata. And so it just knows how to construct SQL queries based on the format of the data, the column names and the types like, you know, how the data is structured. So it's not seeing the actual data file.

OK. I hope everyone was at least able to get part of the game working. You can take a look at the in the solutions folder. There is a solution prompt so you can see how we structured it. It has some examples and it's very straightforward, but there's nothing particularly fancy going on. And the primary point of this exercise is just to show you like how much you can control with the system prompt and like things that you are you might be used to controlling with code. You can move to controlling with words in the system prompt.

Tool calling

OK. So the last 25 minutes, we're going to talk about tool calling.

OK. So as a recap, let's just think again about how LLMs work. This is going to be important for understanding why we need tools. So generally, with an LLM, you write some words, the LLM writes some words back to you, and then you do something with those words. You know, you use the code that it generated, you paste the poem somewhere, you use that information to inform what you're doing, that kind of thing. But this is all word based. So does that mean that LLMs can do things like access the Internet or run code, send an email or like generally interact with the world?

Let's try it out. What if we ask it something that would not be in the training data, like who are the keynote speakers at our Pharma 2025? So I asked GPT 401 Nano this the other day, and it says my training data only includes data up to October 2023, and it doesn't have access to real time updates. So basically, it doesn't know and it can't answer. What about if we ask it what the weather's like? It says, again, basically it can't, it doesn't know how to provide real time weather updates. Or if we ask it something that seems very easy, like what day is it? It doesn't know that. This was yesterday and it told me it was October 27th, 2023, which is the cutoff date from its training, which is why I perpetually think it is October 27th, 2023. This is sort of particularly interesting because the model didn't even say it doesn't know, it just gives you the wrong date.

This seems like a major limitation of LLMs, like they can't even tell you what the day is, and they can't interact with new things on the Internet. They don't know what the weather is like. But luckily, there is a solution to this, which are tools. So tools, you can think of essentially as functions. They are functions that we're giving the LLM that give it new abilities. You can use tools to give the LLM access to up to date or real time information, like, you know, real time weather. And you can also use tools to let the model interact with the world. You can build a tool that lets the LLM run code or interact with your file system, browse the Internet, that kind of thing. So generally, these are like add-on abilities for the LLM. You're giving it abilities that it wouldn't otherwise have.

How does tool calling work? We're going to walk through the process at a high level, step by step. So earlier, I asked it what the weather was like in Minneapolis, and it said it didn't know. But what if the LLM has the appropriate tool? So if I asked it something like, what should I wear today in Minneapolis? And it has access to some kind of tool that can provide weather information. It is going to request that that tool is called for, this is a zip code for Minneapolis. So it requests that this tool is called. And then this is sort of the important part. The LLM itself is not calling the tool, it's not calling the function, it's requesting that the tool is called. And then your computer, a computer somewhere, is running that tool, pinging the weather API, that sends back data about the weather.

The LLM receives that data, it takes a look at it, and then it sends you back what you should wear. I don't know if this is what you should wear, if it's 58 degrees, but this is what the diagram says, we're going to go with that. And this looks a little complicated, but all of this, like all the stuff in this little triangle here, this is all handled by ellmer. So you're going to see in a minute, you don't really need to worry about like what's happening with the communication back and forth to the LLM. The main thing you need to do is write the tool function itself.

And again, I want to emphasize that the LLM can't run code by itself. Like I've been saying, or trying to emphasize, like the LLM, it's just like text. It takes text in, you know, generally it's providing text output. And it doesn't have the ability to run code. It needs something like, you know, your R session to run code in for it to be able to do stuff like this. The LLM is not executing the tools on its own. It is asking you or like your computer to run them. But so then you might think like, well, I could already do that. I could already run a function that things a weather API on my own. Why do I need an LLM if it can't even run code?

And the purpose is that what the LLM can do is be very good at deciding when and how to call a tool. So instead of you needing to figure out what zip code to pass to get weather, the LLM does that, can do that for you. And it can figure out when it needs to call get weather and when you're asking it something that maybe doesn't require knowledge about the weather. So the LLM is choosing when and how this tool is going to be called. And so based on your response, it would request different kinds of tool calls. If you ask it about San Francisco, it's going to pick the appropriate zip code. And you might also have more complicated tools that have multiple arguments and the LLM can pick all of those arguments for you. Based on your request, LLM can also call multiple tools if it needs to gather different pieces of different information from different spots. So again, the sort of like magic of what the LLM doing is in how and when it is calling the tool.

Okay, so let's take a look at an example. You have this demo in Fawcett Cloud as well. So you can order something on your own if you want. Okay, so this is an app that is hooked up to a weather tool. So let's see how this works. Earlier, if you remember, I asked it about the weather and it basically said, I don't know. So now, let's ask it, what's the weather in Minneapolis? And it tells us the current weather is currently around 54 degrees. And it is also showing us that it did a tool call. This weather function works a little bit different than the one that I put in the diagram where it's asking for a Latin long instead of a zip code, but it's the same idea. The LLM has decided which Latin long to pass to the function to get the data that it wants. And it gets some information back. And because it's good at synthesizing information and producing responses, it takes all that in and then crafts the response that you see below. I think there's two tool calls here because it's doing it for slightly different Latin longs.

Writing tools in R

Okay, so I'm just showing you how this works so you can get an idea of how an LLM might incorporate a tool call into its response. But now we'll take a look at the code. I'm going to show you how to write tools in R. And this might seem complicated. You're giving an LLM a new ability. I want to emphasize that most of this is writing in R functions. If you can write a function in R, just like a normal everyday R function, you can write tools for an LLM. Because the first step is just to write a normal R function.

If you want to provide the LLM with a tool where you can get weather information through a weather API, you might write a function called get weather that takes a zip code or some kind of location information. And then I just didn't fill this in to make it easier to understand what's happening. But you would have some just normal R code that's sending a request to the weather API and getting data back. These should be functions that you can test out yourself. You can run them in the console, run them with normal inputs, like you as a human. There's nothing really special about it. It's just an R function.

And then the next step is to define the tool. So now we're getting into more LLM specific things that you need to do to create the tool. So tool is a function from the ellmer package. And this is how you're going to turn a normal R function into a tool. So you pass the function. This is just a stand-in for the function that is your tool. So the name of the function. And then the rest of this is basically documentation for the LLM about how to use your tool. You might notice that we're doing a lot of things that feel like documenting. You wrote a system prompt, which kind of feels like documentation. And now we're writing function documentation. I think there's a lot of crossover between getting LLMs to do what you want and providing useful information to help people use your functions or tools or products. Often it feels very similar. So all of this is essentially documentation for how your tool works. You have a description and then your arguments. So it's going to be a list of the various arguments your tool function can take in their types. You can also say if those arguments are required with the required argument.

So this is the same thing but for our getWeather example. So again, we asked the name of the function, getWeather, a description, and then a list of arguments and their types. This function only takes one argument, zip code, and we want it as a string.

Okay, so we wrote a function, defined the tool. The third step is to register the tool. The chat object has a method called registerTool and you just pass in that tool object. This tells the chat object that the LLM can use this tool. It's available and it passes along all of that information, all the documentation.

We have time to do this. So you're going to open Quiz Game 2. This is building on the quiz game you just wrote. It has a prompt already there so you don't need to do anything with the prompt. There is a function in that app file called playSound that plays a sound when you call it. Your job is to create a tool that uses this function and then hook it up to the LLM. So register the tool so that in your app the LLM can request that this tool is called and there will be a sound that plays.

I might call you back a little bit earlier than six minutes, but we'll start the clock anyway.

Hi, Sarah. Do you mind taking a question from the chat? I can read to you. Yeah, sure. Jared's asking what is your favorite LLM to use right now? Yeah, I was just answering that in the chat. I'd say generally for things I just use Cloud Sonnet 4.5 for everything. That's partially because of how we have things set up in Posit, but also it's good at most things. That's the latest Cloud Sonnet model.

I guess a follow-up into that, how much have you ventured into vibe coding recently? I think it's a really great tool to be able to using LLMs to help you code is very useful. I think I'm wary of 100% vibes and no understanding or oversight. Not that you can't do that and that's not going to work, but I like to make sure I still understand things. But if you're using tools like Cloud Code, Codex to speed up your coding or help you write better code, I think it's very useful and can let you do things that you wouldn't have otherwise been able to do, which is pretty cool.

Very useful. I feel like the downside of things being called tools, does that mean tools like function tools that we're hooking up to the LLM or tools like LLM tools?

There's a couple R packages that we're not going to have time to talk about today that are essentially collections of useful tools for R. There's BTW . I'll get the link.

This is the BTW page. This has a variety of tools that make it easier to do things in R. These aren't tools that I wrote, but they are useful tools stored in a package.

We're just going to move on just for time, but we might have a minute at the end for additional questions. I hope you all got this working, or at least were able to experiment with the code. Again, you can take a look at the solution if you got stuck.

Tools in Shiny

Oh, I think I was looking at the wrong timer. I might have cut you off early, but regardless, it's over now. Okay. This slide says tools in Shiny. You just wrote a tool for Shiny app. That was a Shiny app, but just some things to keep in mind. The tool function goes inside the server function. A useful thing that you can do is update reactive values with this tool. If something changes, you can write a tool that updates a table or updates a plot or something like this.

You can see this logic if you go through the query chat code. This is what's happening. This is very useful because you can have the user say something in the chat, and then you can have a tool that will update reactive values like a data frame.

Okay. There are a lot of packages that we weren't able to talk about today. I know there was a lot of questions about query chat. I don't know if there's a sticker, but we also have a variety of open source R packages for working with LLMs. We talked about ellmer and Vitals in shinychat. I just mentioned BTW. There's also Ragnar for Ragnar. I have the chat list here. It's a Python package, but it fit in the landscape. Chores, which is another tool collection package.

Encourage you to check out these other packages if you want to learn more.

Okay. And then I also have a link to query chat. We talked a bit about this, but it's a really useful package. There's also a package called ggbot2 , which lets you talk out loud to create and iterate on plots. This is useful to look at and play around with, but it's also a useful example of an app that uses speech to do something. You might extend that for your own uses.

We also have a tool, general definition of a tool, called DataBot, which is an EDA assistant for Positron that I've linked here. And then if you're interested in keeping up with AI news coming out of Posit, along with my coworker, Simon Couch , I write an AI newsletter that we release every other week. I have the link to it here. This is Posit news, but also we cover sort of general AI news.

Okay. And yeah, thank you. This was great. I enjoyed answering all your questions. And I hope you got some use out of this workshop. Again, if you want to save a copy, click that save a permanent copy button at the top of Posit cloud. And at the end of this workshop, unfortunately, we will shut off those API keys. So you will need to get your own if you want to run the code. And you store those in our environment file.

Closing remarks

Great. I guess we have one minute, if there's anything I can answer. But otherwise, thank you for coming. This was great. Well, thank you so much, Sarah. We know how much work it is to prepare for these workshops. And we had a lot of attendees, you know, very engaged into all the material here. Again, a lot of thank yous and kudos in the chat. Well-deserved.

I'll put this in the chat one more time in case you missed the badge credential link. But otherwise, yeah, hopefully everyone can join me virtually and say thank you so much, Sarah, for this great material. And again, we'll make sure that we will have the recording this workshop posted on the rPharma YouTube channel, hopefully in a month or so, along with all the great links that were shared throughout Sarah's presentation, as well as in the chat. You may want to save that chat if you want to get a lot of those links on your own and not wait for us to get those. But yeah, Sarah, yeah, any last parting words for our attendees as they begin their LLM journey with open source?

I guess one thing, I tried to say this a couple times, but I would just encourage you to experiment with the tools and play around with it. I think sometimes it's easy to feel either like overwhelmed by LLM things or like, you know, it's kind of pointless because the LLM can just do things for you. But there really is a lot that you as, you know, an R user can do yourself, new things to build, new things to experiment with. So, I just encourage you to be curious and experiment with the tools.

But there really is a lot that you as, you know, an R user can do yourself, new things to build, new things to experiment with. So, I just encourage you to be curious and experiment with the tools.

Yes, it felt a little intimidating for me in the very beginning, but certainly thanks to your colleagues at PASA and all of your material, you can start small and get pretty far pretty quickly. And in fact, last year at our pharma conference, I built an app that was heavily influenced by LLMs to generate random facts about haunted places because our pharmas are on Halloween. So, lots of fun activities you can, or lots of fun domains you can do this.

Well, okay. Well, we'll wrap it up there, but thank you so much once again, Sarah. And yeah, for those of you online, we got another workshop happening right now about the use of the Cardinal package for clinical reporting, and there's a whole great lineup as well tomorrow. So, thank you so much, and we will see you later in the R-Pharma conference. Thanks, Eric. Thank you.