Resources

Small boosts here and there - Simon Couch

Rather than writing an entire R package or carrying out a data analysis in one fell swoop, I’m interested in large language models (LLMs) doing things for me that I don’t like to do: tedious little refactors, transitioning from deprecated APIs, and templating out boilerplate. Chores, if you will. This talk introduces chores, an R package implementing an extensible library of LLM assistants to help with repetitive but hard-to-automate tasks. I’ll demonstrate that LLMs are quite good at turning some 45-second tasks into 5-second ones and show you how to start automating drudgery from your work with a markdown file and a dollar on an API key

Oct 11, 2025
14 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So, this is a talk about the sort of other end of the spectrum of turning 45-second tasks into 5-second tasks with LLMs. If you're the sort of person who in the past has been frustrated by this process of steering LLMs and correcting their errors, this might be the time to revisit and explore what it feels like for LLMs to be very good at small, specific tasks.

So for an example, let's imagine we're writing this function in R. It's a silly little wrapper around sys.getenv. So we're grabbing the value of an environmental variable, and if that environmental variable isn't set, we're raising an error that says I can't find that environmental variable.

So Matthew queued this up really nicely. What do we not like to do? Write documentation. In my gig as an R developer day-to-day, writing documentation is something that I do all the time. There are many parts of writing documentation that I think are pretty interesting, like eliciting the connections between different functions in my packages and the packages that I'm interfacing with. But there's a lot of parts of it that are pretty boring, like templating out boilerplate.

Introducing the chores package

So today I'm going to be introducing the chores package, which is a package that helps you automate hard-to-automate sort of fuzzy tasks using a markdown file. So in this example, I have this key get function, and I need to write or start writing Roxygen documentation.

So in this video, I will highlight the function, I'll press a key command to pull up a small shiny applet, I'll select the Roxygen helper from inside that applet, and a template of the documentation will begin streaming in. So something like this already exists inside of RStudio and will be coming soon to Positron, which is like the insert Roxygen skeleton. And so that will give you like at param name, at param error call. We can get a little further along the way using the chores package, because the LLM can actually infer the sorts of formats of the arguments to that function.

So this prompt is sort of tuned to give me the least amount of documentation possible while being thorough enough to where I don't want to delete anything. I don't want to read through a bunch of stuff that's not exactly how I would write it. This isn't complete documentation, but it's just complete enough to where all of this seems reasonable and I don't want to delete any of it. Or not reading through slop, if you will.

So this prompt is sort of tuned to give me the least amount of documentation possible while being thorough enough to where I don't want to delete anything.

So this is a talk about the chores package. The chores package is intended to help you with repetitive, hard to automate tasks. They're sort of like smart RStudio snippets.

How chores works

So let's talk a little bit about how chores works. If you were tracking along with Hadley's keynote yesterday, you might actually find this somewhat intuitive. So the first thing that you do as a user is you make a code selection. So in our example, I selected the full body of that key get function. And then I trigger an add-in and select a helper.

So I highlight the function, and then once you've installed the chores package, you will have an RStudio add-in available to you. And that add-in can be configured to any key command. Unfortunately it's kind of naughty for me to just do that for you as much as I wish I could as a developer.

And so once I press that key command, I have a small applet. You can see there are three default selections in the chores package called test that, Roxygen, and CLI. These are references to three different tasks that I do almost every day in developing R packages. But you can add a helper to this list by just creating a markdown file.

And so the next thing, each of these helpers corresponds to a system prompt. So Hadley introduced yesterday this idea of a system prompt where you can steer the behavior of a model to do something specific. So in this example, I'm saying you're a tourist assistant. I'm going to show you some function code. Just respond with Roxygen documentation and then blah, blah, blah, blah, blah, blah, blah. The nice part about working with chores is that once you've typed this out, interacting with the package isn't like a chat at all. You're literally pressing a key command and your interaction with the LLM is done.

Then that prompt in the selection you've made inside of your code editor is sent to the LLM. So if you're an Elmer user, it looks sort of like this. You initiate a chat object. The chores package is compatible with any model that's compatible with Elmer. So that could be Anthropic Claude, OpenAI, ChatGPT, so on. And then the code that you select inside of your editor is sent to the model. The model responds with, in this case, some Roxygen documentation. And that response is inlined directly into your source editor. And finally, that response is written to the editor using the RStudio API.

So the interesting thing about this particular part of how this package works is that this is all happening in R code. So if you're an RStudio user, this works in RStudio. If you're a Positron user, this works inside of there. There's an implementation of the RStudio API that several other IDEs have made. And so if you're interested in whether this package works there, you can just try it, and it might.

Use cases and custom helpers

So I called out that this package supports a couple things that I do every day as a package developer. But any repetitive task that you do in your day-to-day work in your IDE, if you can describe it in a markdown file and the sufficient context for resolving that request is just selecting some code, then you can make that happen with chores.

So the first one that I showed you was templating out Roxygen documentation. You can also transition erroring code to use the CLI package. So if I have a call to stop or message or warn, sprint F, take your pick, any of those can be converted to this more modern interface that the Tidyverse team has been transitioning to. And the same goes for test that code. So a few years ago, the third edition of test that was released, and it's so painful to convert packages over to it. We're really happy to have this as a resource now.

Choosing a model

So again, if you can describe this coding task that you do all the time in a markdown file and all you need as additional context to say, this is the specific task that you're working on this time, if that can be encapsulated in a code selection, then you can use it in a chore.

This is the part of the talk where I tell you that you have to put your credit card information down to use my R package, which is like the lamest part of starting to work on language models now. But hopefully soon, this won't be the case. I'll talk a little bit right now about what the landscape looks like. Once you download the chores package, you will have to make the decision of what model you want to use to power it.

Okay, familiar territory, we're on the XY plane. On the X axis, we have the price per 100 refactorings. So if I generate Roxygen documentation 100 times, this is the price that we're looking at. And on the Y axis, we have an evaluation score. So this is based on an evaluation of models specifically for sort of being the engine of the chores package. And there are two important pieces for a model to be able to effectively carry out chores. One of them is that they respond very quickly.

So kind of the current frontier of LLM research right now is reasoning. When models take two or five or 10 seconds to reason, you're just waiting in your IDE for something to happen, which is an unpleasant experience. But these sort of snappier, non-reasoning models tend to perform better. And then also, models need to be able to follow instructions really well. So I had that super long system prompt where I said, here's a bunch of examples of how you can carry out this task. Be terse, just do that. There are a lot of models, and especially like local LLAMA models right now, that will just like describe our lang to you inside of your source editor.

So we're looking at how well different models do on this eval. You can see in the top right, we have Claude Sonnet 4 coming in something like $1 per 100 refactorings. So especially compared to these more agentic processes that we're starting to see these models capable of, this is like pretty cheap in comparison. But cheap is relative to your perspective.

In a sort of close second, Gemini 2.5 Pro is a thinking model. But it's actually pretty snappy, so it does well on this eval. And there's sort of a drop off from there in terms of price. There's two points here for Gemini 2.5 Flash. One of them is the thinking version. One of them is the non-thinking version. So the one that does better and is cheaper happens to be the non-thinking. That's just a chart crime.

You might notice that there's no local LLAMA models on this chart. I'm doing my best. We're so close. We're almost there. I really wanted to be able to mic drop here. You don't need to put your credit card info down, but not quite there yet.

Earlier this week, OpenAI dropped GPT-OSS, which is a model that I can run on this laptop. And actually, on the side of the evaluation in terms of correctness, like can you write good Roxygen documentation, it nails it. But it takes like 30 seconds per response. So almost there.

The other way to get around putting your credit card information down to try my R package is to use GitHub support, or GitHub as a provider. So they have a sort of generous free tour. The exchange is that any information you submit to them is theirs, and they're going to do whatever they want with it. So it's only free in that sense.

The exchange is that any information you submit to them is theirs, and they're going to do whatever they want with it. So it's only free in that sense.

Here is the more screenshot-able table. So I did write this package. As I was writing it, I was using Anthropic's Claude, which is the default model when you call chat-anthropic with Elmer. If you just want to get a sense of like, is this something I'm interested in using, put $0.50 on an API key and give it a whirl. And I think that will give you the best results and maybe show you what's possible there.

Again, chat, GitHub, you have access to some really good leading models. Usually as OpenAI releases new GPT releases, they're available on GitHub within a few days. For like an ultra low budget, but higher privacy model, you can use GPT-4o-mini from OpenAI. Before I went out of office five days ago, this was like the newest snappy model from OpenAI. And of course, five business days is like a decade in LLM time. Don't use LLAMA for now, sorry.

If you want to keep track of this process where I'm trying to find some cheaper models and some local models that we could be running on our laptops to do these sorts of smaller tasks, I'm developing this evaluation called the chores eval, which is what generated the data behind the graph. I showed you a second ago.

So if you'd like to learn more, the chores documentation is a good place to start. You can install that from CRAN using the regular degular install packages. This repository, github.com slash simonpcouch slash usr-25 has the source code for these slides as well as some links out to various resources. I'll just end by saying I'm super stoked to be here. This is my first usr and it happened sort of last minute to be able to come out here. So I'm super grateful to be here and having so much fun already.