How non-programmers can make sparks fly when using testthat during package development (L. McKenna)

Enemies to lovers: How non-programmers can make sparks fly when using testthat during package development Speaker(s): Libby McKenna Abstract: You’re just a data scientist, self-taught in R, trying to find your way in the world of package development. It’s just a package named testthat, hoping to help a developer make sure their package is operating as intended. You meet. You hate each other. The package seems a little daunting and quite frankly, a little tedious. Surely only “real” programmers use this! The package thinks you’re inept for not immediately putting it to use. As fate would have it, you attend this talk and discover you are indeed compatible. Sparks fly, a package is born. This talk will help less experienced programmers learn about testing, automated workflows, how to write good tests, and why it’s all worth it when it pays off in quality and efficiency. Steamy. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

Nov 7, 2025

18 min

Posit::conf(2025) Rstats Python Data Science Data

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thanks, Sarah. So today I'm going to be talking to you about this common romance trope. So to get into this, like any modern romance, you might end up on the dating apps. But since we're at PositConf, we're going to use TindR.

So the reason I'm using Tinder is because I'm looking for a package to help me with my very first package development journey. I've only been using R for about four years, and I'm an environmental engineer. So that means while I do use some data science and statistics, it's not the main component of my job, so I don't identify as a real programmer. So getting into package development is a little bit daunting.

So let's get to swiping and see what packages are out there. So the first one that comes up is Plumber . Plumber helps with web APIs and prototyping. And while that sounds really cool, I don't think that's going to help me with my plan for what I'm going to develop. So I'm going to leave that on CRAN.

But Purr, I know Purr. I love Purr. I'm definitely going to use the list mapping and functional programming with some of my helper functions. So I'll go ahead and install that.

Now testthat , I don't really know what testthat is. Let's look into this. It catches mistakes and requires perfection. That sounds daunting. It sounds like something only real programmers would use. And I would prefer to leave that on CRAN, because if any errors come up, I can Google it. I'm a perfectly capable programmer in that respect.

But the thing is testthat comes highly recommended by the package development book by Jenny and Hadley. And when a friend, or a friend of a friend of a friend, recommends something, it can't hurt to give it a shot. So I'll go ahead and install this and get to know testthat a little more.

What testthat does

So testthat's been around for about 15 years, and its job is to be a code tester. So you want to use this during package development to test your code for specific conditions, like warnings and errors, or test for expected outputs, like specific numbers.

The thing is, you're probably already doing testthat's job. So as an example, we're going to look into what I've been developing for our package. This is Tidy Water. It's on CRAN if you want to check it out. And as an environmental engineer, we work with drinking water treatment plants to optimize their treatment processes to provide better water quality to the general public.

So this first function here, define water, is what we start with for every single modeling scenario. And this creates a water. This is like making a cup of water with specific parameters like pH, temperature, and alkalinity. I can then pipe my define water function into other models. So in this case, chem dose pH, which calculates the pH and alkalinity after we dose a chemical. So here we're dosing alum. We dose this at the beginning of most treatment processes to coagulate out a bunch of dissolved material in the water that you don't want to ingest.

So if I want to test that this chem dose pH function is operating as expected, I might run this code chunk and then say, what's my water at alkalinity? And I can see that it's dropped from that 33 starting value down to 17. But with testthat, all I have to do is wrap all of this code in a testthat function, label it so I remember what I'm testing. So test that chem dose pH returns the expected values, and then replace that water alkalinity with a testthat function like expect equal. And here I'm expecting that my calculation from that code chunk equals a specific number, 17.87.

So testthat seems kind of easy to use, kind of interested so far. And if you're interested too, you can get started at this QR code, and that will take you to that package development book by Jenny and Hadley, specifically to this testing basics. It's a little bit more involved than just doing install packages, testthat. But you can go to this link, and it'll tell you the exact steps to get through, and it's not too difficult.

First date: testing inputs and outputs

So to get to know testthat a little bit more, we're going to go on a few dates. So our first date will be a coffee date, super simple, get to know testthat a little superficially. Then we'll go on a second date hike where we start to trust testthat a little bit more. And finally, we're really going to commit to testthat and get into a late relationship dinner with the parents.

So this first date, we're just going to test function inputs and outputs and understand some of the functions inside of the testthat package. So this first test is testing that define water warns when no pH is input. So every water will have a pH, and because of that, all of our downstream models require pH. So if I don't include pH in my define water, I should get a warning, and the reason it's giving a warning is because in my define water code, I have this code that says, if you're missing pH, throw a warning.

But you can see I have a second warning on the bottom if I'm missing alkalinity. So if I want to be really specific and make sure that the correct warning being thrown is related to pH, I can add some regex, or I could add the entire warning message. Similarly, you can use expect error. In this case, I've entered pH as a character instead of a number. So expect warning and expect error are really useful for evaluating the outcomes of your conditional statements. It's also useful for figuring out when your user is breaking your code.

Next we have testing that just define water works. So testthat has a lot of really intuitive function names. So expect S3 class. Well, we expect the code inside of this function to output an S3 class. You can also expect S4 class or an S7 class. You can also expect true. So here I'm saying my water pH should be a numeric. There's also an expect false function. And finally, there's this expect LT, expect less than. So my pH should be less than 14. Similarly, there's an expect GT , which means expect greater than. So you can see these testthat functions are really intuitive, easy to understand without even getting into the documentation too much.

Finally, we have what I showed you earlier. You can write a whole chunk of code inside of a test that function and then evaluate the outputs from that code chunk. In this case, we're using expect equal again, but you'll notice I've set that alkalinity to 200, which is really high. So we should be getting an error. So we're going to take a little sidestep into how testing is useful for error catching.

So in your console, you can run DevTools test. And what this does is starts running through all of your testing code. And what's going to show up is a yellow F for failure, a pink W for warning, and an S for skipped. And in this left column, you can see the number of tests that are run for each function you've developed. So I ran 33 tests for balanced ions, 25 tests for biofilters, and every single one of those tests, my team manually wrote.

When it gets to the define water function at the end there, it's run 50 tests, but one of them has failed. And it's gotten even more granular and told me which test failed and why it failed. We expected that the calculation should be 200, but my code was outputting 17. So this helps us isolate. If things go wrong, I don't need to manually go through all of my code to figure out where my error is. If I've written good tests, I can go straight to this code and understand what's going wrong.

Second date: testing interconnected functions

So this is pretty cool. What else can testthat do? So on a second date, we're going on this hike. We're trusting that it's not going to push us off a cliff in the wilderness, but rather look for obstacles and prevent us from tripping over them.

So as I mentioned earlier, we always start with define water in our package, creating that first cup of water. And into that cup of water, I can dose chemical A and chemical B and get a resulting pH and alkalinity. But there's another way I can get to this same answer. In one cup of water, I can dose chemical A. In a second cup of water, I can dose chemical B. I can mix them together using our blend waters function, and I should get that same pH and alkalinity.

So this shows how your package can be really interconnected. And we really want our tests to fail. If something changes in that upstream function, like chem dose pH, I want my blend waters to either reflect that change, or if it doesn't, my test should fail.

So what does this look like using testthat? So in this chunk of code, we're going to test that this blend waters handles our equilibrium chemistry correctly. So first, I make that starting water quality. I dose two chemicals into that first cup. Then I dose them separately and blend them using blend waters. And then I expect that my pH using water B should be the same as my pH from my blended waters. So this is a step up from testing hard-coded inputs. So instead of saying my pH should be 7, why is it 7, what does 7 mean, what else can I compare this to? So this is really helping us make our package more resilient and account for the interconnectivity.

Late relationship: committing to tests during refactoring

So now that we know that tests can catch hard-coded inputs like numbers, it can test our conditional statements, it can test the interconnectivity, we can really start to commit to testthat and see how far we can fall.

So here we're in the late relationship stage. And we're just going to trust that our tests catch everything. In the example here, we're going to do a refactor. And refactoring is just changing your code to maybe be more organized, but your outputs shouldn't change. So this might mean changing your naming consistency or adjusting your functions to reflect the standards of Tidyverse .

The detriment is that you could introduce bugs, and hopefully your tests will catch that. In our case, we needed to really speed up our package. It was very slow when we were running thousands of scenarios. So to do this, we changed all of our functions from Tidyverse functions to Base R. The thing is, our entire team learned R using Tidyverse, so we're a little rusty on Base R. So you can imagine there's a lot of room for errors to sneak in.

So as an example, we have this code chunk here where I've highlighted the different Tidyverse functions that we know and love. But when we change that to Base R, suddenly there's three separate lines of code, there's a lot of brackets and dollar signs. But the thing is, as I'm running this, and I'm a little unsure about Base R, I'll make sure that this works by running each line of code. And the thing is, it does. I don't run into any errors. The problem is, when I run my tests, I do get an error. And that's because this merge is replacing a left join. The merge default behavior is an inner join, and to get that left join behavior, I need this allX equals true. So my tests were catching that I was dropping some data that a left join would have kept that this merge default was losing.

So my point is that you can code, you might not see an error, you might not account for edge cases that your tests should catch.

So my point is that you can code, you might not see an error, you might not account for edge cases that your tests should catch. You might be lulled into a false sense of security as well as you're coding that there's, I don't need to run my tests, my code's working fine, it's not breaking.

And this is where you can add automated tests. So when you automate your tests, you can write a Git workflow that will execute on a push or pull request. For us, we have it set up such that when you create a pull request, you cannot merge that pull request until all of your tests and checks are passing.

So down here, you can see there are two commits. One has a red X, one has a blue check mark. The red X means that some checks were not successful. This first red X is our command check. This is a very standard check for your code. In addition to running your tests like we've been talking about, it executes some checks. So it might check your documentation, it checks that your dependencies are set up correctly, and you need to pass this, our command check, if you're planning on submitting to CRAN. And the second check that's failing is the test coverage, and that's the equivalent of running DevTools test in the console. If you are failing these tests, you can click on the details link there, and it'll take you right to where your code is failing. So similar to running DevTools test or DevTools check in your console, it's all here on GitHub and documented.

The thing about tests is that it's very frustrating sometimes. These are a bunch of commits that are failing. There's a lot of red Xs. This is where we finally get to that blue check mark. So if you start to implement tests, you should know that it might take a lot of failure before you see success, but imagine doing this without tests. You might think you're fixing something and break something else in the process of doing that. So your tests help you figure out and make sure that your code is really robust.

And this is where I started to feel like a real programmer. I felt like our package was not built on a house of cards where any change might crumble everything to ground. We can't add any new cool features because it might break everything. So tests made us feel like it's very robust and very reliable.

I felt like our package was not built on a house of cards where any change might crumble everything to ground.

Takeaways

So my takeaways are that you should use testthat at the beginning of package development because your tests help you manage complexity as you get your package more interconnected and complicated. It's easier to build coverage as you go and a little harder to reverse engineer that later. You might miss out on those different edge cases. And building tests into your package allows you to focus on cool development opportunities. For us, that meant being able to speed up our package significantly by refactoring without having to worry too much about the consequences.

For you, it might look like an open-source project where you can crowdsource cool ideas and contributions from people who aren't as familiar with the intricacies of your package and not worry about how they might break your package. Your tests should catch that.

So overall, I was wrong to judge testthat so harshly. Luckily, we committed to the package early and developed a really positive relationship with testthat and with package development. So I'm hoping that if you're considering your first package or you're in the early stages of developing a package, that you too can find love with testthat and start to feel like more of a real programmer. Thank you.

Q&A

We have time for one question. There it goes. Have you found yourself writing code differently once you've started implementing tests?

I don't know about writing code differently, but I do notice that every time I test my code, I run over and document it with a test.

Okay, maybe another question. What is your advice about when to write tests?

Just as you develop. So as you're writing your code and you're doing those typical tests like, did this output why I expected it to? Go write a test for that. If you have conditional statements where you have a bunch of if-else, evaluate every single one of those and make sure they work. For us, we had a bunch of different chemicals we can add to each function, so we want to make sure each chemical is behaving with the expected chemistry. Yeah, so make sure you're as covered as possible.

One more quick question. What was the most difficult thing to learn about testthat?

I think it was really difficult just to get into the mindset of using tests. When we first started, our project lead was like, if you're writing a function, everyone's writing their own tests. I'm like, what does that mean? That sounds so intimidating. That sounds like extra work. But once you do a couple tests, you realize that's actually really easy. I'm already doing this.

Featured software#