Amanda Gadrow | Getting it right: Writing reliable and maintainable R code | RStudio (2019)

How can you tell that your scripts, applications, and package functions are working as expected? Are you sure that when you make changes in one part of the code, it won't break something in another part? Have you thought deeply about how the consumers of your code (including Future You) will use it, maintain it, fix it, and improve it? Code quality is essential not only for reliable results but also for your script's maintainability and your users' satisfaction. Quality can be measured in part with targeted testing, and fortunately, there are several effective and easy-to-use code testing tools available in R. This talk will discuss some of the most useful testing packages, covering both concepts and examples. VIEW MATERIALS https://github.com/rstudio/rstudio-conf/tree/master/2019/Testing_R_Code--Amanda_Gadrow About the Author Amanda Gadrow Amanda is a software engineer with many years' experience writing automated test frameworks for enterprise software. She started learning R when she joined RStudio in 2016, and has been basking in its glory ever since. Amanda leads the QA and Support teams, and spends a significant amount of time analyzing customer data to improve the products and optimize support. She is a co-organizer of R-Ladies Columbus, and an avid musician on the side

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Let's start by making sure we understand why we're writing code in the first place. What we're trying to do is data science through code. We're trying to answer questions and make predictions that are difficult to do manually, so we automate it in the code. Ideally we do so in a way that is reproducible and easily augmented and that facilitates collaboration and outcome distribution.

And if we come up with some code that we think that would be useful to others, we can share that too if we do, if we write a package for it.

So code is written in the service of analysis. It's a means to an end, but it's also an artifact. This is really important. Your code is not temporary, at least not if you ever want to run it again. So we should not be treating it as an afterthought. It should be a primary concern because you are going to need to run it again someday and you need to trust that it will produce accurate results every single time you do run it.

Your code is not temporary, at least not if you ever want to run it again. So we should not be treating it as an afterthought.

What we want from our code

So since the code behind our analysis or prediction is important, we should know what we want out of it. Ideally we want reliability, the ability to depend on our code producing the correct results, and we want confidence that our functions won't break when new data comes in or that one area of the code won't break. If you make a change to one area of the code, it won't break another area of the code. We also want reproducibility so that you and others can run the same code with the same data and get the same results every single time.

Flexibility goes hand in hand with extensibility. In many cases we want to be able to handle different data sets, inputs, context, things like that, and we need to be able to extend what we've already written to make incremental improvements to the code and in our projects. We want our projects to have longevity, to be relevant for the long term, and to accomplish this it has to be straightforward to update. We want to make it easy to iterate on the code since small changes over time are easier to absorb than a huge code drop in much the same way it's easier to put $20 away a week for a college fund rather than to find $1,000 at the end of the year to drop all at once. Finally we need the code to be scalable, flexible enough to handle more users, additional variables or inputs, and potentially larger data sets as well.

So the overarching need here is to be able to trust the outcome of our analysis, which is directly related to the quality of the code that is producing the results.

Measuring code quality

Code quality does matter. We should be examining it to make sure that it's up to snuff. How do we do that? Well, we can get some feedback during script execution in the form of console output, logs, errors, warnings and things like that when the code runs locally. And we also get user feedback from the people who are running the script or launching this shiny application. We could wait until it breaks in the field, but it's better to check it yourself before you release it to the rest of the world.

The best way to do this is to create tests to verify the quality of the code. Just check the output of the script to make sure it's doing what it should. And test themselves, unit tests especially are separate R files that are designed to exercise the functional code to see how it handles various inputs and integration points. And by that I mean pieces of code that rely on the output of other pieces of code, like a function that relies on the output of another function. So you want to examine how quickly it runs as well and how well it runs under load to gauge its performance.

One of the biggest benefits of testing is that you're able to make faster updates to your projects, because you run your tests if you run your tests after you make the changes, you'll get quick feedback on the impact of those changes. So the act of writing the test increases your visibility into internal dependencies, which are those integration points that I just mentioned. And you'll potentially gain a greater understanding of the meaning and the intent of the code, because you really need to understand the code extremely well in order to be able to test it effectively.

Ideally when you write these tests, you're going to want to cover all of your code paths, but practically speaking, you're going to want to start with the major functions, inputs and integration points. You're not going to be able to test everything all at once, so focus on the parts that you think may be the most fragile. Once these tests are in place, you can then expand to other areas of the code once you've gotten sort of those major areas covered first.

Manual vs. coded tests

Manual testing. So you're going to do this as you're developing your code. You're going to run things in the console just to make sure things are working, there's spot checks, you want to make sure that the stuff that you're writing is actually producing the results you expect. The problem with those, though, I mean, they're very good, you want to run them, make sure things are working, but the problem is you're not going to remember what you ran three months later when you go to change the code. So it's really easy to have things break. Areas that used to work might break because you don't remember what tests you ran the first time to make sure that it was working in the first place.

So to avoid this problem, you should convert some of those manual tests into a set of coded tests. Coded tests are called unit tests in this context because they exercise small units or pieces of a larger whole. And you'll want to run these often, again, because you're going to get feedback right away when you make changes. If you find an issue several weeks later after you've moved on to other things, tracking down the cause of that problem is really difficult. It's much better to build these in from the beginning if you can rather than to wrestle with a fallout later. So writing tests now rather than waiting for bugs to appear later is the classic case of an ounce of prevention being better than a pound of cure.

Getting started with testthat

So now that I've convinced you that writing unit tests is a good idea, let me show you how easy it is to get started. One of the most popular and useful packages for our code testing is test that. It's part of the tidyverse and it provides functions that make it easy to describe what you expect a function to do, including catching errors, warnings and messages. You can use the functions in test that to exercise any project, but here I'm going to give an example of test that on a package.

To get it set up, it's three easy steps. You install the package test that. You also want to install use this, because we're going to use that package. It's a workflow management package that makes it easy to get going with projects. And in this case, it has a really nice function called use test that that sets up the infrastructure for your tests. And then to write the tests themselves, also when you use this, you use the use test function and that will create and open up your very first test file.

So let's see how that looks. Can you see that okay? I'll try that. So here's a small demo package with three files that I have in here. Let's see. If I can make this a little bigger. It's small. There's only three files in here. Just a couple of functions in here. They're dummy functions just for the purposes of this demonstration. But I'm going to show you how easy it is to get the testing infrastructure set up here.

So here's my package over here. Looks pretty similar to what you used before. You've seen before. So I've already loaded my libraries, I've already installed use this and test that. So I'm going to run use test that function here. And you'll see here that it created a brand new test directory down here. Within that is the test that directory. And this is where most of your tests are going to live. To create the very first one, I'm going to do the use test. And I'm actually not going to do add. It actually doesn't matter what I pick. I'm going to pick the add things set of functions. And what it's done there, if I go back into the test that, it's already created this R file for me and opened it up for me with some dummy text. Again, this is very simple. You're going to want to change this, obviously. Nobody cares about two times two. You don't really need to prove that. But that's just how easy it is to get going.

So I'll give you an example here. This is from the string R package and this is taken straight out of Hadley's book. This will give you an idea of what your tests are going to look like. At the top, you have a context, which is a description of the test that you're running here. And then you have each of these little sections here, the test that is the test itself. It's exercising the output of a single function with one or more expectations. And the expectations themselves describe the expected result of a computation. You can have multiple of them. Here you see they're all using expect equal. But if you look at the documentation for test that, you have a lot of different choices. You can test for a known error. You can test for matching. You can test for all kinds of things using these functions.

So I'm going to go back over into the code. And here, this is basically that same dummy package that I've been working with. But I've already created tests here. So you'll see in my R package, I've got three different related sets of functions. And then I have tests for each of those in the test that subdirectory here. What I'm going to be looking at here is the Uniqueify function, which takes a character vector and spits out the unique values. It's not very interesting, but it's pretty good for a demo. And then there's some related tests that I've already written over here. So to run these, if you're in RStudio , you can do command or control shift T. It'll run all the tests in a package. And you'll see the output over here in the build. Can you read that okay? Let me see if I can make that bigger. Over here in the build, it'll give you the output of that. You'll see that it's just running DevTools tests under the hood. And over here in the output, you'll see the different contexts that it ran. Those are the contexts that you have at the top of your test file. And then you'll get some feedback on whether they passed or failed. And these all passed. That's great.

So at this point, I'm going to make a change to my code. Right now I'm only expecting a character vector, but I want to be able to handle factors as well. So I'm going to add some very simple code to be able to handle factors. This is great. I'm going to save it. I'm going to run the test, because I just do this as a habit, just to make sure everything's still working. And it did not like that. What did I do? Oh, did I? That was really smart of me, wasn't it? All right. Thank you. Let's try that again.

Thank you. And in this case, you're going to see that one of my tests failed. So over here you still have the context over here in the output, but you'll see that one of them have failed. And it tells me why. It says Uniqueify does not handle factors. That's the benefit of having a really descriptive name here in your test. I can now go looking for that and it says, oh, well, I had a test. Because it didn't use to handle factors, I was proving that it didn't use to handle factors. In this case, it does now. So I need to change up my test. Now I am going to expect it to handle factors. I'm going to run the test again. Life is good. They're all passing now.

So once you have this set up, this is a really simple development workflow. You modify your code to your tests. You test the packages with command shift T or devtools test and you repeat until all the tests pass.

Designing good tests

So as you're designing these tests, you're going to want to put yourself into the shoes of the user, the consumer of each function. What would you expect to happen? What are some common mistakes or typos that you might anticipate that a user will put in there for you? You're going to want to focus on your external interfaces, your input types, and integration points since those are the areas that tend to be most fragile. Write one test for each behavior so that you always know where to find the test for a given behavior. And when you discover bugs, write tests for those as well so you're notified if they crop up again later.

As you're writing these tests, you may find that parts of the functional code would be a little easier to test if they were more isolated. We'll come back to that later in the talk, but generally speaking, if some area of the code is hard to test, it's also going to be hard to maintain.

Testing Shiny applications

But before we get to that, let's take a look at some Shiny applications. Joe talked about this a little bit this morning and Winston did a really nice talk on this last year, so I'm not going to get too deeply into this, but I wanted to give you a quick idea of how this works. This is also really easy to set up. It's also three steps just like test that. But Shiny test works a little bit differently. It compares the state of the application to a previously expected state of the application. Shiny apps, because they're interactive, are a little more difficult to test straightforwardly. So this is a— I can't remember what he called it, but you're comparing the state of the application for the way you expect it to the state of the application where it's working right now. It's kind of a snapshot-based testing strategy.

So easy to set up. There's basically three different things to do. You want to install your libraries, you want to record your test or take that initial snapshot, and then later on you run the test app function to test it and then compare it against your previous snapshot. First you start by running record test. This will open up an interface where you have your target app on the left and a recorder app on the right. And what you would do is over in your target app, this is number one in the slide here, you make changes to your application, you do controls, you act as if you were a user. And then over in number two over here in your recorder app, you'll notice that it's recording everything you do. At some point you take a snapshot, which is number three up here, then you give it a name and save it, which is number four there. So you're basically recording what a user would do and setting up that state.

Then once you record it and save it, it's going to open up the file for you. You can edit it if you want to, you can change it up. In this case we're not going to do that. We're going to move on and we're going to run test app now and you're going to run that same app and it's going to play back those things that you recorded earlier. If everything's the same, it'll just say pass, but if it's different, it gives you the option to take a look at why. So when you hit why, you can take a look at the JSON encapsulation of that state, which is number one here. And you can also look at some screenshots. Number two here. Again, I'm not getting into this because it's online and Joe talked about it a little bit this morning, but you can toggle back and forth and see what the differences are. If you're okay with the differences or you've made some changes and the updated version is what you want, number three here you can update and quit or you can just pick quit and then investigate why it might have gone wrong if it's not what you expected.

Designing for testability and maintainability

So getting back to something I alluded to earlier, designing for your ease of testing is also designing for ease of maintenance and so there's some good software practices involved here. So let's talk about that quickly. Again, your goals are reliability, reproducibility, flexibility, longevity and scalability. You want a good user experience and you want a high degree of confidence that everything you're doing and all of your changes aren't going to break what you've already done well before.

You also want to take a look at a modular design for ease of testing and maintenance. Break things down. It's much easier to test a small function than it is to test a huge function because it's much more easy to control the inputs and outputs that way. You want to decouple your code, pull out helper functions, things like that to make it easier to maintain later. Don't repeat yourself. Keep it simple if you can. So I don't mean keep your analysis simple, I mean keep the pieces of your analysis, your pieces of code as simple as possible, again, so they're isolated and easy to test and maintain. And you want a consistent coding style so that it makes it easier for future you to read or for your colleagues for that matter.

Then on the shiny test side of things, for example, you want your ease of use user experience tested, the state of that application when you've done a certain set of user actions, you want that to be as consistent as possible and you can prove that with some shiny tests. And finally, you want to make sure that your error handling is good. You want to make sure if something does go wrong, that the message you give your user is actionable.

So at that point, yay, profit. However, you have to remember that the actual tests themselves are not going to ensure the high quality of your script or application, they just verify it when it's there. You need to develop a workflow of adding these tests as you go along to ensure that you're actually building in the good code in the first place.

Resources and next steps

So I'm going to close up here with some resources. There's some really good things you can read or watch. There is the R packages and advanced R books by Hadley Wickham are an excellent place to start, not only for testing, but also for good software design in general. And I wanted to add to that as well, it is never too late to start. If you've got a big package and you're, like, intimidated by it, I've certainly inherited packages that I'm intimidated by, just start small. Start small, add something in, follow Hadley's advice, it's really good advice, it's not hard to get started, it's three steps.

There's also this Charles Gray blog post on test that auto test. I don't have time to talk about it, but you can basically turn on auto test and every time you save your file, it will run your test for you. It's immediate feedback and it's a really cool way to do stuff if you have enough tests to make it worth your while. And finally, again, I'm going to refer to Winston Chang 's talk on shiny tests he did last year. It's an excellent video if you have a chance to watch that.

Resources for testing packages, I didn't talk about most of these, I just didn't have time. But all of these pink words are actually links. So if you want to pull down my slides at the end of the conference, you can go through these are really excellently documented, and so you can read all about them there as well.

Next steps, start small, give it a go. There's no reason not to do this, it's three steps. You don't have to do everything in the world right at the very beginning. You can start with your major functions and move on from there. Look for opportunities to improve the functional code to be more testable and therefore maintainable. My future self will thank you, trust me, I've been there. If you are using source control, check in your tests as well as your functional code. So that it always stays in sync with what you're actually building. And if you have a continuous integration system like Jenkins or Travis, if you check it in with your code, there's pretty easy ways to plug it in and have it run the tests for you as part of your regular build integration system for your packages.

And then you can bask in the warm glow of reliability and maintainability because your tests are telling you whether or not you've broken something and you don't have to guess and you don't have to wait for your users to find it. Thank you very much. If you want to have any questions about this or anything else, feel free to email me. I'll be around in the Pro Lounge as well later. And slides and code examples will be available after the conference. Thank you. Bye-bye.

Look for opportunities to improve the functional code to be more testable and therefore maintainable. My future self will thank you, trust me, I've been there.

Featured software#