
Hadley Wickham | testthat 3.0.0 | RStudio (2020)
In this webinar, I'll introduce some of the major changes coming in testthat 3.0.0. The biggest new idea in testthat 3.0.0 is the idea of an edition. You must deliberately choose to use the 3rd edition, which allows us to make breaking changes without breaking old packages. testthat 3e deprecates a number of older functions that we no longer believe are a good idea, and tweaks the behaviour of expect_equal() and expect_identical() to give considerably more informative output (using the new waldo package). testthat 3e also introduces the idea of snapshot tests which record expected value in external files, rather than in code. This makes them particularly well suited to testing user output and complex objects. I'll show off the main advantages of snapshot testing, and why it's better than our previous approaches of verify_output() and expect_known_output(). Finally, I'll go over a bunch of smaller quality-of-life improvements, including tweaks to test reporting and improvements to expect_error(), expect_warning() and expect_message(). Webinar materials: https://rstudio.com/resources/webinars/testthat-3/ About Hadley: Hadley Wickham is the Chief Scientist at RStudio, a member of the R Foundation, and Adjunct Professor at Stanford University and the University of Auckland. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. You may be familiar with his packages for data science (the tidyverse: including ggplot2, dplyr, tidyr, purrr, and readr) and principled software development (roxygen2, testthat, devtools, pkgdown). Much of the material for the course is drawn from two of his existing books, Advanced R and R Packages, but the course also includes a lot of new material that will eventually become a book called "Tidy tools"
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
So today I am excited to talk about the 3rd edition of testthat. So I'm going to start by talking a little bit about what do I mean by that. You certainly want to know what a version of a package is, but what do I mean by this being the 3rd edition of testthat.
Then I'm going to talk through three big new features in this version, in this edition, see how testthat now displays differences much more clearly using the waldo package. We'll talk about a new type of testing, a new type of testing to testthat called snapshot tests. Very briefly talk about testing in parallel, multi-process testing, and then finally I'm going to give you a real quick goody bag of a bunch of little small stuff that will hopefully make your life a little bit easier when using testthat.
The edition concept
So the big news with this release of testthat, this upcoming release, testthat 3.0, is we're going to introduce the idea of this 3rd edition. And I should say anything you're going to hear about today is also explained in more detail in vignettes on the testthat website, so if you want to learn more about anything I say today, feel free to take a look at those vignettes, and if they don't explain, they don't help you understand what's going on, please file an issue so we can learn better.
So the basic idea of the addition is that we want to make breaking changes to testthat. Testthat's like 10 years old or something, and there's a bunch of things in the package that we now regret. But testthat's kind of a victim of its own popularity. There's something like 4,000 or 5,000 packages on CRAN that use testthat, and many of them are not actively developed. So there's no way we can change testthat without breaking a bunch of packages, and many of these packages, since they wouldn't be maintained, they'd just end up off CRAN because people wouldn't have the energy to fix them.
So the idea of an addition is that this is something you have to deliberately opt into. So if you want to use many of the new features or all of the new features of testthat I talk about today, you have to deliberately choose to use a 3rd edition of testthat. And to sort of do that, you basically have to add a line in your description file, config slash testthat slash addition. So if you don't do this, you'll continue to use the second edition, that's the existing behavior of testthat. If you do this, you'll get, if you do change this, you'll get a bunch of new features and a bunch of things, which I'll explain shortly, which have gone away.
So the kind of idea is that it should take hopefully, you know, less than 30 minutes to convert a package to use the 3rd edition. You know, it's a little bit of work, hopefully not too much work, but it just generally gets you up to date with all of our best current best practices. And I should say, if this kind of idea of this addition idea is successful, it's something we're likely to try out in other packages, a way of allowing existing code to continue working while giving you the choice to opt into a new set of behaviors.
So additions will always be coupled with a major version. So that means testthat2 and everything else before this counts as the 2nd edition, testthat3.0 can use the 2nd edition, that's the default behavior, or you can opt into using the 3rd edition. And if one day, you know, many, probably multiple years down the line, we decide to introduce a 4th edition, then that would be tied to testthat4.0.
What's going away in the 3rd edition
So what's going away in the 3rd edition? These are the things that are most likely to lead you to change your tests. So while the context function is going away, we've kind of been moving away from this for quite some time in favor of just using the name of the test file. There's no need to kind of duplicate the information in the test file and another function, which you have to remember to update if you ever move your test files.
We've also been lately in devtools and use this building kind of a stronger coupling between files in the R directory and files in the test directory. So that is a kind of a one-to-one correspondence, which gives you a bunch of handy keyboard shortcuts and development tools.
The very old expect that function is going away. This was the kind of first API, the first excessively clever API that testthat used with every test kind of had the flavor. You could read it like a sentence like testthat blah is true. And I pretty quickly decided that that was excessively clever, not very good. And so we're finally deprecating that for good.
Expect is we're deprecating in favor of expect S3 class, expect S4 class and expect type of just to be more precise about what exactly you're trying to express. Expect equivalent. I'll talk about a little bit later, but we're basically that's just going to be an argument to expect equal or expect identical. Some of the mocking functions going away in favor of more featureful packages. And then finally, we are moving away from set up and tear down in favor of test fixtures, which I'm not going to talk about today, but there's a whole vignette about them if you want to learn more.
And hopefully that's kind of all I need to say. Just these are the functions that are going away. Like if you really, really believe any of these functions are important, like now's the time to speak up and say you want to keep these. Otherwise, when you switch to the third edition, rerunning your tests, you'll get deprecation messages telling you what to replace these functions with.
Waldo: clearer comparison output
So that's kind of the stuff that's going away. Now I want to focus on the cool new features. I think the feature you are most likely to encounter and the most likely to give you pleasure in your life is that comparisons with expect equal and expect identical or test failures with expect equal and expect identical are now much easier to understand.
So previously expect equal uses the base all.equal function. This was never really what all.equal was intended for, but it kind of did an okay job. But there's a few cases where it just doesn't give very useful output. And so here I'm comparing empty cars without the first column to empty cars, the full data set. And you'll notice I get a lot of output. What tells me that there's 10 string mismatches in the names, that doesn't really help me. It tells me there's a length mismatch and then it gives me a bunch of mean relative differences. So this certainly tells me that there's a difference, but it doesn't kind of concisely point me to the fact there's a column missing in this data frame.
So now we use waldo. So to use this, I'm just going to use this function called local edition. This is not something you generally need to use in your tests because you'll convert a whole package to use a third edition. But we provide this local edition function for writing stuff like this, for writing vignettes so you can explain about this new edition. This just temporarily sets the edition to the third edition.
So now we use the third edition and now this is the new waldo output. So the kind of the first principle is that waldo is going to give you the most important differences first. And in this case, I think the most important difference is that the lengths are different. And then it tells us what the names are, how the names have changed, lining up the old names, lining up the actual names with the expected names. And then it uses color to highlight the fact that in the expected, there's another column called MPG. And then finally down here, we get an exact description that, well, the names are different, but the values are also different.
So here I've switched to use the third edition. Waldo then shows the most important differences first. It uses color to help highlight differences, and it always uses the names of things we have possible, just because that makes it easier to understand exactly where the differences lie.
So another little example here, I have a factor. This is an ordered factor with one extra level. When you look at the expect equal output from the second edition, you'll see like it kind of tells you that they're different, but it's hard because it doesn't include the values to know exactly what's changed. When we switch to the third edition, now we can see, well, there's a new element of the class vector. It's ordered, and there's a new element of the levels, which is D, and hopefully this will make it much, much easier to see exactly what's gone wrong. And certainly in my own, this is like, this is the number one reason I've been converting packages to use the third edition to test that, just because it makes things so much easier to see when something's gone wrong.
And certainly in my own, this is like, this is the number one reason I've been converting packages to use the third edition to test that, just because it makes things so much easier to see when something's gone wrong.
So if you want to learn more about waldo, you can go to the waldo website, which shows a bunch of kind of examples, shows some of the principles. A lot of the differences of the values are powered by this really nice package called diffopt by Brody Graslam, which uses the same algorithm as the diff utility on Linux, which just makes it really easy to kind of narrow in exactly what has changed between two vectors.
This change also makes it now possibly precise about what's the difference between expect equal, expect identical, and expect equivalent, which are always a little vague in the past. Now all of these functions are equivalent to expect identical, but with some extra arguments set. So if you use expect equal, that's equivalent to expect identical, except that it ignores small floating point differences. Or if you use expect equivalent, which is now deprecated, it's just the same as using expect equal or expect equivalent with ignore attribute equals true. That's the only thing that expect equivalent does is it just ignores attributes. Now we're going to make that precise in a function argument. And then any other arguments in expect identical or expect equal are passed on to waldo compare, which gives you the ability to kind of fine tune your comparisons as needed.
Now this is such a great idea. You might wonder, well, why can't it work for the second edition as well? Well, unfortunately, when I implemented expect equal, it turns out that I made a rather silly mistake when implementing the tolerance comparison. So depending on exactly which code path it goes down, it computes the tolerance in slightly different ways. One way it always uses the absolute tolerance, and one way it uses absolute or relative tolerance. So there's basically no way I did a few experiments. There's no way, regardless of which one I pick, it causes like hundreds of CRAN packages to break. So that's the main reason that this has to be in the third edition.
Snapshot testing
The next big feature I want to talk about is snapshot testing. Again, this comes with a vignette, so if you want to learn more about it, you can read the vignette. If the vignette doesn't make sense, please file an issue so we can make it better. But the basic idea is this is a new type of testing.
So normally in unit tests, you describe the expected output using code. And in the vast majority of cases, this is a really, really good idea because it allows you to kind of describe what you expect. And in some sense, the tests help document, because the tests are code and code is a means of communication. They kind of help describe the expected behavior of the functions as well. But sometimes describing the expected output is just like really annoying, like if it contains a bunch of special characters, like quotes or backslashes, you have to spend a bunch of time like carefully escaping them. And then when something goes wrong, it's hard to see exactly, like you've got to unescape them in your head, and it's just a pain. Or maybe it's very large, like maybe you want to check an entire HTML page or multiple paragraphs of text, as you expect. Or maybe it's not even something that you can easily describe with text. It's an image.
So the idea of a snapshot or golden test, which idea used in other programming languages and other testing packages, and test that inspiration draws primarily from Jest, which is a JavaScript testing package that Joe Chang shared his experiences with me a bunch and persuaded me that this was something really useful.
But so the snapshot test, the key idea is that instead of recording the results inline in the test itself, they're stored in a separate file. And test that provides a bunch of tools for looking after that, for managing that file. So it will create it automatically the first time you run it, and then it gives you tools to update it when you decide there really has to be a change. And if you've used verify output or expect known output, which we've never really advertised because we've never really been particularly happy with them, the snapshot tests basically supersede those functions.
So what does a snapshot test look like? So here I'm just going to give you a quick simulation in the presentation, and I'll show you what this looks like in a real package. So here I've got two files, foo.r, I've just got this very silly and simple function, which has a mistake in it, which you can see, which we'll fix shortly, and then I've got a test. So I run foo, I expect it as a character vector, and then I'm going to use this new expectation, expect snapshot output, and this doesn't have what the expected value is, because it's going to save it to test.
So the first time I run this, it's going to say, warning, create snapshot reference, and it creates a new file, and I forgot to put the correct directory in here, inside the test that directory, it's going to create a new directory called snaps, underscore snaps, and inside that, it's going to have a file called foo.md, right? So our R file is called foo.r, our test file is called test-foo.r, and then our snapshot for that, the snapshot file is going to be called foo.md. So all of these are just named, have a strong naming convention, so you can easily find which snapshot corresponds to which test, which corresponds to which R file.
So what does that snapshot contain? Well, it's a markdown file, I'll kind of explain the syntax shortly, but we use a heading to indicate that this is the test, and then it's going to put the output of that test directly in that file. So now if I run it again, if I run that test again, the test is going to work, because nothing's changed, and so the test will pass. If I change it to fix that typo, I run the test again, I'm getting an error, saying that the previous value was something conflicted, and the new value is something complicated.
Now the downside of snapshot tests is that there's no way to know what is correct, so you as a human now have to step in and intervene. So if this is correct, if you really did mean to make this change, you can run snapshot accept, and that will accept the change. And the way that this works is that when you have this value, when something is changed, there will be a new markdown file, which contains the new value. If you accept it, then it will replace that with the old value.
So let's just dive into that for a slightly more realistic example. Start with a simple one. So I have this package, I have the same test which I copied before, and I'm going to run this test. And I'm going to run this test by pressing commands T, which is a shortcut for devtools test file. So this keyboard shortcut takes advantage of that convention, that if you've got a file called foo.r, the corresponding test will be test.foo.
And so this says I've added a new snapshot, this is the value of that snapshot, I can look at it, there's this markdown file again, which I'll explain shortly. And then if I run that test again, you'll see that all the tests have passed.
The other thing, if you've used devtools test file before, which I'll talk about briefly later, is there's now a slightly different display, a slightly more compact display to hopefully make it easier to run tests for a single file interactively.
So I'm going to change this, I'm going to correct that typo, I'm going to test again, and now the test fails because the snapshot has changed. So it shows you the current value, which is something complicated, and it shows you the previous value, which is something complicated. Now, if this is a deliberate change, you can run snapshot accept. If this wasn't a deliberate change, oh, that was just a typo, I can fix it and rerun and all my tests pass again.
So if you've ever used verify output, this is a little bit different because verify output automatically updates the kind of true known value on disk, which forced you to use it with GIT, basically. So I'm going to fix this, the test fails, you can see now in my snaps directory, I've got foo, that's the foo.md, that's the previous value, and I've got foo.new.md, which is the new value.
Okay, so let's look at a slightly more complicated example. So this is a bullets function, and it's basically used to create HTML bullets.
And this is like a little annoying to test, because if you're going to put this in a test, you'd have to escape all of these new lines and kind of carefully manage all the white spaces, it's just a pain. To test this code, it's a little easier to use a snapshot test.
So I can run this test, and now here is a test snapshot file with multiple expectations in it. The first one, I am just using a bullet with a single bullet, and the second also has a single bullet and also sets the ID. So if I later want to change this bullets function, maybe I've decided actually I don't want this indent. Let's get rid of that. I rerun the test. Again, the changes use waldo to highlight what's changed. So it's a little bit hard to see this, but this is the thing that's the same. It's in gray. What's changed? We can see there is now no space where there was before.
So I look at this, I decide, yep, that's a deliberate change, and then I run snapshot except bullets to update that snapshot.
And that basically is snapshot testing. Of course, there's more documentation, and I think it will take a little while to get your head around it all. I've shown you expect snapshot output here. There's also expect snapshot, which, as well as printed output, also captures messages, warnings, errors. There's also expect snapshot error if you want to capture specifically just error messages. And then we've got expect snapshot value, which captures return value. So this is a little bit different. This is if you, for example, wanted to test the output from a complicated function, you just want to make sure it doesn't change without warning. You can use expect snapshot value.
So the only other thing to mention is just what do these files look like? Why are they markdown files? Well, if you're writing this, this package is a collaborative project and you're using GitHub or some other tool where you do code review, it's really important to make sure it's really important that these snapshots be human readable because when someone goes to review your code, they need to go look at the snapshot and say, is this a reasonable change or not? So they use markdown. Each file, again, is there's one snapshot file per test file, which normally corresponds to our file in the R directory. There's a heading for the test name. And then if you have multiple snapshot expectations and a single test, they're separated by the horizontal row with three dashes.
One thing that I'm working on at the moment and some help from Joshua Kunst is a Shiny app that will help you review all of these differences and accept them or reject them with by clicking buttons rather than typing stuff at the console. And the other big part of that is providing tools for doing image snapshots as well. So this is something that we've implemented in two places and Shiny test, which is used for testing Shiny and VDiffer, which is used for testing ggplot2. The idea is they're going to pull out the common code, centralize and test that and invest a bunch in this whole snapshotting idea. So you've got a really nice workflow if you do need to do image tests.
Image tests are complicated because they can change for all sorts of reasons unrelated to your code. But sometimes they're all you have and they are a really important part of testing both Shiny itself and ggplot2 itself.
Okay, so that's snapshot tests. Main idea of snapshot tests is that compared to regular unit tests, which have the expected results in the test file, snapshot tests store the expected results in another file, which makes them suitable for testing large output, output with quote marks and backslashes in it and for testing things that you simply cannot describe with text like images.
Parallel testing
Next, I wanted to talk briefly about parallel tests. This is still work in progress by Gabor Chadi. And again, you're going to have to activate this specifically for your package again by putting another line in your description, config, test that parallel true. But the payoff for this is pretty big. It's going to run your tests on multiple processes. So if you have long running tests, this is going to make a big improvement to the total running time of your tests.
Now, the downside is like there's a little bit of overhead associated with that starting out multiple R packages, multiple R processes, loading all the code in those. So there's a little bit of overhead. So if your tests are very fast, like if your tests all run in under a second, it's probably not going to have a huge impact. But if you've got tests that take five or ten seconds, the whole test suite takes minutes, then this should hopefully have a really big positive impact on your workflow.
There are some downsides. As well as this big upside of speed, tests will now kind of effectively run in stochastic order because you'll have like four processes and each process takes kind of the next test in the queue. And so depending on exactly how long each test takes to run, the tests might get run in a different order. And this basically means if you have any dependency between your tests, which is relatively easy to introduce because normally they will run in alphabetical order. If there is any dependency between your test files, you'll get like random test failures that occur sometimes but don't occur other times depending on exactly which order the tests are run in. In other words, this is like a dependency debugging nightmare.
So we're still thinking about tools to kind of ensure that if that happens to you, we've got some mode debugging mode you can switch on to get more insight. And then also, if you have used like global setup or teardown, for example, like you set up a database or some CSV file for all of your tests, you're going to need to think that through in a little bit more detail because those setup and teardown files are now run by multiple processes. So there's going to be a little bit of work to convert your tests to use Parallel. We're still working on this. You know, there's a vignette if you want to learn more and want to try it out. But we're hopeful that this will make a big impact if you've got long running tests.
Goodies: reporters and other improvements
So I've talked about this idea of the third edition. This is a special mode. You'll have to switch on if you want to use all the latest and greatest test that features. We've talked about waldo, which makes comparisons or test makes test failures from expect equal and friends much, much easier. We talked about snapshotting tests. We talked about running tests in Parallel. Now I just want to show you kind of a bunch of little features that I think are kind of cool.
So in the course of working on test that for this release made a bunch of improvements to the reporters. The reporters are the things that actually go and generate the results. So when I press test, this is done by a reporter. If I press command shift T to run all of the tests, these are different reporters.
One reporter that you never normally use is called stop. You don't normally call it explicitly, but it's called stop reporter. That's what happens. And that's the test that's run. So that's the reporter that's run when you run a test interactively. So here I'm just running a test, which is not working for probably reasons that I need to look into.
But if I create a share of another example over here, if I run this test, this is the stop reporter. Now it clearly tells you that your test passed. And it gives you some emoji. If your test does not pass, it nicely displays the failures. And it also displays any warnings. And really conveniently, it also displays the backtrace for the warning. So if you've got warnings in any of your tests now, they get a full backtrace. You can figure out exactly where that warning came from.
One of my kind of pet peeves is, let me do this, is this. I have partial matching warnings turned on. And if this occurs somewhere deep inside a function, inside a function, inside another function, it's really hard to track down. Now in your test, you get this nice backtrace so you can figure out exactly what sequence of functions leads to that problem.
OK. So that's the stop reporter, which is used in debugging. So now uses color, uses emoji, and generally gives you more information about problems when you're interactively running tests. I showed you earlier that compact progress reporter. That is the single line here, which gives you kind of running.
So you get like a running progress bar in some sense of all the tests that are running so you know exactly what's going on.
I've made the regular reporter, which you see most often when you're running all the tests in your package. The biggest thing is I have added a bunch of new praise that uses more emoji, because I think emoji are fun. These random praise kind of veer into a little bit into dad joke territory. But hopefully that's just something, a fun little feature of testthat that keeps you motivated and keeps you going when your tests aren't working well.
And finally, the last reporter is the check reporter that's run inside of our command check. Now it reports all of the problems. So it also reports warnings and skips tests. It tells you all of your skips tests by type, which is really useful for checking that you haven't accidentally skipped tests that you meant to run. And it also creates an RDS file with a machine readable list of all of the tests. That's probably not something you're going to use directly, but it is something that we will start to build into our tooling so that things like GitHub actions can give you nicer displays of your tests and so on.
We've also made a few changes to the way that the condition functions work. So here I have a function that calls warn and message. And now if a condition like a message, a warning, or an error is not explicitly caught by your expectation, it will continue to bubble up. So if I run this line, if I run this line, this expectation passes. It does find a warning called hi, but the message called bye still bubbles up. Or if I say I expect a message called bye, I'll see a warning called hi. So if I want to capture both of them, I need to use expect message and expect warning together. This hopefully will give you better control over exactly what's going on. It's going to cause a few more warnings in your tests.
These won't cause your package to fail our commands check, but they will require a little bit of work to make sure that those tests are as you expect. If you want to just ignore them, you can still, rather than using expectations, you can just switch to the base functions, suppress messages and suppress warnings, if you don't care about them.
Now, you might notice here that we've got expectations nested inside of expectations, and you might kind of naturally think, well, why can't I use the pipe for that? Unfortunately, you cannot currently do that because the pipe eagerly evaluates everything. So if I do this, you get a bye and a hi, and then this fails because this is called before this is called. We will start to announce a new work in Magruder that will make this work and hopefully make Magruder a little more compatible with the native pipe that is unlikely to appear in the next version of R as well.
Okay, so that also means if you're one of the, I don't know, 10 people in the world who use the all arguments, so expect warning, expect message, that's now deprecated. You have to take a slightly different approach, but I'm pretty confident that's going to be a much nicer API overall and shouldn't change existing behavior too much.
And the last thing is if you've ever used expect error and it gives you this message to set the class, we no longer encourage you to do that. It kind of fixed one type of test fragility, basically the cost of just introducing a different type of fragility, and so now you're better off just using expect snapshot error if you want to check that a specific error occurs because that just gives you a bunch of nice features for managing the change over time.
Summary and next steps
Okay, so I'm just about to wrap up, which is great. So there's plenty of time for questions. So what have I talked about today? First, test that 3.0 is coming out soon. I should say we'll probably start the release process in about a month, which means it's at least two months before it's on CRAN. So it would really be great if you would try it out, and this is the right time to let us know if this is causing you pain so we can fix it before release.
Plenty of time to do that. So the third edition is going to be part of test that 3.0, which will be on CRAN and at least, well, the soonest possible is probably two months. It's likely to be a little longer than that. The third edition you will have to deliberately opt into. That gives you a bunch of new features at the cost of doing a little bit of work to clean up old APIs.
The new edition uses waldo to make comparisons, which will hopefully make your test failures much easier to debug. It provides snapshot tests, which are an alternative form of testing where the expected results are stored on separate files rather than inline and code, and then the test that provides some functions to help manage those files so that you can accept changes when you have made them deliberately or revert changes when you've made them accidentally.
Test that 3.0 will also introduce parallel testing, which, again, will be a little bit of setup work just to make sure your global setup and teardown works appropriately and you don't accidentally have any dependencies between your tests. But the payoff will be if you have slow tests, they should run much, much faster because they'll run in parallel. And then, finally, I showed you a bunch of goodies, many of them featuring an emoji that will hopefully make your day-to-day life using test that a little bit more fun, a little bit more pleasant.
So if you do want to try it out today, you'll need to get the development version of test that and the development version of devtools. And then in your package, if this is a package that uses continuous integration or similar, you'll need to add test that to remotes. And you'll need to accept the addition to the third edition to take advantage of all the latest and greatest features.
So that was a lot of content in 40 minutes. Again, hopefully, if you didn't take in everything I was saying, there's plenty of additional material and all of the vignettes. And if you don't find those understandable, please let me know so we can make them better. Thank you. And now, hopefully, Jenny will have some questions for me.
Q&A
So the first big one is I think people want to hear more about why you're doing a third edition. So why not create an entirely new package the same way that plyr led to dplyr? Or why not create test that 3? Or why can you not do this through semantic versioning of the existing package?
Okay. So the first question is, like, why not a new package? And that was something I considered. But, like, 90% of the code between the second edition and the third edition are the same. Like, the biggest difference with the third edition is it just takes stuff away from you, stuff that we now regret. So if we created a new package, we'd have two packages that have, like, 90% overlap in the code. And whenever there was a bug, we'd have to remember to fix it in both packages. So I think for this case, where, like, most of the stuff is the same, we're just trying to get rid of some new things, I think creating a new package would be overkill.
The question about semantic versioning is something that, like, test that does use semantic versioning. We test that 3.0. But that doesn't really help in the R community. Because you only have a single version of a package installed on your computer and a library. You can kind of work around that. But it's a little fiddly. And for packages on CRAN, they always use the latest version of the package on CRAN. So if I just released test that 3.0, with all of these new features in it, like, something like 500 packages would break on CRAN, because they would automatically use the version of test that on CRAN. So I think that's why those are, like, we want to make a little bit of friction, so that you have to deliberately choose to use this new testing, use these new conventions, use this, basically stop using old stuff that you should have stopped using a while ago, without duplicating a bunch of code that we'd then have to maintain in two places.
So now I have a series of questions, and the list is growing longer, that are smaller. And we'll just work through as many as we can. So this first one is, have you tested whether snapshotting plays nice with things like cover or the good practice package? Does this third edition work have any effect on Shiny test? So kind of talking about how this works with other packages.
Yeah. So cover, I mean, snapshot testing is just testing. It just works exactly the way you'd expect with cover. I don't know that much about good practice. I would think it would not, I don't think it would cause problems for good practice. Shiny test, it's not clear precisely how this will play out. I've been having kind of bigger discussions with the Shiny team about how testing Shiny should work. Because my vision for testing Shiny is quite different from their vision of testing Shiny apps. And so we're just working towards that. I think in an ideal world, eventually Shiny test would use kind of more of this new snapshotting infrastructure that test that provides. But there's some other big changes that have to happen to Shiny test. So it's hard to say today whether the Shiny test will eventually kind of integrate this better or there'll be a new package that's kind of a successor to Shiny test that uses some other stuff and is more focused on snapshot testing by test that.
All right. The next one is, do test snapshots need to be included in the package or could you exclude them if they're large and then developers can recreate them from a stable version?
I think if you want people to be able to run your tests, you have to include them in your package. Because they are literally like the correct output. One thing I did forget to mention is that snapshot tests are not right, will be skipped on CRAN automatically. You can opt out of that with an argument if you want. Just because if a snapshot test fails, it isn't necessarily true that it's like a real meaningful failure. It might be an incidental difference or a difference from a downstream package that you don't want to cause a failure on CRAN. But yes, if you want people to run your tests, you have to include the expected results in the package.
All right. Is it possible to use snapshot tests if the function does not return anything, but rather has a side effect? So I mentioned briefly expect snapshot, which captures the side effects of messages, warnings and errors. If it's making other changes to the global state, you would need to make that kind of explicit. You know, like if it's attaching, if your function attaches a package, you know, you might do expect snapshots, like search, which would capture the search path. Or maybe if it's creating files in the directory, you could use, so it doesn't directly handle side effects apart from messages, warnings and errors, but you could easily wrap up a little function that captures those side effects and makes them explicit and snapshot that.
All right. The next person missed a little bit about what's going on with context, but sees that it's deprecated and thinks that it's needed when using the JUnit reporter. So how are those two things going to work out?
Okay. So this is just a per package. I just searched to find all the uses of context and you can see like the test-lmap.r file contains a called a context lmap. The test-mapper.r calls as it contains a called a context as underscore mapper, which is a little mildly inconsistent or test-win contains a called a context win. Test-utils contains a called a context utils. So by and large, it's like a one-to-one mapping between what you put in the context and the name of the file. And that is just, you know, you don't want that duplication because you end up with inconsistencies.
Well, maybe if I can find, well, here we can see that this is a test for pmap. It probably used to be test-pmap, but we renamed the file and forgot to update the test context. And so now there's like this mismatch between the two. So that's the reason why we're getting rid of context because it's just, it just introduces this duplication from little real guys. But the context is still kind of implicit. Like we just take the file name. So the context that tests the JUnit reporter, for example, we'll use, we'll just be map underscore in. So it shouldn't change anything practically with the JUnit reporter.
Okay. So this is a person like me who uses with mock and local mock. So if they're going to be deprecated, what is the suggested way to mock functions in external packages? I read that the goal of mocker, the package mocker, is to provide a drop-in replacement, but it does not have this feature.
Yeah. So basically, the way that with mock works is truly an abuse of R, and it does something that I now believe to be tremendously ill-advised and the chances of it stopping working in a future version of R are high, because what it does is so, you know, it's like, you know, future version of R are high, because what it does is so bizarre and horrific.
So basically, you have to use another approach. So both the mock R and mockery packages provide slightly different techniques. Unfortunately, you just have to use those. Although I'm just so concerned that the way that with mock works is not, it's not good, and it's surprising that it hasn't caused any problems to date. So I just, yeah, unfortunately, it's like something that is really nice and really useful, but was achieved in a really, really horrible way.
Um, does Snapshot have a limit on the size of the data? And can you combine Snapshot tests with more explicit tests of the Snapshot?
Uh, yeah, so that was kind of, so there's no limit on the size. And I think, you know, generally, you know, you'll, you'll, you will sort of mingle Snapshot testing with regular testing, like where you can, like, if this is a list, you might want to test that this is a specific value. And there's another component of the list that's just a, you know, a big blob that you don't want to kind of type out. So I would imagine, you know, I really imagine that you will want to mingle those as much as possible.
I think it will be tempting to overuse Snapshot testing, just because it's so convenient, because you have to write like the right hand side of expect equal, you have to, you know, to make clear in your head exactly what you're expecting. But I think if you do that, if you, if you use Snapshot too much, you'll find your tests are a little bit fragile and kind of break more often than you're otherwise like, and that will kind of push you naturally back towards a finer grained unit testing that you're, that you're used to and test that.
I think it will be tempting to overuse Snapshot testing, just because it's so convenient, because you have to write like the right hand side of expect equal, you have to, you know, to make clear in your head exactly what you're expecting. But I think if you do that, if you, if you use Snapshot too much, you'll find your tests are a little bit fragile and kind of break more often than you're otherwise like, and that will kind of push you naturally back towards a finer grained unit testing that you're, that you're used to and test that.
You know, of course, putting really large files in your package is going to be annoying for other reasons. So I don't think you want to put like, you know, megabytes of data using this, but certainly like kilobytes of data, which would be a pain to put in a test file. It's a really nice sweet spot for Snapshot testing.
Okay, I have two questions that are a little bit of emoji pushback, not to

