Barret Schloerke: Lessons Learned Testing 2500+ Shiny Apps Every Day

Transcript#

This transcript was generated automatically and may contain errors.

Okay, everyone, we are just chugging right along here. I'm going to thank everybody every time I come out, so just get used to it. So thank you guys so much for sticking around. I think we're seven hours into day one of the 2023 Shiny Conference, and we have two speakers left.

So this next speaker is Barret Schloerke , who in my world goes without needing an introduction, but for anybody who doesn't know who Barret is, he is a software engineer and Shiny developer at RStudio . He currently develops and maintains many R packages in the Shiny ecosystem at RStudio, including Shiny, reactlog, plumber, learnr, leaflet, shinyloadtest .

And you really can't encapsulate how much of an impact Barret has had on Shiny. I'll let him speak to that, but a little bit of a background on Dr. Schloerke is that he received his PhD in statistics from Purdue University, specializing in large data visualization. And today Barret is going to be speaking about lessons learned from testing 2500 plus Shiny apps every day. So Barret, it's such a pleasure to have you once again, and I'll leave the floor to you.

Awesome. Thank you so much for the introduction, Ian. It's always wonderful to see you and present here at the Shiny Conference done by Epsilon. It's such a wonderful conference.

So as Ian said, today is actually going to be a little bit different talk for me. Normally I'm doing technical talks deep into the weeds, and today is going to be a little bit more of a story time. And I think it's kind of fun because I'm going to unveil the curtain behind the great development of Shiny over time. And it's kind of fun to see where we've come from and actually where we're at today.

Background on testing

So just a little bit of background, testing, never heard of them. If this kind of rings a bell for you, I recommend two chapters to check out. There's one in the R packages book. It's been updated recently. It's now chapter 14, and it's testing basics, and there's a couple of follow-up chapters as well. And then also check out chapter 21 in Mastering Shiny, it's the testing chapter there. Both of those is wonderfully written and good explanation about motivation of testing and why we need them, and how they can actually even apply to Shiny as well, even if it's for an R package.

And then last year, presented at ShinyConf, and if you want a refresher on Shiny Test 2 and test that, which I recommend highly if you have your app or package that has Shiny applications, please watch that presentation on the link there. At the end, I will include a link for my slides, and it has hyperlinks, PDFs, you know, HTML, you can get all the things from there. So don't worry about trying to screenshot or grab things right away.

Also throughout this presentation, do not be discouraged by the amount of testing that we are doing, or the approach that we are doing to testing. The testing that Shiny does is pretty exhausting, both mentally and also in like computing cycles as well. I think it's a bit overkill for virtually everything else. But as a Shiny developer, I'm trying to make sure that all of you have smooth user experiences, even when we are doing updates that are breaking things. And we want to make sure that everything works across packages, because local development may not be the same as across package development.

The early days: manual testing

Yeah, so use Shiny Test 2, you should be good, and that should cover most of it.

Okay, so yes, as Marlene said, yeah, tell me a story, Uncle Barrett. Well, all right. Back in 2018, I joined the Shiny team, and that was a fun year. And, you know, did a little bit of work on React Log, did a little bit of work on Leaflet , and then, okay, now we're ready to release Shiny. And I was like, where's the test suite?

And Joe looked at me and said, we have 150 applications, and what we're going to do is everyone on the team is going to stop work for two weeks, and we're going to manually test all 150 applications in as many places as we can do this. And if we find any bugs, what's going to happen is maybe in, like, second day of week two, if you find a bug that we think is critical, which most bugs are, we're going to restart this testing process and keep doing it until we can release. And so this is kind of like a motivation as to why Shiny didn't release all the time. It's because testing took, like, a solid three weeks, maybe four weeks. Sometimes it took six because we just kept finding more and more that would pop up. So it's very, very expensive.

And Joe looked at me and said, we have 150 applications, and what we're going to do is everyone on the team is going to stop work for two weeks, and we're going to manually test all 150 applications in as many places as we can do this.

And when we're testing, I know I was guilty of this. Like, if you're given a piece of paper to-do list of, like, install these packages and test these applications, what about, like, if I have another package that's not the latest Crayon version? What if I have, you know, something else? And, like, I know I installed, like, not the bleeding edge, but, like, back to commits that maybe contained the bug, and then I was testing things. And so I wasted two hours because I realized I was behind and I needed to come back to the latest version. And so there's no guarantee that everyone had the best testing environments, and it was really, really expensive for the company. And it was just like, oh, man, so much pain. I hope no one has to do this. I highly recommend ShinyTest2 to try to at least get out of manual testing.

Lesson two, a failure may not be your fault. But it's still my problem.

I know I've talked about this with ShinyTest2, but let's see if you can spot the difference on this one. I'm actually going to just switch slides and have the images overlaid. All right, so here's baseline and new, baseline and new, baseline and new. There's not much of a difference, but you can actually see that the baseline has like a boldish font in Alabama, and then the new has like a thinner font. So the font weight was reduced, and that was a system change. Like, can't fix it.

Test failure, test opportunity. Nah, that's test failure. So a recommendation is to actually use threshold. This will allow for minor differences when comparing screenshots within ShinyTest2. So in this code here, in the expect screenshot with ShinyTest2, just say threshold is like two or something less than five. If it was zero, then we would say the images have to be exact same. In the case of the modal dialogue, we are doing this because rounded corners were being different because they were rounding to the left or to the right sometimes, and so we just wanted to make sure that they were not throwing failures when comparing the images. Please check out the docs in ShinyTest2 Compare Screenshot Threshold. This will help you see what values you need to use or even what methods you can use to expose how different those images are.

Lesson three, test values that you can control. Yeah, Kaylee has totally said this in the movie. So test those values that you can control. So if there are dependency changes, you know, they will produce new output values like in Plotly. So Plotly will change, you know, its internals, and that's not your fault. And so really what you should be doing is possibly testing the dataset that's going into Plotly and assuming the plot will work. So use ShinyExportTestValues to expose those dataset values from your server side. They won't be exposed anywhere else except in testing mode, so you're safe to expose possibly dangerous things.

And then also, you can do snapshot excluding on your outputs, or you can pre-process your input or output, and I'll show you a quick demo of a pre-process your output, snapshot output. And this is, the snapshot is the JSON values that are saved to disk when you call expect values within ShinyTest2.

So running out of time, so that's why I'm kind of rushing here. So in a Plotly example from ShinyTest2 internals testing, we actually, this is adjusted for the demonstration, but what we do is normally you'd say output my plot, and you'd have Plotly, render Plotly, and you do your plot. Great. But we can then pipe this into a Shiny snapshot pre-process output, and what it'll do is it will provide that JSON object. So we're going to parse that JSON. We're only going to pull out the X data, the first data, and only the X and Y columns, and hopefully, we can get back the cars1 and cars2 for our X and Y data. And this way, it allows Plotly to do whatever it wants internally, and we're not testing against it. So it can hold, it can make changes freely without breaking our tests.

And finally, lesson four, routine maintenance is maintainable. I know this is a little bit of, like, tautology, but if you don't do routine maintenance, you might make a PR so big that GitHub won't let you merge it because you can't view the website. It finally rendered about 15 minutes later. It was a lot of fun to get that in.

So lessons learned. Minimize your screenshots, the count of them. Just because it failed doesn't mean it's your fault, but it's still your problem. And test values that you can control and perform routine test maintenance to keep on top of it.

Q&A

All right, Barrett, well done, as always. I especially love that Brad Pitt from Troy meme that's had me giggling for a second here. So we're going to be slim on time for questions, but the first one I wanted to ask comes from Eric Hans. What are your thoughts on other services that offer similar capabilities, like GitHub, sorry, GitLab CI? Well, I'm just trying to see how much of this is, like, GitHub action specific, and could this be generalized to other services?

I'm not fully familiar with GitHub or GitLab CI. It is very similar. I know Mr. Novartis uses it thoroughly. I have run into issues with promote on it, but I can't reproduce it, can't debug it, so it's really frustrating. But that also happens with GitHub actions. So if it's integrated in your system and it's less friction for you, by all means, use it. If you're automating things, that's the goal here. Less manual work, more automation.

Okay, makes sense. And we also got a comment from Paul Ruki. I can see that ShinyCore CI is testing logistics. Where are your actual tests? They are actually within each app. So if you're on the repo, it's inst apps, like, let's say, 001 hello, and then there's another testing file structure within that app itself. That's actually where I'm going with package helping later, or package enhancements for ShinyTest2 later. Those tests may move into your package structure, but for the most part, ShinyCore CI, let's just pretend it's custom and not what I will recommend for users going forward.

Okay, great answer. All right, Barrett, I wish we had more time for questions because your input is always incredibly valuable. So I'll just go ahead and end and say, you know, thank you once again for joining us at this 2023 Shiny Conference. And for everybody in the audience, we have one more speaker, it's going to be Peter Solomos. And we'll be back in just one minute with that conversation. Thanks, Peter. Thank you so much.

Barret Schloerke: Lessons Learned Testing 2500+ Shiny Apps Every Day

Transcript#

Background on testing

The early days: manual testing

Formalizing the process in 2020

GitHub Actions and automation

Testing Shiny in 2023

Lessons learned

Q&A

Featured software#

leaflet

learnr

plumber

reactlog

rstudio

Shiny

shinyloadtest