Resources

Olga Mierzwa-Sulima | Best Practices for Developing Shiny Apps | RStudio

From rstudio::global(2021) Shiny X-Sessions, sponsored by Appsilon: Best practices for developing Shiny apps presentation covers organizing app's code with modules and R6 classes, setting up development environment, and testing. About Olga Mierzwa-Sulima: Olga is experienced in production applications of analytical solutions, especially for FMCG companies. Recently she developed a price elasticity model for Unilever. Learn more about the rstudio::global(2021) X-Sessions: https://blog.rstudio.com/2021/01/11/x-sessions-at-rstudio-global/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So now let's dive into the best practices for developing shiny apps. So actually you can build apps that last. And in my talk, I'm going to cover T areas, which is organizing a shiny app code, organizing your development environment, and organizing your testing.

Since we have limited amount of time, this will be a high-level overview talk, and I'm going to introduce concepts and tools you should be using and tell you why. So if you are not familiar with them, you can explore them in more details after the talk.

Organizing app code with Shiny modules

So how should you organize the code inside your shiny app? First answer is use shiny modules. So shiny modules let you decompose your application. So you can follow a don't repeat yourself coding principle. They offer encapsulation, modularity, reusability, and allow you to test the whole components. They allow you to organize your app in the server and the UI part, which in essence mean that you can define your own pair of server and UI logic and use it inside your app or embed it in the other module.

So secondly, you can also organize your code using R6 classes. And an R6 class is a modern and fast and simple implementation of object-oriented programming in R.

You can look at it as a more organized shiny module. And R6 classes introduces a clear system of getting the current state of the piece of functionality, the operations that can be performed with it, the auxiliary functions, and the initial state of the object that can depend on the particular user rights, which might be actually super useful.

So how do you organize your code using R6 classes? So you can keep each class in a separate script with a name similar to the class name. You can store the script in a folder separated from the other code. You can import the code using a package or as in here in the example, using a use function from the modules package. And then you can initialize the class with a new method, which is ultimately available for all the classes and attach it to some object.

And the third option is to use modules from the honey. And I would like to avoid some confusion with shiny modules. So the modules I'm going to talk right now is something completely different from shiny modules. So the modules are the organizational unit for the source code, and they can contain shiny modules or R6 classes or just functions you define.

And the cool thing about them is that they enforce rigor when defining the dependencies. It means that you have to explicitly declare which functions or packages your code is using and you have to import them.

And you also have a full control of the things that are available outside the module because you have to export them using the export function as shown here in the example.

And additionally, they have a local search path and they can be used as a subunit within the package or in scripts.

So to put it all together, shiny modules are six classes and modules. So shiny modules, they let you reuse UI and the server part of the components. R6 classes, they let you build objects that have a single responsibility and implement given logic. And behind modules, they encapsulate dependencies and let you organize files that could be shiny modules and R6 classes as a separate unit.

Organizing the development environment

So, OK, we covered how we organize the code. So now let's talk about the development environment and how you can organize it.

So continuous integration to ensure quality and automate checks, you can run lint and test automatically for every push change. And especially for tests, you probably will forget to run them every time. So just automate it so you don't think you don't have to think about it.

And additionally, you can automatically add a pull request template that can contain a checklist for a pull request creator and the reviewer to every pull request. You can push, you can block pushing directly to master and block pull requests, merging pull requests if the test fails.

So here are some popular continuous integration options, Bitbucket, CircleCI, Travis Jenkins and GitHub Actions. So I'm sure you will find something that will fit your technology stack.

And remember, you don't have to start from scratch every time and you can use a project template for it.

So at Epsilon, we have an internal project pattern with a Shiny boilerplate template and we use it to initialize the repository structure if we start a new project. And it contains a simple Shiny app with modules and Shiny modules set up. It also contains a sample unit test and the CI with Lint and automatic unit tests set up.

OK, so how actually your development environment should look like? So let's start from the other way. So what happens if you don't take care of it?

So development environment is a crucial part of working in R and it's not only Shiny specific. So otherwise, if you don't take care of it, you can end up with the same code giving different results on different machines. Other projects can be affected as in the global environment, shared package can change and crash unrelated projects. You can imagine the deployment is long and difficult as establishing infrastructure is challenging if it's not being tracked. And last but not least, team will waste time setting up a new environment rather than jumping straight to work.

So this might not be a problem when you're starting a project, but imagine that you have to urgently add a new team member in the middle of one.

So to solve all those problems, we recommend a following setup. So you can do a development in RStudio running inside the Docker container, which is fixed and dedicated per project. And you can leverage Docker and RENV package to control underlying system, system dependencies and R packages. And Docker and RENV together make team collaboration easy.

So all the changes to Docker file and RENV log file that records the packages used in the project are committed to the Docker repository and to the code repository. For example, GitHub that your team is using.

On the other hand, the previous solution still requires some level of DevOps skills. And what we observe with our clients, especially the enterprise one, they choose to work with RStudio Server Pro that allows their data science team to purely focus on delivering value using their core competencies in a highly secure and flexible environment.

Obviously, here you still want to use RENV to control the R dependencies and keep them separate for every project so different team members can easily collaborate.

Testing

So the last bit I'm going to talk is testing and my message. And my message here is just do it. The only question is what test and in what proportion you should be writing. And we will use a testing pyramid to answer that.

So you should aim to have the most unique tests and data validation. The next test up components, aka test your Shiny modules, test the whole scenarios with end to end testing, perform load tests and finally test usability with user interest views.

And my message here is just do it. The only question is what test and in what proportion you should be writing.

And now let's quickly talk about the tools you have available.

So for unit tests, I'm sure you all know the test package. So just use it. But please set up your architecture early and don't tell yourself an excuse that you will do it later because that will probably won't happen because your Shiny application would be awesome and your clients or your business partners would like to move on to something else. So start each project with test architecture. You already know that you can have a project template to set up at the beginning. You also know that you can use a continuous integration to triggers the test automatically so you don't have to do it manually.

And please write at least some tests for each piece of code and it would be easier to expand them later. And obviously, keep your standards high. So do not accept requests that don't include tests or actually break them.

So now it's part for a data validation. And this is the this is the logic. This is the functionality that you should be doing on the same level as the unit tests. So as we all know from our experience, data can be very messy. And actually Kaggle users reported that this is their biggest pain, the dirty data.

So I want to introduce you to a data validator package, which is a tool for creating reports based on our open site assert our results. And it allows you to create user friendly reports that can be generated automatically, for example, with RStudio Connect.

So how can you use data validator package introduction? Here's the example workflow. So you can run RStudio Connect scheduler and the scheduler can source the data from a database and validate the results based on the validation results. A new data validator report is created and we have two scenarios here. So the validation fails. In that case, a report can be sent to responsible stakeholders so a responsible person can actually take action and fix things. And obviously, on the other hand, we also have positive scenarios. So everything is great. So the Shiny app can be refreshed with the new data.

And the next step that I'm going to cover is end-to-end testing. So the phase when you are actually testing the full components in your app. And luckily, we have the Shiny test package from RStudio and that it's easy to use for developers. You can record the scenarios and this provides a quick and simple way to test your Shiny apps. On the other hand, for more complex apps, we recommend using Cypress. But please be aware that it requires knowledge of JavaScript, but it's really, really powerful.

And we reached the point that your application actually is ready to go live. So you want to make sure that it's able to handle the traffic. So here you can use Shiny load test package that enables you to load test your deployed apps. You can estimate how many users your app can support. You can identify bottlenecks and you can use the test output to guide you through the changes if the optimization is actually necessary.

OK, and the last bit that you really have to think about. So you also want to make sure that what you're actually building is useful for your users. So please talk to them.

And you don't need to have a big budget to do that because you can always use the hallway tests. And the idea here is that you can show the application to your colleagues or to end users. And let them click through it and they can describe what they are doing and what they understand is happening. And research showed that if you do it to five people, you will learn around 85 percent of the usability problems. And obviously, if you don't do it at all, you will learn zero problems.

And research showed that if you do it to five people, you will learn around 85 percent of the usability problems. And obviously, if you don't do it at all, you will learn zero problems.

So the things I want you to take home, organize your code with Shiny modules, R6 classes or modules, depending on app complexity. Use CI to run lint and tests. Use Renv to control R dependencies and automate code testing and data validations. Thank you very much.