Emily Riederer | RMarkdown Driven Development | RStudio (2020)

Transcript#

This transcript was generated automatically and may contain errors.

All right. Hi, everyone. Thank you so much for coming. My name is Emily Riederer and I'm here today to talk about RMarkdown Driven Development. I assume many of us in this room are already familiar with RMarkdown and what an amazing tool it is for literate programming and combining both code and narrative in plain text to create an amazing variety of different types of outputs. But something I'm particularly fascinated by is how RMarkdown can also be used as a prototyping tool for more advanced analytical tools.

I contend that any analysis that you've created in an RMarkdown document actually contains a latent underlying implicit package or analytical tool that is custom tuned to solve your domain specific workflow from everything from dealing with custom nuances in your data to making the specific types of deliverables that the customers of your analysis need. And the goal of RMarkdown Driven Development is to take this implicit tool and make it an explicit one.

I contend that any analysis that you've created in an RMarkdown document actually contains a latent underlying implicit package or analytical tool that is custom tuned to solve your domain specific workflow from everything from dealing with custom nuances in your data to making the specific types of deliverables that the customers of your analysis need.

We'll do this by harvesting all of the assets that you've already created in that implicit tool. For example, from the development side, you've already probably had to go through and curate a wide set of packages that relate to your problem and play nicely together. And you hopefully have developed in your one-off analysis a consistent set of code that works to solve your problem and ideally is at least somewhat well tested and well styled.

But beyond pure code development that we have to work with, we also have a lot of great design elements already in place. One of the hardest parts generally of creating a software product is understanding user requirements and getting deep empathy for user needs. But as the person that did the original RMarkdown analysis, you get all of that for free. You understand how all of the pieces of analysis need to come together, a sane workflow for doing the work and processing your data, and again, answering those questions that are important to the people that you want to share your analysis with. And finally, you have already a complete and compelling marketing example of how your latent tool goes into use and can solve real-world problems in the wild.

So there are five main steps to our RMarkdown driven development that I want to talk about today. And along the way, we'll talk about how these steps can result in many different types of outputs and many different types of analytical products. You can think about producing well-engineered single files, projects, or packages. And while I depict this as a spectrum, I think it's important to call out at the beginning, there's not necessarily a clear value judgment in here. There's no better or worse in terms of quality, user experience, or your talent as a coder, depending on what part of the spectrum you stop at. Really, all it is, and we'll see as we go through this, is whether you want to tune your final analysis tool more to solving another instance of your very specific problem or a tool that solves more of a generic class of problems.

So in summary, no matter which path you choose, your original R Markdown analysis can be a great starting point for a very wide variety of data products. And by leveraging all the work you did in conducting your initial analysis, you're much closer than you may realize to building a very sustainable and empathetic data tool.

Thank you, Emily. That was excellent. We have time for just a couple questions, maybe just one. The most popular is, how does this workflow change if the end result is an automated scheduled job?

I think that probably can vary in a couple of ways depending on how complicated the job is. I know I've been reading about and learning about RStudio Connect some recently, and I think that actually has a lot of good options for automation based purely out of that original R Markdown. So I think that could be another good use case where the one-file R Markdown approach, even if it is handling kind of the full ETL process, is actually a good solution because it can still, some of the rerunning problems aren't really a problem because rerunning your work is actually what you're trying to accomplish. But for some more complicated batch processes, even if you're automating them, you'd still probably want to go all the way and build the package and then go back and use the tools you've created to build the job if you want to have more robust and well-tested and well-documented components to that batch job.