Tiger Tang | Saving 1,000 hours with RStudio: selling R in your workplace | RStudio (2022)
There are many benefits to using R and no lack of packages that help you solve technical difficulties, but you may still get stuck at selling it to decision-makers or implementing it at work. Tiger's recommendation is to start a project that focuses on automating work with R and gets everyone involved. Once the value of R has been established, selecting RStudio Workbench and RStudio Connect for streamlining tasks would not be a difficult choice. Several years ago, Tiger's organization moved away from SAS in favor of R for modeling projects, but there wasn’t much initiative taken company-wide to move everything to a new tool. To help change that, he started a work automation project using R that has saved 12K+ hours of manual work. In this talk, he will share the key parts of the project, lessons learned, and a structure you can follow if you would like to do something similar in your organization. Talk materials are available at https://tigertang.org/rst_conf_2022_talk/ Session: Take a sad process and make it better: project and process makeovers
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Good afternoon. So, many of you probably have seen or tried to help a workplace adopt R.
So this is the typical case I see. Your company hired an R trainer, went over a lot of fancy things possible in R with the team. After the training, everyone finds R very cool. Tidyverse, amazing. R Markdown, awesome. Shiny, fantastic. But then after a few days, most people just went back to the go-to tools. I'd rather not call it out, but something that's not R.
Several years ago, my company moved to R, and we faced the same situation. To help change that, I initiated a work automation project with R, and now we have 20-plus team members who consider R the go-to tools, 60-plus stakeholders who use our product on a daily basis, and this has saved over 12,000 hours and counting. In the next 15 minutes, I'm going to talk about key parts of the project, lessons learned, and the structure you could follow if you would like to implement it.
Why R adoption stalls after training
Now, let's revisit the situation. We want to move things to R, but why aren't there many movements? Well, we all have learning curve as beginners. Some of us went through it quicker. More of us don't. The other scenario I can think of is that after the training, we are not asked to handle the right tasks. Some of those tasks might not even be covered in the training. They may look similar, though.
Even if you were able to avoid the first two, you may still end up in the last situation where you have a grand plan of what to do with your project using R, but have to go back to the old ways because your deadline is the same. Now, unfortunately, before realizing any of these, we assumed everything would work after the training. When it didn't, we just thought we need to work harder on the next one. In fact, in one of those follow-up training I provided, I even remade the lion king song into can you use the R tonight, summarizing the R functions we talked about, but it still did not help.
It turns out what we need is to create a soft landing that will start small and help the team to build confidence, identify the task for them, and plan the effort for the transition. My soft landing idea then was to build a work automation project using R, and next I'm going to talk about the six key parts of the project, and in the end, you will see why that was perfect for this situation.
What's possible with R automation
So to build an automation project, first, let's look into what's possible. At that time, I was a member of a data team, and for most data teams on this planet, you cannot escape from reports. So let's take a look at a typical report. If we can describe it step by step, like traveling on a map, you first open your map, maybe it's a SQL database, data lake, all that. Then you update your parameter and run. Then you wait. Export the data, clean the results, analyze it, update the format, draft an email, and send.
But then if you look again, it almost looks like all of these steps can be divided into three portions where we just use different groups of application. For some of those reports, I would start with Oracle, move on to Excel, and then end it with Outlook. For other reports, I would just use other applications. But regardless of the difference of those applications, we all seem to start from getting the data. Then do the data wrangling, analysis, visualization, and end it with communication. So if you have a report that has any parts that fits any of those three portions, we can replace that portion with R code with the help of those R packages. And the good thing is, there are so many of these R packages.
Now, this makes sense for the reports that would take some efforts. But how about the processes whose pain point was running the task itself and sharing the result on a daily basis? How about the stakeholder requests where we just require you to run a SQL query and share the data? Well, they still fit in the scope of this automation project. We just need a little bit more help.
When defining automation, there are three types. So the first type are the one perfect for the reports that will still need some human involvement. And that would be perfect to be done with R code. The second type are the ones that are good for the process that does not necessarily need human input, but will need to happen at a certain time. And that would be best with R plus RStudio Connect. Lastly, also my favorite, a combination of the previous two. In this case, the human input will come from stakeholder, and they can kick off whatever processes that will get them the answer with the help of Shiny.
Selling the project to decision makers
Now, if we look at these definitions, none of those are new. But the current R ecosystem was able to offer a brand new interpretation. So after knowing what can be done in the project and what the current R ecosystem can help you with, time to sell it to decision makers.
And whenever you're selling something, you want to talk about its benefits. So what are the benefits here? Reproducibility. The first word popping in my head. Then we may get last human error if we code everything correctly. And if we were able to achieve the first two, we were able to be able to save some time.
So I got a few decision makers in the room, presented them my plan of automating all the tasks identified in step one, along with the benefits here. If I could replicate the third interest level using a gauge here. So when I first talk about reproducibility, I gave them an example. If I'm out of office, someone else can run all the tasks that I'm handling. Their interest level wasn't very high. Maybe because it sounded very similar to a travel insurance, where it is not urgent at the moment. You would only miss it after things happen. Then I moved on with last human error, but not additional interest because maybe they were thinking it is all theoretical and code is written by human, too. Lastly, hours saved. More interests? And then they even asked me if I have any numbers. But overall, I did not get the go ahead.
This is when I realized I was so ready and so prepared to sell this whole thing to our user, who is more concerned about the day-to-day workflow, but not decision makers who have no programming experience and who is more concerned about the return on investment as well as business urgencies.
So I waited a few weeks, just after everybody forgot about what I said, and updated my strategy and this is what I presented. So I started with hours saved. In fact, I told them if we go through with this, we might be able to save 1,000 hours per year. They're interested. Then I moved on to last human error. If we code everything correctly, they're still intrigued. Lastly, I sold reproducibility like a free add-on travel insurance. Who would have loved that? Overall, we got the go ahead. But then if you look back, it is still the same number of benefits, just in a slightly different order. I guess sometimes it makes sense for us to start with the one that does not require too much context to understand. And now you see why I named the talk saving 1,000 hours with RStudio, not bringing reproducibility with RStudio. If that was the talk name, I don't know if half of you would show up or even I would show up.
I guess sometimes it makes sense for us to start with the one that does not require too much context to understand.
Gathering requirements and ranking tasks
So after getting the decision makers buy-in, time to get the things we need to do the actual automation. High level, there are two things. A document of the current process so that we know what needs to be done and information for us to compare all the tasks so that we know which one to start with. To get the documentation, we will need to understand the current processes. Well, you may say, wouldn't that just need tools and steps? Mostly, yes. But we will also need to understand the business reasons so that we can accommodate any changes that may end up making the process better. We will also need to understand the occurrence, whether it happens daily, weekly, or ad hoc so that we can choose the best platform or tool for it. Lastly, we tend to forget communication is part of the automation. And I would always recommend saving a few communication examples in your document.
On top of that, we also want to know the current efforts in terms of the overall time for each run before the automation. And what's the manual versus processing time breakdown and identifying the tough items through this process. Ideally, the automation should always work if we code things right. But oftentimes, it is not ideal, which is why it is critical for us to know how often should we update the process of the document so that we know when it will be obsolete. And oftentimes, we're not always the original report owner. So, we need to know when to stop and call for additional help.
Now, all of these will get us a detailed document of the requirements. Of course, it's always a great opportunity to practice our markdown. But other than the document itself, we now know the complexity of potentially automating this process. The impact, as well as the stability, which would allow you to rank all your tasks in a table that looks like this. So, if you're wondering which one to start, you may not want to start with task number one, where it is complicated as small impact and may require you to update the code every other week. You may want to consider starting from automating task number three, where it is not overly complicated, earning you some recognition, and there's less need for you to update the code. And then, maybe you can move on to the harder ones.
Doing the actual automation
After knowing what is needed, identifying the task to start with, time to roll up the sleeves. Now, to do the actual automation, there are so many things to cover here. In fact, I've been working on an automation book trying to cover all the common scenarios using two dozens of chapters. So, I will just be brief here with my top three recommendations.
My first recommendation is to always start with components. Let's just say if you have a process that separately involves SQL, Excel, and Outlook, you want to code them one by one because within the same team and organization, the same processes will involve the similar components where you can just reuse the code. My second recommendation is that we should definitely do plenty of tasks to capture all the scenarios possible. Whether it be dependency tasks, user tasks, dev tasks, unit tasks, we should do all the applicable ones. I know everybody trusts their own code. I do, too. But the reality has taught me to trust the tasks even more.
Lastly, be practical and stay on target. I know it feels great to be able to build all solutions within R. But the thing is, not everything needs to be fully automated. At the end of the day, it is not about building something cool with R, but building something impactful with R. I've got lost in that so many times. So, let me say this again. It is not about building something cool with R, but building something impactful with R.
It is not about building something cool with R, but building something impactful with R.
Keeping the project going
Now, after the coding task is all done, we can hand off the process and move on. But there are still things we need to look out for in order to keep the project going. Overall, there are three deadly situations that could affect your project.
The first situation is that after you handed off your automated processes, someone ran the code. But the result isn't what's expected. Maybe because the script wasn't handled properly, like running it on a new machine or a new environment without the proper setup. Believe it or not, after all these years, it still happens to me.
So, to avoid this situation, what I would recommend is to always have a handout document that discusses the requirements of the job, instructions of running the process, testing the process, and what to do to maintain. The second scenario is that as you build more and more automated processes, chances are multiple processes may run into issues at the same time. And sometimes someone will come up and need your help to fix it. Sometimes multiple people will show up, which could look pressing. So, my tip for you is this. Always discuss the fail safe in your handoff document. In most cases, a line like this will do the trick. This is almost like when the autopilot stopped working. You don't stop flying or driving. You switch back to manual. And this often reduces the urgency and gives us the time to properly fix the issue.
Lastly, everything went well. You just got a new feature request. Should you directly jump in? And my recommendation here, well, two recommendations. The short term is that always treat it like a brand new task so that you can start from gathering requirements and best determine when to work on it. And then the longer term solution here is that it is also a perfect opportunity for you to train your individual team members to gradually take on the task and accumulate more knowledge.
Sharing progress
Now, after you're comfortable with dealing with those three situations, we just need to give it time for the project to make progress. And just like any of the other projects, from time to time, you want to share updates. And to do that, I often find it helpful to start with the key stats. For example, hours saved, which is something you can easily extract from the requirement document that you have built in the previous steps. And the number of process documents now available because of this project. As well as the all the training that happened because throughout this project, whether it would be the official one or the one on one sessions that you had with your team members. On top of that, you also want to share any of the success stories, learnings that happened along the way. And maybe give kudos to your team member that you collaborated with. And lastly, you don't want to forget talking about the hurdles where you could use additional help on. Other than that, you also want to determine a good cadence to share progress. For me, it was once a month. But you should always set up your own and know that you can share progress whenever you have critical updates.
A structure you can follow
Now, that wraps up the key parts of the project. If you would like to start something similar at your workplace, this is a structure you could follow. First, start from identifying the task in your workplace. Then, build your proposal with the benefits that would matter to your decision makers and workplace. After that, build a requirement document with possibly with R Markdown, identify the right task to start with. Then, code by component. Do plenty of tasks. And stay on target. While at the same time, try to stay away from the three deadly situations. And don't forget to share progress from time to time.
Now, I would like to ask everyone to think back about the original situation. As we build more and more automated processes, we have completely moved on from the original question of why the R adoption rate isn't what's expected after the R training to now having most team members owning several of these R processes where they were at least involved in building the requirements, testing the process, and execution. Rather than expecting everyone to connect the R functionalities to the business needs through the R training, we connect the business needs to the R functionalities through this project. And we just happen to save thousands of hours.
Rather than expecting everyone to connect the R functionalities to the business needs through the R training, we connect the business needs to the R functionalities through this project.
Now, if you're a decision maker in this room, this might be a good project to make some impact and accumulate some R knowledge. If you're a team member in this room, bring this back and start from identifying the tasks. If you're one of the two team members in this room, text your boss now and say you have an idea. Thank you.