Sean Lopp & Rich Iannone | Avoid Dashboard Fatigue | RStudio (2020)

Transcript#

This transcript was generated automatically and may contain errors.

Thank you all for taking some time out of your day to be with us. Really looking forward to talking about dashboard fatigue and how to avoid it. And then just to reiterate what Rob said, because it's always our perennial question, the recording will be available. And then we're going to be going through some slides as well as some code today. That's available already. If you want to go to this link to follow along, it's a GitHub repository that has the slides as well as the code.

So dashboard fatigue, what are we talking about? Before we can address kind of that question, we actually need to take a step back and think for just a second about what the goals of a data science team really are. And at the end of the day, at RStudio , we believe that the main objective of a data science team is to help organizations and individuals make better decisions that are informed by data. And so that's a really admirable goal, but it comes with its set of challenges. And one of the challenges that we've seen time and time again is that there's often a gap between data scientists and those stakeholders they're hoping to inform.

And that gap a lot of times is the result of different mediums. So data scientists can be discovering insight, writing code in R, Python, whereas decision makers often are spending their time either in meetings or in their inbox or perhaps within spreadsheets. And more and more so, they're doing all of that kind of on the go as they take care of a thousand things at once, often on their phone. And so for data scientists and data science teams, we have to find a way to bridge that gap so that our analysis can still be engaging and inform and ultimately drive those decisions.

Ways to bridge the gap

And there's a few different ways that we've seen teams kind of close this gap in practice. So one way that at RStudio we're big fans of is building dashboards. And so with packages like Shiny in the R ecosystem or Dash, for example, on the Python ecosystem, you can create these interactive dashboards that are more engaging for those stakeholders. So they can explore your analysis, they can ask questions, and they can kind of have an understanding that then allows them to make better decisions. So that's one kind of practical and very popular way to close that gap.

Of course, there's other ways as well. A lot of time and energy has been spent explaining how data scientists can put their models into production. That often is in the form of an API so that other systems and services can help take advantage of those insights. And then sometimes here we might resort to doing things like modifying PowerPoint presentations or copying and pasting results in kind of more manual ways. So these are all different ways that data science teams can bridge that gap between mediums to make their work more accessible to decision makers.

Dashboard fatigue

But what we've seen is that sometimes, especially for those really awesome, interesting, exciting dashboards, data science teams can fall victims to their own success. And so what do I mean by that? Well, a few weeks ago on Twitter, Angela Bassa, who leads the data science team at iRobot, they make those robotic vacuum cleaners, she asked this question on Twitter. She basically said, how do you keep track of all of your dashboards? And this led to a really interesting kind of exchange that I want to show just a few of the responses that she got.

So one response is that you could go about making a dashboard of all the dashboards, so kind of a meta dashboard. And this was actually the most popular response to the thread, a very kind of data science way of thinking about solving this problem. Maybe I can use the same hammer that I used to solve the original problem and just make more dashboards. Another answer was this one I found pretty interesting. A lot of teams will create kind of an index of the work. Maybe that is a spreadsheet with links to dashboards. Maybe it's a page on like Confluence or SharePoint. And of course, the challenge there is that you have to hope that you remember to keep it updated.

And then I think this answer was probably the most honest, which is that a lot of times, there's not a great response. Data science teams can produce all of these interesting dashboards, but there's no great way for others to interact and keep track of them. And ultimately, this is a problem because it kind of impairs our ability as data scientists to close that gap and inform those decisions. Often what we'll see is a really great dashboard will be created and users will engage with it once. But over time, in the coming days, weeks, and months, the dashboard falls by the wayside, especially if there's many of these dashboards.

Often what we'll see is a really great dashboard will be created and users will engage with it once. But over time, in the coming days, weeks, and months, the dashboard falls by the wayside, especially if there's many of these dashboards.

It's just a pretty big burden to ask for your decision makers every day to go play around with the dashboard to keep track of what's going on. And so that creates a problem for us is that if we just wait for decision makers to proactively go visit our work, we're not actually closing that gap. But that's one problem. We also see something else kind of interesting happen as data science teams try to close this gap.

And so one thing, as we said, is that maybe the dashboards fall by the wayside. Another thing that can often happen is that stakeholders will get excited. They'll say, great, this insight makes sense. Can you bring it to my medium? Can you copy and paste those predictions into my spreadsheet? Or can you send me a screenshot of that plot so I can put it in my presentation? And that leads to a lot of painful, labor-intensive work.

One of the interesting things over the last couple of months is that I got to work at home with my wife, who is also an R user. And one day, looking over her shoulder, we kind of had this exchange where she was doing exactly what I just described. She was taking the results of some R code and plugging them into a spreadsheet for her boss. And me, being the RStudio fanboy, kind of said, why don't you create a Shiny app? That'd be really awesome. It'd save you all this time. And it could be a big win for your organization.

And I wish I could have captured the look on my wife's face in her response here. What she basically said was, my boss, Dave, has been looking at the same spreadsheet every week for years. Any one who tries to make changes gets decimated, basically. And so she was kind of stuck in this equally problematic scenario. I mean, at least her boss was engaging with her work. But it was in a way that was really killing her productivity. We often refer to these as fire drills, where you have to answer ad hoc questions that require a lot of manual, labor-intensive work, especially on the formatting side, if you're copying and pasting results into different mediums.

And so to summarize the intro here briefly, data science teams often can find themselves trying to thread this challenging needle, where on the one hand, you might put a lot of effort into something really awesome, like a dashboard, only to see that it kind of falls by the wayside. On the other hand, you spend all of your time kind of in these manual, ad hoc fire drills. Neither of those are great ways to, in the long run, close that gap between data scientists and decision makers.

With code, the answer is always yes. But no matter what it is, the actual implementation of sending a notification on a condition boils down to a really simple if statement that I think is kind of at the heart of how powerful this workflow can be.

And then finally, I mentioned we don't want to introduce data science fatigue. So we don't want to use these tools and then still end up in a world where every day we have to press play on rendering these reports and sending out these alerts. And so we're going to talk a little bit about how to automate this workflow. Now, there's a lot of open source ways that you could go about doing this. You could use something as simple as CronTab to manage the execution of these reports. Or perhaps you could use a more advanced scheduler, something like Airflow. But in our case, we're going to look at RStudio Connect, which is a professional product that RStudio kind of sells.

And what RStudio Connect does really well is it takes care of some of the intricacies of putting R into production. So you don't have to worry about the different packages that your report might need. You don't have to worry about logging failures, about authenticating users, about securely sending email, or scaling out the system. And so it's a really powerful way to kind of adopt a lot of these automation features without having to reinvent them all yourselves.