
Sean Lopp & Rich Iannone | Avoid Dashboard Fatigue | RStudio (2020)
Data science teams face a challenging task. Not only do they have to gain insight from data, they also have to persuade others to make decisions based on those insights. To close this gap, teams rely on tools like dashboards, apps, and APIs. But unfortunately data organizations can suffer from their own success - how many of those dashboards are viewed once and forgotten? Is a dashboard of dashboards really the right solution? And what about that pesky, precisely formatted Excel spreadsheet finance still wants every week? In this webinar, we’ll show you an easy way teams can solve these problems using proactive email notifications through the blastula and gt packages, and how RStudio pro products can be used to scale out those solutions for enterprise applications. Dynamic emails are a powerful way to meet decision makers where they live - their inbox - while displaying exactly the results needed to influence decision-making. Best of all, these notifications are crafted with code, ensuring your work is still reproducible, durable, and credible. We’ll demonstrate how this approach provides solutions for data quality monitoring, detecting and alerting on anomalies, and can even automate routine (but precisely formatted) KPI reporting. Webinar materials: https://rstudio.com/resources/webinars/avoid-dashboard-fatigue/ About Sean: Sean has a degree in mathematics and statistics and worked as an analyst at the National Renewable Energy Lab before making the switch to customer success at RStudio. In his spare time he skis and mountain bikes and is a proud Colorado native. About Rich: My background is in programming, data analysis, and data visualization. Much of my current work involves a combination of data acquisition, statistical programming, tools development, and visualizing the results. I love creating software that helps people accomplish things. I regularly update several R package projects (all available on GitHub). One such package is called DiagrammeR and it's great for creating network graphs and performing analyses on the graphs. One of the big draws for open-source development is the collaboration that comes with the process. I encourage anyone interested to ask questions, make recommendations, or even help out if so inclined!
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you all for taking some time out of your day to be with us. Really looking forward to talking about dashboard fatigue and how to avoid it. And then just to reiterate what Rob said, because it's always our perennial question, the recording will be available. And then we're going to be going through some slides as well as some code today. That's available already. If you want to go to this link to follow along, it's a GitHub repository that has the slides as well as the code.
So dashboard fatigue, what are we talking about? Before we can address kind of that question, we actually need to take a step back and think for just a second about what the goals of a data science team really are. And at the end of the day, at RStudio, we believe that the main objective of a data science team is to help organizations and individuals make better decisions that are informed by data. And so that's a really admirable goal, but it comes with its set of challenges. And one of the challenges that we've seen time and time again is that there's often a gap between data scientists and those stakeholders they're hoping to inform.
And that gap a lot of times is the result of different mediums. So data scientists can be discovering insight, writing code in R, Python, whereas decision makers often are spending their time either in meetings or in their inbox or perhaps within spreadsheets. And more and more so, they're doing all of that kind of on the go as they take care of a thousand things at once, often on their phone. And so for data scientists and data science teams, we have to find a way to bridge that gap so that our analysis can still be engaging and inform and ultimately drive those decisions.
Ways to bridge the gap
And there's a few different ways that we've seen teams kind of close this gap in practice. So one way that at RStudio we're big fans of is building dashboards. And so with packages like Shiny in the R ecosystem or Dash, for example, on the Python ecosystem, you can create these interactive dashboards that are more engaging for those stakeholders. So they can explore your analysis, they can ask questions, and they can kind of have an understanding that then allows them to make better decisions. So that's one kind of practical and very popular way to close that gap.
Of course, there's other ways as well. A lot of time and energy has been spent explaining how data scientists can put their models into production. That often is in the form of an API so that other systems and services can help take advantage of those insights. And then sometimes here we might resort to doing things like modifying PowerPoint presentations or copying and pasting results in kind of more manual ways. So these are all different ways that data science teams can bridge that gap between mediums to make their work more accessible to decision makers.
Dashboard fatigue
But what we've seen is that sometimes, especially for those really awesome, interesting, exciting dashboards, data science teams can fall victims to their own success. And so what do I mean by that? Well, a few weeks ago on Twitter, Angela Bassa, who leads the data science team at iRobot, they make those robotic vacuum cleaners, she asked this question on Twitter. She basically said, how do you keep track of all of your dashboards? And this led to a really interesting kind of exchange that I want to show just a few of the responses that she got.
So one response is that you could go about making a dashboard of all the dashboards, so kind of a meta dashboard. And this was actually the most popular response to the thread, a very kind of data science way of thinking about solving this problem. Maybe I can use the same hammer that I used to solve the original problem and just make more dashboards. Another answer was this one I found pretty interesting. A lot of teams will create kind of an index of the work. Maybe that is a spreadsheet with links to dashboards. Maybe it's a page on like Confluence or SharePoint. And of course, the challenge there is that you have to hope that you remember to keep it updated.
And then I think this answer was probably the most honest, which is that a lot of times, there's not a great response. Data science teams can produce all of these interesting dashboards, but there's no great way for others to interact and keep track of them. And ultimately, this is a problem because it kind of impairs our ability as data scientists to close that gap and inform those decisions. Often what we'll see is a really great dashboard will be created and users will engage with it once. But over time, in the coming days, weeks, and months, the dashboard falls by the wayside, especially if there's many of these dashboards.
Often what we'll see is a really great dashboard will be created and users will engage with it once. But over time, in the coming days, weeks, and months, the dashboard falls by the wayside, especially if there's many of these dashboards.
It's just a pretty big burden to ask for your decision makers every day to go play around with the dashboard to keep track of what's going on. And so that creates a problem for us is that if we just wait for decision makers to proactively go visit our work, we're not actually closing that gap. But that's one problem. We also see something else kind of interesting happen as data science teams try to close this gap.
And so one thing, as we said, is that maybe the dashboards fall by the wayside. Another thing that can often happen is that stakeholders will get excited. They'll say, great, this insight makes sense. Can you bring it to my medium? Can you copy and paste those predictions into my spreadsheet? Or can you send me a screenshot of that plot so I can put it in my presentation? And that leads to a lot of painful, labor-intensive work.
One of the interesting things over the last couple of months is that I got to work at home with my wife, who is also an R user. And one day, looking over her shoulder, we kind of had this exchange where she was doing exactly what I just described. She was taking the results of some R code and plugging them into a spreadsheet for her boss. And me, being the RStudio fanboy, kind of said, why don't you create a Shiny app? That'd be really awesome. It'd save you all this time. And it could be a big win for your organization.
And I wish I could have captured the look on my wife's face in her response here. What she basically said was, my boss, Dave, has been looking at the same spreadsheet every week for years. Any one who tries to make changes gets decimated, basically. And so she was kind of stuck in this equally problematic scenario. I mean, at least her boss was engaging with her work. But it was in a way that was really killing her productivity. We often refer to these as fire drills, where you have to answer ad hoc questions that require a lot of manual, labor-intensive work, especially on the formatting side, if you're copying and pasting results into different mediums.
And so to summarize the intro here briefly, data science teams often can find themselves trying to thread this challenging needle, where on the one hand, you might put a lot of effort into something really awesome, like a dashboard, only to see that it kind of falls by the wayside. On the other hand, you spend all of your time kind of in these manual, ad hoc fire drills. Neither of those are great ways to, in the long run, close that gap between data scientists and decision makers.
A proactive notification approach
But I'd like to propose for you today an alternative, a solution that we've found a lot of success using ourselves here at RStudio. And so what does this solution entail? Instead of creating a dashboard, what we like to do is create a proactive notifier, or something that is going to alert me to the information that I need to know. And in our case, we spend most of our time inside of our inboxes, so that's where our alerts goes, is through email. But we still want to be able to do that as data scientists writing code.
And so that's what we're going to talk about today, is how you can use a rich set of R packages to allow you to create these proactive notifications. And as I mentioned, kind of with my wife's dilemma, oftentimes these notifications need to meet the expectations of our stakeholders for how they're formatted. So we'll look at some new packages that are available to help you precisely format these results. And then finally, we don't want to solve this dashboard fatigue problem by introducing another type of fatigue, which would be email fatigue. And so we're going to talk about how you can only send these alerts on conditions, so that instead of sending emails every day, you can send emails more specifically when there's a problem at hand.
The picture that you're seeing here is one of my favorite examples of this workflow. One of my roles at RStudio is to keep track of all the different R packages that we build and maintain. And so every day, we go and we build all of these packages. And we want to know if any of those builds fail, especially if it's a really popular package. We don't want to get a whole bunch of emails from users saying, hey, this package isn't available or doesn't work. We kind of want to stay ahead of those types of problems. And so what a data scientist named Greg and I came up with is this workflow, where every day we run these builds. And then if there's a problem, and critically, only if there's a problem, do we get these types of targeted emails that tell me exactly what the problem was, and allow me to then take further action.
Real-world examples
We're not the only ones who are kind of using this pattern successfully. So I just want to tell you a few quick stories before we get into the weeds of writing the code. One of my favorite stories is from a large infrastructure as a service provider. So a company that maintains thousands, if not hundreds of thousands of servers. And what they use this workflow to do was track their hardware uptime. And then if they saw a problem arising, they could send a proactive alert out to their vendors in the context of their support level agreement, or SLA. And what this team was really excited about is before adopting this workflow, they had done this type of task in JIRA. And any time they onboarded a new vendor or they modified their SLA, they had to manually click through a whole bunch of steps in JIRA to accomplish this task. The great thing about this workflow is that it's all based on code. So it's really easy to reuse these different rules and different markdown documents that you're going to write.
Another really awesome story of this type of technology applied was in a high-tech manufacturer People Ops Department, so kind of an HR group. And they had this challenging task where they needed to notify division managers about employee overtime problems. So say an employee worked more overtime than they were allowed, they wanted to send out an alert. Now before, they had a dashboard. And what they found was there are 30 different division managers were not reliably going to that dashboard every day to check if there was problems. And so they adopted this workflow to instead send out an email to a manager only if there was actually a problem with a specific employee's overtime submission. And they actually won an award of excellence for adopting this workflow, which I think is really exciting because it's ultimately a pretty simple concept that we're talking about here.
The tool belt
So let's get into the nuts and bolts a little bit to set the stage for Rich, who's going to walk you through the code here. I just want to talk about the different tools in our tool belt that are going to allow us to apply this workflow. And so the first thing that I mentioned is that we're going to be crafting emails. And to do that we're going to use two R packages. The first one is R Markdown. If you're not familiar with R Markdown, it's a package that allows you to combine code and text to create documents, beautifully kind of customizable documents.
And so historically R Markdown has been used to create PDF files or HTML files. But in our case, we want to use R Markdown for is creating the actual body of an email. It turns out an email's body is kind of unique and different from other types of output formats. And so to help kind of adjust R Markdown to create the body of an email, we've introduced a new package called blastula. And so that's what blastula is really good at, is helping you craft the body of your email.
We also mentioned that often when you implement these types of workflows, you need to format results in a very specific way. And because of kind of the legacy of spreadsheets, that very specific format tends to be a table. And so we're going to spend a little bit of time walking you through GT, which is a package that allows you to have a lot of control over the types of tables that you can produce from R. And it's also compatible with blastula and R Markdown. So you can embed those tables right into the email, like you saw in my example.
So those are kind of the packages. And then we get to what I think is really the heart of this whole solution, which is how we can send these types of notifications in a more dynamic way on a condition. And the amazing thing about writing code is that our answer here for what can our condition be is really endless. With code, the answer is always yes. That is really true here. Your condition might be something like anomaly detection. It might be a complex model that allows you to determine kind of dynamic thresholds with uncertainty levels. In our case, it's going to be just a simple set of rules. But no matter what it is, the actual implementation of sending a notification on a condition boils down to a really simple if statement that I think is kind of at the heart of how powerful this workflow can be.
With code, the answer is always yes. But no matter what it is, the actual implementation of sending a notification on a condition boils down to a really simple if statement that I think is kind of at the heart of how powerful this workflow can be.
And then finally, I mentioned we don't want to introduce data science fatigue. So we don't want to use these tools and then still end up in a world where every day we have to press play on rendering these reports and sending out these alerts. And so we're going to talk a little bit about how to automate this workflow. Now, there's a lot of open source ways that you could go about doing this. You could use something as simple as CronTab to manage the execution of these reports. Or perhaps you could use a more advanced scheduler, something like Airflow. But in our case, we're going to look at RStudio Connect, which is a professional product that RStudio kind of sells.
And what RStudio Connect does really well is it takes care of some of the intricacies of putting R into production. So you don't have to worry about the different packages that your report might need. You don't have to worry about logging failures, about authenticating users, about securely sending email, or scaling out the system. And so it's a really powerful way to kind of adopt a lot of these automation features without having to reinvent them all yourselves.
Demo walkthrough
So to set the stage here for Rich, this is what we're going to walk through in our demo. We're going to start with some fake data, and then we're going to go from that data to a set of key performance indicators, or KPIs, that are going to serve as our threshold for sending alert notifications. Now, in our case, those are going to be simple rules. In your case, that's really where the data science magic applies. So you might apply some magic and give us the benefit of the doubt here.
And then what we're going to show you is kind of a natural first step. Once we have that data and those KPIs, we're going to build the dashboard. And don't get me wrong, the point of today's webinar is not to say that dashboards should go away. Dashboards serve a really important purpose in helping create mutual understanding between you and your stakeholders. It's a great way, especially in like a meeting, to play around with the analysis and allow stakeholders to bring their domain expertise to the table, and maybe informing what thresholds or KPIs should be based on their experience.
But it's not a great place to stop over time. And so what we're going to do, once we have our dashboard, is show you how pretty much the same code can be adapted with the packages I mentioned, blastula, R Markdown, and GT, to create these automated emails. And then we're going to put them into production on RStudio Connect and show you the output of what those emails in an automated fashion actually look like. And so with that, I'm going to go ahead and stop sharing my screen briefly. We're going to switch over to Rich, and then he's going to walk you through the code.
Code walkthrough in RStudio
All right. Here we are in my favorite IDE, RStudio Desktop. I'm going to walk you through a few R Markdown files. And don't worry, all the code is available on this site right here. It's called Beyond Dashboard Fatigue. There should be a link available. You don't have to furiously screenshot this code. It's all there, the exact same files that are here are in my IDE right now. So you can relax.
So my goal here is to walk you through the different R Markdown files which are necessary for creating first a dashboard, and then creating the report and email component, and not dwell too much on the actual code except where it's important, but show you the structure and basically the capabilities of what can happen. So if you haven't used R Markdown, the key is you have text and you have some code, codes in these chunks here. You can render this by hitting the knit button. So I'll do that right now to show us the dashboard.
Okay. So right now, it's basically working away to create that dashboard. We get this preview window, and here we are with the dashboard. Really easy. We just use a little bit of code here to create these panels right here for all the different parts. So what I'm doing is I'm using ggplot to create a daily users chart, same with revenue. This is our business data that's all available inside the repo as well. It's basically fake data. I made this. But it's a great sort of like set of examples to show us how we can use email functionality to deliver a report that's based on business health.
So that's what I call this. So these are different KPIs, and I'll walk you through that in a second, but I just want to show you that this is where we're starting from. We're starting from the dashboard, which in itself is great, but you have to actually look at this data to know if anything is alarming with these KPIs. So that's maybe not the best thing, unless you're extremely diligent with looking at dashboards, which may not always be the case.
So let's take that one step further. We'll go to a R Markdown report. Without showing you the actual code in these chunks yet, I just want to say that the code is essentially borrowed from the dashboard, but put into this main report document. And same rule applies. If you want to render this, you hit the knit button, and you get a preview inside of RStudio. So I'm going to hit this, and what we'll see is actually two preview windows, because we actually have a sub document here called business health email, and that's the email version of this report.
Okay, so we get a rendering here, and we get two documents here. So this is the main report document. It's supposed to have lots of details. We have a GT table right here, point and foremost, and we have the same two ggplot charts here, and also some raw data, because it's sort of a full-blown report here. And this is actually the email version. So it's a bit stripped down. We don't have the final thing, like the final raw data. We just say the full report is available on Connect, which it will be. Check the links below, because it generates those links for you. And the raw data is attached to the CSV file, which isn't here, but in the final version that we'll send out, it will be. So essentially, this is a preview of the email, and it's quite a bit larger, so we can read it on phones and smaller displays, which is great.
But this is what we want to send out, only if there's a problem. And we see those problems here, the cells highlighted in red. So let's have another look at the code. I'll walk you through that and show you how we get to that point. Okay, well, first of all, let's introduce the packages that we're using. We're loading in tidyverse. That gives us things like dplyr to bunch our data, to transform it, get it in the right shape. LibreDate for handling dates, it's very nice. Then we have GT and blastula. So this creates the table that you saw there. blastula helps with emailing. So we'll use some blastula functions at the end of this document. And Glue is a really wonderful package for, it's like an alternative to Paste. So if you want to make a string, like say a custom subject, it's quite a bit easier to use Glue and some variables that you've generated in this document.
So let me show you the first chunk here. The first part is essentially getting your data in. So get health KPIs. It's a helper function that's sourced in from these other R script files. And this just gets us our data to work with. You know, in different circumstances, this could be pulling in database data, could be pulling in data from an Excel sheet, whatever. The idea is just to get the data into this report.
And then we can apply other functions. What we're doing now is we're transforming the data. And then we have a set of thresholds. And what we're doing with that is we're saying that for each of these KPIs, this one is daily active users, daily active customers. We have new users, churned users, and daily revenue, and a ratio of DAC over DAO. And we're saying that if there is an exceedance, this will be less than the threshold that we set in a different file. I'll show you that really quickly right here. So this could be set by a decision maker. This could probably be a meeting. Essentially, this is where the business health is not so great if it passes these values. Essentially, we're saying that all these ones should be above these values. And for churn, obviously, you want that number to be lower is better. So this is our way of saying, oh, yes, OK, we have a problem. So we're calculating those right there.
And then what we're doing here is we're getting some other variables. And we'll use these later on to shape the subject line of our email to give us a little more information up front. So it's a bit nicer to sort of digest immediately. OK, so we've got some variables. This is not shown to the user in the report. We have a lot of include equals false. That means in the file report, we don't actually see this code. You can reveal the code, but most of the time, it's not very interesting. So most of the time, we just hide it with equals false.
So I'll show you this chunk here. This is where we make the GT table. Again, we're not showing the code. We're just showing the output of this, which is the table. And it's quite big here. Essentially, we start with the KPIs. And then we're doing a few things to the data. And then we pass it into GT. And then we're doing things inside the GT API, where we're adding a heading. We're changing the column labels. We're using some helper functions that we've created in a different file to do some GT-related things. And we're doing some formatting of numbers and currencies. And this is quite a bit. I mean, we can stop right here at just GT. And then we have a pretty good table. But this essentially adds more formatting and makes the table really shine.
Again, more information on this is available in the GT repository. I will show that to you. It's actually just right here. It's gt.rstu.com. You can learn lots of stuff about GT through that website. OK. So that's the GT table. ggplot. ggplot needs no introduction. That's been around for a very long time. Tons of examples. We're essentially reusing the plots that we had in the health dashboard and just putting them here. We're not showing the code. We're showing the figure. And we're aligning it center with this option.
The email condition
OK. So another key thing here is the email part. So if there's a problem on the last day, should we send the email or not? OK. So here's the condition right here. If the total number of exceedances, which is to say the total number of problems saw in the most recent day, is greater than zero. So if there are any problems, in other words, then we'll prepare this email. We're going to take this sub-document, which I'll show you in a second, which is businesshealthemail.rmd. We're going to take that as the body of the email. And then we're going to attach it to a connect email. And we're going to create a subject line with glue. So the subject here is one or more KPIs total exceed below thresholds. And then we're having some more glue code here. These curly braces are essentially where the variable total exceed, which was defined up here, is being put into the string. So it's being interpolated into the string.
Another cool thing is we're actually generating a CSV file right here, just some data. And we're attaching it to the email through this attachments argument. This is really cool because we can create any files here. They won't be seen by the user in the main R Markdown document. But they can be attached to email, which is fantastic. So this is what will happen. This email will be sent if we have any exceedances or problems. If we don't, then we suppress the scheduled email.
Deploying to RStudio Connect
This is a bit confusing. What does this actually mean? So in Connect, you deploy it through this, just publish to Connect. And I'll take you to Connect. And I'll show you what this actually means in context, which is right here. So this is our main document, our main report in Connect. So it looks great. We actually have a history of these documents. They'll be published every day. But to do that, you actually have to set it up. You actually have to go to schedule and then schedule this output to be run every day. So it'll take all your code. It'll run it every day, maybe on different data. That's kind of the hope, because what happens is it'll be connected to a different data source. Well, the same data source, but it'll be updated every single day. So we'll get different data and possibly different alerts.
So in this day, we did have a few problems. They're aligned here in red. So that's great. But we have to check this every day to see that there's a problem. When we send email, we can send it when there's a problem. And we don't have to be checking. This is almost the same problem as the dashboard, having a report here. But the email sort of solves that problem. So to schedule that, we schedule it every day. And the key thing is to send an email after update. And then we set some recipients. It could be Sean, for instance, here.
Great. Now we can save that. And now we have a master list of people that can view or change this document. And we can schedule Connect to send email to those recipients through here. So I'll just do this one more time. This is kind of cool. Great. And we'll save this. Brilliant. So every day, it's going to be run. And we can change this, of course. But this is daily data. So it makes sense to send it every day. And it will send an email only if the thresholds were exceeded. And we can actually preview the email, send one to us yourself, just to sort of debug the email and make sure it looks good before it actually goes out for production.
So I can send this. And I actually happen to have one open. So it shows it to me right here. This is my email client. So we can see that the subject here is one of our KPIs and parentheses three. That's the glue work that we did here. Total exceed. Broke thresholds. And we have the list of things here, which is great. Because if you look at this in your list of messages, you see this right away before even opening up the email. And the body looks really good. This actually looks really wonderful on a small device, because the text is quite large. We have just the important things, which are the table, a little bit of text, just to sort of show what's being shown. And importantly, here's the attachment. So here is the CSV data, in case it's important for whoever is looking at the data.
The email sub-document
So I want to show you how you craft this message body in this business health email sub-document. So it's a little bit strange. When I saw this previously, I was a little bit weirded out. Because essentially, these chunks have nothing inside them. What's actually happening is we're using the chunks in this main document here. We don't expect anything. As long as the names match across documents, all the content will be transmitted to the sub-document and then the email, which is really pretty cool. It reduces typing, reduces code duplication. We can have one source of truth, which is fantastic.
So yeah, we're essentially adding a section here, some text, the output, stating that the full report is available on Connect. Check the links below, because the links are given here at the bottom. And these are real live links that send us to the main report, if you want more details. And I say that the data is attached to the CSV file, which is the truth. It's right here. So this is really great. And we're reusing some of the code that's in the email as well, this paste. And this is R Markdown, so we can use Markdown with the asterisks to make things bold.
So yeah, let me go back to this business health R Markdown document. And I'll show you this one more time, in case you missed it the first time around. If you knit this, you do get a preview locally. So you can debug it before it even gets to Connect. Yeah. And again, the strange thing is we get two previews. One is the email, which appears in your browser. And one is the main report, which will appear in Connect.
Yeah. And that's really about it. I just want to impress upon you that the main thing is this very important bottom chunk. It relies on having blastula available, which is declared right here in the sub chunk, library blastula. Because these two functions are from blastula, renderConnectEmail and attachConnectEmail. And as long as you have the condition set properly, we should expect that email will be suppressed, because it will be, by default, sent out every day, unless this condition is, you know, unless we're going to the false condition here.
Q&A
That's about it. Went through it pretty quickly. But that's okay. Because I want to leave lots of time open for questions.
Yeah. Thanks so much, Rich. That was awesome. Hopefully, everyone saw, kind of, got a sense for the code and how easy it is to write, to implement these types of workflows. Lots of questions that we'll get to in just a moment that were coming in, which are all excellent. Just to quickly, kind of, recap, in case maybe you were late and missed a bit of the introduction. You know, the goal here is to help data science teams empower really good decisions. And what we've, kind of, seen in our experience, and what Rich just showed you, is that it can help data science teams do that, if they're able to more proactively distribute their work, instead of relying on decision makers to, kind of, diligently visit a dashboard on a regular basis.
As Rich also, kind of, showed, this workflow of proactive distribution is pretty easy to do, because there's a rich toolkit of code. So, you can write the code that you love to use, even if your decision makers are receiving results in the media that they like. And then, just as a, kind of, general rule, the more tools that we as data scientists have in our tool belt, the more likely we are to, kind of, more likely we'll be able to impact decisions in an effective way. So, that's, kind of, the take-home message.
Rich, I can go ahead and answer, and we can, kind of, tag team this. So, one of the first questions that I saw that I thought was really interesting was a question that asked what types of things can you embed in the emails? So, we demonstrated embedding ggplots and tables. This particular participant asked about interactive HTML widgets. So, Rich, maybe if you can give a shot at answering that. Yeah, well, it's a little bit of a letdown, but not really, because email clients do not like any JavaScript in their email. They tend to strip those away. So, that is, kind of, out. The most interactivity you can probably get is simple HTML elements and links to, like, say, a full featured app is what people are doing in email. So, interactive bits, like HTML widgets, are, kind of, not possible until, like, email, you know, gets beyond where it is now. Right now, it's very stripped down because of the security model of email.
Yeah, the, kind of, a follow-on question was, we think you showed Apple Mail, which got some applause from the audience, Rich, but we'll throw out there that the approach has been, kind of, validated and tested on a whole bunch of email clients. So, for those formats that are supported, you'll be able to, you know, kind of, use Outlook, Gmail, kind of, whatever, on a phone or on a laptop. I'll throw in there that we made a very large effort in our QA process to test as many email clients as possible with a range of, you know, sending types with blastula, through Connect, just sending through your own SMTP server, like via Gmail. We're actually using a service called Litmus, which will, you send them the email and they will test it on 50 plus, you know, email clients and we can see the results and if things break. So, we were able to fix a ton of problems and, more importantly, to ensure that, you know, your email will be delivered without being mangled, which is always a good thing.
Awesome. I'll take another, kind of, group of the questions here, Rich, was, you know, what if you don't want to use RStudio Connect? So, I can take a stab at answering that. Kind of, the key to all of this, there's two parts that you would need to implement. One is a way to, kind of, on a schedule, render in our markdown document. And there's lots of different ways to do that. You can use something like Cron, you can use something more sophisticated like Airflow, or you could manually click render, you know, on a report every day. So, there's lots of, you know, options there. The second bit is that you need to use a service that will send the rendered email. And so, Rich mentioned briefly, and maybe you can expand upon this, Rich, but blastula also has functions for integrating with non-RStudio Connect email senders, like an SMTP service. Yeah, that's right. So, you can use Gmail's SMTP service, you can use Microsoft's Outlook one, the online one, and you can also hook up any other ones, like SMTP to go, which is a really nice one. The problem is, it's not really reliable. I found Connect is rock solid, always sends your email. Sometimes you get throttled with those other services. So, that's a little bit of uncertainty, you know, you may have to deal with, but it's possible.
Yeah, the other thing I'll add to that, Rich, that you will want to be a little bit careful of is that when you're sending these types of notifications, you just need to watch out for kind of impersonation issues. So, one of the things that we've worked hard on Connect is to make sure that when you send an email, it's coming, even though it's an automated system, it's coming from a reputable address and then going to the people that are designated to see it. And so, there are some security ramifications when you get into the realm of automating notifications that you'll just want to be careful of if you're integrating with one of those other options yourself.
Let's see, in terms of other interesting questions, Swanet asked a pretty interesting one, which is, you know, do you have to use R or is there a way to incorporate other languages like Python into these types of workflows? I'm sure there's a way, but the thing you're missing is like blastula for, you know, creating the email message body. And Python does have, it might, you know, I've looked around and I've looked for other Python libraries that do this sort of thing, like create email. And certainly, you can send an email right through Python, but the hard part is creating a message body. And even harder part is creating an HTML message body, which is what blastula does. So, I think it's possible. You just have to do a lot of upfront work. You basically have to reproduce blastula in Python to get that going.
I'll add to that, Rich. One kind of way that you could combine these two together is actually within R Markdown. And so, it's a bit of a misnomer, but R Markdown allows you to combine a whole bunch of different languages. So, while it's an R Markdown document, you can certainly have Python code chunks commingled with R code chunks, commingled with bash code chunks, if you want it, or SQL. And so, all those different languages can live inside the R Markdown document. And then R, and that type of workflow just becomes a facilitator for sending the email using Rich's awesome work on blastula. Yeah, I'm glad you brought that up. Because yeah, definitely all the other stuff you can do in R Markdown. It has a lot of engines available. But the last piece is kind of critical.
So, this is kind of an interesting question, Rich, on a little bit more kind of philosophical, I guess. The person asks, how is this approach influencing kind of your thinking around dashboards and kind of when you would use a dashboard and when you wouldn't? Ah, yeah, a dashboard, it's still pretty essential. But I wouldn't use it as a way to notify people of anything important, right? Because it involves diligence and seeing the dashboards daily or whichever frequency is required to be ahead of important issues. So, a dashboard is part of the whole thing. You can go to a dashboard once you get notified, and then get the history of what led to certain problems up to the present day or whatever. It's not a replacement for a dashboard. It's basically, this basically adds to the whole stack of analytics.
And one thing I'll echo there that also kind of addresses another question that was asked. A common pattern in those emails that you send is to dynamically generate a link back to a dashboard. And so, something that we do a lot at RStudio, in the example I mentioned at the beginning, I'll get an email that says, hey, these packages failed to build. And then it includes a link to a dashboard where I can explore like the error codes and the duration of the job and all sorts of more kind of interactive questions. And so, dashboards aren't out of the picture. They just, as Rich said, aren't the mechanism for doing the notification.
There was a question here, instead of hard coding the thresholds, they can be dynamic, right? Yes, absolutely. Of course. This is basically your code. I just basically whipped something up together. Very unsophisticated, whether it's below or above. You can definitely pull in data, generate the thresholds dynamically, compare. There's lots of ways to do this. You can use packages like anomaly detection, or you can use like the forecast package to do certain things. Essentially, what you want to do is you want to get to a place where you can make a final decision on whether to send the email or not, and then what to put in that email. And you may have different versions of email. You may select between different types, or you may be more sophisticated in the dynamic content. You include like a subject line or even content in email, because the variables are transmitted from the main document to the sub-document, which is kind of cool. So yeah, basically, because this is code, you can do whatever you want. You can go nuts, essentially.
Yeah, a kind of follow-on question. It was, maybe as a data scientist, I don't know the triggers or thresholds. And so the question was, does it always fall on the data science team to set them, or is there some way for decision makers to influence the thresholds? Yeah, that sort of thing. I think it's a conversation between the data team and the decision makers in the organization, for sure. You may come up with a set of recommendations. You may change them on a weekly basis or some other frequency. It may be dynamic. It may just be a system which creates thresholds if values are changing fast, right, and there's seasonality or things like that. So it really depends on the organization. But essentially, it should be transparent, I think, because decision makers are getting the notification. They probably should know why. And yeah, basically, it's a conversation.
Yep. And the nice thing about code is that it's not a black box. It's very inspectable, and it gives you the full flexibility there. So you're not locked in. It's not like RStudio's deciding what the threshold should be for you. You get all the options there. And that kind of leads to another question that came up that I thought was really awesome. We talked a lot in this webinar about creating notifications to bridge the gap between data scientists and decision makers. But this person asked, could the same workflow apply to notify the data science team of data quality issues? Absolutely. Yeah. Yeah, definitely. I mean, it's totally up to you who the email's directed to. I would set up diagnostic emails to yourself, if important. If certain data is suspect based on some data validation rules that you have, I would want to know about it. And then I would want to let data engineering know about it, too, to get at the root cause of things. So absolutely, yeah. Definitely send yourself these notifications. Yeah, that's my pitch.
Technical questions
One of them is from someone who's already playing around with the code. So kudos to you. And they've hit a stumbling block that is maybe a little bit common as you're learning this workflow. And rather than reading the question verbatim, Rich, I'll basically diagnose it. But essentially what they did was they rendered the child document instead of the main document. So can you talk a little bit about which document to render and when? Yeah. So this is a trap I always fall into, because it doesn't stop you from rendering the child document. So it's always the main document. So if you render the child document, nothing good comes from that. There should probably be a warning somewhere, but there's nowhere to put it. But essentially, it all comes from the main document. And that gives you two previews. So basically, the thing you want to ask yourself is, am I seeing two previews? And if not, I probably did something wrong. So always hit knit on the main document. And then you'll be A-OK.
Yeah. The error messages that you'll tend to see if you accidentally knit the child document are that variables are undefined. And that's because those variables are defined in your main document. Now, of course, you could duplicate all the code into the child document to have a child document that's also standalone. But then you needlessly copied and pasted code, which is something that we're always trying to avoid. Yeah. Try to keep dry, even in your email message bodies or your R Markdown documents. So people that don't know what dry means, don't repeat yourself. Try as much as possible to have one place for code. And that's the great thing about these sub-documents, taking results from the main document. You don't have to rewrite code and worry about being in sync between the documents.
Excellent. And then just for the sake of being in the weeds with our audience here, maybe one final question for you, Rich.

