Matt Dancho | Using R, the Tidyverse, H2O, and Shiny to reduce employee attrition

Transcript#

This transcript was generated automatically and may contain errors.

Oh man, this is exciting. This is my first RStudio conference, by the way, guys. Yeah. I'm pumped. I'm a big fan of RStudio, the company. I think you guys know why. It's probably the same reasons that you're here.

So today what we're going to be doing is talking about R for business. So that's kind of where I specialize. I'd marry up business with data science. My company is Business Science, so you can probably guess how I got the name. Business plus data science. Merge the two. You got it.

So what we're going to be talking about today is a lot of workflow. We're going to be talking about how to solve a specific business problem. It's called employee attrition, and it ends up being a huge problem for businesses. It's a big dollar figure, $15 million per year, and we'll talk a little bit about how I came up with that number. But more importantly, what we're going to be doing is we're going to be showing how the different tool sets, including the tidyverse , my favorite modeling package, H2O, and also Shiny combine to really help us solve this business problem.

So I'll be your host. Again, my name is Matt Dancho. I am the founder of Business Science, an educational company. I'm a lover of R. I've been using it for quite a while. And I've even contributed some open source packages as well, probably the most popular of which is TidyQuant. And I do dabble in finance. But really, I'm an educator of data science. I specialize in teaching data science.

I both do onsite workshops at companies, and also what I've done is created an online platform called Business Science University, where students can come and take a range of courses that really up-level them, accelerating their careers. So oh, and one more thing, one of my special passions is converting business people to data scientists. So I just want you guys to know, if you are a business person in the audience, you don't need a Ph.D. to be a data scientist. Just throwing that out there.

The $15 million per year problem

All right, so agenda for today. We're going to be talking about a few different things. The $15 million per year problem, that's going to be our focus. We're focusing on a business problem. It's employee attrition, and we'll find out a little bit more about what that means. The second thing, and this is what I'm super excited about. I'm unveiling a new shiny web app that we're going to be teaching in the 300 series course and part of our program at Business Science University as part of the Data Science for Business program. So you guys are going to see it first, right here at RStudio.

And just one other thing about that app, just want to give credit to Kelly O'Brien. She's the one who developed it. So she's a RStudio employee, and I work with her quite a bit. Then we're going to talk about the internals of the app, the data science workflow, what powers this app, the tidyverse, we're going to talk about H2O and also another package called Lime that I'm very excited about. And then finally, we're going to pull it all back together and talk about learning R and how you guys yourselves can figure out how to do all of this stuff that I'm teaching you.

So that $15 million per year problem. So you guys might recognize this gentleman's face, Bill Gates. He was once quoted as saying, you take away our top 20 employees and overnight, we, Microsoft, become a mediocre company. Let's dissect what he's saying there. Take away. He's talking about a concept called employee attrition, employee turnover. Top 20. He's talking about high performers.

This is the top 20 people in his company, and often what we find is the 80-20 rule applies. Top 20% of employees in a company tend to generate about 80% of the results. So you really want to do what you can to preserve and keep and retain those high performers. Otherwise, your company is going to become a mediocre company.

So you really want to do what you can to preserve and keep and retain those high performers. Otherwise, your company is going to become a mediocre company.

So let's dive into this a little bit further. I want to talk about this curve. It's called the economic value of an employee over time. It looks a little bit like this. So we'll see that there's four different boxes up here. The first box is the curve. So when an employee just starts at a company, they represent that green dot there right at the beginning. So time has not elapsed, and actually what is happening is that company is investing and actually losing money having you as part of that company. And that's an investment that they're making in you.

So they have to provide you training, mentorship, they have to integrate you, and it takes a while to do this before you can become a productive member of that company. Then eventually you make it to the second box, and this is what's called the break-even point. So as you gradually get to that point where you begin to start generating returns for that company. And they call that the return zone. So you've got the investment zone, you've got the return zone. And that process can take as little as three weeks for jobs that aren't overly difficult or for highly technical jobs. It can take upwards of a year or longer. So that's the type of investment that that company is making in you.

So what happens when a person decides to quit? That's what this third box represents. So that employee has become a productive member of the company. They're generating returns, doing really good work, and then all of a sudden they decide to quit. And that could be because they hate their boss. That could be because their work-life balance is all out of whack. Or they may just not be like the work that they're doing. And what ends up happening is when that person quits, that line immediately drops down to zero, and then it extends for a period of time, and eventually that company decides to replace that person, typically. And then that cycle repeats itself. So they're reinvesting in the new person, getting them up to speed, and it takes a while before they get to the return zone. So as you can see in box number four, we've got lost time and lost productivity. This is what you want to avoid, especially for high performers.

So there's two different types of attrition, and we're going to focus on trying to prevent your high-performing employees from leaving. So not all employees are created equally. Some employees just never quite get there. And that's what this first box here is what we call necessary attrition. And this could be because that person just might not be a good fit for the culture of the company, may not mesh well, it might be a poor job fit. Whatever reason, they just don't quite cut it, and it's okay to lose these people. But what you really want to prevent is the right-hand box, which is bad attrition. And this is when you have high performers that have generated returns, you don't want to see them leave the organization.

So you can actually assign a dollar figure to this particular problem, and it's actually a very simple calculation. In fact, this is some R code. I know it might be a little bit difficult to read, but what this is is a function that I created. It's called calculate attrition cost, and it takes a few different parameters, and then it performs just a vectorized calculation that incorporates direct costs, lost productivity, and some assumptions in there, and then also it subtracts out the salary and benefits, which is what the company actually benefits from losing an employee. So for a high-performing employee that is a productive member of that company, it can be upwards of $78,000 per employee that that company loses when you assign a cost to it.

So the funny thing that happens is typically companies don't just lose one person, it ends up being more of a systemic issue. And so what happens when a company such as, say, Walmart or Target, you know, a competitor moves in and starts stealing your high performers, before you know it, you've got a couple hundred that you've lost, and if you lose 200 performers each year, that's a $15 million problem. Yikes. So we want to prevent these high performers from leaving the company. I think that's pretty obvious by now. So the way we can do that is through using data science.

Shiny app demo

And I'm going to show you the product first, just because I want to show you guys the end result. So we're going to do a quick, shiny web app demo. And again, this is something that's exclusive to the RStudio conference, I'm really excited about this. This is what we're developing as part of the 300 series course, so there's a 100, a 200, and a 300, and those who take that course are going to be able to learn how to build this app. So I'm just going to click this run app button, and if all goes well, we get a shiny app.

So what this app represents is the end product. You can imagine that you've got managers out there that are responsible for their employees. They are responsible for retaining them, making sure that they develop them into productive people, making sure that they're happy. And that's exactly what this app allows us to do. So I'm just going to scroll and kind of show you, each one of these numbers represents a specific employee. And this particular one, employee number 891, has a 47% prediction risk. So this is, this employee is actually predicted to leave, because that prediction risk is above the threshold for deciding whether or not that employee leaves. And this is actually H2O under the hood that's generating this predictive model, and the app is encapsulating it.

So you can imagine, put yourself in that manager's shoes. They see that they've got an employee that's 47% likely to leave, and then what we have over here are the reasons, the features, of why that person is predicted to leave. So this actually comes from the Lime package, and this first feature here is stock option. So that person has stock option level 0. So this is something that the manager can actually affect, or change. It's what's called a lever. It's a feature that the manager can adjust. So maybe moving that employee from level 0 to level 1 and giving that employee stock options, that might be enough to help that employee to stay.

The next one is the employee has over 28 years at the company. The next one is that the number of jobs that that employee worked at is over 6. And then the next one is that that person has a training time last year of 2. So these two here are not levers that can be adjusted. You can't really toggle the number of years that that employee works, or how many jobs that they've had previously, but that manager can then take a look at maybe training times, maybe to help get them engaged, give them a few more training sessions a year.

So this is really cool. We're able to actually bottle up data science into this machine learning app, and even further than that, we can develop predictive recommendations that are able to be presented to the manager. So that way they don't have to kind of think of strategies to do, but that's already incorporated for them. So for example, this management recommendation strategy has a work environment strategy of promote job engagement. So that manager should then focus on activities that will promote job engagement.

So this is what we're talking about. When we provide shiny web apps to non-technical people, business leaders that have a stake in the game where they can actually affect and make decisions that better improve the company, that can make a huge difference.

When we provide shiny web apps to non-technical people, business leaders that have a stake in the game where they can actually affect and make decisions that better improve the company, that can make a huge difference.

Matt Dancho | Using R, the Tidyverse, H2O, and Shiny to reduce employee attrition | RStudio (2019)

Transcript#

The $15 million per year problem

Shiny app demo

The better decision making effect

The data science workflow

Learning R

Featured software#

rstudio

Shiny

tidyverse