Resources

How to Win Friends and Influence People (With Data) - posit::conf(2023)

Presented by Joe Powers Too many great data science products never go into production. To persuade leaders and colleagues to adopt your data science offering, you must translate your insights into terms that are relevant and accessible to them. Attempts to persuade these audiences with proofs and model performance stats will often fall flat because the audience is left feeling overwhelmed. This talk will demonstrate the data simulation, visualization, and story-telling techniques that I use to influence leadership and the community-building techniques I use to earn the trust and support of fellow analysts. These efforts were successful in persuading Intuit to adopt advanced analytic methods like sequential analysis that cut the duration of our AB tests by over 60%. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Bridging the gap between data scientists and decision makers. Session Code: TALK-1077

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to my talk.

So pretty early in my data science career, I was introduced to a worrying statistic when I was told that 80% of data science projects never successfully go into production. This is an unverifiable statistic, but I think you related to it or you would not have shown up for the talk.

This is worrying because we invest a lot of time in a data science project, a lot of your personal time, a lot of company resources. Looking back on six years in hindsight, the bad news is that this is probably true, if not an underestimate of how many projects fail to go to production successfully.

But the good news is that I think you can reliably be among the 20% whose projects do successfully go to production if you implement the steps from this talk.

So we can start off with why wouldn't a data science project go to production after all the expense invested in it. One of the first reasons is that many launch-ready projects die in the backlog, awaiting prioritization. Second, many projects get deployed, but their potential adopters are unaware of their existence. And finally, even for projects that get deployed, people are aware of them. The difficulty of using those projects can incede the incentives to use those same projects.

Influencing leadership

So how do you ensure that your data science project goes into production and gets widely adopted? I have found an enormous amount of value from a small book written 90 years ago called How to Win Friends and Influence People. This is a book that kept getting recommended over and over again for probably two decades before I finally picked it up, and I think it took me a long time to pick it up because my initial reaction to the title was, ick. It sounds like a shmarmy, manipulative way to deal with leaders and colleagues.

But if you, I highly encourage you to, if you actually open the book, what you'll find is that it's about empathy. It's a book about winning over colleagues and influencing leaders by listening to and designing around their needs. And I think it's a really productive approach to working with people who have different needs than you and who have different agendas and different goals, and they're equally valid with your own, but they are different.

So step one, influence leadership. This is the first step in any successful data science project launch. You need to start by identifying a problem worth solving. And for me at Intuit, this focused around A-B testing. And some of you may not be as familiar with what A-B testing looks like in industry, so I'll anchor with a small example. This was a race mode timer that we tested in the last tax season.

Very simple. In recipe A, you get no timer. This is your control condition. Recipe B, you get a predicted duration of your tax filing, and you have a timer running against it. And it's just a simple question. Does this feature increase the probability that you'll complete your taxes?

So in 2022, our A-B tests, on average, were taking 30 to 60 days to complete. And in a precious traffic window like tax season, that's a lot of lost opportunity. So reducing A-B test duration by 25% became among the company's top goals.

So after identifying a problem worth solving, now you need to sell your solution to leadership in terms that they understand and already care about. So these terms should not be abstract statistical terms or mathematical proofs. You should, and I highly encourage you to do that for yourself, to convince yourself that this is a valid and sustainable solution. But that's not going to be the pitch to leadership. The terms that they care about are often going to be time and money.

So how does your solution save your customers' time, or how is it going to optimize their tax refund? Alternatively, how does this solution save our employees' time, or how is it going to make money for the business? These are the kinds of terms that are going to highly motivate and get the attention of your leaders to back your proposed solution.

So got a relevant problem, 30 to 60 days needed for fixed sample A-B tests. And I have a proposed solution. That was to increase test speed with sequential A-B testing. So fixed samples tests, these are your classic Fisher t-tests, chi-square. There's a fixed sample needed before you can perform a valid analysis. Sequential A-B testing is a suite of methods, but it allows you to look at the data either every day or at fixed periods without undermining the integrity of the test. It's not just peaking on a fixed sample test. This is a valid way to look at the data early.

Simulating A-B testing scenarios

Okay, so how do you win leadership support for a solution like this? Again, by demonstrating time-money benefits, also in familiar business scenarios. You're going to want to avoid examples like dice games and marbles in a jar. Anchor it to the kind of business context that they're already familiar with. And these are the actual slides that I used to sell our senior leadership that sequential testing was a promising way forward.

So what I did is I simulated thousands of A-B testing scenarios, and then I just observed the performance of fixed sample tests against the sequential testing methods that I was proposing. So in the first case, how much faster would our sequential tests end when no effect is present? So we're testing A versus B, but because I wrote the simulations, I know there is absolutely no real difference between these two conditions. How much faster will sequential tests end?

So on my Y-axis, I have a tally of the simulated tests. On my X-axis, I have the sample size when those sequential tests ended. And then in the dotted line, I have for reference, I have the fixed sample size that would have been required under these testing conditions. So in this case, it was 31,000. How long would the sequential tests end? On average, when no effect was present, a sequential test was taking just 14,000 samples to conclude. This is a 60 percent time savings over a fixed sample test. And there's a long tail. Nothing's free in life. But by the time you reach your fixed sample size, 91 percent of your tests have concluded. So I think this is a very worthwhile tradeoff.

This is just when no effect is present. Well, another familiar business scenario, how much faster would sequential tests end if a small improvement was present, say like a 1 percent lift? So same plot again, and the fixed sample test always takes the same amount of time. It's a fixed sample. Under these testing conditions, a fixed sample takes the same amount of time. But now, when the 1 percent lift is present, on average, we're taking just 18,000 samples, a 40 percent savings compared to the fixed sample test.

And finally, we could look at how much faster sequential tests end when a treatment is harmful. So you never want to do this, but this is also why we A-B test. We don't want to roll out features that hurt the customer. So if you roll out something and unintentionally it has a 1 percent harm, now you get enormous savings. That's 5,000 samples on average, and 100 percent of the tests are ending faster than the fixed sample required. That's an 85 percent time savings. That's an 85 percent reduction in the number of customers who are exposed to an unintentionally unhelpful feature. I think this slide alone may have sold leadership that this was a method worth pursuing.

I think this slide alone may have sold leadership that this was a method worth pursuing.

But again, I just want to emphasize, there's no fancy statistical terms being used. We did our homework, we did the mathematical proofs, but that's not the pitch. When you're selling it to leadership, you want to put this in terms that they already care about and understand.

So for the summary on influencing leadership, you need to sell your solution by identifying a problem worth solving, pitching a solution in terms that they already care about, and building trust in your data science solution through simulated data trials in very familiar business scenarios. And then this section closes with, you need to leave with a clear endorsement from leadership for adoption of your solution.

Winning over your colleagues

So this brings us to step two, winning over your colleagues. So leadership's endorsed your solution, and they're now asking, like, great, we want to endorse this, we want people to adopt it. You don't want this to be something that's jammed down your colleague's throat, or is just miserable for them to implement. And this is where design is going to come in heavily, is where your colleagues are concerned. You want to design a user-friendly tool, and then provide adequate support for adoption.

So I can proudly say our first sequential testing apps were unusable. But luckily, we had done lots of user testing with the analysts that went into it, and they told us this early on. They would say things like, I don't even know where to look, I don't know where to start, I don't know what I should do.

And so we took this feedback, we heard it, but we didn't really know how to act on it. And that's where, like, the second book recommendation in this presentation comes in, which is design of everyday things. Philosophically, this is very closely aligned with how to win friends and influence people. Same theme again, like, you have your needs, you have your agenda, but you need to recognize that the people you're trying to influence may have different needs, and see those validly and meet them halfway.

So you don't need to read the whole book, I think the first three chapters alone are enough to provide the framework for how to navigate these design challenges, and it's gonna allow you to combine your vision of how you want people to use your app, or data science project, with their feedback about how they expected to use the app. And you can make a wonderful halfway meeting point.

And this is what it looked like for us, okay? So when you come to the app, you know, we were acting to people who were like, I don't know where to look. We needed to hit them with everything all at once. So now we did a scaffolded introduction to the app. First question you get asked, where are your test data? Are they in the data lake? Are they in a CSV file?

Second, it's often gonna be several months between your release of an application or project, the training associated with it, and when they actually go to use it. So just expect that they'll have forgotten 95% of everything that you told them, and so we always include a hyperlink to the original training materials, where they can see a recording and a brief wiki. Good lesson from the book is when your wiki is getting really long, that's a clue you need to go back and redesign the product.

But so after they've provided us with their data location, we do a quick validation that the data are in the correct structure for the app to ingest. And then we ask them a few questions about their data. So for instance, what is your control name? What is the primary metric you're evaluating your A-B tests with? And what is your minimum detectable effect?

After we gather this data from them, we run the analysis. And in the upper right in bold green, you get a clear call to action. So our recommendation is that you stop the experiment for revenue. Your power has exceeded 80%, which is a widely used power threshold. And in the lower right, you can see the decisions. So in this case, recipe B is offering a statistically significant improvement in revenue. And the team can now move on from their test.

Support and adoption

So summary for winning over your colleagues. You need to promote your colleagues' adoption of your solution by designing user-friendly interface and ensuring adequate support for adoption. So to do this, we use the user-friendly design. But this second step was really important, is that we chose to do team-by-team interactive trainings.

Most data science projects I see rolled out, they may have one or two mass trainings with 300 or 500 people present. That's not an environment where people are going to ask vulnerable questions or try to reconcile their needs with what you're offering them. Team-by-team interactive trainings, these are like 10 people or less. People get very comfortable asking those dumb, vulnerable questions that actually ensure that they're going to feel confident using the tool.

The second key piece is that it's interactive. If you're merely talked at for an hour on how to use an application, there's very low likelihood that you're going to retain or feel confident applying any of that. We always use the show, then do rhythm. So we would demonstrate a capability of the app. And then we would have them, for instance, change the primary metric and then perform the same steps themselves. This makes the training a lot stickier.

Finally, we offered ongoing support in a Slack channel, sequential testing support. And I want to emphasize that because we took those first two steps of it was a user-friendly design and we had provided adequate training, we were not overwhelmed with minutiae requests in the Slack channel. A lot of times, the questions we were getting in the Slack channel were actually really provocative or really exemplary requests that other people could benefit from seeing the conversation around.

The tone in which you respond in the Slack channel is also going to set a lot of expectation around whether people are irritating you with their help requests or whether this is a collaborative effort that they're participating in. And I think, you know, if you take these three steps, it is a lot more work than just doing, you know, rapid rollout and mass training. But this is what it takes, you know, to see widespread adoption.

Implement, track, and report

Okay. So step three, implement, track, and report. So if leadership has endorsed your solution and your colleagues feel empowered to use it, you're going to get widespread adoption. You need to be thinking ahead and have a measurement plan in place beforehand for how you're going to track adoption rate and estimate the benefits of your project.

So we reached 100% adoption, but now we needed to go back to leadership and tell them, well, how do we fare against this goal to reduce test duration by 25%? And importantly, what was that worth?

So the way that we did this was that every sequential test under the hood was computing a fixed sample power analysis. So we weren't just going to rely on simulations anymore. We would know for every single test how long it actually took and how long it would have taken if they had run the same analysis in a fixed sample test. And what we found, on average, was right in line with our simulations. We had an average savings of 63%, and you can see in the negative territory that some tests did run longer. But on average, this was an enormous time savings, well exceeding our 25% goal.

But when tests end 63% faster, that means that there's more time in season for customers to enjoy the best experiences. So what we then did is we calculated the incremental revenue gain from faster rollout on every sequential test. And so what you're seeing is it's obviously the sequential tests are on the Y axis. The blue section is, I should say the full length of the bar, is the revenue associated with that rollout. And if you go back to the blue section, that's how much the revenue would have been if we had done the delayed rollout on a fixed sample test.

So now we could sum up the green incremental revenue gain from sequential testing across all of our tests, and we could report back that not only did we save 63% time-wise, but that this method had increased revenue by 38% relative to the methods that we replaced.

Closing summary

So in my closing summary, you can be among that 20% of data science projects that go to production if you influence your leaders with accessible terminology, if you win over your colleagues with accessible design, measure and report out regularly on your progress. And there's one other thing, and that's don't go it alone. There are no 10x engineers. There are no 10x data scientists, but there absolutely are 10x partnerships. So if you want to run 10 times as far and 10 times as fast, find a great running partner, find a great coach, and then form a great running club.

There are no 10x engineers. There are no 10x data scientists, but there absolutely are 10x partnerships.

Thank you all for attending today.

So you did a really lovely job of walking us through that entire experience. If you did that again, what would you do differently?

Before you launch, make fake data representing everything that you're expecting to come into your pipeline. This is for the tracking piece, that third piece. And then just make sure that all of the fake data is making it perfectly into the lake, exactly as you expected. This is so much easier. I would 100% delay release by a week or two to be 100% confident that all of the tracking is exactly as I expected. As you can guess, we didn't do that. And I can tell you that that one or two weeks easily cost us two to three months in trying to recover from just violated assumptions. Not statistical assumptions, just assumptions on how you thought the data was going to move. So yeah, run fake data through the pipelines before any release. It's like an ironclad rule now on future releases that absolutely nothing gets released until we're confident that the logging is in place.