
Failure (and Mistakes) (Laura Gast, USO) | posit::conf(2025)
Failure (and Mistakes) Speaker(s): Laura Gast Abstract: In a field driven by precision, the power of failure is often overlooked. This talk digs into the paradoxical benefit of error in data science, drawing on high-profile missteps in data handling and personal anecdotes of falling short. Using examples from errors big and small leading to impacts big and small to the everyday misinterpretation or misuse of data that happens everywhere, we’ll focus on how to get the best out of failure. While some level of error is inevitable in data science, the most resilient and forward-thinking teams realize that errors can drive innovative and creative solutions that may not have been discovered if everything had gone as planned. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Magic. Okay. So, we've heard three great talks, and all week we've heard great talks about people failing their way into success, right? And so, I want to talk about failure itself, not just about how we fail to success. So, in my now almost 20-year career, I have failed a lot more than I have succeeded. Like, a lot more than I have succeeded. And this happens for all of us when we're tackling these really interesting or new or complex or stubborn problems, right?
And I've learned, as I've moved from the more hands-on individual contributor role into work leading on data governance in some particularly messy situations, that I've had to think about not just the failures, but how we fail in these situations. And as I've worked through this, I've identified kind of core tenets of failure. And we'll go through those really quickly. I think, first, it's safe to say failure is inevitable, right? And I don't mean this fatalistically or pessimistically. I mean, like, cool. Failure is inevitable. Run with it, right? We're not building easy things. We're not tackling easy problems. We're taking big swings. And so, failures are going to happen. And the second principle is that failure is helpful. Failure is how we learn. The best discoveries are born out of failing, right? The reason we have sticky notes is because someone failed miserably at making a permanent glue. The reason we have artificial sweeteners is because someone failed so badly at making anti-ulcer medications and they didn't wash their hands before having lunch, right? Big failures. Cool things that came out of it.
And I'm going to add a third one here. And that is that failure is a bias. And what do I mean by that? So, failure is the way failure is a bias is the way that you see failure is shaped by your experience, your role, your tools, your teams, and your culture. So, it's by nature it is distorting, not just framing.
The healthcare.gov case study
So, I'm going to do an example on this of a big public failure, and that will be the healthcare.gov website. Some of you may remember it. For those that either don't remember it or were in public health and have blacked it out of your memories, the healthcare.gov was a federal insurance marketplace launched in 2013. And its goal, its main aim, was to be a one-stop shop for Americans looking to compare, review, and purchase individual health insurance under the Affordable Care Act. Instead, the rollout was what could only be described as catastrophic. To give you an idea, on day one, 4.7 million unique users logged on. Only 12 people were able to click the link that said I want to start an application. 12 out of 4.7 million.
So, if you look into the ad nauseum levels of review of why this happened, you start to see that each person involved or each person not involved but looking at this had a different perspective and a different frame and a different blame. So, you know, the engineers are going to see failures in trying to tackle these large, complex, technically difficult systems to build this site, and the project managers are seeing missed deadlines that are resulting from really compressed timelines and a lot of uncertainty. The GAO, or the finance departments, are seeing blown budgets, the results of high-risk contracts. You'll see these, you know, the senior leaders are seeing failures of communication and coordination resulting from the lack of a central leadership and not-coordinated senior leaders, and users felt frustrated, let down, annoyed, or outright scammed, right? So, this is the same failure but with completely different lenses.
So, not every failure is going to come with wall-to-wall press coverage and a dozen congressional hearings and get written into textbooks for us all to study later. Most failures just fade away. So, we have to think about which failures become the stories that we tell, and how do those stories shape what failure means to us and to our community, and then what are the patterns of these stories help us find?
A taxonomy of failure
So, to answer these questions, you know, people have been trying to understand and categorize failure for as long as they've been trying to avoid failure, and there are a lot of different approaches. So, some groups organized by mechanism. So, how did it fail? Was it technical? Was it a user error? Was it a decision failure? Or by an effect? So, was this systemic, local, contained, catastrophic? Or maybe what did we learn from this failure? If you've ever heard of intelligent failures or black swan failures or complexity-induced failures, and these are all incredibly useful ways of mapping a failure in any given arena, right, in any space, and I've used these and other models as I've mapped failures before, and some may be stickier for you in different places, and I've found one that works for me at kind of a 30,000-foot level to categorize all the failures, and that's what I'm calling a taxonomy of failure, and I've broken it down into kind of three flavors, as it were, three buckets. So, structural, symbolic, and ambient failure.
So, structural failure, what do I mean by that? I mean catastrophic brittleness that shows up only under stress. You may know the XKCD comic, this one, where, you know, all of modern digital infrastructure is resting precariously on a dependency for a project some random person in Nebraska has been thanklessly maintaining since 2003. So, you know, that's the idea here, but I'm going to give a case study that some of you in this room might have been involved in, which was the 2022 Southwest Airline holiday meltdown, and to explain why this was a specific type of structural failure, I first have to explain how flagships and budget airlines wrapped.
So, flagships are like United, Delta, or American, the big airlines, they operate what's called a hub and spoke model, right? So, they centralize at their big hubs, O'Hare is a United one, Atlanta is a Delta hub, and they have extra staff, extra planes there, so that they can be really guaranteed to get all their flights on time. Budget airlines, like Southwest, JetBlue, Spirit, are operating what's called a point-to-point method. So, you know, they decentralize their crews, and they assign them to like a tour, and this enables them to, A, not rent out massive amounts of space at the biggest airlines, you know, O'Hare, it's expensive to rent space there, but also they can serve more smaller market locations by getting more planes there. So, there are pros to this approach, as well as cons.
So, what happened in 2022 was primarily the Denver airport went down because of ice. So, basically, the whole day, Denver is out of commission. There were a couple other small ones, but none of the major hubs in the United States went down. So, for the flagships, they're mostly fine. They continued the routes to airports that were open, and as a weird kind of bonus, they had some stranded staff at, you know, their hubs at O'Hare, an extra plane or two where they could get some more people moving. So, they did, you know, pretty fine. But on the point-to-point methods, one canceled flight propagates the delay through the network, because each leg is dependent on aircraft and crew available from previous legs of the tour, right? And because this model decentralizes staff and crew and materials, airports aren't they don't have standby available to go to those next points.
So, this should have been recoverable. You notice that this point-to-point method, Spirit Airlines, JetBlue, they had no problems. They came back online, you know, kind of slowly with everybody else, but Southwest melted down, just straight up melted down. And that's because Southwest was operating with a really outdated staff distribution system software. And that meant that they really just had no idea where their pilots were, where their flight attendants were. Flight attendants were going to their app on their phone and trying to check, okay, I'm stuck in Cleveland, now what do you want me to do? And the app wouldn't load. They'd call their help desk, and they would be on hold for over 24 hours waiting to figure out what city they were to go to. They would wake up in the morning and see, oh, I'm assigned, it's 11 a.m., and I'm assigned to the 3 p.m. to Cleveland. They would go to the airport, get there at 1, and be told at the airport, oh, that Cleveland flight got canceled yesterday.
Right? So, that breakdown resulted in about 70% of flights from Southwest being canceled over a week. Reminder, even the other budget airlines came back up pretty quickly about in line with the flagships. So, they had about it ended up being about $800 million in losses in quarter four alone for Southwest. And then in 2023, when the DOT investigated and specified this was a business failure, not weather, they had to pay out another $740 million in fines, reimbursements, refunds, et cetera. So, that meltdown was catastrophic because of brittleness, right? But the brittleness was already there. It was just an accepted failure before. It was known. We knew this was broken, but it was acceptable. This is just the year before, 2021. April and October. Southwest was called out for having particularly high cancellation rates. And if you read into these, it's actually quite funny because you'll find notes at the bottom of these articles saying Southwest canceled 10%, but other airlines only had to cancel 2%. Very interesting. Even other budget airlines. So, their framing of accepted failure and tolerating this thing, this broken software system, led them to a situation where the system can bend, the system can bend, the system can bend, and the system broke. So, that's structural failure. Brittleness that hides until the system collapses.
Symbolic failure
But not every failure is going to bring your system down. Some are going to fail loudly in public view and the damage is reputational more than operational. And I call that symbolic failure. Loud public and reputational damage. And for that, we're going to talk about the metaverse. So, depending on your perspective, the metaverse is either a bold vision for the future of the Internet or the best tech joke of the last decade. So, Meta poured tens of billions of dollars into this project, and what they got out of it were headlines and endless social media posts about legless avatars and boring experiences. It was parodied endlessly. The system didn't break. This is still a good product underneath it all. It just failed publicly.
So, symbolic failure is not just about being laughed at. It's what that ridicule does. So, in the metaverse, leaders assumed because they could envision this grand future, everyone else would see it, too. And hey, our reputation is technically solid. We have widely adopted popular social media products, and that biased them into believing that that stability and that popularity meant they couldn't really fail with this new thing, even though it's bigger. It's a bigger, it's a different thing. As long as the system doesn't collapse, we're going to be a success. Our vision can't fail. Unfortunately, the next time someone else tries something like this, the next big swing in this area, they're going to have to fight the ghosts of metaverse's failure. We're all going to be biased against this idea because of what happened with metaverse this specific time.
We're all going to be biased against this idea because of what happened with metaverse this specific time.
So, that's the symbolic failure, right? It's not about crashing planes or, you know, servers breaking or what have you. It's losing that credibility, and once the credibility erodes, it's really hard to rebuild.
Ambient failure
So, that's the spectacle of failure, and not every failure is that way. Some failures are creepy, insidious, and invisible. They don't break all at once. They erode over time, until suddenly one day you just go, wait, what? This isn't right. That's an ambient failure. And for this, I want to talk about Google flu trends. And that launched in 2008, and if you worked in public health at the time, it seemed like magic. They were able to get predictions for outbreaks of the flu weeks before the CDC saw it coming with their traditional clinical surveillance models, and they did this by tracking search terms for things like flu symptoms or pharmacies near me or how do I treat a fever or ordering tissues, what have you. And, you know, this was it looked like the future. If you were in public health at this time, you were like, oh, this is great. We're going to track the social movement of something before it shows up in our hospitals. This is great. It was a bold vision, and it was a sea change at the time.
But unfortunately, what happened is that that started to drift. The system started to move to be less reliable and seem weird, and because it was so successful at the beginning, everyone was like, well, it has to be true. What are we missing on the clinical side then? That became the pivot of, like, this can't be wrong, so this has to be wrong. And so it was offset, like, up to 140%. It almost more than doubled the expected cases of a week compared to the CDC surveillance. And the reason for that was because the definitions underneath were changing. So it used those Google search trends, right? But people, the media started to talk about the flu more. We had, you know, big conversations about other places, and people were looking up those articles, right? People were, you know, the algorithms that Google used itself started changing. And so it was no longer measuring people looking for care. It was measuring people paying attention and hyping up flu. And so no longer were we looking at what we thought we were.
We didn't change anything as the Google flu trends team. Not we. I was not a member. But underneath, the definitions changed without us understanding. So there was no day when it went bad. It just slowly got worse. No one continued using it. They all moved away from it, and then one day they just turned it off in 2015. And so this is what makes ambient failures insidious, is that they rot invisibly. They are hidden to you unless you're really paying attention, and by the time you notice, the damage is embedded in your system. You are already making bad decisions on it. You are already misplacing your confidence. You have already lost opportunities because it rotted from underneath. The bias here was that overconfidence in the first, you know, those first years when it looked so good, as it started to drift away, like I said, it can't be the flu trends. It's got to be our clinical surveillance that is incorrect. And then we write it ourselves later, but few people thought to ask, as soon as that ship started turning, is the data still measuring what we think the data is measuring? Few people ask that. So that's what I call an ambient failure. It's not loud, it's not spectacular, but it is corrosive, and sometimes this can be worse than a sudden collapse. Because it fools you into thinking everything is okay, and you're making decisions on that okay, but it's not.
It's not loud, it's not spectacular, but it is corrosive, and sometimes this can be worse than a sudden collapse. Because it fools you into thinking everything is okay, and you're making decisions on that okay, but it's not.
How the three failure types interconnect
So these types of failure, my three types of failure, don't live in isolation. They are interconnected, and there is a trap, and when you overprotect against one, you are creating exposure to another. So Southwest, they optimized their efficiency, and they allowed for some forgivable failures, and then the system collapsed. Or meta, when they took great pride in their structural, you know, they were protecting against a structural failure, they took a big swing with this public promise, and they suffered a really bad failure in that symbolic world. And then Google flu trends was the reverse, is that the system didn't snap overnight. It slowly drifted. Those proxies were incorrect. So in each case, bias played a role. The leaders never ignored the list. Or the risk. They were just picking the ones they thought were going to lead to the worst outcomes, given what they knew. And that left them exposed.
So that brings us to today. We're entering this world with AI, making risk decisions for us, and it is already doing that. And they're not reflecting just our personal biases, but they are compounding our personal biases, right? They're automating them, they're scaling them, and do we have any input into them? Do we know all of the places where AI is making those decisions for us? Do you know where you're AI? You keep talking that out to your LLM, you're querying your results, but you're querying your results based on your history of failure and the things that you are worried about most.
So AI is going to start to break things in all three of my categories in new ways, and in ways that we have not yet imagined. As is the history of failure. We have imagined failure in various different ways, all the way from moral failing to normal accidents to our high efficiency systems where nothing can fail, all the way to move fast and break things and into wherever we're going now. And it's really important to understand that the challenge isn't whether the failure is going to happen, it's whether we're going to recognize which failures we're inheriting, which ones we're creating, and which ones we're ignoring or we don't see coming. So I don't want you to walk out of here thinking that failure is something to fear. I think we heard a great keynote this morning that, like, don't be afraid of failure. Failure and running headlong into it is actually a good thing, because that's how you learn. But paying attention to your own historical bias, your organizational bias, and your societal bias, it doesn't mean eliminating it, it means paying attention to it and getting better at learning from it. And so I'm going to leave you with a Ben Franklin quote that I really like, is that perhaps the history of errors of mankind, all things considered, is more valuable and interesting than that of their discoveries. Truth is uniform and narrow, but error is endlessly diversified. And that's what makes it really cool. So thank you.
Q&A
Thank you so much. We have a couple of questions here. The first one, in the year 2000, Blockbuster both passed on a $50 million purchase of Netflix and launched a failed video on demand partnership with Enron. Do either of these business failures fall into your taxonomy?
Yes. So they I well, I'm going to talk about the Blockbuster one, because we're going to go that way. So they are big structural failures, because there's a big swing that brought something down. Rather than they do incorporate, like I said, they're all interconnected. That's why there's so many taxonomies of failure. It's impossible to put them all cleanly in one bucket. But taking that big swing, they didn't comprehend that where the pieces were. Their new tenure into on demand was that brittle piece, right? And that brought down the company. Had their move been part of the company, you know, like a metaverse, like we're taking a big swing, but we're not putting all our eggs in one basket, people would lose their jobs and it would not be a good thing, but it wouldn't bring down the system. And so I put that kind of in a structural.
What is the biggest failure you've seen related to LLMs so far, if you have seen any? Well, the first one that comes to mind is the what's the word, sycophancy, the chat GPT people using it as a therapist, and then we're just going to say to stay positive, bad results. And so there was, you know, an unintended consequence there of that people used things in a way that were never really predicted, and it ended up particularly horrible. And that's the biggest one I can kind of identify right now. And then there's all the data leakages that we're all prone to that horrify me and apparently lots of others. Right. Thank you so much.
