Angela Bassa | Data science as a team sport | RStudio (2019)

Transcript#

This transcript was generated automatically and may contain errors.

So, when we are starting out in data science within an organization, within a larger organization, usually there's the first one. And the first one is doing all of the things. And just because a data scientist can do all of the things doesn't necessarily mean that a data scientist should do all of the things. But it's fine. It's fine because it's where a lot of serendipity comes from. It's a lot of where innovation comes from.

So there's nothing wrong with being the first one, with being the only one, with being the only one for years. But A, that only works as long as the product that your company is making money from is not data scientific, because otherwise there really should be a whole team doing this. But it's also because as more information is acquired, as more analyses are performed, there's greater knowledge that can be applied. And so it makes sense to start expanding on that team. And so what I'm going to talk about is how do you do that.

The thing that happens a lot when you have a single person doing all the things is you start having not just tech debt, but you start having foundational tech debt. You start having rickety, duct-taped systems put together that suddenly start having to have resilience tacked on top of, rather than thought through initially. And so one of the things that tends to take place in an organization when you reach this point is that you have to take things offline in order to start fresh, or you have to refactor things while they're burning to steal the animated GIF that JD had earlier with the weight lifter in the middle of the fire.

And that's very true. What you don't want to do is reach that point where you have to take your entire system offline for, God forbid, days, but even hours, so that you can fix all of these duct taping that has taken place, because the data scientist whose role really should be analyzing information and communicating that information, iterating on that information, and coming up with novel solutions to problems, what they're doing is they're building structures, and a lot of the times data science and data engineering become synonymous when they really aren't. They're different skill sets that a team needs to have.

When to grow the team

And so when do you grow? If you're the only data scientist in a team, when do you tell other people with hiring budgets that you need help, and how do you convince them? There's three things that are telltale signs. The first one is scope, when there's too much to do for one area of expertise or for one pair of hands. When you've reached a level of maturity that it makes sense to introduce additional expertise into the team, and also to increase speed. And here's the kicker. The last one doesn't happen.

Every time you add people to an organization, you add complexity. And when you add complexity, what you're actually doing is you are improving your ability to handle an expanded scope, you are improving your ability to mature as an organization, and you are by definition slowing down. So unless you are increasing the quality of your underlying systems just by adding humans, you're actually not going to solve your problem. So as you're thinking about expanding a team and working together as a team, you have to realize that as hard as it is to use sophisticated data analytical tools to reach conclusions, getting humans to work together is at the very least no easier.

getting humans to work together is at the very least no easier.

So how do you do it? How do you get to a point where you're not being penalized velocity-wise by having an expanded team? You do three things. You add specialization, you add process, and you add resilience. So what do I mean by that? By adding specialization, I mean that you want to have different people doing different things and doing the things that spark joy.

So you want data scientists sciencing on data. You want data analysts building analytical pipelines that are going to be monitoring your systems. You want data engineers to do the data engineering and to lay down the foundations that you're going to be using in your analysis. You want to increase the presence of operations talent, which is undervalued as far as I'm concerned, but incredibly important for smart organizations. And you want to have somebody who really truly understands the product, the financials. What is the pain point of the customer? What is the journey that that customer is going through?

You should not protect your teams from failure. You should prepare them for it.

And you know what? Bits flip. I would not have believed unless it had happened to me personally. But things go haywire when they shouldn't have gone haywire for any other reason. Which brings me to... Should you only hire experts? No. No, no, no, no. Have tons of interns. Have tons of people who have never done this before, and this is their first foray into data science. Because the luxury of ignorance is that these folks who are extremely junior don't know what they should know. They don't have any of the assumptions that are subliminal and are baked into the way that we think, because it's the way we've always thought. Because the world moves, but our assumptions tend to be sedimented. And so as you bring in new talent, they know to question things that you've forgotten to question again.

The other thing that I would recommend, if nobody has read the book Chaos Monkeys, about the resilience of systems at Netflix and how there are systems that are turned off on purpose just to make sure that things can still function when part of the system goes down. This is true for people. Maternity leave, paternity leave, vacation. Your business should continue to function if somebody needs to go to a conference. So you should allow it. You should foster that kind of mentality, and you should use it as a test for the resilience of your own organization.

Where should the team live?

So I'll close with a couple of thoughts. Where should this new team live? I mean, if you're a single data scientist, you're probably embedded in a part of some organization, but where should it live? I've been doing this for many, many years. I'm not going to tell you how many years. I have really great skin care. But it doesn't matter. It really doesn't matter. And I'm happy to get into a Twitter conversation about this with somebody who disagrees with me. Because I have worked in data science within the finance organization. Financial operations, IT, engineering, R&D, software. And if you're doing good work, and you're solving problems, and you're having people address these problems in new, innovative, creative, and competent ways, this is the least of your problems. And usually this is where everybody gets hung up.

Because when you're going from one to many, the importance is not the nodes. The importance is the edges. The importance is how people, humans, communicate and interact. And that's where a lot of the knowledge that you're leveraging in your organization comes from. But the thing to remember is teams don't scale. And so once you reach a sort of critical mass where you're not being able to have that trust anymore, because those edges are frayed and stretched, it's time for a new team. And at that point, you add specialization, and you add process, and you add resilience again.

If you've seen me talk before, A, you know that I give the caveat about my voice breaking every time. But you've also seen me talk about dynasties and intellectual inbreeding, and there's a reason why the royal families of 15th century Europe looked the way that they did. There's also meritocracies and survivorship bias, and just because you made it doesn't mean that that's the only way to make it. And lastly, I don't know if anybody here has heard about super chickens. But if you haven't, come find me after, because it's a really cool story.

In essence, when you take a whole bunch of chickens that are amazing at what they do, they kill each other. They don't make each other better. Because excellence isn't binary. Excellence varies. And your job as you build out teams is to foster an environment where people can be excellent, but where they have the psychological safety to not have to be excellent. And they'll surprise you. And so with that, I know I'm the thing between you and lunch. So thank you very much. And if you have any questions, I'm happy to answer.

And your job as you build out teams is to foster an environment where people can be excellent, but where they have the psychological safety to not have to be excellent.

Q&A

Now we have time for a couple of quick questions for Angela, if you have any. We still have the mics around.

Oh, a couple of questions. Hi. Thanks for the presentation. It was great. Thank you. So what is your team, and how long did it take you to assemble it to where it is today?

So for confidentiality reasons, I can't disclose too much about specifically how iRobot is organized and does its business. But I can talk in generalities, and you're smart enough that you can read in between the lines. So I think the teams that I have managed and built that are most successful tend to have about five to 10 data scientists of all walks, junior to senior, and about three data engineers to start is a really good size. And this is for a $2 to $5 billion market cap company, for context.

Hi. If you are currently the only data scientist within an organization, and you are doing all of the things, how do you make that business case that you need to bring in more people if you're already getting beyond the budget discussion of hiring more people? How do you make that case without just stopping doing one of the pieces and saying you need someone else to do it now?

The best way that I have found to make that case is to highlight the opportunity cost of not doing it. If you're only saying how much it'll cost and how good it'll be, that's one avenue, but that only gets you half of the way there. I think the important thing to highlight is by not doing this, this is the revenue we're foregoing because that number is going to be much bigger. Here are the insights that we are failing to see. Here's where burnout is taking us. Here is how we've gotten to a point where the scope of the responsibility is so big that we're doing the minimum viable solution, and so those systems are not resilient. The likelihood, the risk-adjusted value of that decision, if a catastrophic event were to happen, is taking the systems down for X many weeks, and that's the cost of that. So highlighting the cost of not expanding when it's evident that an expansion is warranted, to me, has been a more fruitful conversation and has led to better results, for Angela's definition of better, than highlighting what the cost would be for an extra FTE.

I think we have one more quick one here. How much butch back do you get on these ideas, and who would freak out the most?

That's a really interesting question. There's two ways to answer that. I am privileged in that I get to choose to work for people who agree with my management philosophy, so how much pushback do I get? I don't get pushback because I filter employers out who would have given me pushback, so I'm very transparent in the interviewing process for myself to not get myself into that situation. Not everybody has that, and there is substantial pushback, especially in organizations that look homogenous, so not everybody can choose not to work there, but I have also found that making the argument monetary reduces the amount of friction, so if you say, and I'm not saying this happens at iRobot because I filtered that out, but were I to say we are building robots for software developers, we're not building robots for homemakers, for people who want to maintain a home, and so we're making decisions about how these robots operate that don't match the pain points of the people who would be paying us for these robots. Again, this is not true, but that's the kind of argument that can help open up eyes for people who see it monetarily as foregoing revenue.

Angela Bassa | Data science as a team sport | RStudio (2019)

Transcript#

When to grow the team

Adding process

Adding resilience

Being clueless and embracing uncertainty

Where should the team live?

Q&A

Featured software#

rstudio