Resources

Angela Bassa | Data science as a team sport | RStudio (2019)

How do you data science as a team sport? Oftentimes a data scientific initiative starts with just a single, lonesome data scientist. But when that germ of a team is successful and starts expanding, should the team be embedded in other disciplines or should it be centralized into its own function? Where should it live in the organizational structure? Should you focus on recruiting senior data scientists or is there a benefit to attracting junior talent as well? And in terms of capabilities, should you hold out for unicorns or hire several specialists to get all jobs done? Data scientists need to work on almost every aspect of a business, so how should a team composition set the data science discipline up for success? Great data scientists have career options and won’t abide bad managers for very long: if you want to retain them, you’ll need to care about their work, connect it to the business, and design a diverse, resilient, high-performing team. Materials: https://github.com/angelabassa/rstudioconf-2019

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So, when we are starting out in data science within an organization, within a larger organization, usually there's the first one. And the first one is doing all of the things. And just because a data scientist can do all of the things doesn't necessarily mean that a data scientist should do all of the things. But it's fine. It's fine because it's where a lot of serendipity comes from. It's a lot of where innovation comes from.

So there's nothing wrong with being the first one, with being the only one, with being the only one for years. But A, that only works as long as the product that your company is making money from is not data scientific, because otherwise there really should be a whole team doing this. But it's also because as more information is acquired, as more analyses are performed, there's greater knowledge that can be applied. And so it makes sense to start expanding on that team. And so what I'm going to talk about is how do you do that.

The thing that happens a lot when you have a single person doing all the things is you start having not just tech debt, but you start having foundational tech debt. You start having rickety, duct-taped systems put together that suddenly start having to have resilience tacked on top of, rather than thought through initially. And so one of the things that tends to take place in an organization when you reach this point is that you have to take things offline in order to start fresh, or you have to refactor things while they're burning to steal the animated GIF that JD had earlier with the weight lifter in the middle of the fire.

And that's very true. What you don't want to do is reach that point where you have to take your entire system offline for, God forbid, days, but even hours, so that you can fix all of these duct taping that has taken place, because the data scientist whose role really should be analyzing information and communicating that information, iterating on that information, and coming up with novel solutions to problems, what they're doing is they're building structures, and a lot of the times data science and data engineering become synonymous when they really aren't. They're different skill sets that a team needs to have.

When to grow the team

And so when do you grow? If you're the only data scientist in a team, when do you tell other people with hiring budgets that you need help, and how do you convince them? There's three things that are telltale signs. The first one is scope, when there's too much to do for one area of expertise or for one pair of hands. When you've reached a level of maturity that it makes sense to introduce additional expertise into the team, and also to increase speed. And here's the kicker. The last one doesn't happen.

Every time you add people to an organization, you add complexity. And when you add complexity, what you're actually doing is you are improving your ability to handle an expanded scope, you are improving your ability to mature as an organization, and you are by definition slowing down. So unless you are increasing the quality of your underlying systems just by adding humans, you're actually not going to solve your problem. So as you're thinking about expanding a team and working together as a team, you have to realize that as hard as it is to use sophisticated data analytical tools to reach conclusions, getting humans to work together is at the very least no easier.

getting humans to work together is at the very least no easier.

So how do you do it? How do you get to a point where you're not being penalized velocity-wise by having an expanded team? You do three things. You add specialization, you add process, and you add resilience. So what do I mean by that? By adding specialization, I mean that you want to have different people doing different things and doing the things that spark joy.

So you want data scientists sciencing on data. You want data analysts building analytical pipelines that are going to be monitoring your systems. You want data engineers to do the data engineering and to lay down the foundations that you're going to be using in your analysis. You want to increase the presence of operations talent, which is undervalued as far as I'm concerned, but incredibly important for smart organizations. And you want to have somebody who really truly understands the product, the financials. What is the pain point of the customer? What is the journey that that customer is going through?

Adding process

The other thing that you can do is you add process. So this is something that I harp on quite a bit. Have a documentation party. Take a day, move everybody to an offsite location so that they don't get distracted, and provide pizza or pastries or any other kind of healthier options. And get together to explain what it is that has been going on. It's a lot easier when you have everybody in a room going, like, what was the context of that? What were we thinking about? Why did that decision get made? What were the constraints that we were operating under?

Because one thing that happens quite a bit is there's a lot of judgment that happens after the fact, where somebody will go, why didn't they just do this? Or why the heck would anybody have done this in way A instead of way B? And the reason usually is very sensible, because we are all very smart nerds, all different kinds of nerds. But we tend to be smart nerds, and we don't do things for stupid reasons. So document those reasons so that you know how to modify those processes to fit the purpose as that purpose evolves.

Things like authentication, things like governance, provenance is also really important. So as your data traverses through a system and gets manipulated, modified, and used in different ways, you want to add information, metadata, about that data as it goes through, so that you use the correct piece of data at the correct point in the pipe for the reasons that you need. And also automation. I mean, I think JD also said it earlier, and I can't underscore it enough. Whenever humans touch data, they breed errors. They introduce confusion. And automation is great at catching those things. Automation is not great at many things, but this is one of the things that you really want to leverage it for.

Adding resilience

And then the third step is adding resilience. So have a well-thought-through hiring process. This is literally one of the most important things that you can do as both a team member and a team manager, is making sure that the people that you are hiring to work with you or that you are convincing to work with you in your volunteer endeavors, you don't want friends. You want people who could be friends with you, but the example that I have is I have somebody that I have worked with for years now, and upon meeting him, I recoiled. I was like, why? Who chose to hire this person? Are they out of their mind?

Because A, I wasn't mature enough to understand the value of what I'm about to say, but also because I think that person would never have hired me, because we have very different personalities. But what we have is a drive to do our jobs and to do it well. All of the different ways that we clash come from a very good place, and we are now able to understand that any kind of conflict that arises is not a conflict of character, but it's a conflict of understanding or a conflict of purpose. And those are things that you can iron out. So as you are hiring in your organization, hire people that could become friends, but that you wouldn't expect to become friends. Hire people who are competent, who are smart, who are ethical, and who are interested in doing what you're doing.

Onboarding is another really big thing. Just know that you have problems and then hire somebody and say, good luck, here's a third of a wiki that is aspirational and doesn't actually reflect the systems that we have and ask me questions, because they won't know what questions to ask. So have a methodical onboarding process that at least gets somebody set up for, if not the first six months, at least the first three months. And that gives them context to know what to ask.

Culture is really important. So I have several friends of mine who have been killing it, and so they're now managers. And I've been doing this management thing for about five years now. So I wouldn't call myself an expert by any means. But I've made a few mistakes that I can steer them away from. And one of the things that one person came to me and said, you know, I don't like the culture that my team is developing. I don't like what's happening. And I asked him, have you tried to instill a culture, or are you just hoping that a nice culture happens?

Have you guys gone out to lunch? Have you done any breaking of bread? Have you talked about things that are not work-related? Because the only way to have a really constructive and additive relationship work-wise is to have trust. And the only way to have trust is to have trust. It doesn't magically happen just because somebody signs a W-9 form. You have to work at it. You have to be intentional about it. So set up situations where you are building a team. And I'm not saying go do trust falls if you think those are ridiculous. I don't think they're ridiculous, but you may very well. But there is something that matches the culture that you're trying to build, and build it. Don't expect it to just magically take place.

This is both about rules. But as we've learned politically over the last two years in this country, it's also about norms, things that aren't explicitly written out but that are important to you. Say those out loud and set those as expectations. And the last thing is diversity and inclusion. So don't just have people who went to the same schools that you did, who came from the same backgrounds that you did, who ask the same questions that you do, who understand the world as you do, who are part of the same systems as you are.

So for instance, in my role, we're a consumer hardware, software, and data company. And I'm here in a personal capacity. But for those of you who don't know about me, I run data science at iRobot. And one of the really important things is what kind of household do we want having our product in it? And do we have people developing that product and thinking about that product in ways that reflect those households? Because guess what? The people who are buying our product don't all look like the people who think computer science is fun. Because otherwise, they would all be working for iRobot instead of buying iRobot products.

And not just having that diversity of experience, of personality, and of thought, but including not having one person who thinks different who is now responsible for reflecting all of the nuance that come from that background. So you want to make sure that that person feels like they're part of the organization and they're not just there to represent that which the organization is not.

Being clueless and embracing uncertainty

So all this brings me back to the point of what it means to be clueless. And we are all clueless. I'm going to say that again. We're all clueless. Because there are more ways to think about things than there are atoms in the universe. This I've not read anywhere, but I am a firm believer in it. And I am only an expert in like three of them. And I am incredibly clueless about like 98,397,037,004.

So how do we know that we are ready to ask the right questions? Well, we're not. And I think being that vulnerable and having that humility to understand that there are things that you are going to hire for and there are things that you're not going to hire for and be mindful that those things are true. Not putting them behind a dark wall and not thinking about it and hoping for the best, but being mindful that those things will take place and learning from them.

This is a chart by David Whittaker. And I think a lot of you probably have seen this before. It's about imposter syndrome. And I don't want to talk about imposter syndrome. I think this is a very valuable chart in that sense. But I want to co-opt this and take it in a different direction. Because the blue is what we know. And the yellow is what we don't know. And what this chart is saying is we tend to assume that a single person knows all of the yellow when, in fact, you have lots of people who know lots of yellows. And I think the point that I want to make is that's why you want lots of people with lots of intersecting, overlapping, but not perfectly on top of each other yellow circles.

This is the reason for diversity is so that if you have a lot of people who have similar experiences, backgrounds, and expertise, there's going to be a whole host of yellows that you're never going to be aware that you should be looking at and investigating and investing in.

Which leads me to this point. You should not protect your teams from failure. You should prepare them for it. Because if you have a single data scientist, there's a lot of yellow that's missing. If you have two data scientists, there's still a lot of yellow that's missing. And I'll tell you what, if you have 100 data scientists working for you, there's still a lot of yellow that you're missing. Fewer and fewer shades and sizes of yellow, hopefully, as you improve on your processes. But still, there's always going to be something that is overlooked.

You should not protect your teams from failure. You should prepare them for it.

And you know what? Bits flip. I would not have believed unless it had happened to me personally. But things go haywire when they shouldn't have gone haywire for any other reason. Which brings me to... Should you only hire experts? No. No, no, no, no. Have tons of interns. Have tons of people who have never done this before, and this is their first foray into data science. Because the luxury of ignorance is that these folks who are extremely junior don't know what they should know. They don't have any of the assumptions that are subliminal and are baked into the way that we think, because it's the way we've always thought. Because the world moves, but our assumptions tend to be sedimented. And so as you bring in new talent, they know to question things that you've forgotten to question again.

The other thing that I would recommend, if nobody has read the book Chaos Monkeys, about the resilience of systems at Netflix and how there are systems that are turned off on purpose just to make sure that things can still function when part of the system goes down. This is true for people. Maternity leave, paternity leave, vacation. Your business should continue to function if somebody needs to go to a conference. So you should allow it. You should foster that kind of mentality, and you should use it as a test for the resilience of your own organization.

Where should the team live?

So I'll close with a couple of thoughts. Where should this new team live? I mean, if you're a single data scientist, you're probably embedded in a part of some organization, but where should it live? I've been doing this for many, many years. I'm not going to tell you how many years. I have really great skin care. But it doesn't matter. It really doesn't matter. And I'm happy to get into a Twitter conversation about this with somebody who disagrees with me. Because I have worked in data science within the finance organization. Financial operations, IT, engineering, R&D, software. And if you're doing good work, and you're solving problems, and you're having people address these problems in new, innovative, creative, and competent ways, this is the least of your problems. And usually this is where everybody gets hung up.

Because when you're going from one to many, the importance is not the nodes. The importance is the edges. The importance is how people, humans, communicate and interact. And that's where a lot of the knowledge that you're leveraging in your organization comes from. But the thing to remember is teams don't scale. And so once you reach a sort of critical mass where you're not being able to have that trust anymore, because those edges are frayed and stretched, it's time for a new team. And at that point, you add specialization, and you add process, and you add resilience again.

If you've seen me talk before, A, you know that I give the caveat about my voice breaking every time. But you've also seen me talk about dynasties and intellectual inbreeding, and there's a reason why the royal families of 15th century Europe looked the way that they did. There's also meritocracies and survivorship bias, and just because you made it doesn't mean that that's the only way to make it. And lastly, I don't know if anybody here has heard about super chickens. But if you haven't, come find me after, because it's a really cool story.

In essence, when you take a whole bunch of chickens that are amazing at what they do, they kill each other. They don't make each other better. Because excellence isn't binary. Excellence varies. And your job as you build out teams is to foster an environment where people can be excellent, but where they have the psychological safety to not have to be excellent. And they'll surprise you. And so with that, I know I'm the thing between you and lunch. So thank you very much. And if you have any questions, I'm happy to answer.

And your job as you build out teams is to foster an environment where people can be excellent, but where they have the psychological safety to not have to be excellent.

Q&A

Now we have time for a couple of quick questions for Angela, if you have any. We still have the mics around.

Oh, a couple of questions. Hi. Thanks for the presentation. It was great. Thank you. So what is your team, and how long did it take you to assemble it to where it is today?

So for confidentiality reasons, I can't disclose too much about specifically how iRobot is organized and does its business. But I can talk in generalities, and you're smart enough that you can read in between the lines. So I think the teams that I have managed and built that are most successful tend to have about five to 10 data scientists of all walks, junior to senior, and about three data engineers to start is a really good size. And this is for a $2 to $5 billion market cap company, for context.

Hi. If you are currently the only data scientist within an organization, and you are doing all of the things, how do you make that business case that you need to bring in more people if you're already getting beyond the budget discussion of hiring more people? How do you make that case without just stopping doing one of the pieces and saying you need someone else to do it now?

The best way that I have found to make that case is to highlight the opportunity cost of not doing it. If you're only saying how much it'll cost and how good it'll be, that's one avenue, but that only gets you half of the way there. I think the important thing to highlight is by not doing this, this is the revenue we're foregoing because that number is going to be much bigger. Here are the insights that we are failing to see. Here's where burnout is taking us. Here is how we've gotten to a point where the scope of the responsibility is so big that we're doing the minimum viable solution, and so those systems are not resilient. The likelihood, the risk-adjusted value of that decision, if a catastrophic event were to happen, is taking the systems down for X many weeks, and that's the cost of that. So highlighting the cost of not expanding when it's evident that an expansion is warranted, to me, has been a more fruitful conversation and has led to better results, for Angela's definition of better, than highlighting what the cost would be for an extra FTE.

I think we have one more quick one here. How much butch back do you get on these ideas, and who would freak out the most?

That's a really interesting question. There's two ways to answer that. I am privileged in that I get to choose to work for people who agree with my management philosophy, so how much pushback do I get? I don't get pushback because I filter employers out who would have given me pushback, so I'm very transparent in the interviewing process for myself to not get myself into that situation. Not everybody has that, and there is substantial pushback, especially in organizations that look homogenous, so not everybody can choose not to work there, but I have also found that making the argument monetary reduces the amount of friction, so if you say, and I'm not saying this happens at iRobot because I filtered that out, but were I to say we are building robots for software developers, we're not building robots for homemakers, for people who want to maintain a home, and so we're making decisions about how these robots operate that don't match the pain points of the people who would be paying us for these robots. Again, this is not true, but that's the kind of argument that can help open up eyes for people who see it monetarily as foregoing revenue.