Resources

Panel | Growth & change of careers, organizations & responsibility in data science | RStudio (2019)

Hosted by Eduardo Arino de la Rubia, Instagram With: Hilary Parker, Data Scientist, Stitch Fix Karthik Ram, Data Science Fellow, UC Berkeley Angela Bassa, Director of Data Science, iRobot Tracy Teal, Executive Director, Carpentries About the Author Eduardo Arino de la Rubia Technologist and Data Scientist driven to create software that people use, find useful, and pleasant. From programming through architecture, from green field to maintenance, software is interesting technologically, socially, and intellectually. I enjoy contributing to the process, either through leadership or individual effort, of creating software that is deployed joyfully and is as bug free as possible

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

My name is Eduardo Arrino de la Rubia, and I am fortunate enough to be the moderator for your panel this afternoon, evening, I don't know what time zone I'm in, to be honest with you.

The goal of our panel is to provide you, the audience, our dear friends, with an understanding of how the data science leaders who have been kind enough to share their time with us today think about growth and change in careers, organizations, and responsibility in data science, and we're very lucky to have them.

For just a little bit about me, I have been involved in data science since before it had a cool name like data science, I've been fortunate enough to hire my first machine learning engineer in the late 90s, and ever since then I've had the incredible fortune of just supporting great teams, I'm currently a data science leader at Instagram.

Introducing the panelists

And so now I will be introducing the panel to you. Immediately to my left is Angela. For the past 15 years, she has helped make businesses intelligently use data to make decisions and trained others to do so. Her thoughts on data science management have been published in the Harvard Business Review and the Wall Street Journal.

Next we have Hilary Parker, she is a data scientist on the styling recommendations team at Stitch Fix. At Stitch Fix, she focuses on what sorts of data to collect from clients in order to optimize clothing recommendations, as well as building out prototypes of algorithms or entirely new products based on new data sources. You also might know her as the co-founder of the Not So Standard Deviations podcast, which she does with Roger Peng. It's a bi-weekly data science podcast that now has over half a million downloads.

Next we have Karthik. So Karthik Ram is a research scientist at the Berkeley Institute for Data Science and the University of California Museum of Paleontology at University of California Berkeley. Karthik is also the co-founder of the R Open Sci Project, lead of the U.S. Research Software Sustainability Institute, founding editor at the Journal of Open Source Software, and has served on the boards of various organizations in this space, including Data Carpentry, Many Labs, Libraries.io, and more.

And finally, way down at the end right there, we have Dr. Tracy Thiel. Dr. Thiel is a co-founder of Data Carpentry and now the executive director of the Carpentries. She received her PhD in computational and neural systems from the California Institute of Technology, was an NSF postdoctoral researcher in biological informatics, then an assistant professor in microbiology at Michigan State University. While an associate professor, she saw a researcher's need for effective data skills to effectively and reproducibly conduct research and co-founded Data Carpentry to scale data training along with data production.

Growing as a data scientist

So when I agreed to do this panel, I was shocked at the breadth that I was allowed to have. I really was not given any constraints on what to talk about. And that's both really freeing and a little terrifying. And I thought actually back to the previous panel that I was on at this conference, and I remembered that we had so many questions from folks about how to grow in data science.

You know, people like I want to get into data science, I'm a junior analyst, I want to become a data scientist, people asking about organizations, and I decided let's just do a panel entirely on growth.

So I'm going to kick it off with my first question. The career path in data science is different for all of us. What is the most important lesson you've learned about growing as a data scientist, whether it be your personal growth experience or how you've fostered growth in others?

I can start. So yeah, I mean, I think you're totally right that careers in data science, I think we finally hired someone who had a master's in data science recently. So out of 120 people in the data science or get stitch fix, there's one person who was like formally trained in data science.

And so I think that some of the most important things from my perspective are being really nimble with the tools that you're learning. I think that keeping like being very flexible with tooling is super important, because in the various different jobs, you're going to be tackling different problems. And there's really no space for being a purist in terms of like, the most important thing is being able to get a job done.

And so being a purist with your tooling and trying to get every problem into a very specific format just isn't going to happen. At Stitch Fix specifically, we have this philosophy of kind of a full stack data scientist, which is like obviously has pros and cons. But one of the big pros is that you end up having to create like scrappy solutions for almost everything. So like learning how to create ETL from web blogs all the way to like front end applications. So yeah, just like kind of having like a growth mindset and being really flexible is super important.

And then the other thing I'll say is that I actually think some of the more, I mean, I hate this term, but kind of like the softer skills are super important. So like, especially like cultivating empathy, I think is really important for the job and something that we don't talk about very much. But understanding users, understanding the people who are reading the analyses you're putting out there, I mean, that's in some ways the most important part because you can do a perfect data analysis, but if you're, you know, speaking a different language than the person who's consuming it, you might as well have not done it.

but if you're, you know, speaking a different language than the person who's consuming it, you might as well have not done it.

And so I think that's kind of an unsung hero of successful data science work.

A question for you about that. So I like that you were talking about not being a purist. How do you balance then, or what advice do you give for folks balancing actually creating breadth in the technologies they know with actually having enough depth in any one thing?

Yeah, I'm not sure I have a great, I don't know if I have a great answer to that, but I mean, certainly I'm obviously, I mean, for people who know me, I'm like obviously really into R. So I think that learning one language really fluently, I know that's advice that Hadley Wickham gives of like kind of like a depth in one language is very powerful. And I think a lot of the general concepts from one language to another translate really well. So I found it really effective to get a lot of depth in one language and kind of a deep understanding of like the work I'm doing through that.

And then being willing to be flexible and work in these other languages, like, you know, we probably all write SQL queries and we don't talk about it very much. I learned SQL best by learning dplyr and then being able to translate over.

Advice for junior data scientists

Angela, you know, I know you've managed data science teams, you've had junior data scientists. What advice do you give to junior data scientists who want to grow in their career?

I think I have extreme respect for junior data scientists, especially because being in the sexiest profession of the 21st century, yes, thank you for laughing, yes.

I think it made sense when DJ Patil wrote the article to have that articulation of the profession. But I think it's created a sense that it's really difficult to break into unless you have a PhD, unless you have two PhDs, unless you have two PhDs and a body of work and three published books and a collaboration with a Nobel laureate. And that's not true.

I don't have a PhD. And you stole my joke, but I've been doing data science before it had a cool name or before I had heard of said cool name.

And so I think, and I harp on this a lot. I think being a junior data scientist is a gift. It's a gift that you bring to any organization. And if you feel that joining an organization is too difficult because they're giving you a hard time because of how junior you are, I know not everybody has this privilege, but I would make you consider strongly whether or not you actually do want to join that organization.

Because what you bring is you bring the ability to question dogma. Because you ideally, by virtue of your tenure in the discipline, haven't been indoctrinated yet into the dogma. And so you bring a breadth of ignorance. And I say that in the best possible way. You don't know what's supposed to be done a certain way. You don't know what isn't allowed to be questioned anymore. And I think that's something that can be extremely useful for any organization.

And so I think that's not something to hide or to shy away from. I think the ability to be vulnerable and the ability to say I don't know are actually extremely valuable to me as a hiring manager. I actually have in my interviewing guide questions that are impossible to be answered. Or at least impossible to be answered succinctly in the time allotted. And that's because the thing that I'm testing is whether you have the humility to say I don't know. And whether you have the hubris to say but I can figure it out.

And that's because the thing that I'm testing is whether you have the humility to say I don't know. And whether you have the hubris to say but I can figure it out.

And that's the balance that I think is, is the balance that I would recommend younger folks in the discipline to try to strike.

One of my favorite data science executives was pushing back that his team only wanted to hire senior data scientists. And he was like, you really need to stop doing that. Junior data scientists are better. They don't know what is impossible yet.

Ruthless prioritization

The one thing that I'll add, I think these are all great answers. What do y'all think about ruthless prioritization? I think that as a junior data scientist, junior data scientists oftentimes in product teams are sort of like the target of product managers, of engineering managers. And they're thrashed in every which way. Do you have any great advice that you give junior data scientists on how to own their communication style or ruthless prioritization?

Yeah, I really care about this question because I think that so frequently as data scientists we're in these really reactive positions, especially as like a data analyst for a product team, especially if you're probably embedded on the product team. And I know like we talk a lot about kind of building tools to enable product managers and all of that. And I think that's important. But I also think that as a data scientist, you end up thinking about the holistic product in unique ways.

And I actually think that another way to solve that problem is to take on some of the responsibilities that a PM might traditionally take on. Like in many ways, I think that by thinking holistically about that system, by thinking holistically about the product, you can really guide decision making. And so, I mean, to your point about ruthless prioritization, I think that by taking a more proactive like non-defensive stance about really solving problems can put you into a place where you're really a genuine partner with a product manager rather than just kind of like they're the person who crunches numbers for them.

And if I could add to that, I think one of the really important things that a data scientist can do, one of the biggest sources of value is the translational step, is the ability to articulate a business problem into a quantitative answerable hypothesis and then testing that and iterating. And so, going back to your point about ruthless prioritization, the ability to define that prioritization, to understand what is being asked and not perform a task but perform a service, to understand what is the question, formulating a path to answer. And if that path is not appropriate, having that dialogue with the product management, with senior management, with whomever it is that is posing the business question or the academic question and working that muscle of that translational muscle, I think that leads to better prioritization.

Leadership vs. IC work

So, I'm going to move on to the next question, if that's all right. So, data science organizations, whether it be a small team or a large multinational, requires data science leaders. How do you think about how you choose to either become a lead yourself or how leadership is different from IC work or how we can do better to support leadership amongst our ranks? Tracy?

Well, so, I really appreciate the question around leadership because that's definitely one of my soapboxes, I have quite a few. But I think that there is really, like the talk this morning, where we know things about education and we should apply the things that we know about education when we think about teaching people about programming. We know things about leadership, about what makes effective leaders, that value people and create better products, better organization, that we have an opportunity to apply in the data science space.

But that we're not given opportunities to learn these skills, we're not necessarily given opportunities to grow these skills. And so, it is a real space where if we invest in thinking about leadership more strategically, more implicitly, that we can grow better leaders in data science. And I think so many people have the potential to be leaders that are doing it in small ways. You don't need to lead an organization to be a leader. If you're mentoring another person, you're a leader. If you're working on a small team, you're a leader.

So more people seeing themselves as leaders and approaching it in a way that values the people that they work with, I would love to see more of that and I think that investing in that would be really transformational in the data science space.

If you could go one click more specific, what would that investment look like?

Well, my secret project is that I'm not working on, but I would love to do. So as the Carpentries, our model is two-day workshops. I would love there to be something like a leadership carpentry, which is kind of a two-day thing about, I was talking to Hilary before, you know, what are the things that people don't know coming in about project management, about how to support people, how to be a mentor, interactive listening, facilitating a meeting, all like super exciting topics. I love all these things, but the impact of doing these things well scales so significantly that it's just one of those things that I think if we invested in would have such broad impacts.

But also just understanding, I think, an appreciation that you are a leader and invest in and learning more about that, looking to people who you see leading in a way that you want to lead, talking to them about that. How did they get there? What's their philosophy? What do they think about being more intentional in how we approach it and think about it?

Karthik, what do you think about the difference between leadership and IC work?

So I have a very nontraditional path into being a data scientist or sort of a data science organization leader. And so for me, I was trained as like a traditional academic. And so trying to manage a team and doing stuff to enable other people to produce data science work is not something that I'm trained to do. And so Tracy and I have all these conversations about, Jesus, I'm a terrible manager. How can I be better at this thing?

And we have, not just me and Tracy, but a handful of others talk about, how do we be better managers? And so this is like a constant and ongoing struggle. But I think Tracy makes an excellent point that a lot of people have potential to be leaders and trying to identify people and empower them is something critical. So I'm just figuring things out as I go.

But at least with some of these projects, one of the things I think about is, how do you get more diverse people in the room? And then you actually find people that are leaders and then you just give them more responsibility and then they thrive. And then you sort of recognize people and we just need to do a better job of that.

I think if we were more intentional, because it is a discipline unto itself and it is different. And so a little anecdote, when I first became somebody who is responsible for somebody else's ability to pay their bills, which is how I like to think about it, it's a huge responsibility. It's a huge weight on my shoulders to make sure that the people that are actually individually contributing are empowered, because I'm not. My job is to multiply their contribution.

And so when I had been doing the work, I had been mentoring, I had been project managing all of that without the label. And the person who was my supervisor at the time said, you know, we want to make you the official team lead. And I went home and I cried. And I called up my then boyfriend, my now husband, and I'm on the phone in tears saying, I don't want to do this. I want to be in the trenches, but it's going to be such a bad thing to bow out, to say, no, thank you, I appreciate the vote of confidence, but no, I'm going to go back to the computer. In hindsight, I love it. I absolutely love it.

And so just to build on, I don't think that everybody wants to be a leader, but in my gut, no, everybody can be. It's not a special skill set. It's not something that some people are gifted with genetically, the ability to do. No, I think it is a skill that is learnable, and it's about whether or not you derive the feeling of accomplishment from it. And if you do great, rock it. And if you don't, that's fine, too.

But it isn't just if you are a great individual contributor, you're naturally going to be a great leader. No, they're different skill sets, and they're both things that anyone with drive can learn.

And personally, now, years later, I look back, and I love what I do today even more, because the way that I think about it is I'm programming these folks, because you can only pay somebody to keep a seat warm. You can't pay somebody to want to solve a problem in the best possible way. That takes mentorship. That takes providing ability. That takes providing opportunities. That takes all sorts of things that are learnable and that are incredibly empowering, because you see the work, you see people grow, and you see people reach success because of ways that you've cleared the midfield, and you've allowed them to run their play.

Oh, I would just love to add, I mean, I totally agree with everyone's talking, and I think we are in a field that kind of lionizes IC work, and I think that, yeah, we sort of self-deprecate about management, where it's like, oh, I used to be an IC worker, and now I just, like, answer emails or schedule meetings, and I think that's, like, it's reflective of the fact that I think a lot of people, myself included, in this field, you're sort of, you likely came from a place where, like, your kind of personal intelligence, personal productivity was, like, you know, your self-worth, and I think that it actually takes a tremendous amount of personal work to kind of break free of that and feel like, you know, there's more to my value than, like, being the smartest person in a meeting, and there's more to success in this job than being, like, you know, the most productive coder.

And so I think, I think in some other fields, that is, like, more of an understanding than in technical roles, and so I think that, like, investing in kind of your personal identity work, for lack of a better term, and, like, kind of feeling settled inside helps to kind of understand, like, these multidimensional facets of being an effective leader, and so, again, I feel like that was kind of a personal journey for me, going from kind of an academic setting, where, like, there's kind of one thing, and it's being smart, into, like, this role where, like, partnership's so important, leadership, like, just, it's just, like, a more complete, I don't know, intersection with the world.

I'm kind of, I'm ecstatic that you talked about that. Whenever I talk to folks about transitioning from IC work to management work, maybe not be the first question, but in the top three questions is, well, but my skills are going to atrophy, like, that is sort of, like, the fear. The fear is that your skills are going to atrophy, and we don't do a great job of understanding that, hey, first and foremost, last time I checked, like, loops are still loops, right? Like, that hasn't changed much, unless y'all know something I don't, but no, that actually, like, this gives you this opportunity to develop these incredibly rich skills.

Have y'all ever encountered that, where people, like, are, like, oh, I don't want to take on this responsibility because I'm going to lose something, and have you ever, like, have you provided any feedback to them about what you thought about that?

I'll take a stab at this. So, I felt this way all the time. I still feel that way sometimes, and so, adding to what Hilary said earlier, part of transitioning to being in a more of a leadership role is trying to redefine what success means. So, for me, success is not being the smartest person in the room, or trying to be, like, the best coder, trying to write, like, the most compact, like, bit of code possible. I think of it as, like, Angela does, like, having been in this role for quite some time, is how can I enable people to do their best work, and how can I give them the resources to do it? So, that's sort of been, like, a slow transition, but it's, like, okay to be at peace with it after some time.

And I think it's okay to not want to be a manager. You can lead. You can speak at conferences, and find the tone of the conversation, and help spearhead the evolution of the discipline without having people whose time cards you approve. That's fine. There's the evolution of individual contribution is not management, but I don't think there's anything that should stop anybody who wants to pursue it, who has the opportunity to do so.

But I think if you have any hesitation, it's worthwhile to explore that. It's worthwhile to explore whether that hesitation is coming from a place of insecurity, and fear, and imposter syndrome. It's okay to wonder if you're going for the management role, because in some circles, it has more prestige. If you're fearing, if you're cowing away from it, because in our circles, it has perhaps less of a prestige, because our identities are so tied to the product of our code.

So, I think all of those are valid. I don't think there's any wrong way to explore this, as long as you have the knowledge that it's perfectly doable, and it's perfectly valid to not want to do it either.

Growing data science organizations

I love that. So, I'm going to move on to the next question, if that's all right. So, we've talked about ICs. We've talked about the transition of ICs to leadership. Let's talk about organizations for a second. Data science organizations grow and change. In the organization that you've seen grow and change, what is the most important lesson you've learned about what to do, and what is the most important thing that you've learned that you should absolutely not do?

I'll kick off. The big one that I've learned is you don't hire data scientists if you don't have data. It sounds goofy, but there are so many times that I've seen people hire data scientists. They're like, oh, and we'll build a system, and we'll collect data, and then you'll get to do stuff with it.

Tracy? What are things you've learned to do or not to do in growing organizations?

Yeah. So, we're not a data science organization. We have data, though.

Yeah. I think that's a really interesting question, and I think it also depends a little bit on the organization. I mean, for us personally as a non-profit, we went from one person to a couple organizations and mergers, and now we're a team of about 10.

And so, I think one of the things maybe not to do slash to do is to think about where you want to be when you're setting up your structures. I think some of the decisions made early on lock you into certain things because they don't envision where you're going. And it's really hard to envision that future sometimes at the beginning, but I guess what I say sort of all the time, and my team laughs at me, is let's take a step back. Forging ahead on this thing, okay, let's take a step back. Where is it that we want to go? How can we not overbuild it, but give ourselves when we're creating this thing the flexibility for it to be something bigger for it to scale?

And I think the point is that you'll get it wrong a lot, but only by getting it wrong do you kind of learn what to do next time.

Yeah. I mean, I think one thing that I've seen in the organizations I've been a part of is the importance of adapting as a business grows. And so, I think that a couple years ago, I was really into blameless postmortems, and I was, you know, reading a lot about them and talking about them. And I think that what I really like about them is that it sets up this paradigm for very explicit kind of ROI discussions.

I think, you know, there's some really important decisions in data work, especially about, you know, like, for example, being the full stack data scientist is very appropriate for an early stage startup where you need to build scrappy solutions quickly, and, you know, you want to be able to function independently. But as you become a more mature organization, like, that model sort of becomes less and less feasible. You have, like, more of a higher standard for the algorithms you put out, as well as the, you know, front-end experience for the clients or the back-end experience. You know, you consider technical debt in a different way.

And so, I think just, like, having very explicit regroups about, like, okay, here was the tradeoff we made for, like, nimbleness versus kind of future-proofing, and, you know, five years later, what does that look like? I think that that's something it's, there's this great paper about how machine learning is the high-interest credit card of technical debt. I think that's absolutely true. And so, and the paper goes and, like, explains exactly why, but in general, machine learning systems are very complex. There's a lot of hidden assumptions in them, and the technical debt can scale really fast and in invisible ways. And so, I think, yeah, just, like, having these explicit check-ins about what's appropriate for our org right now is super important.

Can I say something? Yeah. I think two other things, now that I think about it, is one is investing in systems that allow your team to work effectively together, and that's, like, physical systems, maybe that's project management or communication strategies, but also around the values that you share as a team and having a shared sense of what that means, and so not just what communication channel you're going to use, but how are you going to communicate with each other. And that shared sense makes, I think, all the work a little bit easier and, again, is kind of that thing that helps set the stage for scaling.

I think, like Hilary was saying, knowing when to do that is always a little challenging, but it's usually sooner than you think you should, because when you wait, it gets a lot harder. So, it's that point where you, like, feel the pain, but not totally catastrophic yet, but even if it's totally catastrophic, you still can do it.

So, don't hire data scientists without data. I'm actually going to add another one that I'm curious what you all think. I think a big mistake I've seen in organizations is over-hiring. What I've seen is, basically, data scientists will get hired and there isn't a clear success criteria, there isn't a clear thing for them to do, and they're just sort of, like, brought into the organization and told, figure stuff out. Have you ever experienced that? And how do you, you know, how do you think about that problem and combating it in the organizations you're in right now?

Yeah, no comment on if I've experienced it before, but I think one thing that's really challenging and exciting about this field is that, in general, like, the product management of machine learning systems is a new thing. I mean, I imagine, I think Facebook has some really excellent kind of technical machine learning PMs, but I think you're kind of not alone, but there's not many folks doing this well. And so, especially with kind of smaller orgs, I think that the data science can be a black box, you know, I think that's kind of a common thread, and so having a mature understanding of machine learning within product management can be very difficult.

And again, kind of this thing I was alluding to earlier about leadership, I think that data scientists can be really uniquely positioned to guide machine learning product development, and because they understand the problem in a unique way that probably most people in the org don't understand it. And so, I think that, yes, I have seen teams kind of just pop up, and it's like, hey, have fun, and they'll throw a junior PM on the project, and I think that's the wrong thing. I actually think you need a pretty senior technical PM, or you need people, data scientists who are acting, like, taking on some of the responsibilities of product development, and thinking really creatively about, like, what can we realistically solve.

Responsibility in data science

All right, if it's okay with everyone, I'll move on to the fourth question. So, again, we've talked about ICs, we've talked about becoming leaders, we've talked about organizations, and all of these around growth, and now I'm going to talk about responsibilities, right? Our responsibilities as data scientists are significant. As we've seen, the ways that data can be used or misused, and I've personally felt the weight of the responsibility as a data professional, grow. I think that when I hired my first ML engineer in 1999, I don't think that I had a great understanding of the responsibilities that this new world would create. When you look forward, what are the principles by which you hope to shape the policies and practices of data science to assure that your work is responsible? Angela?

I'll tie the answer to this question with an addendum to the last one, which is, one of the things that I have learned not to do is, when you have established that you have a need, you have established what the success criterion is, or the criteria are, and you're out and you're recruiting, and you're attracting talent, and you have a team, perhaps, at your organization for talent acquisition, and they've brought in several resumes, and now I have my resumes, and I'm looking at them. The one thing that I would say you shouldn't do is you should not hire the best of the people you have at hand, or the best of the people that you've interviewed.

I think one of the things that we have a little bit is that you interview somebody, they're extremely smart, they're personable, and whatever, they would just be a bad fit. That's not a negative. That's not something that is a personal failure of them. It's just not a good fit, and so I think going forward, part of that is having a better group of people to choose from. Better not in the personal qualities, but better as in how they fit, what your goals are, and what the organization is going to be.

I would love to have organizations metabolize the fact that it is so important to have people working at your organization, people teaching at your academy, who are representative of the world, because one of the things that we learn in data science, and one of the things that I'm adamant about, is that data isn't, and Hillary and I have had lots of conversations about this, data alone isn't ground truth. Data are artifacts of systems. They're breadcrumbs.

There's this amazing book, which I'm going to butcher. I think it's All Tomorrows, but I'll tweet about it. It is, in essence, paleo artists imagining what animals could look like, so I don't know if anybody has ever imagined a T-Rex without the teeth glaring, right? Like, every representation that we see has that, and so these folks who are trained as paleo artists, who are trained to imagine what flesh and cartilage look like on these extant bones, they reimagine what a manatee looks like, and it's landfaring, or they imagine what a swan looks like, and it doesn't have a neck the way that we know they do, because these bones aren't truth. They're artifacts that we have access to, but they don't tell the whole story, and for you to be able to ask better questions, for you to be able to imagine different futures, for you to be able to allow for the fact that that data might be only a partial truth, you need people in your organization who have experienced that and who know to question that, and who know to question that things aren't what you expect them to be.

They are what they are, and our job is to try to build inference models and to try to understand what that data is telling us, but I think it's a problem when we use the data as the metric rather than as the channel for what it is that we're trying to do, so how I hope that informs the theory and the praxis of the discipline is for us to be mindful of the fact that we need lots of perspectives to ask lots of different questions so that we get a more well-rounded understanding of what data is telling us.

That's fantastic. I mean, I'll jump in on that because I actually share an office, or one of my offices I share with an actual T-Rex.

Like back in the day, someone, I don't remember who this was off the top of my head, but assembled like a stegosaurus skeleton as like a unicorn because they didn't have a mental model of what it looked like, but the real point is like having good data science teams means having very diverse data science teams, people that bring a lot of different ideas together and also thinking a lot about how to make your work just broadly reproducible and effective and owning your mistakes because we all make mistakes and owning issues that happen with data and then going back and talking about things like blameless postmortems. These are all things that are valuable, and so last year a handful of us got together and started writing down what we think are ethical guidelines to be a good data scientist, so we captured all this in a manifesto. I think it's datasciencemanifesto.org. I can't quite remember, but I'll tweet it out, but I try to think about all these things every time. I try to assemble a team for a project, and like Angela says, you don't just grab the best person out of like a list of CVs and just say like you. You think more about the team more holistically.

Tracy? Yeah, I would echo what Karthik said, and we saw that question. I said, Karthik, what's the manifesto that you wrote? Because I think it is having something to kind of tie back every decision that you make, so you're faced every day with decisions that affect people, right? I think that is one of the great things and the terrifying things at the same time. What's your process like when you make that decision? What's your foundation? So something like this manifesto or your personal values or your company's values, how do you think about those things and everything that you do?

I think it is interesting as I give talks, especially in a public setting, the question we get most often is around the data ethics, and almost no one on the panel or anyone speaking has anything to say about it. So thinking about what we would say and being more intentional in communicating that to the people who use the products or the community at large and engaging them, being open to engaging them in that conversation as well.

Yeah, I think all very excellent points. I'll add that there's Cathy O'Neill wrote a book called Weapons of Math Destruction, and I think that one of the key takeaways she has is that we should have kind of a data science version of a Hippocratic Oath, which is the oath that doctors take, and I think that's a great idea. I mean, establishing kind of a community-wide standard of the values that we have, I think, is important. Obviously, it doesn't prevent like, you know, not every doctor is necessarily ethical, but I think it's a step in the right direction.

And then the other quick thing I'll add is that frequently as data scientists, kind of to all the points about like diversity and having people bring things up, I think that frequently, again, with the black box machine learning thing, a lot of times, you can't rely on other parts of the organization necessarily understanding the issues. So, taking that responsibility to surface things and explain the implications, and yeah, taking ownership of that, I think is really important for data scientists.

Yeah, I'm just going to, that ownership is critical. Product isn't going to do it. Product is trying to shape the future, right? Engineering is trying to build it, you know. At the end of the day, we are the ones who have this responsibility, and I would love to see a world in which chief data officers, in which all the way from chief data officers to the entry-level data scientists, was keenly aware that this was them. They were the bar. They were where the buck stops, and I really hope that we're able to continue that.

Audience Q&A

So, we're almost running out of time. I think it's time for us to take some questions from the audience.

All right, over there. Hi, I'm curious what is the most effective way you've seen product management and data science work together and actually accomplish things? Because there always seems to be a hierarchical shift depending on the project, and I don't necessarily know how it fits, especially with your experience.

I mean, I'm very lucky right now in that data science is one of our key competitive advantages as Stitch Fix. So, you know, there's obviously other personal styling services out there. So, it's kind of like a freebie for me because I'm like, oh yeah, just be a great partner and like come up with ideas. But I think, I mean, I'm just sort of repeating myself, but I think not assuming that your place is just to crunch numbers and like thinking much more broadly and systematically about the problems that you can solve. I think that, I mean, product managers want to put good products out there. And so, if you work on solving that problem, then I think that they will respond to that or at least that's been my experience. And so, just like coming to the table, assuming, again, this takes confidence, it takes personal development, but kind of assuming that you're a partner, not someone who's receiving tasks, I think that that can go a long way.

I think that's right. One thing that I'll add very tactically is whatever your planning cycle is, product has a roadmap. Make sure data science also has a roadmap. Make sure that that roadmap is correlated and complementary to the product roadmap, but not directly tied to it. And then make product see the value in that roadmap. Because that allows you to then have very, very real talks when product comes down and asks hard questions about the tradeoffs that are going to be required. I think that we as data science organizations oftentimes don't do that, and I think it's a critical thing to do.

Hi. We've talked about this a little bit, but I just want to make it explicit. What are common ways that data scientists fail? And if you want to answer it in another direction, in your own careers, if you've seen data scientists fired or volunteered to leave a team or a company, what happened to them? Why did that happen to them? And I don't mean this in a morbid way. I just believe that a lot of times we learn about what works by learning about how things break.

I'm going to answer first, but I want... I've seen data scientists fail by not saying no enough, committing to too many things, being unable to provide anything other than a cursory analysis, and then not having a competitive value over a replacement level sort of like Excel spreadsheet. That, to me, is the big one.

I'll add to that. Ways that I have seen data scientists fail is by assuming that the product manager knows enough to ask the question in the best way, not being collaborative and a partner in helping refine the question so that the ultimate product is the best version that it could be. And sometimes that means it's a 180 from what was originally imagined. I think product managers are very smart and they're very good at product management. They're not necessarily really good at understanding what's possible by sciencing on data. And that's where you come in as a data scientist, to step in and to say no or to say yes and what if.

I think having those conversations are part and parcel of what you get paid to do. And you're not just somebody in charge with delivering an output. You're somebody in charge with the success of a project, of a product, of an organization. So that's the number one way that I have seen failure.

And to answer your question, what has happened to that, so in a previous life I was leading a team for an organization that was not data scientific at its core, but they were expanding into data science. And the market took a turn and difficult decisions had to be made and the entire data science team got laid off. My boss, me, the people that worked for me, all the way down that whole branch just got chopped and the company sort of tightened in and focused on what had been their bread and butter. And this is one thing about this community that is remarkable. Within six months everybody had a fantastic job. Not a backup job, not a I got to do anything out of desperation because I got bills, but really great jobs that were sourced through the community because they knew people who knew of opportunities and who knew of good opportunities.

I'll add that one way that I've seen failure in data science is like caring more about the statistical method or the fancy model, caring more about that than about like solving a problem for the business. So I've seen a lot of people building some sort of fancy machine learning thing that essentially was like a cool intellectual project, but wasn't actually the person wasn't really tied in or thinking about what the product needed and like what the customers of the product needed, what the fundamental problem was that you were solving. And I think that's something that is the impulse within the community. We like to get together and talk about fancy algorithms and fancy you know data sources and whatever, but usually like many data science problems can be solved with a pretty simple model. And so kind of like letting go of the ego around that and really thinking instead about like you'll you will have more impact if you think a lot more about the product that you're working on and a lot less about like kind of these marginal gains by using slightly different algorithms.

Mental health and burnout

Hey, thanks. On the topic of soft skills, how do you manage your mental health or what advice do you have to manage mental health?

I have my advice. I have dogs I love, an incredible wife that supports me, and every Saturday I rent a room with a jacuzzi and a sauna for 90 minutes and I just hang out.

I'll just say, this is a great question, and burnout is a huge problem with every single person that I know. So just being aware of burnout and doing things like Eduardo does, I meditate, I do all kinds of things like him, just to like stay on the edge of burnout or push myself back from burnout. But one thing you can do as a manager is actually be aware of this. If you've already experienced this, I can spot this in people pretty easily. So then I try to enable them to like step back and then refocus and things like that.

And sometimes you're in an organization that isn't as respectful of people's mental health because of the culture and everything. And if you find yourself in that role and it's incumbent upon you to change that, you can make a monetary argument. You can say that if people burn out, they're going to leave and the cost to replace them is too much. You can say that them being burned out means they're making suboptimal decisions or they're making bad decisions and they're harming the product. I mean, there are ways to articulate this so that you can do your job of watching out for the people on your team to make sure that they don't have to raise their hand once they're past their point of no return.

I think I said it in my talk earlier, if folks have not read this, Chaos Monkeys, the story of how Netflix has their systems architected so that certain systems will go offline on purpose and randomly to ensure that there's resilience built in. Maternity leave, vacation, holidays are people chaos monkeys. And you have to ensure that your team can survive if somebody has an emergency, if somebody gets a scheduled vacation that they're going to go anyway and then something poops out. I mean, those are important things to build resiliency on the team and there is a business rationale for them. So if you think that you working yourself to death is in the benefit of the business, you've got it backwards.

And I have no stomach for anybody who brings themselves to the point of burnout because they think that's what's expected of them in my team. It's not. And we just had year-end books closing and I went through and I saw everybody who hadn't taken their vacation and I went and I talked to every single one of them. I was like, why? And what's going on? And so when is your next vacation? And when's that happening? And you're going to turn your phone off. You're going to take the SIM card out. I don't want you available because otherwise other people don't know to learn, to work around that absence, that planned absence, which is an incredibly important business skill to develop.

I would love to add like kind of in this like financial, you know, there's a reason. I mean, I have no doubt in my mind whatsoever that the thing that has helped my career the most is investing in my own kind of personal emotional development. I think, you know, like jobs are deeply personal and you end up exploring these parts of yourself and your personality and your insecurities and it's a unique lens that you see yourself through. And so, yeah, I, you know, invest in my mental health. I live at a Zen center, so it's helpful, but, you know, meditating and just like generally being interested in the human aspect of our job, I think is, again, like just if you want to be a better data scientist, this has been the thing that was most important to me.

I just want to quickly add, though, I think that there is an element of self-care that's more about shaming you into self-care. So then you feel badly that you haven't been taking care of yourself, which is like not the great cycle. So being kind to yourself that maybe you haven't been taking care of yourself, you're not doing the seven things in the magazine that they say you should do, like that's okay too. Don't beat yourself up for not taking care of yourself. But yeah, it's a journey. You figure it out over time and I think you grow into a role what you can handle next year is not necessarily what you can handle right now.

Closing pitches

All right. Well, I was going to say, if you have anything you want to pitch or if you have anything you want to say, we can go around. If not, do you have? I'm hiring.

If you are interested in data engineering, in data science, if you're a junior, if you're a senior, if you want to look at robot logs, if you don't want to look at robot logs, if you want to be a data steward, we have this awesome role that used to be called data hygienist and now is called data steward. And the sole purpose of this person is to look at provenance and governance and ensure that everybody who needs data has it and knows where to go. And if they don't, how do we fix that? There's tons of really cool jobs. So come find me.

Well, I guess I'll say Stitch Fix is hiring too. We're hiring. It's a great place to work. And then I'll also just pitch my podcast. I have a podcast called Not So Standard Deviations. And we talk a lot about kind of like the why of data scientists, the how of data science. So yeah, I would love to connect with listeners and love to have you as a listener.

I'll just add that if you're interested in talking about challenges around leadership or running open source teams or open source projects or data science teams, or how to transition from academia to data science, feel free to email me or talk to me.

I think that there are actually groups around those discussions is important. I think a lot of the things that we talked about here, talking about them and knowing that they're