Kshira Saagar @ DoorDash/Wolt | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody. Welcome back to the Data Science Hangout. I'm Rachel. I lead Customer Marketing at Posit. So excited to have you all joining us today.

The Hangout is our open space to hear what's going on in the world of data across different industries, chat about data science leadership, and connect with others who are facing similar things as you. So we get together here every Thursday at the same time, same place. So if you are watching this recording on YouTube at some time in the future and want to join us live, there's details to add it to your calendar below.

Is this anybody's first Data Science Hangout today? Say hi in the chat so we can all welcome you in and say hello too. We're all dedicated to keeping this a friendly and welcoming space for everyone and love hearing from you no matter your years of experience, titles, industry, or languages that you work in.

It is totally okay to just listen in here too, and you can be a part of the party that happens in the Zoom chat here, but there's also three ways that you can jump in and ask questions or provide your own perspective. So you can raise your hand on Zoom. I'll keep an eye out and calling you to jump in. You can put questions in the Zoom chat. And if it's something you want me to read instead, you can just put a little star next to it. And then third, we have a Slido link where you can ask questions anonymously, and Isabella will share that Slido link in the chat here in just a second.

One more thing. Oh, I guess I have two more things. We do have a LinkedIn group for the Hangouts if you would like to join. I know those groups aren't always the best for discussion, but it can make it easier to find each other after the fact and connect.

And the other quick note I wanted to share is registration just opened for the POSIT conference in August, and I know there's super early bird pricing for I think just three weeks. So I want to make sure that I let everybody know that here.

Well, with all that, welcome again. I am so excited to be joined by my co-host today, Shira Sagar, Senior Director of Data Science and Analytics International at WOLT and DoorDash. And Shira, to kick us off here, would you be able to introduce yourself and share a little bit about your role and also let us know something you like to do outside of work too?

Well, thank you so much for having me, Rachel. I'm really excited to be able to come in and share what little I know. The first thing I'll say before I get off my mark is I tend to speak very fast. If you don't get it, if nobody here gets it, I'll really try to pause. I tend to speak 245 words a minute, which somebody has measured it for me, thankfully. But if you feel like I'm going too fast, I'm always happy to stop so that everything I say doesn't sound like gobbledygook.

The point is there's no filter between my mind and my mouth, so I tend to say a lot of things and then think, should I have said that? But yeah, hopefully it stays in a safe space.

Everyone who I haven't met, my name is Shira. I take care of analytics and data science for WOLT, which is a part of DoorDash International. It's an international wing of DoorDash. I'm sure all of you know who DoorDash are and what we do. And for those of you who don't, DoorDash is the largest food delivery provider and grocery delivery provider in the US. And so they have an international wing in Europe, in Canada, Australia, New Zealand, in other markets. And so I take care of the analytics and data science team for those markets.

But to be very practical, what does that mean? What we do is the team and I, we act as addition enablers to the point where the way analytics is structured at DoorDash and also at WOLT is how we are not consulted after the fact, which has always been my bugbear. It's not the team which has come and told after everything's done, by the way, we finished doing this thing. But the other way around there, the way the teams are set up from scratch is we are part of the conversation on, do you want to build something new? So the big team for us is being true partners and having a seat at the decision making table.

And that's typically my calling card for, when I try to explain to people what I do, that's different to other analytics I've done before and why I really enjoyed it. What I do here is you get a seat at the table, which is quite hard to get in other places.

What do I like to do outside work? I used to love running. I used to do a lot of marathons and half marathons. I haven't done that in a while, but that's what I like to do outside work. And it brings my mind clear. I think of it as control C and control delete of my brain. And when I run, I try to process, throw everything out.

Getting analytics a seat at the table

I love that you just mentioned making sure that analytics has a seat at the table, right from the start there. Could you tell us a little bit more about really what that means and how you make that happen?

So in essence, what that means is three things, right? One is we make sure that everything that we build as a business, not just the product team or the ops team or whichever team, everything we've built as a business all ladders up to what the business eventually wants to achieve. So we are the truth seekers that make sure that anything that's being built, we keep everybody honest and accountable that these are the right things to build for the business.

Number two, what that also means is we are the team that keeps people accountable in the fact that if we can build something, we can all think it's a great idea, but somebody has to run an experiment to figure out if it's the right idea, if it actually works. The thing with having ideas is anybody can have an idea, but does it actually work and measuring it and keeping everyone honest and accountable and in an unbiased manner, that's the other thing.

But the third and most important part that we do is the fact that we try to bring teams together and connect the dots. So the problem with having a much bigger business is everybody's focused on their part of the business. And this word that gets thrown around with the silo. So nobody connects the dots. I like to think of my role and my team's role as being the people who help connect the dots.

So my favorite character growing up from Sherlock Holmes was not Sherlock Holmes himself, who seemed to always solve the problems. For those of you know, Sherlock's brother, Moria, Mycroft Holmes was the person who could actually connect all the dots and sit in a room and see everything that's going on and understand why something actually happened. So being able to know the second and third are the things and connecting dots. So those are three things that we do.

And that's what we mean by being true partners is we are very helpful to the business in actually making sure what we do is the right thing for us to do.

And that's what we mean by being true partners is we are very helpful to the business in actually making sure what we do is the right thing for us to do.

Typically what I've seen work and that's when we get credibility and a seat at the table because you're actually helping you be part of running the business and not about just measuring the business.

So some businesses run with OKRs. I didn't want to generalize it because not every business runs that way. Some businesses run with OKRs. So if you have OKRs, then making sure that everything you work hits the, what are the capital OKRs, you call them a top OKR. If your business has three things that they want to achieve, which are the missions and are we tying up to the mission exactly, and can you quantify it?

Fraud detection and tooling

So the team, I probably can't talk to the exact techniques as such, but the team does look into chargebacks and trying to find out fraudulent refunds and people exhibiting fraud behavior. It's a combination of tooling. So we have standardized tools that can then flag this. So we don't directly do it. What we do is we feed the tool. It's a graph database. At the end of the day, we flag, we provide the features and the relationship entities that we need into this tool. And then it can then keep flagging for us.

And what the team works on is actively is, are these the right rules to have? And are we losing a lot of revenue from these rules? So instead of identifying consumers, we try to identify policies, come up with policies, which is people who look like this, do like this, maybe stop their transaction. So that's the policy that they would come up with. It's easy to input into a system. Like I said, so the system is smart. The system needs more and more policies. So we can come up with those policies based on what we see.

So very simple. So we use a lot of SQL at the end of the day. It's all about the data we have. And then there is a lot of the analytical work spent on time on trying to understand why something happened to spend on R and Python. So we have our own R and Python. Different people, people are very free to use whatever notebooks they want or work pages they want.

When you start building models, we have our own, so we can build models. We can deploy models and deploy stuff. We have our own machine learning platform in-house, which can take a Python package and then make it into the end point. And then that can start feeding anybody who queries it with a recommendation output to visualize. We have our own, we have multiple visualization platforms, as you can imagine, as any big business does.

And the beauty is everybody who has at doordash.com or at volt.com email pretty much has access to almost half of all the images, all the dashboards can actually play with almost all the workbenches and also tweak the queries. So we've built it intentionally in a way where nobody has to feel limited because they don't work in the analytics team. Everybody can get a query as long as they know how to run a query and see what they need to see, and they can do it.

Open sourcing and the DoorDash blog

So I think it's good on a multiple fronts. One front is for the team to talk for the great work they do. What we've always gotten from the blogs are people who interested in those kinds of things, come to us and say a bread is productive, can be better. So there's no better way to get feedback on what you do than by sharing it more and more widely rather than keeping it closed.

So if you want to know more about our Sibyl platform, which is a machine learning platform, want to know more about our data science platform or how we run experimentation or an experimentation platform, which is also completely built from ground up, which is called Curie, you can go to Curie and write DoorDash and you can get a blog and that talks a lot about it. And we've had people reach out and provide product feature ideas and also recommendations and improvements on how we can experiment from that.

Surprising insights from data

So I can talk about a very unique use case, right. And it's something I often bring up. So in the U S and all of you in the, in the most fear in the U S you'd realize that if something is late, you get pissed off. You're like, don't be late, be on time or be early. If a food is supposed to come at 35 minutes, you want it to be on 34 minutes, 36. It's fine, 36, but you don't have to be 40 minutes. You really hate it.

So what we try to do is all our algorithms or models are set to deliver things earlier than what we promised. That's that's inside. That's a common thing. What we don't realize is the world is a very diverse place and each culture is very unique. So when you try to build something similar, roll it out in Japan, we realized that people really were pissed off, not by lateness, but by early. So the earlier we came in, the more consumers are coming back.

We were quite surprised because, um, consumers are really pissed off by because they wanted us to be on time. A slightly later, but never early because of who these people are and how they were shopping. They were shopping for us and they were coming home and they didn't want to put it outside, but that's the kind of thing when you assume that something is true and then try to apply it 29, 30 countries, then you get caught out. And so then we start bringing in the lens of, are we doing the right thing by this country? And so going to that level of country level detail, then it's very interesting.

So when you try to build something similar, roll it out in Japan, we realized that people really were pissed off, not by lateness, but by early.

Hypothesis testing and data quality

Because like I mentioned, maybe I quickly ran through it because we only roll out experiments and experimental features to people who are logged in, we are able to identify who these people are in some way. Uh, and all our features, it doesn't matter. We don't know who their age or name or gender, any of this stuff. We just know who, what kind of behaviors they exhibit. And so we are able to identify them as an individual unit block.

Those kinds of audiences are naturally excluded from the experimental sample because we don't know anything about them, or we might get erroneous data because we can't track them properly. Uh, but in other ways we do have bias reduction techniques, or we have bias identification techniques, um, before the whole experiment is automated. And so that's how we do it.

Collaboration between data science and the business

Um, so it, like I said, the, we are driven by the business to actually do things. So the fundamental reason why we exist and what we do is pretty clear for us. The reason we exist as a team and what we do is to make sure that we are investing in the right things. We're putting money in the right places and we're going after the right target. So to know that, um, the only way to learn it is by experimenting.

The one thing I really love is how our CEO often talks about it is we read these experiments, we read these documents where people talk about things that they say, oh, we experimented this and we failed and we feel bad about it. And then his comment would be, if you didn't expect, that's the whole point of experiments. It's to fail and to learn things. If you didn't fail, you would never learn. So it's that culture is built in, in the business.

And so what happens is the business comes to us and says, we want to build this new, go to a new market or build this new product or build this new feature or whatever it is, um, let's do it in an incremental fashion and let's measure along the way and make sure if it's the right thing to do, uh, if it's useless, let's throw it out the window. If it's great, let's evolve it.

Tools and decision-making culture

Um, so we do use, um, I don't think we have a lot of Microsoft products as such. There's nobody who does not use Excel, but we don't use a lot of the other things. So all our presentations are on Google slides. We do a lot of the document culture. So everything is written down. So anything that you need to make a decision on, be it an experimental feature or be it a business decision, everything is written down and somebody comments on the talk and says if it's the right thing to do and if it's the right.

Um, and we use, uh, Looker from Google, which is the visualization platform to share dashboards and stuff like that. But dashboards are best dashboards. You know, the decisions are made on a document, uh, be it investing money in a place. We're shutting down a feature or being that pickup feature. Everyone writes up an experimental doc, um, talk about what the feedback has been or what the output has been. And then people ask questions there. And then the decision is made there in the document on what that's what we do.

Handling ambiguous data signals

So data quality and lineage is a massive thing for us. We spend a lot of time and effort on understanding how we track something, where it is tracked from, what is the coverage for it? Uh, we have monitors for it. So we know at any point in time, what is the quality of our data? Um, and because like I said, we're extremely obsessed with metrics. Uh, even if something moves by a couple of basis points, you know, immediately, if it's a moment due to us, due to the market or due to the data being wrong.

On external data, we think that we don't have control over example, market share data that we get, or app download data from download partners and stuff like that. The only way we provide credibility to it is by trying to triangulate it with something else, which is one other external data and also our in-house internal data, try to find a proxy for it and then seeing if that all makes sense. So we never trust anything until we can triangulate it. And we know for sure that makes sense.

Career advice: being vocal about your work

Um, so the one thing that I've, I've heard from, and so this is completely probably tangential to what we're talking about is if we don't talk about the work we do, nobody knows what it is. And so I've taken that. So a lot of times we do some amazing, this is something I tell my team, you do some amazing work, but you don't want to talk to anyone about it because you assume people automatically figure out about your work, come understand what it is, and then appreciate you for it. That doesn't happen.

And so at scale, especially if you have a great idea, what we do believe in is you have to be vocal about your idea and talk about why that makes sense, what is the great project you've done, why, how is it impacting the business if we don't do it? Um, and you just worry that nobody appreciates what you do or things don't happen for yourself that, um, passive, uh, credibility never happens. It needs to, you need to take an active foot in it. It's what I've been told. I was advised. I've taken it to heart and that's what I tell my team.

And so at scale, especially if you have a great idea, what we do believe in is you have to be vocal about your idea and talk about why that makes sense, what is the great project you've done, why, how is it impacting the business if we don't do it?

So we create forums for the team. So the teams can come share their ideas, the things that they've done. We push our teams to be vocal about things that they find are wrong. If they see something, we ask them to say something and fix it. And so there is a lot of effort on, um, being vocal about the work and being in the data analytics space where everyone likes to be with their stuff.

So one thing is we do create, we do make sure that the work that the team does is incredibly technical and complicated. So instead of boring the team with boring everyone about, we did the CB test and we saw this, it was like, no. So if you do this, you get X back and trying to translate that, translate math into English is what I do. And I try to teach the team, but how do you translate your math into English?