Data Science Hangout | Jarus Singh, Pandora | Human in the Loop

Transcript#

This transcript was generated automatically and may contain errors.

Thank you, everybody, for joining, and I know we just hit the top of the hour, so welcome to the Data Science Hangout. I know it's a bunch of familiar faces here, but anybody who's joining for the first time, welcome. This is an open space for the whole data science community, current and aspiring data science leaders to really connect and chat about some of the more human centric questions around data science leadership. We really want to create spaces where everybody can participate and hear from everyone. So you can jump in live and ask questions, you can put questions in the chat. And we also have a Slido link where you can ask anonymous questions too.

But with that, I'm so excited to be joined by my co host for today, Jarus Singh. Jarus is a Director of Quantitative Analysis at Pandora. And Jarus, I'd love to turn it over to you and have you introduce yourself and maybe start off by sharing a bit about your team and the work that you do.

Yeah, thanks, Rachel. Thanks for having me today. Really excited for this. So I lead a team of three. We're quantitative analysts. And what's a little unusual is that we're embedded in a finance team. And as a quick aside, when I first joined, I was very skeptical of being on a data team sort of embedded in the finance org, just because they're not historically known for, I think, putting as much emphasis on data or the right emphasis on data relative to sort of other orgs that may have more of a history of doing so. But it's actually worked out really, really well for us.

So, you know, the way I view the team is that we have three sort of pillars that we're responsible for. One is reporting. So what happened? And this is all the metrics that Pandora cares about that informs its financial model. So it would be users. It would be how much people are listening. What platforms are they listening on? What are the demographics of listeners? All of these have impacts on what we pay for spitting the tracks or, you know, the revenue that we can make against people when they're listening on the free tier from advertisements or, you know, the propensity to join a trial and then eventually subscribe. So that's the first pillar is reporting. Second is forecasting. What does the future hold? We do that using a lot of R and we try to rely on algorithmic and automated techniques as best as possible, but realize that, like, they can't do everything for you. So there are times where, you know, you need sort of human in the loop style modeling where someone has to look at it and say, OK, I'm getting information from this team saying they're going to change their strategy and we need to make sure that we're not relying entirely on an algorithm that is not aware of sort of that change that's coming. And then the last pillar, I'd say, is data driven decision making. So we have a wealth of information about our users. If we're going to go out and do marketing or special deals with, you know, it could be a telecom company or a device manufacturer, we want to understand the most we can about the sort of users they're going to bring in or the deal terms so we can use our knowledge of Pandora's data to come up with a very refined estimate of what we think would happen under those under those deals.

Yeah. So my team is mostly dealing with like business metrics, I would say. So there's not a lot of stuff we've done where I'd be like, oh, I worked on that. It's more like every time you go in and engage with the app, it's like we're trying to predict how much we think you're going to use it and sort of the potential you have towards the business. LTV as well is important. But yeah, you know, we're not selecting the next song.

Data scientists, I think, on average, tend to overestimate how much everyone else knows about data science. So if you're like, am I over explaining it or under explaining it? Odds are you're under explaining it.

Pandemic effects on time series models

We did. Yeah, it's the effect is much more muted now. But at the beginning of the pandemic, there was, you know, a huge change in the way in which people listen to Pandora that, of course, is now, you know, when extrapolated forward, having a lot of problems. So what I was kind of proud of was our team has, you know, the knowledge of how our algorithms work. Not only do we implement them, but we understand how they work. And we were able to very quickly identify, like, look, the outputs of these are not going to make sense. And so we had to sort of pull alternative models online, which I think we did within two weeks or so, which I'm pretty impressed by.

And I would say, like, if you're dealing with business side data, you should consider that a part of your job, because, again, there are folks who place a lot of emphasis on, you know, can I put these things into production and can I run them? But if you're not making sure that they're resilient to things like this or you don't have a plan for how you would fix it, I realize not everyone wants to plan like possible pandemic into like every month of the work that they do. But if you don't have some way of pivoting quickly, you know, in the eyes of the folks you work for, they're like, wow, that person's like not really doing their job. And, you know, we pay them so much because they have all these skills. But like in some ways, you want to be resilient to these sorts of changes.

So what I can tell you all about Pandora usage is like a lot of it we found is associated with routine. And so when routines get disrupted, you see large changes in how people listen. And we saw that happening in March with shutdowns and with panic buying. You know, you're buying toilet paper, you may not remember to put on Pandora in your car when you're stressed out. So we did see that. And, yeah, accuracy is what we use. And it was just like this seasonal kink that, you know, you would not expect. But for, you know, major disruption happening.

Tech environment at Pandora

Yeah, so we're on Google Cloud Platform for all of the sort of raw and upstream data that my team relies on and the data engineers get it there for us. So that's where we start working with Pandora data and we'll do aggregations there. You know, we have to handle ourselves. Sometimes logic needs to be applied to classifying various platforms. We don't want to deal with the laundry list that they have. So from there, we will get it into BigQuery or Postgres server we've just used for a long time and we've not migrated our jobs over to BigQuery. And then we'll connect to that with RStudio . So we're using RStudio either server for sort of the bigger projects that we have. And then we'll use or if RS Connect is needed for a shiny dashboard and then sometimes a personal laptop just when it like doesn't really matter how much firepower you need.

And then for visualization or end product, you know, sometimes it's just dumping the data back into something like BigQuery or Postgres. Sometimes it's, you know, a CSV that other folks in finance can ingest and use in Excel or something that can be uploaded to Anna plan, which is how we manage a lot of our financial reporting or a shiny dashboard is what we sort of default to. And then we've inherited some work that's in Tableau. So we'll do that as well.

And what I would say is the way I like to explain it to stakeholders who are used to working with folks doing a lot of manual reporting is I say like one or zero click to send this report to give them an idea of, you know, you may have folks putting together stuff in Excel and pulling manually from different sources. We're able to sort of create this whole sort of pipeline or ecosystem for a lot of the problems that we do that require one click or zero clicks. The one clicks usually just to check to make sure everything makes sense before you send the report out.

Presenting work: pretty vs. impactful

I would say like, the first thing is, it's dependent on who is the executive that you're presenting to. But by and large, at most companies, my experience has been the executives who hire or green light data science teams are not well versed in data science themselves. So a lot of I think the onus is on like management and leadership and data science to explain to them the value. And I think, you know, a good executive is not going to try and like pick you apart for not presenting stuff while they care about ultimately the bottom line, usually.

So, you know, better results should work, but I think it's the way in which you frame them. So there's the idea of like making it pretty and making it impactful. Pretty, you know, the iBankers are good at, the management consultants are good at, and execs are used to working with them, and they're used to seeing things that way. But if you were to do like a summary or output of a table that shows how much money you're saving the company through your optimization efforts, and that's in the millions of dollars, my hunch is that you don't really need to, you know, put that number in the same font as everything else in the deck and make sure it's standardized. At the end of the day, I'd say that's what matters.

A classic example is we had a model that had 80% accuracy. I made it 84. But without doing some work to explain like what is the value of that to the business, you know, executives can sometimes be skeptical. Also, like they may not be well-versed in the problem space. So they might say 80 to 84, like that's not very good, right? Like that doesn't seem like you did a very good job because they haven't been working at those problems as long as you have and understand sort of the hairiness of them. So I guess more broadly than just dollars, I would say like speaking the language that they care about and are receptive to. If you're in people analytics, it might be what are the things that we've done that have resulted in employee retention, which again, ultimately makes its way down to dollars. But make sure to coach it in like what they care about and what they're expecting.

Because the way I like to view data science is, you know, at the end of the day, like they're hiring us because we have a skill set. It's a tool and it's a tool being applied to problems of the business. It doesn't matter how fancy your tool is if it's not solving the problems that the business has. So try to view it through their lens.

It doesn't matter how fancy your tool is if it's not solving the problems that the business has. So try to view it through their lens.

Skills for data science leadership

So to be a leader, I'm going to think about that piece particularly. So like, what is the difference between a good data scientist and a good data science leader? And this relates a little bit to what Bruno was asking. So and I sort of had this as a rude awakening during my career, which was as a data scientist, right? A lot of emphasis is placed on like, what are the tools and techniques that you have? You know, why are they better than doing it manually or doing it under, you know, older styles of, say, like an Excel based report or what have you? Like, why do we need you and why do you do what you do? And so much emphasis was placed on that.

But less so, you know, on the communication, my manager was handling it. And certainly on like project scope, right? It wasn't really my job to say, like, this project should entail this or it should do that. Where I see like data science leadership, like what people who are doing well in that space, you know, the skills they have to develop and they may not be developing as an IC or obviously the communication presentation, being able to understand what are the needs of their stakeholders and then how can or cannot data solve those problems. So I say can, obviously, because that's the job. But can't is also really important as well. Because if you have a stakeholder come to you and say, we want to train a model that or can you build a model that does XYZ? Like, it's going to be great. And you say, yeah, I don't actually see how that's going to help the business. And you can sit down with them and explain why or say, historically, we've tried to develop models here and we haven't had much success. There's nothing I can think of that I think would do better. So instead of just sitting down and going and do that, you're the person who has some say around, you know, I think this is more likely to be successful and actually advance the needs of the business.

And so, Rachel, that totally ties into what I was saying earlier, when you interview day one or day zero or negative one, depending how you look at it, if you understand the business, I think it makes you much, much better able to do that than if you don't really know the inner workings of it. So the idea of proposing here are the projects I think are most likely to succeed and then getting people on board with that. Maybe this is a lazy analogy, but it almost feels somewhat like VC-like, like you're a data VC in some ways. Like I'm betting on these three projects to deliver value for the business and like I need you to buy into them so I can have someone go out and do them and help us be sort of most successful.

I haven't had a ton of experience with that, but I would say right now the state we're at for a lot of sort of data informed projects at a company is that human in the loop is a great way to go. And if a human is in the loop, then you're not really threatening someone's job. The idea is I think that you are advancing the value that they offer because you're giving them tools that they can use in order to do their job better. That is not always the case, so it's sort of dependent on how things are structured. You can't tell someone, oh, it's human in the loop and then cut them out entirely. But for a lot of things that we see, and I mentioned that example earlier where like we knew given the cultural climate and the global pandemic, people would be more interested in health and news related to that than history. We need smart, good marketers to say, okay, I'm going to let the algorithms pick certain things to put in front of people, but I need to reserve this space for something that I personally am willing to bet folks are really interested in or getting increased engagement.

In general, what happens with the progress we make on the machine side is that we kind of let people add more value because it's displacing sort of the easiest tasks. Again, this is on average, not always, but that's what I've been seeing, not just in my company, but at others as well. So I would say it's not so much threatening to your job as a whole, but it does mean that if you're somebody who's been doing a job without a machine learning model, you need to understand how do I work with this thing? Because, you know, it's not going to go away.

Advocating for data in decision making

So this is where being in finance has been hugely beneficial. And I mentioned I was skeptical when I first joined because finance teams don't, you're not generally associated with data prowess, no offense finance teams. But being embedded in finance has helped with that hugely because at the end of the day, the decisions that we are making are coached in those terms, terms that everybody cares about, obviously, the CFO and the CEO as well. And so once you've got that framing down of like, we're trying to understand how, you know, if this initiative, proposed initiative will be breakeven, or we want to understand what's the utility of missing our forecasts, like how bad is 1% miss and accuracy? How bad is 2%? Like, what are the downstream effects of making these mistakes? That has been hugely beneficial.

So I would say the advocacy that I had to do over time was not me trying to convince people of my team's worth. It was more just like producing, hopefully a drumbeat, constant drumbeat of wins and then showing people like, look, this is what we do and how we do it. And you need to trust us and involve us in the process. And we've had times where folks have tried to go around us and try to sort of sell a decision without our involvement. And, you know, our leaders have always backed us up and said, like, no, you need to involve, you know, this group or the finance team or the group within the finance team to make sure that that's the right decision.

Yeah, I guess while you all are thinking, I might add, I do expect more organizations to start embedding data people in the finance org, or if there's like a hybrid org, right? So you have the data org where we're decentralized. But if you have the hybrid org, so data org, but everyone has sort of different groups that they support like dotted lines to ultimately that reporting into the CFO, I think is really powerful. And I've been seeing it more often just talking to folks in this space. And I think that's going to become more popular.

Finance basics for data scientists

I would say the basics will get you a long way. So revenue, cost, and then profit, which is sort of revenue minus cost. And then specifically, how does your company generate those revenues? How are those costs generated against those revenues? And then what are the ways in which your company is looking to sort of change over time or improve over time to improve those things? And then the last question that you can answer, right, is how can I use data to do that? And then that's you kind of doing the job.

But it is interesting how many folks you talk to with a very strong data skill set who, you know, as I mentioned earlier, like, how do you put a value on accuracy improvement? So they've built their whole careers around, I'm going to learn models that can improve, you know, the accuracy over whatever historical thing you were using. And that's me doing a really good job. But in the eyes of the stakeholder, if there isn't someone to do the translation of, like, here's what those accuracy benefits mean in terms of dollars, and it doesn't have to be dollars today, it could be dollars tomorrow, it's really difficult to sort of make that case of what it's worth.

Generalists vs. specialists and hobbies

My experience is generalists. But again, that's me as a leader saying I know that I am often unable to predict what future needs of the team will be. And in order to accept that, you can't hire specialists because if they're specialized in an area that gets de-emphasized or something, they're not going to be as good of a fit as someone who's a generalist. So, my team hires generalists. Obviously, it's team dependent. But I mentioned before, right, we don't expect you to be 10 out of 10 in any specific thing, but you need to be like seven or eight on a bunch of things because you are going to get pulled in different directions and sort of have to do different parts of projects.

Yes, to the first half. I haven't quite figured out how I want to apply it yet, but I think the applications are there. And I guess I'll share a cute example of me doing an explore-exploit algorithm manually. So, I love cooking. And I'm cooking kind of nonstop. And so, I do it every day. Even if I'm really tired, I find energy for it. So, it's a very sticky hobby that I find really rewarding.

And there are a lot of cooking analogies about the data science process. But for me, I think what has been most useful about understanding data or maybe technology more broadly is there's all these sort of things in the cooking process that are kind of tweakable. And I personally think the modern way in which people interact with recipes doesn't really account for that. And so, what I've been playing on the side, and I don't know if this is a data thing or an app or something that I'm not smart enough to build, is how do you better represent that in a way that people who are either very good chefs or very good data people can better interact with the system? So, as an example, I'll give you is like I've tried a bunch of different ways of making the same recipe. But when you go to a website, you just see the one that someone did that they considered to be best. And you don't really see information about how many things do they have to do to get here?

Because if you read comments in recipes, a policy I have is read the most upvoted comment and do what that person says always. Your default should always be do what they say. So, classically, you don't need that step or that's too much salt. Do what they say and then adjust the recipe because the recipe that gets posted on the blog is often not final. So, I like that a lot. And I'd love to just see the ecosystem get somewhere where I think it's accounting for the fact that everybody's kind of experimenting. So, in two years, if I'm working on a startup in stealth, that's what I'm doing.

What I would recommend folks who are getting into this space to do is I say, you know, pick a project you're motivated about. And most people are motivated about their own data. So, the two areas I like to point them to are health and personal finance. Again, you have to have some prior interest in them. But those are great ones because it's like on the one hand, you know, do I want to muck around with an iris data set, even though I don't really know what an iris looks like? Or do I want to figure out where I'm overspending and can I predict future spend? Like, it's kind of, I think, a little more fun and exciting and tangible.

Chasing interests at work and closing thoughts

So the way I've seen, I have not been great at this myself. But the way I've seen people who do this well is one, they do the core job and they do it well and they understand sort of what's me doing a good job or is a great job or is an amazing job. And they're generally smart enough people to pick where on that spectrum they want to be. So these are usually very bright people. And they say, I could do an amazing job, but I'm just going to do a good job. And then I'm going to invest that remaining 20% of my time or whatever it is into stuff I care about. And I'm going to do it in a way that I can sort of spin to my manager or leadership is like this has potential. We should look into it. And when you do that and you're doing a good job, I found it sort of rare for someone to say like, no, I don't want you doing that, right? Because you might leave. And now they have someone not doing a good job.

So I found that that's the way to sort of make it like politically palatable is to say like, look, I'm already doing all the stuff I need to do. But here's where I want to sink some extra time into. And then you kind of coach it in a way that like is useful to them. So research and testing gets thrown around a lot. I want to research these techniques. I think they might be helpful. If you're doing a good job, it's usually easy to sell. If you're not doing a good job at your work, it's harder to say, well, I want to try this and I want to try that.

Forecast is what we use for a lot of stuff. We've tried to implement profit as well with mixed success. Again, we're doing a lot of time series work. So those are the ones we generally rely on. And then for plotting, I think we're still doing ggplot for a lot of stuff. But yeah, forecast is the one we rely on. So forecast is the one we've had the most success with.

Yeah. Thanks for having me and thanks for showing up, everyone. It was really great to see you and chat with you. It's been a lot of fun.

Data Science Hangout | Jarus Singh, Pandora | Human in the Loop

Transcript#

Excitement for data science in 2022

Human in the loop

Getting into data science and hiring

Working with stakeholders and teaching communication

Pandemic effects on time series models

Tech environment at Pandora

Presenting work: pretty vs. impactful

Skills for data science leadership

Advocating for data in decision making

Finance basics for data scientists

Generalists vs. specialists and hobbies

Chasing interests at work and closing thoughts

Featured software#

rstudio