Data Science Hangout | Jarus Singh, Pandora | Human in the Loop
The Data Science Hangout is a weekly, free-to-join open conversation for current and aspiring data science leaders. An accomplished leader in the space will join us each week and answer whatever questions the audience may have. We were recently joined by Jarus Singh, Director, Quantitative Analytics at Pandora. A few key snippets from our conversation: 01:28 - Start of session 6:33 - Human in the loop 14:14 - Working with stakeholders and teaching communication 25:47 - What does your tech environment look like? 28:46 - Presenting work; pretty vs. impactful 33:18 - Skills for data science leadership 43:01 - Having data science rolling up to the CFO 51:41 - Getting motivated by personal projects 49:00 - Applying our hobbies to work: cooking 1:01:25 - Going with your passion ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Add the Data Science Hangout to your calendar: https://www.addevent.com/event/Qv9211919 Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstu... Twitter: https://twitter.com/rstudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you, everybody, for joining, and I know we just hit the top of the hour, so welcome to the Data Science Hangout. I know it's a bunch of familiar faces here, but anybody who's joining for the first time, welcome. This is an open space for the whole data science community, current and aspiring data science leaders to really connect and chat about some of the more human centric questions around data science leadership. We really want to create spaces where everybody can participate and hear from everyone. So you can jump in live and ask questions, you can put questions in the chat. And we also have a Slido link where you can ask anonymous questions too.
But with that, I'm so excited to be joined by my co host for today, Jarus Singh. Jarus is a Director of Quantitative Analysis at Pandora. And Jarus, I'd love to turn it over to you and have you introduce yourself and maybe start off by sharing a bit about your team and the work that you do.
Yeah, thanks, Rachel. Thanks for having me today. Really excited for this. So I lead a team of three. We're quantitative analysts. And what's a little unusual is that we're embedded in a finance team. And as a quick aside, when I first joined, I was very skeptical of being on a data team sort of embedded in the finance org, just because they're not historically known for, I think, putting as much emphasis on data or the right emphasis on data relative to sort of other orgs that may have more of a history of doing so. But it's actually worked out really, really well for us.
So, you know, the way I view the team is that we have three sort of pillars that we're responsible for. One is reporting. So what happened? And this is all the metrics that Pandora cares about that informs its financial model. So it would be users. It would be how much people are listening. What platforms are they listening on? What are the demographics of listeners? All of these have impacts on what we pay for spitting the tracks or, you know, the revenue that we can make against people when they're listening on the free tier from advertisements or, you know, the propensity to join a trial and then eventually subscribe. So that's the first pillar is reporting. Second is forecasting. What does the future hold? We do that using a lot of R and we try to rely on algorithmic and automated techniques as best as possible, but realize that, like, they can't do everything for you. So there are times where, you know, you need sort of human in the loop style modeling where someone has to look at it and say, OK, I'm getting information from this team saying they're going to change their strategy and we need to make sure that we're not relying entirely on an algorithm that is not aware of sort of that change that's coming. And then the last pillar, I'd say, is data driven decision making. So we have a wealth of information about our users. If we're going to go out and do marketing or special deals with, you know, it could be a telecom company or a device manufacturer, we want to understand the most we can about the sort of users they're going to bring in or the deal terms so we can use our knowledge of Pandora's data to come up with a very refined estimate of what we think would happen under those under those deals.
Yeah. So my team is mostly dealing with like business metrics, I would say. So there's not a lot of stuff we've done where I'd be like, oh, I worked on that. It's more like every time you go in and engage with the app, it's like we're trying to predict how much we think you're going to use it and sort of the potential you have towards the business. LTV as well is important. But yeah, you know, we're not selecting the next song.
Excitement for data science in 2022
Yeah, I guess like generally in the space, what I'm seeing is some data groups that I'm in and just reading blog posts is that data folks, I think they know they have a lot to offer to the table, but they're not always in the room where decisions are being made and they're not always able to inform those decisions in the way that I think they can and should. And what I see is the space sort of maturing in that. Like, I personally believe that data folks are going to get more ownership over things that sort of should be in their purview, but aren't necessarily at all companies. Some companies are doing this very well where they're using data where data can help. But I think not all companies are on that stage of maturity. So I'm expecting sort of more maturity from the space where people are getting more sophisticated about using data when sort of applicable.
And to throw out a company example that I've heard really great things about in the past and seeing Hilary Parker talk for when she was there would be Stitch Fix. Right. So the way she kind of presented them is that anything that can be informed by like the data scientist skill set well or better than what you would get, you know, leaving that as someone else's decision who doesn't have a skill set that's being put in the data team's hands. So I thought that was a really cool way of framing it. And I think more companies are going to be moving that direction.
Human in the loop
Yeah, of course. So Human in the Loop is sort of this framework that says you can rely on any automated system or algorithms to generate some of a result, but ultimately a human has to be in the loop somewhere at some stage, right, verifying the output, changing it as necessary and is basically able to improve upon the system if it were a computer alone or a human alone. So I can give some examples of that.
So the first one I'll give is from our world, right, where we're using a lot of time series techniques to forecast the future. And the time series technique is if it's univariate, we'll just look at historical data. How has this been trending? What is expected seasonality? And we'll extrapolate that forward. So that makes sense when the future state of the world is sort of structurally similar to what we've experienced in the past. But if we were to make a change and it's an anticipated change, that's where the human has to get involved. So an example I'll give you is we were growing a certain platform on the on the service, you know, pretty quickly over time, and it looks like it was a predictable series. But we tried to make some changes to the user experience there that we thought would actually dampen growth. They would have an adverse effect on growth going forward. Now, the model has no idea that that's happening, right? There was no way for us to sort of pass that in. So we create sort of what we would expect to happen under the old regime. And then we layered in assumptions we got about the new one. And sorry for being vague, but I shouldn't probably go into too much more detail. So that's where it's like you're accounting for the fact that the algorithm doesn't know this change is happening in the future. And you use an alternate set of data and assumptions to do that tweak.
A really cool stitch fix example I heard is when they're actually selecting clothes for folks. So there's a lot of feedback that people can give on prior sets of clothes they've gotten, which a computer may or may not be good at really understanding. So if the NLP side of their business isn't fantastic, it may not realize like, OK, this person's OK getting pants, but like they still want some, but they don't want it always. And a human can do a good job of saying, OK, maybe I should send pants in a kit every other kid or every every third or something. So it's an area at which you would not expect whatever algorithms you have to perform well. A human has to intervene and and kind of adjust what you would get. And you get a better sort of outcome with that.
In terms of resources, I'm not sure if it's addressed in this book, but it's one of our favorites. It's by Rob Hendeman and it's on forecasting. If you go to Otex.org, it's actually available for free. They're forecasting principles and practice. That's what my team relies on a lot for a lot of the work that we've done.
Getting into data science and hiring
Yeah, that's a good question. I guess I would say like I started in college that this would be the beginning of the journey, like studying economics and mathematics. And it's funny, like you see a lot of data science concepts covered then, but they weren't branded as data science in school. So I feel like I was picking up the techniques there. And my first role was in economic consulting, which is a very esoteric area. But it had to do with it was in the litigation space, companies suing companies, companies suing governments, governments suing companies and trying to understand like what were the actual damages associated with wrongdoing and was there wrongdoing? And you could rely on econometric analysis to do that.
And I think where I made the leap into data science was while I was doing that work, where I think I got most interested and motivated was was when I started learning how to do functions. I think that's where it really clicked for me of like you could write a piece of code that makes your ability to do this in the future much, much easier and I can make my life easier. And I started to basically talk to friends who were who were going into data science from my network and realized like that is a space in which that sort of attitude gets rewarded. So it's like, OK, I'm interested in this. I'm excited by this. Let's start to find jobs that like like reward that, which I think is broadly the data science space. Right. So I made the jump into tech and started working on picking up the skills.
So I'm going to answer specifically for my team at Pandora what like what we do and hopefully this is helpful. Don't necessarily extrapolate this to like all of the data science ecosystem. It's obviously very broad and people look for different things. But what I look for at the technical level is somebody has to be sort of like maybe facile is the best word. They have to be very comfortable with, I think, the common coding concepts that are going to pop up on the job. And I want them to sort of proactively motivate. Right. I don't necessarily want to be the person that says, like, you should write a function here, right? Like the person should be comfortable enough with those concepts to sort of motivate them in their day to day work.
On the personal side, what is tricky for us is that, you know, we end up working quite a bit with stakeholders and very early on when I hire people, I may have them in the room with stakeholders alone. Like, obviously, I'm there to guide. But the ideal is to get them at the point where they're communicating well and presenting well, where they can handle that themselves. And then they can also sort of scope the sorts of questions that they get or follow up requests that they get from the stakeholders. So ideally, I want to get someone who says, OK, you know, I see you're asking for this, but if I understand your problem correctly, like I'd rather do this. I think this is actually the thing that's going to solve it for you. So people who are able to do that and what I've found is that you have folks who sometimes are very good at the technical side, but they're not as good at the other sort of skills that we're looking for and vice versa. So when we do our interviewing, we kind of try to emphasize both.
Other beneficial personality traits I look for in addition to communication are and this sort of comes out during the interview is interest in the sort of problems that we work on. I think it's really good to signal that during an interview. And what I found is that folks who do that during the interview process, you know, tend to continue to do it on the job. I haven't seen very many good actors. So when they're like, oh, you know, do you all look at this data or do you do that or have you thought about this? Those folks tend to do very well in the role because they bring that sort of brainstorming mentality to the role. And what I found is they're able to solve things in ways that, you know, I wouldn't have thought of. Other team members wouldn't have thought of. It really does help out a lot.
Working with stakeholders and teaching communication
What I would recommend is have like a mentor at your company, it doesn't have to be your manager, but somebody who has those skills and reach out to them and say, I would love to go over a presentation with you or just invite them to your meeting. Sometimes that's a great way to to cheat at getting feedback. But just the ability to make myself available and give them feedback before important meetings or sit in the meetings and guide them. And then also give them the heads up that that's happening because not everyone likes to be caught off guard while say, OK, like, you know, let's move on to this topic because I think someone's interested. Or can you explain that more? I don't think everyone got it. Stuff like that. And over time, I think that reinforcement really helps out because eventually it's kind of like training wheels like you won't need them anymore.
Data scientists, I think, on average, tend to overestimate how much everyone else knows about data science. So if you're like, am I over explaining it or under explaining it? Odds are you're under explaining it. And again, depending on who's in the room, you know, again, we're thinking like stakeholder management, folks who generally don't have a background in in the data side of things, I would err on the side of over explanation and then always give them an out. Right. You say, if you already know this or I'm being redundant, I will skip the slide or skip this section and we can move on.
Data scientists, I think, on average, tend to overestimate how much everyone else knows about data science. So if you're like, am I over explaining it or under explaining it? Odds are you're under explaining it.
Pandemic effects on time series models
We did. Yeah, it's the effect is much more muted now. But at the beginning of the pandemic, there was, you know, a huge change in the way in which people listen to Pandora that, of course, is now, you know, when extrapolated forward, having a lot of problems. So what I was kind of proud of was our team has, you know, the knowledge of how our algorithms work. Not only do we implement them, but we understand how they work. And we were able to very quickly identify, like, look, the outputs of these are not going to make sense. And so we had to sort of pull alternative models online, which I think we did within two weeks or so, which I'm pretty impressed by.
And I would say, like, if you're dealing with business side data, you should consider that a part of your job, because, again, there are folks who place a lot of emphasis on, you know, can I put these things into production and can I run them? But if you're not making sure that they're resilient to things like this or you don't have a plan for how you would fix it, I realize not everyone wants to plan like possible pandemic into like every month of the work that they do. But if you don't have some way of pivoting quickly, you know, in the eyes of the folks you work for, they're like, wow, that person's like not really doing their job. And, you know, we pay them so much because they have all these skills. But like in some ways, you want to be resilient to these sorts of changes.
So what I can tell you all about Pandora usage is like a lot of it we found is associated with routine. And so when routines get disrupted, you see large changes in how people listen. And we saw that happening in March with shutdowns and with panic buying. You know, you're buying toilet paper, you may not remember to put on Pandora in your car when you're stressed out. So we did see that. And, yeah, accuracy is what we use. And it was just like this seasonal kink that, you know, you would not expect. But for, you know, major disruption happening.
Tech environment at Pandora
Yeah, so we're on Google Cloud Platform for all of the sort of raw and upstream data that my team relies on and the data engineers get it there for us. So that's where we start working with Pandora data and we'll do aggregations there. You know, we have to handle ourselves. Sometimes logic needs to be applied to classifying various platforms. We don't want to deal with the laundry list that they have. So from there, we will get it into BigQuery or Postgres server we've just used for a long time and we've not migrated our jobs over to BigQuery. And then we'll connect to that with RStudio. So we're using RStudio either server for sort of the bigger projects that we have. And then we'll use or if RS Connect is needed for a shiny dashboard and then sometimes a personal laptop just when it like doesn't really matter how much firepower you need.
And then for visualization or end product, you know, sometimes it's just dumping the data back into something like BigQuery or Postgres. Sometimes it's, you know, a CSV that other folks in finance can ingest and use in Excel or something that can be uploaded to Anna plan, which is how we manage a lot of our financial reporting or a shiny dashboard is what we sort of default to. And then we've inherited some work that's in Tableau. So we'll do that as well.
And what I would say is the way I like to explain it to stakeholders who are used to working with folks doing a lot of manual reporting is I say like one or zero click to send this report to give them an idea of, you know, you may have folks putting together stuff in Excel and pulling manually from different sources. We're able to sort of create this whole sort of pipeline or ecosystem for a lot of the problems that we do that require one click or zero clicks. The one clicks usually just to check to make sure everything makes sense before you send the report out.
Presenting work: pretty vs. impactful
I would say like, the first thing is, it's dependent on who is the executive that you're presenting to. But by and large, at most companies, my experience has been the executives who hire or green light data science teams are not well versed in data science themselves. So a lot of I think the onus is on like management and leadership and data science to explain to them the value. And I think, you know, a good executive is not going to try and like pick you apart for not presenting stuff while they care about ultimately the bottom line, usually.
So, you know, better results should work, but I think it's the way in which you frame them. So there's the idea of like making it pretty and making it impactful. Pretty, you know, the iBankers are good at, the management consultants are good at, and execs are used to working with them, and they're used to seeing things that way. But if you were to do like a summary or output of a table that shows how much money you're saving the company through your optimization efforts, and that's in the millions of dollars, my hunch is that you don't really need to, you know, put that number in the same font as everything else in the deck and make sure it's standardized. At the end of the day, I'd say that's what matters.
A classic example is we had a model that had 80% accuracy. I made it 84. But without doing some work to explain like what is the value of that to the business, you know, executives can sometimes be skeptical. Also, like they may not be well-versed in the problem space. So they might say 80 to 84, like that's not very good, right? Like that doesn't seem like you did a very good job because they haven't been working at those problems as long as you have and understand sort of the hairiness of them. So I guess more broadly than just dollars, I would say like speaking the language that they care about and are receptive to. If you're in people analytics, it might be what are the things that we've done that have resulted in employee retention, which again, ultimately makes its way down to dollars. But make sure to coach it in like what they care about and what they're expecting.
Because the way I like to view data science is, you know, at the end of the day, like they're hiring us because we have a skill set. It's a tool and it's a tool being applied to problems of the business. It doesn't matter how fancy your tool is if it's not solving the problems that the business has. So try to view it through their lens.
It doesn't matter how fancy your tool is if it's not solving the problems that the business has. So try to view it through their lens.
Skills for data science leadership
So to be a leader, I'm going to think about that piece particularly. So like, what is the difference between a good data scientist and a good data science leader? And this relates a little bit to what Bruno was asking. So and I sort of had this as a rude awakening during my career, which was as a data scientist, right? A lot of emphasis is placed on like, what are the tools and techniques that you have? You know, why are they better than doing it manually or doing it under, you know, older styles of, say, like an Excel based report or what have you? Like, why do we need you and why do you do what you do? And so much emphasis was placed on that.
But less so, you know, on the communication, my manager was handling it. And certainly on like project scope, right? It wasn't really my job to say, like, this project should entail this or it should do that. Where I see like data science leadership, like what people who are doing well in that space, you know, the skills they have to develop and they may not be developing as an IC or obviously the communication presentation, being able to understand what are the needs of their stakeholders and then how can or cannot data solve those problems. So I say can, obviously, because that's the job. But can't is also really important as well. Because if you have a stakeholder come to you and say, we want to train a model that or can you build a model that does XYZ? Like, it's going to be great. And you say, yeah, I don't actually see how that's going to help the business. And you can sit down with them and explain why or say, historically, we've tried to develop models here and we haven't had much success. There's nothing I can think of that I think would do better. So instead of just sitting down and going and do that, you're the person who has some say around, you know, I think this is more likely to be successful and actually advance the needs of the business.
And so, Rachel, that totally ties into what I was saying earlier, when you interview day one or day zero or negative one, depending how you look at it, if you understand the business, I think it makes you much, much better able to do that than if you don't really know the inner workings of it. So the idea of proposing here are the projects I think are most likely to succeed and then getting people on board with that. Maybe this is a lazy analogy, but it almost feels somewhat like VC-like, like you're a data VC in some ways. Like I'm betting on these three projects to deliver value for the business and like I need you to buy into them so I can have someone go out and do them and help us be sort of most successful.
I haven't had a ton of experience with that, but I would say right now the state we're at for a lot of sort of data informed projects at a company is that human in the loop is a great way to go. And if a human is in the loop, then you're not really threatening someone's job. The idea is I think that you are advancing the value that they offer because you're giving them tools that they can use in order to do their job better. That is not always the case, so it's sort of dependent on how things are structured. You can't tell someone, oh, it's human in the loop and then cut them out entirely. But for a lot of things that we see, and I mentioned that example earlier where like we knew given the cultural climate and the global pandemic, people would be more interested in health and news related to that than history. We need smart, good marketers to say, okay, I'm going to let the algorithms pick certain things to put in front of people, but I need to reserve this space for something that I personally am willing to bet folks are really interested in or getting increased engagement.
In general, what happens with the progress we make on the machine side is that we kind of let people add more value because it's displacing sort of the easiest tasks. Again, this is on average, not always, but that's what I've been seeing, not just in my company, but at others as well. So I would say it's not so much threatening to your job as a whole, but it does mean that if you're somebody who's been doing a job without a machine learning model, you need to understand how do I work with this thing? Because, you know, it's not going to go away.
Advocating for data in decision making
So this is where being in finance has been hugely beneficial. And I mentioned I was skeptical when I first joined because finance teams don't, you're not generally associated with data prowess, no offense finance teams. But being embedded in finance has helped with that hugely because at the end of the day, the decisions that we are making are coached in those terms, terms that everybody cares about, obviously, the CFO and the CEO as well. And so once you've got that framing down of like, we're trying to understand how, you know, if this initiative, proposed initiative will be breakeven, or we want to understand what's the utility of missing our forecasts, like how bad is 1% miss and accuracy? How bad is 2%? Like, what are the downstream effects of making these mistakes? That has been hugely beneficial.
So I would say the advocacy that I had to do over time was not me trying to convince people of my team's worth. It was more just like producing, hopefully a drumbeat, constant drumbeat of wins and then showing people like, look, this is what we do and how we do it. And you need to trust us and involve us in the process. And we've had times where folks have tried to go around us and try to sort of sell a decision without our involvement. And, you know, our leaders have always backed us up and said, like, no, you need to involve, you know, this group or the finance team or the group within the finance team to make sure that that's the right decision.
Yeah, I guess while you all are thinking, I might add, I do expect more organizations to start embedding data people in the finance org, or if there's like a hybrid org, right? So you have the data org where we're decentralized. But if you have the hybrid org, so data org, but everyone has sort of different groups that they support like dotted lines to ultimately that reporting into the CFO, I think is really powerful. And I've been seeing it more often just talking to folks in this space. And I think that's going to become more popular.
Finance basics for data scientists
I would say the basics will get you a long way. So revenue, cost, and then profit, which is sort of revenue minus cost. And then specifically, how does your company generate those revenues? How are those costs generated against those revenues? And then what are the ways in which your company is looking to sort of change over time or improve over time to improve those things? And then the last question that you can answer, right, is how can I use data to do that? And then that's you kind of doing the job.
But it is interesting how many folks you talk to with a very strong data skill set who, you know, as I mentioned earlier, like, how do you put a value on accuracy improvement? So they've built their whole careers around, I'm going to learn models that can improve, you know, the accuracy over whatever historical thing you were using. And that's me doing a really good job. But in the eyes of the stakeholder, if there isn't someone to do the translation of, like, here's what those accuracy benefits mean in terms of dollars, and it doesn't have to be dollars today, it could be dollars tomorrow, it's really difficult to sort of make that case of what it's worth.
Generalists vs. specialists and hobbies
My experience is generalists. But again, that's me as a leader saying I know that I am often unable to predict what future needs of the team will be. And in order to accept that, you can't hire specialists because if they're specialized in an area that gets de-emphasized or something, they're not going to be as good of a fit as someone who's a generalist. So, my team hires generalists. Obviously, it's team dependent. But I mentioned before, right, we don't expect you to be 10 out of 10 in any specific thing, but you need to be like seven or eight on a bunch of things because you are going to get pulled in different directions and sort of have to do different parts of projects.
Yes, to the first half. I haven't quite figured out how I want to apply it yet, but I think the applications are there. And I guess I'll share a cute example of me doing an explore-exploit algorithm manually. So, I love cooking. And I'm cooking kind of nonstop. And so, I do it every day. Even if I'm really tired, I find energy for it. So, it's a very sticky hobby that I find really rewarding.
And there are a lot of cooking analogies about the data science process. But for me, I think what has been most useful about understanding data or maybe technology more broadly is there's all these sort of things in the cooking process that are kind of tweakable. And I personally think the modern way in which people interact with recipes doesn't really account for that. And so, what I've been playing on the side, and I don't know if this is a data thing or an app or something that I'm not smart enough to build, is how do you better represent that in a way that people who are either very good chefs or very good data people can better interact with the system? So, as an example, I'll give you is like I've tried a bunch of different ways of making the same recipe. But when you go to a website, you just see the one that someone did that they considered to be best. And you don't really see information about how many things do they have to do to get here?
Because if you read comments in recipes, a policy I have is read the most upvoted comment and do what that person says always. Your default should always be do what they say. So, classically, you don't need that step or that's too much salt. Do what they say and then adjust the recipe because the recipe that gets posted on the blog is often not final. So, I like that a lot. And I'd love to just see the ecosystem get somewhere where I think it's accounting for the fact that everybody's kind of experimenting. So, in two years, if I'm working on a startup in stealth, that's what I'm doing.
What I would recommend folks who are getting into this space to do is I say, you know, pick a project you're motivated about. And most people are motivated about their own data. So, the two areas I like to point them to are health and personal finance. Again, you have to have some prior interest in them. But those are great ones because it's like on the one hand, you know, do I want to muck around with an iris data set, even though I don't really know what an iris looks like? Or do I want to figure out where I'm overspending and can I predict future spend? Like, it's kind of, I think, a little more fun and exciting and tangible.
Chasing interests at work and closing thoughts
So the way I've seen, I have not been great at this myself. But the way I've seen people who do this well is one, they do the core job and they do it well and they understand sort of what's me doing a good job or is a great job or is an amazing job. And they're generally smart enough people to pick where on that spectrum they want to be. So these are usually very bright people. And they say, I could do an amazing job, but I'm just going to do a good job. And then I'm going to invest that remaining 20% of my time or whatever it is into stuff I care about. And I'm going to do it in a way that I can sort of spin to my manager or leadership is like this has potential. We should look into it. And when you do that and you're doing a good job, I found it sort of rare for someone to say like, no, I don't want you doing that, right? Because you might leave. And now they have someone not doing a good job.
So I found that that's the way to sort of make it like politically palatable is to say like, look, I'm already doing all the stuff I need to do. But here's where I want to sink some extra time into. And then you kind of coach it in a way that like is useful to them. So research and testing gets thrown around a lot. I want to research these techniques. I think they might be helpful. If you're doing a good job, it's usually easy to sell. If you're not doing a good job at your work, it's harder to say, well, I want to try this and I want to try that.
Forecast is what we use for a lot of stuff. We've tried to implement profit as well with mixed success. Again, we're doing a lot of time series work. So those are the ones we generally rely on. And then for plotting, I think we're still doing ggplot for a lot of stuff. But yeah, forecast is the one we rely on. So forecast is the one we've had the most success with.
Yeah. Thanks for having me and thanks for showing up, everyone. It was really great to see you and chat with you. It's been a lot of fun.