Resources

Data Science Hangout | Elaine McVey at the Looma Project | Communicating the Value of Data Science

The Data Science Hangout is a weekly, free-to-join open conversation for current and aspiring data science leaders. An accomplished leader in the space will join us each week and answer whatever questions the audience may have. We were recently joined by Elaine McVey, VP of Data Science at the Looma Project. 21:30 - How to approach experimentation and running tests 43:30 - How do you communicate the value of data science to executives 52:40 - Ways to improve your communication skills as a data scientist 57:15 - How to package insights to executives 1:01:03 - What data scientists get wrong when communicating insights ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.rstudio.com LinkedIn: https://www.linkedin.com/company/rstu... Twitter: https://twitter.com/rstudio To join future data science hangouts, more info here: rstd.io/datasciencehangout

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So welcome everyone to the Data Science Hangout and welcome back to those of you that have joined before and I think most of you have joined here before but welcome also to first-timers as well. So this will be a discussion on data science leadership in the enterprise and ranging from very human oriented topics to a little bit of technical talk as well. So just trying to hit that sweet spot and focusing on questions that are most important to you all as well. So as in prior weeks please feel free to speak up and ask questions or use the chat window as well. But with that being said I'd love to introduce Elaine McVey, VP of Data Science at the Looma Project. Elaine would you be able to help kick things off here by sharing a bit about the work that you do on your team?

Sure yeah so the Looma Project is a small early stage startup and I've been going to smaller companies each time I switch jobs for quite a while now and so this is the smallest place that I've been although we're growing quickly and so we do storytelling via film in grocery stores and so we have tablets with like 30 seconds of human centric films essentially so this is our take on more human oriented storytelling oriented replacement of traditional advertising and so there's some interesting things about looking at how to understand what components of film drive connections with consumers. So we have some film analysis things that we look at and then also looking at time series of sales and understanding the impact kind of causal impact of things that we're doing. So a lot of different interesting angles and going along with being a small company it's a small team so there's three people on the data science team currently and so yeah so a lot of my experience has been working with companies that are rapidly changing in smaller environments with smaller teams which is a different set of challenges and opportunities than having a really big established team at a larger company.

What's exciting in data science right now

I think that's something that probably resonates with a lot of people listening in now and one thing we've been asking everyone every week is what are you really excited about with data science right now?

Yeah I think the one thing that's been on my mind a lot then and I don't know that this is quite in the core of data science but I think comes up frequently in my experience at work and also kind of driven by what's going on in the world but I've been thinking a lot about how we communicate and package data science work to stakeholders or the public or whoever is relevant in a way that kind of balances the analytical integrity that we're responsible for with some level of pragmatism about how humans actually make decisions for better or worse and so I think that can take a lot of different paths but it's partly about like really understanding where data science fits which I think varies a lot from organization to organization and we can kind of influence a huge range of things but generally aren't doing all of that in any given context and then figuring out like what is the appropriate level of rigor for what you're doing and the way that the results will be used and how do you connect that to the decisions that will be made and what the human factors are in making those decisions.

So some of the things that I've been thinking about in that regard is whether data scientists can get farther into the process of really taking detailed results and getting to the what's often like kind of the packaged end result which is painfully simpler often for the people who understand all the caveats of what goes into it but if by doing that we can be better representatives of kind of the integrity of the process and also have more impact by moving farther up the communication chain as it were.

Is that something that your team is starting to do?

Yeah we've thought a lot about how that plays out in different contexts so for some things that are operational decisions then kind of a higher level of detail and getting people to understand more detail can be important whereas things that are more marketing or sales specific we need to understand in detail but then also need to really be able to communicate in very simple high-level ways and I think it takes a lot of concerted effort for people who are trained to do analysis and think about uncertainty in very precise ways and all the different outcomes and ways to interpret things and a lot of caveats and set that aside and think about what are the big picture takeaways that will matter to the audience who's receiving this and how do I simplify all the complexity without losing the core of what's actually going on.

And so I think we also like just observing the world trying to deal with uncertainty through the COVID pandemic and all that you know watching science play out in real time in a way that is difficult for people who aren't scientists to handle with things changing and a lot of uncertainty has really made me think about how that applies also in a business context and whether there's a you know whether it's better to provide results and let someone else decide how to present them to the audience or whether it really makes sense for us to level up our ability to do that to be more involved in where things end up.

I think about that a lot when I see things on Twitter as well if I like look at a visualization real quick and try and figure out what did I learn from that was I able to even answer a question there.

Yeah and I think it often takes a lot more effort than we wanted to to get something to the point where someone can get the right takeaway really quickly in a way that's still connected to the data and so part of what I've the shift that I've been making and encouraging my team to make is that that effort is worth putting in in most cases and that we shouldn't think of it as like our job is to get the right numbers and then all the polish and communication and everything that's kind of like not the right emphasis because we're focused on the analysis and really think like we are also the best people or at least valuable people in doing that translation and even if that takes just as much time as the analysis you can almost not invest too much in that in terms of the impact that you have.

Even if that takes just as much time as the analysis you can almost not invest too much in that in terms of the impact that you have.

Time series modeling and COVID impacts

I see a question that came in from Ian and he said he has a lot of background noise or else he would ask live but this question was you mentioned using time series to analyze or derive a causal link to your advertising what kinds of models do you use for this high level is this like multivariate.

Yeah we're just getting into the best ways to do this and I think there will be a few different models that we're that we're landing on for different purposes and so there's the need to both be able to forecast and to be able to understand what's happened and it's I think we'll have different models for those two purposes so we've just gone through a process of evaluating multiple modeling approaches and trying to determine the right one and then are now identifying data that we really wish we had to add in and that will probably be another iteration as usual in terms of evaluating the modeling approach but some of the models that we have are considering time series specifically and some are not and so it kind of depends on what the outcome is.

And again it's the challenge like how do you communicate that to the stakeholders of what's going on how much do they need to know about the models and the differences and what are the takeaways.

Christian just asked a question as well in the chat Christian if you feel comfortable and want to jump in and maybe just introduce yourself and feel free to ask that as well.

Sure can you all hear me yeah all right cool so I'm Christian I'm waving but my camera's not on so but I'm waving I'm the data science manager here at cosmic pets we are basically a pet hard goods wholesaler but of course we work fairly closely with a lot of our retail partners one of the things that I'm really responsible for here is like forecasting making sure we have enough inventory and obviously over the last 18 months that's been an exceedingly difficult challenge with how COVID is really affecting like everything and now even I mean obviously when it was here like sales were crazy for some things and really really crazy for others but even now that things are starting to level out it's like a lot of our models for seeing like for the last year like everything every all the patterns that were normally there in our data are just different are you guys seeing that are you running into that how are you kind of working with that.

Yeah I am by no means an expert on time series or forecasting but my initial thought and looking at data a few months into COVID is just like every model everywhere is now going to be broken for a long time because there's just so many impacts and this kind of time so we definitely saw you know there were major impacts on grocery purchasing that kind of spiked and then have gradually leveled off and declined somewhat but are still above previous levels and then also a lot of changes in how people purchase things and you know also like people looking at how much of that curbside pickup and online ordering is going to stay versus people reverting back to going in-store so certainly has accelerated a lot of behavioral changes.

In terms of data this is one of the reasons that some of our modeling approaches aren't time series based and are instead comparing things in different ways so we have some things that we can compare to like other things in our in the categories that we're in so categories are you know like wine dairy things like that and grocery for people who don't live in this world and so we're able to look at the overall trends and compare the impact of what we're doing to those in a way that isn't explicitly dependent on modeling history but even before the pandemic it was hard to forecast for a lot of specific products what to expect because of a lot of new products and seasonal changes that affect them differently and there's a lot going on so it's a challenging problem to begin with and then COVID changed everything.

So yeah it will get easier now that we're getting into some historical data that's been all in the COVID period but obviously there are still effects and I imagine there's no perfect solution except to think about various ways to where we have the advantage of being able to compare things in a way that if you're trying to forecast a very specific thing then you've got a harder problem.

Just that looking through the chat looks like Matias you may have something to add on to that as well.

Yeah no I was just completely agreeing with Elaine what you just said we were trying to build a marketing mixed model here internally for one of our brands and we absolutely did have that problem of now all of a sudden we have COVID affected data in our test set so it's hard to actually verify how good a model is now because all of those time series are just messed up.

Yeah I think it's a good opportunity to be creative and thinking about other ways to tackle the task at hand but if you have a very specific forecast I need that painful situation I don't know if anyone else who has more expertise in this area has good suggestions.

Yes find could do the magic like if you like for example like down employment that went up really fast if you add spline and then make assumption about the decay you know this is a point that you don't know where it's going to land off so we can do different decay rates and suggested like a what-if we're going to the fast recovery slow recovery and then use it to future similar event like it's like if you know first wave we had the first wave and then like we're going to have like a second wave how it's going to look like so I found it is something really useful.

Yes yeah prepare for a range of possibilities instead of trying to really pinpoint what to expect is a better approach.

Yeah I played around briefly just trying to add like some indicator variables for like different phases of COVID and trying to get that in there but are in a lot of cases our data isn't necessarily dense enough anyway so it's like just trying to parse that out between what's actually COVID and what's true seasonality can also just be a challenge.

Yeah I imagine there's a lot of interactions that are hard to predict also like for pets I know tons of people got new pets during the pandemic and so you just correct for kind of the overall level of demand the things that people need might be different if there's you know more new pet owners and more puppies and yeah and people are really more engaged with their pets too so we saw things just like um like collars or leashes or harder goods that like dogs like pets won't chew up will spike like had like one spike and then they more or less kind of like came down to normal maybe it's partly because of new pets but then other things that like toys that they would like destroy would have like more and it's more of a sustained increase there's there's all kinds of stuff in there.

Yeah so that's all fascinating unless your job is to predict what's about to happen it's so fascinating it's just I'm balding at a much faster rate than I was pre-COVID.

Getting data scientists a seat at the table

And I see that Eric asked a question around one of your previous points Eric do you want to weigh in and ask that live.

Yeah sure can you all hear me yeah all right cool so Elaine it's great to have you talk to us again I know we've had some great conversations of past RStudio conferences but I know for every organization can treat this kind of differently but I don't know if you had any tips on you know maybe setting up data scientists to be more involved in key decisions that are made and like an organization level at least in my industry that's becoming more and more common now especially with the type of data we deal with I don't know if you had general tips for those that kind of want to have a seat at the table so to speak and those in those key decisions.

Yeah I think you know some of that is like organizational where data science sits in the organization and and how high level representation you have but in terms of what's easier to control for most data scientists I think it is the best approach is to learn a lot about the business and understand how to communicate with your audience and do more of the translation of not like this is what I did to get to this result and these are all the things you need to understand about it from a data perspective but to say like this is what this means for you and you know like the more we can present things in a way that fits with how people are ready to receive things I think that's part of building that credibility and people feeling like the data scientist isn't this you know brilliant magical expert who sits in the corner and then other people have to take what they do and figure out what to change but that we're like constructive parts of the business and decision-making conversation and we can kind of keep all the critical details about what we have to do to get there out of the way and except when we're talking to our teams about the work but that's hard you know even if it's something that you can do well it's hard to do in any given situation where you just come out of you know putting all this work into a project and like extract the details out of your head when you're talking about it.

Experimentation and A/B testing

I see that Don asked a question and was wondering Don if you could provide some context or background with regards to some of the work that you do.

Yes yes sure can you hear me okay yeah hi yeah my name is Don bunk I'm a data scientist at X's digital which is a digital analytics company in Burlington Vermont I was just curious because our fields have some overlap Elaine I was wondering I can imagine there's a lot of confounding factors in your field and you're trying to analyze your data and I was just wondering if your company is able to make use of A B testing at all or you know by whatever you want to call it A B tester and testing in general and or I was wondering if just by you know the virtue of your clients if you're limited to lack of a better term like retrospective analysis and as a follow-up question if you're not doing that I was wondering if your clients are actually interested in doing things like that or if you feel like you have pushback about things like this because it requires like a scale up initially you know to where it's the retrospective testing they people just dump the data on your desk and say here you go figure it out but A B testing requires data scientists to kind of have a seat at the table up front as well to organize the test so I was just wondering if you're making use of this or if you just feel like there's pushback about stuff like this from your clients.

Yeah that's a really interesting question so we're trying to do both both do experiments that are designed in a way to let us do better cleaner analysis and analyze the data that we have despite confounding factors and so there's a lot of different ways to approach experimentation so sometimes we can compare sales to control stores and things and that can be helpful but doesn't really scale in our case to our standard campaign so it's more of an exception to do some pilot experiments and once we've fully rolled out then there's a really interesting challenge there where people do want to do experiments both within the company people are interested in doing experiments and we have clients who are interested in testing things but there's a lot of thought to put into how to do them and some of the factors are figuring out how to do experiments that don't have a huge additional burden of work so for things that you know things like films that are a lot of work to create and aren't just digitally generated and where we have you know a limited number of places that we can test different things then there's thinking about like how do we do this within the context of how the business operates in a way that makes use of things that we're already doing or that adds you know an amount of work that's acceptable.

So that's one thing and that's where being at a small company it's much easier to understand all the different pieces of the company and how they work together and how you know this great experiment that you're thinking of in isolation how that plays out in reality for all the other teams who would have to be involved and also just a lot easier to have those conversations because we all know and understand each other in terms of our roles.

The other interesting thing is that you know if you're trying to measure the impact on sales you almost by definition have to sacrifice some sales in order to do an experiment so to detect an effect will there have to be something that we think will impact sales and then the question is well do we know how much and how much is too much for something that we're running where you know we're trying to run something that's effective and so we don't want to intentionally make part of the campaign less effective just to measure it if we can avoid it so we have right now an approach where we are trying to generate hypotheses from historical data knowing that there are some confounding factors that can't entirely be separated out and then also think about how we test some of those hypotheses with experimentation and being creative about the ways that we can do that and so we have multiple possibilities that that we're thinking about and we do some experiments currently but but would like to build that in as a more regular piece of what we're doing.

Cool thank you very much for the info I appreciate it could I add a question on to that how do you ensure that experiments are taken all the way through right so a challenge of mine is set up a bunch of experiments and then some drop by the wayside.

Yeah yeah so so I guess if you have a series of experiments that you kind of need to work through to get an answer then that then that follow-through can be a problem we have fairly discrete time periods that we work with and so we can plan an experiment for a particular time period and customer set of stores and run that and and analyze the results and it has to be set up in advance in a way that kind of requires buy-in from everyone so we haven't had that problem but I think just generally getting people on board with with experimentation as a valuable and and feasible thing to do is probably an ongoing challenge in a lot of places but I think when it when it works out well and you can say we think this is what's going on and now we can do a much simpler kind of head-to-head comparison then it's easier for people to understand the results and think through it themselves and not have this complex model in the background that's that's a bit mysterious so I think when it works it can build a lot of credibility with what you're learning but yeah figuring out how to do experiments without making everything really complex and hard on people is definitely part of the challenge.

When you say fall by the wayside like what do you actually mean like why Robert like why do your tests like can you elaborate on that?

Sure so full disclosure I'm not a data scientist but we do some degree of testing on our marketing team and when I say fall by the wayside right so I'm I don't always have an experiment plan like so I'll have I'll have a way to measure two things set an A B test live set a few live and then one sort of you know catches my interest more than all the others and it is sort of stop paying attention to the test right so the failure I guess is not having a clear objective and then not having a clear time frame and then I blame my teammates for not holding me accountable.

So I think so we have a testing a whole testing experimentation team where I work and I know they take like the roadmap very seriously so any test gets scored in a way that they try to understand the effort of that the impact and all of those things so maybe some of that would help if you had like all of your testing ideas and you would kind of put them in a list of priorities of you know how much business impact did they have you know is it gonna require a lot of dev time to set that up what's the actual you know impact versus the you know ease of rolling this out and then kind of just focusing on your top ones that might just naturally help you you know prioritizing the ones that you feel like are gonna actually move the needle.

Do you know how that team prioritizes experiments like do they are they just kind of taking requests from stakeholders or do they have a really clear mandate?

Yeah so I sit in like the marketing team so I lead a marketing analytics team and that team also sits inside of marketing so most of the projects that we are getting are on the acquisition marketing end but we do try to like democratize the in the sort of incoming ideas so if you if your function lives completely outside of marketing that doesn't mean that we wouldn't like not take those tests and then there are several frameworks that you can do to actually prioritize tests we internally use like the there's like this rice model so like reach impact effort to kind of like just try to kind of get an idea of like you know what the business impact would be for a project yeah and then it essentially gets built and put in production.

Interesting it's interesting that there's a whole team devoted to that one of one of the things kind of related to testing that's higher level that I've seen be effective and just kind of how people think is just trying to get people to articulate their hypotheses and that doesn't have to be necessarily things that are easily measurable with data in the traditional sense but just like what do we think is going on with our industry and where we play in it and and how our customers operate and things like that and then just writing those down and revisiting them I think helps people think about what are we learning and when do we change this kind of hypothetical knowledge that we have based on new information whether it's data specifically or you know the preponderance of the experience of the sales team or whatever it is and that seems like a little bit more lightweight way to get everybody on board with thinking that way and then obviously there are some things where data and structured experiments are important to answering those questions but even just the structured thinking I think is helpful and staying on top of things.

Upskilling teams in R and SQL

This may shift gears a little bit here but there's a question from David related more to training and David would you be able to introduce yourself and add some context to that question as well?

Yeah I'll be happy to so my name is David Dreyer I'm a technical expert in product safety and data science at Sootinja Crop Protection and you know the question here is within our company there's this global effort to increase digital fluency and several folks on our team in product safety are looking to upskill in data science specifically in R. I've been asked to coordinate a series of training sessions so I'm curious if others in those calls have had a similar experience in their company if they have any suggestions you know things that have worked things that haven't worked this just came across my desk yesterday so I just thought I'd ask the crowd and learn more about others experiences here.

To add to that I'd be interested in anyone's experience with supporting people in learning R or or other programming languages versus encouraging people to learn SQL first.

This is where we failed so I probably should not give advice on this like we have a team on finance today using Excel and we got to the point that we all recognize that they probably should start use R more but we did not find the time or the method to really do this so I'm looking to hear.

Hi everybody I just want to step in my experience was okay I should introduce myself as well I'm Steve Charlesworth I'm a data scientist at Cook Medical we make medical devices. I've had a similar experience we have a lot of people researchers using Excel and you know I've kind of taken it actually upon myself to introduce people to R and try to you know assist them to the degree I can to get up to speed and we've definitely had success stories where the you know the people are motivated and pretty sharp to begin with you know you have some good luck there but I feel like to really make that happen you maybe really need like some management blessing and a little more structure and formality around it maybe you know get people sent to training so that they get you know the message you know not only that R is cool and that you could do a lot of great wonderful things with it but that you know it's also something that you know is expected as part of your toolkit going ahead so that's just my thoughts.

I think it comes down to like showing the value of something to someone too right like I think like these top-down trainings just like rarely work right the way like okay everyone's gonna learn Python now everyone's gonna learn SQL now whatever but like if you can show them that you know hey you're doing this Excel based process that's taking you like a day to update this Excel sheet you can run this function in R and actually understand what's going on I think that's the only way to really show or really get people to adopt it because otherwise it's gonna be even after like you can do a month-long training on R and then like if you ask people six months from now you know I don't know how many of them are actually gonna transition into that into that process even though you've given them the business training like really like the only way to get people to actually adopt it is to have them know work with stuff that they already do so like have them actually do a project that they're currently working on and use R with it and you kind of like make them understand themselves that they by using this process can actually help help themselves a little bit better.

I'll agree with Elaine though I would go with SQL first and R I think that's a you kind of like have to have that as a foundation first rather than you know jumping right in that first. Yeah just put in my vote for SQL before R too as they say SQL is intergalactic data speak so you got to know it.

So I guess I'll wait here I've so I've tried this at two different companies and I've had two different results my previous company I tried one I didn't try so much as I was asked to host like a couple of just training sessions just show people hey here's how you do stuff here's how you assign variables and it was like a room of like 10 or 12 people in there and we did it maybe twice and I don't know that anybody got anything out of it right they were all Excel users just you can't go like and expect it to be quick and easy and painless because it's not going to be.

What I've actually started now here at my current company is basically like a fruit forest scorched earth like they're going to learn but more of like so I have actually scheduled with my team and I would take advantage of the fact I have a small three-peat team where there's just three people under me call it Friday fund days we just pick a project and we the first few I I coded I would just read here's how you read in data we're gonna go through we're gonna work this project beginning to end and I did the first one and after a few sessions that way I started going around and we all took we would all take turns it's it's your day coding now it's your day coding now it's your day coding and that's actually worked to some extent like people are more comfortable once you you can't just leave it there though because if they aren't practicing on their own as well those skills will still atrophy.

My experience has been that even pretty motivated people if they aren't actually using R for like half the time in a given week it's just that it can't get to the point where they can get past the initial problems and really make progress and there was one time at a previous company where we had a couple people who were making a really concerted effort to get more into the bioinformatics side switching over from being dedicated lab biologists and they were like to somebody's point about managerial support they were like split time between the two teams as a training exercise for like a year so they were actually immersed in you know doing things with side side by side with people who've been doing it for years and that seemed to work really effectively but other than that I've had a lot of people who've been really interested in learning R but they just haven't been able to work it into their day-to-day jobs and they can't find enough extra time to really get their SQL might be a little bit easier in that regard.

I'll say I'm definitely not an expert on this but um I feel like this question does come up quite a bit at a bunch of different industry meetups or the Boston use R group that I run and I think it kind of goes back to a lot of what people have touched upon as well but really like or tips that people share is really just go and produce something so find something that it's taking them a really long time to do as you mentioned the tears in Excel and how could they go about making that more reproducible and starting to like see the value in using R right from the beginning like for me personally like that's something that I've I've been trying to do in my own role so even whether in sales or marketing for example like after a webinar I would have this like sheet of all the questions that people ask and then trying to figure out how do I just put this into an R markdown document that I could easily share with people and it was a little more painful in the beginning of figuring out how to do it but now that I have it set up I can just do that after every session and it takes like one minute or so so maybe helping them like find those areas where they could make things easier for themselves.

And just one other thought there and I'll turn it over to someone else but I think it'd also be helpful to maybe try and find others on your team that could help you in in getting other people on board too like there are probably other R users at your company even if they're not on the data science team but maybe even in like human resources or supply chain or and they're reaching out to those people and maybe like having a buddy with helping you teach people might be really helpful too.

Yes I would like to add something so hi my name is Cristal I work as data analyst I am from Costa Rica so I want to share a little bit about my experience on this and I think as Christian mentioned at my work there some of my team are working with Alteryx so it's like a drag-and-drop tool so it seems like more easy and they think that it's more easy because you're not coding because once you show something that has some code they just like think that it's really difficult and they do not have the time to spend on learning but what I find useful is maybe if my team would like to use that then is to show the importance of using R or other tools that might be more useful and what I try to do is like with the stakeholders or internal clients every time that I they run a report and then for example they build a report that they have been working on two years there it is like their baby so they don't want you to touch that baby that they have built in Excel and you have to automate that task so what I find useful is to work side by side with them to explain what you are doing to show so they for example yesterday I have this experience in which she just get lost in all of the tabs that she has in Excel so I mentioned her like see you think like if we can automate this in what I am doing then you don't have to suffer to try to find on all the tabs of Excel what you are built so to show the benefits of using R for example I think that's like to make conscious on also the stakeholders and internal clients so then I can share that success stories to my team and then they will find that useful and then we they will try the same to other stakeholders that they have so I really think that it's not so easy like to make that transition in Excel or Alteryx and using R Python but I think that a clue of doing this is to show the success stories and also I think it should be that commitment of the managers and they have they need to have this vision of this is important so we can transfer this knowledge to all of the team.

Communicating the value of data science to executives

So if anyone else has any, thank you very much Crystal as well, if anyone else has anything to add to that feel free to add more into the chat too but one question that I'd like to direct at you Elaine is how do you communicate the value of data science to executives or for people that are just trying to build out a data science team within their company?

Yeah I think if you are the first data scientist or working on building that early team there's some questions to ask which hopefully you have insight into although not always about why you were hired in the first place and so especially if it's a small company somebody is making a big investment in hiring a data scientist and it's good to understand what the thinking was behind that which is not always entirely logical well-informed thinking it can be just like well we have data and data is really valuable and so we need a data scientist and that is challenging coming in but then does you know give you the opportunity to kind of clarify where data science can be helpful and educate people and it's good to know that that's what you're stepping into and then I think there are other cases where there's a particular need that data scientists or analysts are being hired for and that's good to know also so I think regardless of it's good to understand that like why are you here or why do people think you're here and then how do you work with that beginning context and then also look at the business and where there's value to be provided in data and helping executives kind of understand where it makes the most sense for you to focus which may or may not be what they imagined initially.

But I think so so I think it depends somewhat on on the business and the leadership and the kind of the nature of the organizational buy-in that you have and you have to think about that context but more broadly I think if there's something that carries over to most places in terms of data science though it's about making sure that you're making well-informed investments and so if you're investing in product development teams or you're investing in acquiring certain data or in you know optimizing things with your customers pretty much anything it's valuable to know how much impact is that having and what things did we invest in that we're not really getting a good return on and how do we you know reconfigure our focus to the things that are that are providing the most return and so I think that's the kind of framing that fits best with the way that executives are thinking about the business and where data has a legitimate role to play and then how to translate that into what you work on and how quickly you can provide results that let them make those kinds of decisions because sometimes you're walking into a situation where there's been no you know real data infrastructure or collection efforts and so it's it's hard to get things up and running but I think like that that kind of framing that fits with business leadership is the best way to to start in terms of connecting to executives.

How much do you push back and then how do you get comfortable with pushing back or positioning yourself as an expert to the rest of the business that like you know best on what what should be prioritized in in data in data science projects?

Yeah I don't think I would say it that way because it's true for one right I think it's a it's a collaboration of hearing from people in the business what's what's most useful for them and going back and forth between where we the data experts both can provide value based on you know the data that's available and where we think there's the most value to be found based on you know what we think is possible with the data and aligning that with the problems people have and I think the more you can get to the kind of underlying problems they're trying to solve and get past just the straight up you know tell me send me this data set or tell me how this compares to this the better off you are and so the kind of product management approach of asking why and getting down to like a user story type level where someone's saying on I need to do this because I'm trying to accomplish this allows you some leeway to say okay well now you know I understand your problems and now let me help figure out how data can best address those but it's easy for us to see data problems and data solutions to things and that may not align with the most critical priorities of the business so there's like an ongoing effort that you have to make to understand what's going on with the people you're trying to provide value to and especially rapidly evolving companies that's constantly changing so it's a non-trivial task to keep up.

Mike I see you have your hand raised there sorry if I missed it from before.

No no that's okay I just had a lot of thoughts on this topic of communicating to leadership what data science can and can't do I think you know especially once you get to the point of having a data science organization I think a lot of data science scientists will tell you that instead of you know modelers or analytics people there were problem solvers that kind of our core and I think that that's that's really the way that we need to sell data science to the c-suite is that we're a function for solving problems.

I remember I used to see this this stupid analytics pyramid that was like we're gonna start out with descriptive analytics and then we're gonna go to predictive analytics and then we're gonna do machine learning and then we're gonna do AI at the top once we get to like our most mature point in our organization and I just don't think that that communicates what data science is very well to non-technical folks at all I don't think I think a particular problem is suited for a particular solution and I think that half of the time the c-suite is just looking for something to be automated which isn't a data problem at all and I think you need to have somebody who understands I think going back to what Elaine says you know that translational component of come to me with your problem that your business problem and I'm gonna tell you if it's a data problem or if it's not a data problem at all and how data science you know can or can't be used to solve it it is a report the best output in terms of the solution that we're going to deliver to you or do we need to build you a model and put that in an API that fits into our current business process so that somebody's not you know running a model on their own laptop so I think it really comes down to problem-solving at a high level and I think that a lot of the sales and marketing material that's that's been out there for a while doesn't do us justice as the data scientists who are on the ground trying to communicate and trying to help the business move forward and solve its problems.

I think that that's that's really the way that we need to sell data science to the c-suite is that we're a function for solving problems.

Improving communication skills as a data scientist

If you wouldn't mind introducing yourself I think it'd be helpful to maybe share your your company to well sure you don't want me to just rant the whole time yeah my name is Michael Thomas I'm the chief data scientist at Ketchbrook analytics which were a data science consulting firm out of the Hartford Connecticut area really focusing on that whole end-to-end data science lifecycle in terms of getting the problem the business problem translating it into a data problem and then actually deploying the solution kind of from end to end and we highly specialize in R I'm a certified RStudio and shiny instructor so live breathe R when I can.

It it does seem like a lot of the topics end up going back to communication and Elaine I was just curious like what are specific things that you think data scientists can do today or like things that you maybe taught to people on your team to really get better at communicating data science.

One of the things I know I've told people on my team repeatedly over time and also try and tell myself is to go through the process of trying to look at what you're doing from your audiences or stakeholders perspective three times which may not even be enough and try and like strip away the data scientist specific piece of it and get it closer and closer to you know someone who's arriving at this meeting and is going to pay attention for maybe 15 minutes and has problems they want to solve what do they want to hear about how can they consume that and what do they not want to hear about and so that process of like do that three times like present to other people on your team present to other people in the company and get feedback it's it's really hard to get in just one step of I did this and now I'm going to try and present it to that the right level of communication simplicity and audience awareness so that's one thing and then I think just getting exposure exposure to other teams and other parts of the business wherever possible really helps build empathy and understanding for other people's perspectives so whether that's like getting out into the field or sitting in on other teams discussions just generally trying to understand where everyone else is coming from and what kinds of pressures and challenges they're dealing with is really helpful not just for not just for formulating what problems we solve and having kind of contextual awareness of building models and things but in communicating with people also.

Advice for aspiring data science leaders

It feels like an hour just went by really quickly so we're gonna we'll stay on if if people don't have to run to other meetings but just wanted to make sure I didn't miss any questions but one other question I think has I'm trying to ask everybody in each week is what advice would you have for someone who's looking to get into a data science leadership position.

Yeah I think expanding your the time you spend and the emphasis you put on things other than the data work to understanding the business better understanding what executives care about and what their perspective is getting lots of practice everywhere you can communicating to different groups and then also like how you kind of think about not just prioritizing things within your team but communicating those priorities effectively and so depending on your position you may just be kind of observing other people do these things but there's often opportunities to jump in and maybe even take over tasks that your current leader doesn't want to do like you know updating roadmaps and and communicating where things stand and stuff that's good practice and and just kind of getting a lot of feedback on how people interpret things and what people care about and and how to how to do that kind of thing but there's also almost always opportunities to just you know go find people who work in other parts of the company and sit down and hear what they're working on and and get that broader perspective that's not data-specific because a lot of the leadership stuff is not so much about data as it is about how you interact with everyone else.

I see the comment about money is what matters when it comes to executives I really do think understanding executives is a fascinating undertaking like just thinking about preparing graphs that executives are going to present or consume I once told someone on my team long ago like he actually wrote a function called executify and like basically this needs to take the graphs we have and make it look like they were drawn with a crayon like simple big you know and and that's the kind of thing that often gets in the executive level communications and then the question is why because sometimes it seems like it gets dumbed down the higher it goes but executives are generally very sharp people many of whom have come from very detailed technical backgrounds right so it's not about the people it's about the responsibilities they have and the perspective they have and I think really trying to understand that is really helpful in thinking about how to support them in running the business and they do care about money which can be frustrating but is also really important if you want to stay employed and work for a company that has money to spend on data scientists and stays in business so it's somebody has to care about that and we have to support them in some way in doing their jobs.

What data scientists get wrong when packaging insights

So I know we are getting to the end of the time here so anyone has to drop no worries but I'd love to keep this conversation going if if people are able to and it sounds like I know in the middle of the talk today we were talking a bit about how do you get other people on board internally or people that are using Excel and maybe wanting to switch to R and it it felt like that was something we should maybe chat about more are there other topics or other things that people would like to discuss.

I'd love to know what you look for when hiring Elaine.

So because I'm usually hiring onto small teams at small companies and teams that I don't expect to grow to be 20 people in the next year or two I tend to look for people who are generalists to a certain degree and so I think about like the the whole spectrum from like data engineering all the way to maybe data visualization or whatever is the kind of last stage of communicating things and all the pieces in between which is you know analytics engineering and and modeling and analysis and all of that and and look for people who have a strength in some area that we don't yet have but can flex to cover almost everything else with some level of competence because it's so hard to predict exactly what we need and it tends to fluctuate over time because we're kind of covering whatever the company needs that fits in our area and then the other thing is communication as you can probably guess from everything I've said to this point so we actually have an exercise people do where they do a little analysis and then prepare a slide and that's pretty key.

I don't know if you could think off the top of your head but what are some things that folks get wrong when they are communicating insights either in your hiring test or actually on the job.

I think the most common is to talk way too much about the things that are really important for how you did the work and not enough about the result so like this is the type of model we used these are the assumptions you need to cover some of that at least be able to answer questions but mostly nobody cares just don't I mean people within your team should care and you should care but that's our responsibility to care about nobody else really wants to know most of the time.

I think the most common is to talk way too much about the things that are really important for how you did the work and not enough about the result. People within your team should care and you should care but that's our responsibility to care about nobody else really wants to know most of the time.

On that topic of hiring and I know last week I think it was we were talking a bit about when like how to stand out to to hiring managers when there's so much like NLP behind resumes and what can people do to get around that.

Yeah I may not be the best person to answer that because smaller companies tend to be a little bit more hands-on but maybe that explains some of the resumes that I see that have lists of 10,000 technologies if they're trying to work with algorithms I often find that people stand out pretty easily so it's quite likely nobody will read your cover letter but you should still write one well and just having your resume be you know formatted typo-free targeted stands out a lot instead of the list of like I can do all these things although that that may be effective in other contexts but it's not in the context.

I remember Kobe mentioned that on the webinar like you'd get resumes with just like a bunch of different technologies listed yeah maybe it's like an SEO strategy almost maybe like I need to communicate this thing but then I also need to tag it with 10,000 things that some algorithm might be looking for.

Yeah I see Steve said one resume for people one for machines.

I guess I don't see any other questions I'm trying to scroll through and double-check but just want to say thank you so much Elaine for all the insights and opening up to the group also just want to ask if people have other questions for you what's the best way to get in touch with you.

Probably Twitter for people who use Twitter I keep up with that much better than LinkedIn and so it's EA McVeigh on Twitter.

Perfect well thank you so much have a great rest of the day everyone.