Data Science Hangout | Paul Ditterline, Heaven Hill Brands | Getting Buy-in to Adopt New Tools

Transcript#

This transcript was generated automatically and may contain errors.

Welcome, everybody, to the Data Science Hangout. Welcome back to all the familiar faces. I think most people who are on now have joined before, but if this is your first one, this is an open space for current and aspiring data science leaders to just connect and learn from each other. So we don't have an agenda for the calls. It focuses on questions that are most important to you all. And just want to point out that this session will be recorded and shared up to YouTube as well. But I'm so excited to be joined by my co-host for today, Paul Ditterline, Director of Data Science at Heaven Hill Brands. And Paul, I think if it's okay with you, just turn it over to you to maybe have you introduce yourself and share a bit about your team and the work that you do.

Yeah, sure. So, hey, everybody. Like Rachel said, my name is Paul, and I'm the Director of Data Science at Heaven Hill. We make some of the best spirits in the world. So you've probably heard of things like Evan Williams and Elijah Craig, and some other cool products like Hypnotic. We make a lot of cool products at Heaven Hill. From a data science perspective, it's really cool because we are a consumer packaged goods company, and we have production facilities, and we have things like shipping, and then we also have things like marketing and sales and infrastructure to support the end-to-end production and sales of our products. So there's a lot of cool opportunity there for data science in every aspect of that process.

And so I lead a team called Data Services under the leadership of a relatively new CIO that's really trying to tackle this head on and build a data science foundation really from the foundation level, which includes how our ERP system and data systems are organized and communicate, all the way up through how we can pull analytics out of those systems and derive business value that we can easily communicate to the business. So I think that's kind of in a nutshell what my team does.

Awesome. And as we're waiting for questions to come in from the audience, I think it'd be cool to hear from you. What's something that you're really excited about in data science right now?

Yeah, so I've kind of got a few answers for that. That's a really broad question to me. If I think about it locally in my current work, I'm really excited about the foundational work we're doing from an IT level to make it easier to do data science, right? Like how do we get all these disparate systems to more easily talk to each other and provide the raw material of data to our analysts so that we can spend less time munging and more time providing insight, which I know is like the bane of every single data scientist probably that exists today. A little bit, if I step back a second from that, I'm excited about tools like GPT-3 and how they might change the type of products that we can make. I've been on the waiting list for like, I think it was like a year and I recently got access. So I've been really excited about thinking about how I can use large neural networks like that in my own work and what sort of value I can use with this.

And then I regularly just sort of geek out on upcoming packages and functionality, especially within the RStudio set of products. So there's a really cool resource called RWeekly that I check kind of religiously. Usually at the end of reading that I've got like 30 open tabs that are recent papers that people have published or upcoming packages or changes to packages or blog posts about how to use some packages I'm interested in. Like most recently, I've gotten into using Blastula connected to RMarkdown to try to move away from like go-to dashboards to more automated exception based reporting that informs whenever something happens. So instead of having to go to dig for something, people sort of get that hand delivered to them and using tools like the connect API package, which is relatively new, has been really cool to build those types of things.

Data services under the CIO

I see there's a question in the Slido that speaks to what you mentioned earlier. Since data services is under the CIO, is it relatively easy for your team to get what you need from IT? Yeah, this was a really big learning for me actually at my previous employer. I spent seven years at Brown Forman starting off in R&D as an analyst and at the end of my career there I was helping implement the global advanced analytics function and one problem that I noticed there was traditional IT was sort of this thing and then data science was like this other thing and sometimes those two things collided in ways that weren't the best in terms of agility and in terms of like implementing the data science solution. So one of the things I pushed for at Heaven Hill when we got the new CIO was trying to put all those things into one vertical and the answer is yes. Like having the folks that sort of build those data platforms, build those pipelines with the analytics and the data science needs in mind has been incredibly helpful. Yeah and I and from my experience I would highly recommend some sort of integration along those lines from a business perspective.

Paul's journey into leadership

Cool and I know you just so you just mentioned about working at Brown Forman as well so I think it might be helpful for the audience or some of the aspiring data science leaders to understand like where you started there at Brown Forman and then also how you got into leadership as well and what your journey looks like. Yeah you know I think that using one of the things that I like about R and RStudio is really it's the packages and the functionality but it's also the community and the openness and the how do I use this and years ago like a decade ago now when I was starting off there as just an analyst a new person who really wanted to make an impact I was able to do a lot more than I think I could have because of the tools that existed in the R community. So instead of doing like one-off analysis and providing an answer I was able to say learn Shiny and spin up an application that solved that problem not just once but over time and I was able to use those free open source tools to come to my leadership at the time and say hey if we had X we could solve problems this way and over years of building that I was able to sort of push that strategy and push the thinking into that direction.

So really it was just using the tools that were available to show a vision with tangible assets that people could understand and then leading that into sort of what the vision for data science could be in the future and that's what kind of got me from doing it at the ground level to helping lead the implementation of that. So but without those tools I think it would have been really hard to try to do something like that.

So really it was just using the tools that were available to show a vision with tangible assets that people could understand and then leading that into sort of what the vision for data science could be in the future.

When you show you're interested in solving their problems, they tend to come along for the rest of the strategy.

Integrating Power BI with R

Tony, I see you asked a question about Plumber . Would you want to ask that one live? Sure. It wasn't really about Plumber. What I wanted to know was you talk about integrating Power BI into your process. And I actually recently, just last week, someone approached me about that. And I don't know much about Power BI. So I was wondering if you could talk about the integration of those two tools. I gave an example of how you can integrate Power BI into tools. I gave an example in my question of, are you going to serve data from Plumber for Power BI to consume? But that's just a guess because I don't know anything about it.

Right. So I haven't used Plumber. I think I've used it in one project. I do use pins a lot to write data frames back to RStudio Connect, but that mostly just serves R and Python users. One thing, I mentioned ETL and R Markdown. I mentioned living in R Markdown. One of the great things about that is I can have a script that uses SQL, queries a bunch of stuff, brings it in, uses Python and R to munge it, and then uses ODBC packages to push it back into a data science table that can then be consumed by Power BI or by R users. So that's the way that we're sharing data currently, trying to build what we're calling enterprise-ready data sources, one source of truth. If you look at your data needs, you can probably bucket them down into X buckets and you can try to build something at the base level that works for everyone from those buckets. And then that becomes the source of truth for those data sources. And then you can write them in such a way where Power BI or Shiny Dash or whatever users can access that data. Now, when I think about the future where I have Power BI fully automated and I have Shiny applications and I have model outputs and stuff like that, yeah, I would probably use something like a Plumber API to make sure that everyone could grab those to integrate into their apps. That'd probably be one solution. We just haven't gotten there yet.

Models in production and hosting Shiny apps

Thanks, Paul. I see a few people sent in questions and said they're in a busy office right now, so I'll read them. But one was, how often do you revisit a model that's already in production? Do you have any fixed cycle or depending on need?

So, when I build something like that, I always build some sort of Flex Dashboard or Shiny application that monitors performance for that model, something I can quickly go to and look at and see what's going on. And I'm kind of that nerd that likes to go look at that a lot, so I tend to just naturally, during my morning coffee, look and see what's going on. I don't really have a schedule or anything. Of course, if I hear a problem from a user, I feel like there's something wrong here, I go look at it then. So yeah, I think that's the answer.

Thank you. And then one just clarification question. Not sure if this was already discussed, but how do you host your Shiny apps to the business users? Oh yeah, we use our studio team, so we have RStudio Connect on-prem. So, we also have RStudio Workbench. So, a lot of development in Workbench, where we have all of our handy-dandy connections and protocols sort of set up. So, you can pretty quickly bang on a Shiny app, hit the blue publish button, push it to connect, and then you can pull it up. You can quickly set a vanity URL, set who can see it and who can't, and then we share that. We are working on what our solution is going to be. We really don't want every user to come land on Connect to find their app, but there's some really cool ways. You can use BlogDown , for example, to build something custom. We also use Echo internally, so we've toyed with building a data science landing page for those tools. So, anyway, they're sort of sharing with the user, which is easy. You just send them a link, but then there's like, how do you let people come sort of find, and that we're still toying with, but it's probably going to be some sort of R-developed Connect-hosted website that lets people come and explore the data science products that are there.

Leadership principles

Was any management or leadership principles that you could share?

Management or leadership principles. Well, you know, I mentioned earlier being people-focused. I think that's really important. I think especially in current day with pandemic supply chains, child care, people losing people to COVID, you really have to remember that we're all people first. We're people that are coming to work. So, I try to remember that every day. I think you also have to, I think you have to have a culture of it's okay to fail. Actually, it's good to fail. Fail fast, fail often. Get to the thing that works. Meet people where they are. Give them the tools they need to grow. Find out what they care about and like and try to give them more of that. Celebrate success and learn from failure.

Yeah, I was just curious if your management and leadership came naturally to you or if that was something you worked towards through mentorship or reading books?

Oh, well, I've always been sort of a people person. Of course, that doesn't mean you're a good leader or a good manager. Managing is such a different thing and to directly answer your question, when I went from what was called an individual contributor to a people leader at my previous job, thankfully I was in an organization that recognized the difference there. If you're a person who is a, especially if you're a person who really cares about your work, you're a go getter, you're typing code all the time, you're giving products, you're solving problems, you're seeing the fruits of your labor directly. When you hand someone a link and they're like, oh my god, I can do this great thing now that I couldn't do before. Thank you so much. When you go from that to now your job is to manage these ex-people and provide them with success and clear the path for them and make them rock stars, it's being thrown into the deep end of the swimming pool and you're no longer getting that personal feedback that you used to get from solving problems yourself.

So, I was lucky enough to have, I mean, I think it was about six months of different HR-led trainings and seminars and resources given to me to understand that change and understand, really, you don't just have your lever now, you have a lever for the team and you have more impact, but it's in a different way and the thing that's important to your day-to-day completely changes and I think I would have struggled with that a lot longer than I did. I mean, I did struggle with it, even with the help, but I think I would have struggled with it even longer had I not been given those resources. So, I mean, that's probably an important note, that as you, if you're a person who is a data scientist and you see yourself becoming a leader one day, just keep that in mind, that it's going to be a change, it's going to feel weird, it's going to be a shift, and honestly, not everyone likes that. Like, I've worked with people who got a taste of that and decided they'd rather be a data science-based technical leader, that's what they wanted to do, instead of being a more people leader. It really depends on what you like, but I do think that you can learn with resources, of course, to be better at it, just like anything.

How to kick off the conversation to get approval to use R/Python

I've talked to a lot of people who are in like the very early stages of kind of like being the R champion and maybe making some presentation for their architecture review design board or whatever internal review process they have. And I am just wondering how you actually went about doing that, whether at your last role or, or now to get RStudio approved.

Right. So I, I started off, like I mentioned earlier by doing things with free open source versions locally. So first it was like, Hey, can I get RStudio? It's open source, you know, here's what it does. It's an IDE. And that's usually a short, you can do that. And then can I download, you know, packages, approved packages on CRAN? And that's usually not a problem either. At least it wasn't for me. And then once you have that, I mean, you, you basically had everything right now you can install R Markdown Shiny. You can, you can, you can pull in local data. You can show a Shiny application running locally. You can do, like I mentioned earlier and make a flex dashboard and send that out as an HTML file to people so they can see what that looks like. And then you can build a strategy. I remember making slide decks where I was like, basically said, Hey, here's how you do things now. You know, here's the current state diagram of how you do analytics, how people consume them. And here's what you could do if we had some set of data science tools. And then I would show literal examples of that. And then I would say, and here's what that would look like to acquire. And so that's how I convinced people to, to give me those tools so that I could do better work.

Yep. It are, yeah, I did it in our, like knitted it to PDF and I would even say like, you know, the slide deck was made in our, like, there's lots of, I remember a presentation I gave on R Markdown that just talked about just the incredible, you know, Swiss army knife capabilities of it and sort of what it could do stuff. It could replace ways we could use it instead of other things that we use now. So communicating value and then using, I think one time I called it like a stair step approach, right? Like if you can get RStudio or R and get access to CRAN packages on your work machine and use local data sources, well, then you can sort of stair step that out. You know, you can find a business need that someone needs to be solved and you can solve it better.

Right? Like someone needs an analysis. Like I remember when I was at BF, like someone, someone had this data that was coming out of like an HPLC, which is a, which is a chemical analysis where you put liquid in this machine and then you get this output of all the chemistry that's there. And they're like, Hey, we need to understand descriptive statistics for these chemical compounds. And the second time she asked me, I just built a shiny app. And I actually gave the local code just on her machine. And I was like, the next time you do this, you know, press this run button. And then it looks like a webpage, upload your file and everything you just asked me to do would just be done. And then that person was like, Holy crap, that's amazing. And then they talked about it to everybody. And the next thing, you know, you have individuals coming to you saying, Oh, you know, and then that gets the attention of leadership as well. And so that's, you know, you can sort of work at that ground level as well, solve problems for individuals. Like I mentioned earlier, then while you're doing that, that becomes part of the example that you lay out in your presentation of like, look, this is why a we can do this be I have been doing it. See, here's the feedback from that. And D here's the solution to make this, you know, a real thing we're doing instead of a thing that I'm like covertly doing on my laptop, but you may have to break a few, a few rules. I'm not telling you to like disregard your policies or whatever. But if I'm being honest, like, I probably did a couple of things that certain people in IT didn't really like. But I'm the kind of person that I think you break a few windows because you know, the new house is going to be awesome. And that's just the way that it is. And so that's what I did. And it is more awesome now. So I'm okay with that decision.

When and why to use code-based over non-code-based tools

There's actually a question about around Tableau as well. Someone just asked, how do you think about when and why you'd want to use something like Power BI or Tableau versus when you do Shiny or R Markdown?

Yeah, so honestly, I am now, you know, I'm incredibly biased, like we all are. I prefer code based solutions for lots of reasons. I think they're much more flexible. I think you can integrate all of the awesome things that are in that ecosystem, right? So in Shiny, any R package, any R function, all the custom functions I've written in my custom packages, any new model, any new statistical method or theory that comes out is going to be in R, right? I mean, that's what the programming language sort of is for. And then I can immediately use that stuff in anything that I make. It's all code based. I can collaborate with code. I can do things like pull requests. I can do things like have a Git repo. I can do things like have it be transparent, have it be reproducible, have it be completely you can make it again with a button press, you can compile it all over again, because again, it's code based. I think that those benefits are just, they're hard to match with a GUI based system, just because it is a GUI based system.

Now, where I think tools like that are helpful is not everyone's a coder, not everyone's a data scientist. There are lots of very smart, capable analysts who know their business, know what the business needs, but they don't know R and they need something again, to get to that modern web based output, single click consumption model without needing to write code. And so that's why in my mind, it's a parallel service diagram of two different paths for those types of folks and for data science types of folks. But there are also lines going between them, right? I want those models to be used in the Tableau workbook and the Power BI workbook. There's collaboration there. There's the same sort of data infrastructure we're working off of, but there's two different parallel paths for that reason. But if I had my druthers though, like if I was starting my own company, I would absolutely be biased toward let's do everything with code.

But if I had my druthers though, like if I was starting my own company, I would absolutely be biased toward let's do everything with code.

Transitioning into data science and hiring

One was recommendations for someone to transition into data science. So they said they've taken R and Python training, but they don't use it in their current role. How do you get better at coding? I highly recommend literally working through R for data science from cover to cover. If you can work through that book and you understand everything in that book, you're ready, in my opinion, for an entry level job in data science. If you can show that you, like if a new candidate came to me and they had a GitHub repo with some projects they worked on for fun. I mean, like I remember when Flex Dashboard first came out like, you know, 84 years ago and I wanted to learn it. And I just had some, I knew that census data had like, or there was a package for like baby names. And I was like, and I was having a child and I was like, what if I made some shiny app using Flex Dashboard to understand trends about baby names? And so that's how I learned. So I did that and I had it in GitHub and then every project I did, I tried to keep there. So eventually you have this really cool portfolio. So even if you're not doing it in your job, you have work that you can show someone who's hiring you that you are interested in it and that you know what you're doing. And really like, I think that's all that matters at the end of the day. Can you show competency in that area? And I think that having that passion of like, I'm going to learn this, I'm going to build something cool on the weekends or in the evening also shows me that you're kind of the person I'd want to hire because I like passionate people who love what they do.

Yeah, I mean, I'm pretty, I'm open like at Brown Forman, we had a Python arm mix and we all collaborated. It's great. I'm open to that now too, but I think showing me competency in data science, again, I'm biased toward R, we use RStudio team as our sort of data science platform, showing me that you are comfortable in R Markdown, you know tidyverse, showing me code, showing me projects you've done in GitHub. There's also this piece though, that I feel can be lacking. I think that data science is so popular and it has blown up so much that some, I have run into people who are really good at that, but they really don't have a sound statistical theory. And I don't think there's a substitute for having just a basic understanding of statistics. Like what is this regression model thing that you're doing? What assumptions does it have? Why does it work? Like what's the central limit theorem? How would you guide someone if they had a dataset and they didn't know how to analyze it? And I think you have to have that other background. So I think that's something for folks to keep in mind if they don't come from like a stats background, that that's something they should at least think about doing. It's not just the R and the Python and the packages, it's also knowing the stats and how to use them and how to communicate them to the business. That's also huge. Like one day, like if you use logistic regression, one day think about how would you explain to like the VP of marketing what an odds ratio is? Like honestly think about that. It's not intuitive, right? Like try to get good at doing stuff like that and you'll be very, very valuable.

But one question we always like to ask at the end, Paul, is if people want to get in touch with you, what's the best way? Is it LinkedIn or Twitter? Yeah, I would say find me on LinkedIn. You can connect with me there and you can direct message me there as well.

Data Science Hangout | Paul Ditterline, Heaven Hill Brands | Getting Buy-in to Adopt New Tools

Transcript#

Data services under the CIO

Paul's journey into leadership

Data along the supply chain

Cloud vs. on-prem analytics

Unexpected uses of data science at Heaven Hill

Packages and tools

Writing code as a director

Internal packages and third-party data

Getting buy-in to adopt new tools

Integrating Power BI with R

Models in production and hosting Shiny apps

Leadership principles

How to kick off the conversation to get approval to use R/Python

When and why to use code-based over non-code-based tools

Transitioning into data science and hiring