Resources

Data Science Hangout | Paul Ditterline, Heaven Hill Brands | Getting Buy-in to Adopt New Tools

We were recently joined by Paul Ditterline, Director of Data Science at Heaven Hill Brands. A few snippets: 27:06 - Small wins when implementing new tools 30:23 - How to prioritize KPIs How to prioritize KPIs 33:57 - Communicate what you're doing and why 35:39 - Getting buy-in to adopt new tools 39:24 - How often to revise a model in production 41:56 - Tips to be a better leader 50:36 - How to kick off the conversation to get approval to use R/Python 56:37 - When and why to use code-based over non-code-based tools ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Add the Data Science Hangout to your calendar: https://www.addevent.com/event/Qv9211919 Follow Us Here: Website: https://www.posit.com LinkedIn:https://www.linkedin.com/company/posit Twitter: https://twitter.com/posit

Nov 3, 2021
1h 3min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome, everybody, to the Data Science Hangout. Welcome back to all the familiar faces. I think most people who are on now have joined before, but if this is your first one, this is an open space for current and aspiring data science leaders to just connect and learn from each other. So we don't have an agenda for the calls. It focuses on questions that are most important to you all. And just want to point out that this session will be recorded and shared up to YouTube as well. But I'm so excited to be joined by my co-host for today, Paul Ditterline, Director of Data Science at Heaven Hill Brands. And Paul, I think if it's okay with you, just turn it over to you to maybe have you introduce yourself and share a bit about your team and the work that you do.

Yeah, sure. So, hey, everybody. Like Rachel said, my name is Paul, and I'm the Director of Data Science at Heaven Hill. We make some of the best spirits in the world. So you've probably heard of things like Evan Williams and Elijah Craig, and some other cool products like Hypnotic. We make a lot of cool products at Heaven Hill. From a data science perspective, it's really cool because we are a consumer packaged goods company, and we have production facilities, and we have things like shipping, and then we also have things like marketing and sales and infrastructure to support the end-to-end production and sales of our products. So there's a lot of cool opportunity there for data science in every aspect of that process.

And so I lead a team called Data Services under the leadership of a relatively new CIO that's really trying to tackle this head on and build a data science foundation really from the foundation level, which includes how our ERP system and data systems are organized and communicate, all the way up through how we can pull analytics out of those systems and derive business value that we can easily communicate to the business. So I think that's kind of in a nutshell what my team does.

Awesome. And as we're waiting for questions to come in from the audience, I think it'd be cool to hear from you. What's something that you're really excited about in data science right now?

Yeah, so I've kind of got a few answers for that. That's a really broad question to me. If I think about it locally in my current work, I'm really excited about the foundational work we're doing from an IT level to make it easier to do data science, right? Like how do we get all these disparate systems to more easily talk to each other and provide the raw material of data to our analysts so that we can spend less time munging and more time providing insight, which I know is like the bane of every single data scientist probably that exists today. A little bit, if I step back a second from that, I'm excited about tools like GPT-3 and how they might change the type of products that we can make. I've been on the waiting list for like, I think it was like a year and I recently got access. So I've been really excited about thinking about how I can use large neural networks like that in my own work and what sort of value I can use with this.

And then I regularly just sort of geek out on upcoming packages and functionality, especially within the RStudio set of products. So there's a really cool resource called RWeekly that I check kind of religiously. Usually at the end of reading that I've got like 30 open tabs that are recent papers that people have published or upcoming packages or changes to packages or blog posts about how to use some packages I'm interested in. Like most recently, I've gotten into using Blastula connected to RMarkdown to try to move away from like go-to dashboards to more automated exception based reporting that informs whenever something happens. So instead of having to go to dig for something, people sort of get that hand delivered to them and using tools like the connect API package, which is relatively new, has been really cool to build those types of things.

Data services under the CIO

I see there's a question in the Slido that speaks to what you mentioned earlier. Since data services is under the CIO, is it relatively easy for your team to get what you need from IT? Yeah, this was a really big learning for me actually at my previous employer. I spent seven years at Brown Forman starting off in R&D as an analyst and at the end of my career there I was helping implement the global advanced analytics function and one problem that I noticed there was traditional IT was sort of this thing and then data science was like this other thing and sometimes those two things collided in ways that weren't the best in terms of agility and in terms of like implementing the data science solution. So one of the things I pushed for at Heaven Hill when we got the new CIO was trying to put all those things into one vertical and the answer is yes. Like having the folks that sort of build those data platforms, build those pipelines with the analytics and the data science needs in mind has been incredibly helpful. Yeah and I and from my experience I would highly recommend some sort of integration along those lines from a business perspective.

Paul's journey into leadership

Cool and I know you just so you just mentioned about working at Brown Forman as well so I think it might be helpful for the audience or some of the aspiring data science leaders to understand like where you started there at Brown Forman and then also how you got into leadership as well and what your journey looks like. Yeah you know I think that using one of the things that I like about R and RStudio is really it's the packages and the functionality but it's also the community and the openness and the how do I use this and years ago like a decade ago now when I was starting off there as just an analyst a new person who really wanted to make an impact I was able to do a lot more than I think I could have because of the tools that existed in the R community. So instead of doing like one-off analysis and providing an answer I was able to say learn Shiny and spin up an application that solved that problem not just once but over time and I was able to use those free open source tools to come to my leadership at the time and say hey if we had X we could solve problems this way and over years of building that I was able to sort of push that strategy and push the thinking into that direction.

So really it was just using the tools that were available to show a vision with tangible assets that people could understand and then leading that into sort of what the vision for data science could be in the future and that's what kind of got me from doing it at the ground level to helping lead the implementation of that. So but without those tools I think it would have been really hard to try to do something like that.

So really it was just using the tools that were available to show a vision with tangible assets that people could understand and then leading that into sort of what the vision for data science could be in the future.

Data along the supply chain

Bruno, I see you asked a question in the chat and I'd love to hand the mic over to you if you want to ask it live. Sure thanks for doing this Paul. Retail is pretty interesting. How do you tackle the whole data, the data along the supply chain up to the retail store? Do you get like very fine data details or can you put process in place to gather more information? How does it work in this context?

So it's a little interesting with alcohol sales in America. We have what's called a three-tiered system where we're not allowed to sell directly to the store that sells it to you. This is a post-prohibition legal effect that's been in place since then. So we essentially have to sell to distributors who then sell to individual stores. So that does create you know some difficulty in sort of getting that data. However we do have really good partnerships with those distributors and we are able to get at the store daily, not really transaction level, what's being shipped from them into the stores. So it's kind of a headache because you have to deal with like what we're shipping to this third party and then what they're shipping to a consumer, right? So it's like two different sets of metrics but we are kind of lucky that we have we can get to that store level data. But we have a lot of creative assets in play to try to get us that information as quickly and as at the lowest level possible. It's not always easy.

Cloud vs. on-prem analytics

Chris, I see you asked a question as well around how much of what work is being done is through cloud services. Chris, do you want to add any other context there?

Yeah, again like everyone says, thanks Paul for doing this. Always appreciate having these guest data scientists coming in to do this. So I work with Air Force, mainly special operation command stuff and all that. And one of the challenges that we have within our data science community is you know the traditional things like you would talk about before where you have your siloed information and stuff like that. And now they're talking about you know the possibility of working in cloud environments and that. And I just wanted to get your take as to whether or not you and your team are doing more of your analytics on a cloud basis compared to the normal data silos that you see in a lot of companies.

Yeah, so from a data perspective, from that end we're definitely mostly on-prem. We're in the middle of a strategy. Again, I mentioned we got a new CEO and or I'm sorry a new CIO in January. And he's implementing the strategy that does include sort of a cloud-based future. And we're in the process of moving there from a data perspective. From an analytics perspective, we're in the process of moving to that traditional emailing each other Excel files analytics to a more, I kind of think of it as two different pathways. I want a pathway for someone who writes code in R or Python or whatever. I need them to be successful. I need them to connect to data sources, create products, easily post them to some web-based outcome, one source of truth, you know get backed up, all of that. But then I need a more what I might call an analyst professional. Someone who needs a more modern tool from Excel, needs to provide that web-based experience, and isn't going to learn how to code. So you know that's like a maybe a Tableau or Power BI, right? So we're transitioning into setting up the infrastructure we need to support both of those. We've already got our studio team implemented on-prem for the data science side, and that's working well for us. And we're probably moving to Power BI for the analyst side, because we're also a Microsoft company. So yeah, so I guess to succinctly answer that, nearly all on-prem now, but with a current strategy that's in progress to move to the cloud in the next let's say five years.

Unexpected uses of data science at Heaven Hill

There are a few anonymous questions coming in on Slido too, and one is, what is something that is impacted by data science in your industry that you might not expect?

So one thing that comes to mind is barrel yield. So if some people might not know much about spirits, right? I'm a little biased because I live in Kentucky, and here everybody knows about bourbon, because it's the famous saying that there are more you know barrels of bourbon than there are Kentuckians. Well, when you make a whiskey or bourbon, you make a distillate, and you put it in a barrel. And if you've never seen one of these barrels, they're really big. They hold about 55 gallons of liquid. And then they go into a warehouse, and when they're full, they weigh about 500 pounds. And they sit in a warehouse for you know, four to 12 to even longer years. And of course, due to evaporation and soakage and maybe issues with barrels, you're going to have some amount of liquid left in that barrel whenever you go to dump it. And as you might imagine, especially if you're in finance, you know, a couple percent changes in that amount, that's a big deal. I mean, that's product that you thought you would have that you don't have. So using some cool data science methods to get data from places that I wouldn't have really expected, to feed that into models that can help predict what yields are going to be, was something I didn't think would work, but that did. And that's a really cool thing that we use in our business.

Packages and tools

Going back to packages, there's another anonymous question. What packages do you use most for modeling and machine learning? Is it primary tidy models and related, or do you also use Python?

Yeah, so I like tidy models. I like the framework there. Actually, I came up in Basar. So if I'm doing like a logistic regression or a linear model, I mean, I got to be honest, oftentimes I'll make it out quickly in Basar before I move on to like a tidy models framework. For like machine learning and deep learning, I use the Keras and TensorFlow packages. That's what we have in production now for the models that we're using. And stepping out a bit from machine learning, the tidyverse is always like the first thing that I call pretty much everything that I'm doing. I live in R Markdown. I think that unless you're doing something that's very, very small and one-off, you should almost always start your work at R Markdown. Because the point of the work you're doing is probably going to be to communicate something. And you're going to end up doing double work. You're going to probably write a script, and then maybe you would open, God forbid, Word or something like that to make a report while you're writing your code. But if you come with that in mind, you can open R Markdown, start it off as a report, do your analytical work, organize it. And at the end of the day, you've pushed a button. You have a beautiful HTML web page with a floating table of contents, code chunks that fold in and out. You can also use Python, or R, or SQL, or D3, or other languages within those chunks. And then if you have team, you can push button to host that to team, shoot someone a URL. And now you have a one source of truth hosted report that anyone can go to that you can control access to. And you did all that in one markdown script. So R Markdown is just completely invaluable for the work that I do.

We're actually looking at trying to take some of our ETL work directly into a markdown-based, like a notebook-based format, just because the documentation and the scheduling and all that kind of comes along for the ride whenever you do that work. I like Shiny a lot, but I'm also a big fan of Flex Dashboard because it's really lightweight. It's really fast. You can write your reactive code right there instead of kind of having code in the server and then referring to that code in the UI. If it's something small, something very quick, I mean, I can go from nothing to a hosted Flex Dashboard with reactives in like an hour and a half. And so if I need something very fast, I will use that. The other really cool thing about Flex Dashboard is if you're not using reactivity, you can actually, and you don't have a way to host applications to sort of show the value of what you're doing, you can email the resulting knitted HTML of a Flex Dashboard to someone. And when they open it, it opens like a beautiful webpage that's interactive for them. And that's a really cool way to share content if you don't yet have the ability to publish and host up. And I actually took advantage of that at Brown Forman to try to show people what you can do with our tools.

Writing code as a director

Yeah, I was just curious. I mean, you said you live in R Markdown. So as a director, are you writing code daily? I'm not writing code daily. I'm writing code weekly. And that's one of the issues most people probably have it. I definitely have it personally is as you move into more managerial or leadership roles, your value tends to be derived from just that, from your leadership, right? Not so much from the code. But I don't want to lose that capability. So I do try to dive in and help on projects, submit pull requests, do that sort of thing. And I have a pretty strong opinion about how to do data science. So I don't want to be like a micromanager type of person. But if someone's doing a bunch of filtering and they're not using dplyr, I'm probably going to make a comment on it.

Internal packages and third-party data

Tony, just ask the question around internal packages, Paul, and ask how many internal packages do you host? Right. So I only have one internal package currently. I mean, I see a world where there's a handful of those. For our use case, I think that'd be about it. Although, to be fair, I know that movement toward package-based development, especially with things like Golem, is pretty huge right now, and I haven't dug into it enough. But we have found it helpful just to create helper packages for commonly used functions for data munging or modeling that we tend to do on a regular basis.

I see Ian asked a question earlier as well, and said he's in a noisy office, so I'll read the question. But do any of your analytics and data science rely on VIP, a third-party vendor, for the beverage industry for your distributor data? He said, the reason I ask is I'd like to hear your insights on taking and consuming data from third-party vendors that you can't get an API into. So do you have the data? Yes. So we, you know, in the earlier question about sort of data at the distributor and retail level, you know, I made a joke about some creative solutions, and you do kind of have these vendors that have, I'm not sure what the right phrase is, historic methods of getting data to you, and oftentimes there's not a lot you can do about that. So we just try to automate that in the best way possible that fits our infrastructure and doesn't require a lot of human interaction to bring that data into our environment. But we do work with VIP data, yeah.

Getting buy-in to adopt new tools

Aliyah, I see you have a follow-up question on the implementation strategy. Would you want to ask that one? Yeah, thank you, Rachel. Hey, Paul, so I did have a follow-up question about implementation and trying to get buy-in from teams that might be a little hesitant and some pointers around that for teams that are hesitant, they might be like late adopters or late adapters to like the new analytic tools and technology. So if you have any pointers around getting over that hump.

Yeah, what I've done in the past is try to show small wins that are very directed to those individuals, right? So if you, if you can figure out that there's a thorn in someone's side, that's going to be a three-hour shiny app that takes that problem away forever. Even if that's not a business priority, if it doesn't connect to a KPI, just do it and give it to them. And they're going to be like, whoa, you can, you can do that. OK, because I think there's hearing and then there's seeing. And so I think being able to solve small personal problems for someone, even if it's not a huge, you know, you can do some stuff in R that's not exactly data science, but it might just be automation of some manual work that takes up a huge swath of someone's time. When you show you're interested in solving their problems, they tend to come along for the rest of the strategy. That's the technique that I've used in the past.

When you show you're interested in solving their problems, they tend to come along for the rest of the strategy.

Integrating Power BI with R

Tony, I see you asked a question about Plumber. Would you want to ask that one live? Sure. It wasn't really about Plumber. What I wanted to know was you talk about integrating Power BI into your process. And I actually recently, just last week, someone approached me about that. And I don't know much about Power BI. So I was wondering if you could talk about the integration of those two tools. I gave an example of how you can integrate Power BI into tools. I gave an example in my question of, are you going to serve data from Plumber for Power BI to consume? But that's just a guess because I don't know anything about it.

Right. So I haven't used Plumber. I think I've used it in one project. I do use pins a lot to write data frames back to RStudio Connect, but that mostly just serves R and Python users. One thing, I mentioned ETL and R Markdown. I mentioned living in R Markdown. One of the great things about that is I can have a script that uses SQL, queries a bunch of stuff, brings it in, uses Python and R to munge it, and then uses ODBC packages to push it back into a data science table that can then be consumed by Power BI or by R users. So that's the way that we're sharing data currently, trying to build what we're calling enterprise-ready data sources, one source of truth. If you look at your data needs, you can probably bucket them down into X buckets and you can try to build something at the base level that works for everyone from those buckets. And then that becomes the source of truth for those data sources. And then you can write them in such a way where Power BI or Shiny Dash or whatever users can access that data. Now, when I think about the future where I have Power BI fully automated and I have Shiny applications and I have model outputs and stuff like that, yeah, I would probably use something like a Plumber API to make sure that everyone could grab those to integrate into their apps. That'd probably be one solution. We just haven't gotten there yet.

Models in production and hosting Shiny apps

Thanks, Paul. I see a few people sent in questions and said they're in a busy office right now, so I'll read them. But one was, how often do you revisit a model that's already in production? Do you have any fixed cycle or depending on need?

So, when I build something like that, I always build some sort of Flex Dashboard or Shiny application that monitors performance for that model, something I can quickly go to and look at and see what's going on. And I'm kind of that nerd that likes to go look at that a lot, so I tend to just naturally, during my morning coffee, look and see what's going on. I don't really have a schedule or anything. Of course, if I hear a problem from a user, I feel like there's something wrong here, I go look at it then. So yeah, I think that's the answer.

Thank you. And then one just clarification question. Not sure if this was already discussed, but how do you host your Shiny apps to the business users? Oh yeah, we use our studio team, so we have RStudio Connect on-prem. So, we also have RStudio Workbench. So, a lot of development in Workbench, where we have all of our handy-dandy connections and protocols sort of set up. So, you can pretty quickly bang on a Shiny app, hit the blue publish button, push it to connect, and then you can pull it up. You can quickly set a vanity URL, set who can see it and who can't, and then we share that. We are working on what our solution is going to be. We really don't want every user to come land on Connect to find their app, but there's some really cool ways. You can use BlogDown, for example, to build something custom. We also use Echo internally, so we've toyed with building a data science landing page for those tools. So, anyway, they're sort of sharing with the user, which is easy. You just send them a link, but then there's like, how do you let people come sort of find, and that we're still toying with, but it's probably going to be some sort of R-developed Connect-hosted website that lets people come and explore the data science products that are there.

Leadership principles

Was any management or leadership principles that you could share?

Management or leadership principles. Well, you know, I mentioned earlier being people-focused. I think that's really important. I think especially in current day with pandemic supply chains, child care, people losing people to COVID, you really have to remember that we're all people first. We're people that are coming to work. So, I try to remember that every day. I think you also have to, I think you have to have a culture of it's okay to fail. Actually, it's good to fail. Fail fast, fail often. Get to the thing that works. Meet people where they are. Give them the tools they need to grow. Find out what they care about and like and try to give them more of that. Celebrate success and learn from failure.

Yeah, I was just curious if your management and leadership came naturally to you or if that was something you worked towards through mentorship or reading books?

Oh, well, I've always been sort of a people person. Of course, that doesn't mean you're a good leader or a good manager. Managing is such a different thing and to directly answer your question, when I went from what was called an individual contributor to a people leader at my previous job, thankfully I was in an organization that recognized the difference there. If you're a person who is a, especially if you're a person who really cares about your work, you're a go getter, you're typing code all the time, you're giving products, you're solving problems, you're seeing the fruits of your labor directly. When you hand someone a link and they're like, oh my god, I can do this great thing now that I couldn't do before. Thank you so much. When you go from that to now your job is to manage these ex-people and provide them with success and clear the path for them and make them rock stars, it's being thrown into the deep end of the swimming pool and you're no longer getting that personal feedback that you used to get from solving problems yourself.

So, I was lucky enough to have, I mean, I think it was about six months of different HR-led trainings and seminars and resources given to me to understand that change and understand, really, you don't just have your lever now, you have a lever for the team and you have more impact, but it's in a different way and the thing that's important to your day-to-day completely changes and I think I would have struggled with that a lot longer than I did. I mean, I did struggle with it, even with the help, but I think I would have struggled with it even longer had I not been given those resources. So, I mean, that's probably an important note, that as you, if you're a person who is a data scientist and you see yourself becoming a leader one day, just keep that in mind, that it's going to be a change, it's going to feel weird, it's going to be a shift, and honestly, not everyone likes that. Like, I've worked with people who got a taste of that and decided they'd rather be a data science-based technical leader, that's what they wanted to do, instead of being a more people leader. It really depends on what you like, but I do think that you can learn with resources, of course, to be better at it, just like anything.

How to kick off the conversation to get approval to use R/Python

I've talked to a lot of people who are in like the very early stages of kind of like being the R champion and maybe making some presentation for their architecture review design board or whatever internal review process they have. And I am just wondering how you actually went about doing that, whether at your last role or, or now to get RStudio approved.

Right. So I, I started off, like I mentioned earlier by doing things with free open source versions locally. So first it was like, Hey, can I get RStudio? It's open source, you know, here's what it does. It's an IDE. And that's usually a short, you can do that. And then can I download, you know, packages, approved packages on CRAN? And that's usually not a problem either. At least it wasn't for me. And then once you have that, I mean, you, you basically had everything right now you can install R Markdown Shiny. You can, you can, you can pull in local data. You can show a Shiny application running locally. You can do, like I mentioned earlier and make a flex dashboard and send that out as an HTML file to people so they can see what that looks like. And then you can build a strategy. I remember making slide decks where I was like, basically said, Hey, here's how you do things now. You know, here's the current state diagram of how you do analytics, how people consume them. And here's what you could do if we had some set of data science tools. And then I would show literal examples of that. And then I would say, and here's what that would look like to acquire. And so that's how I convinced people to, to give me those tools so that I could do better work.

Yep. It are, yeah, I did it in our, like knitted it to PDF and I would even say like, you know, the slide deck was made in our, like, there's lots of, I remember a presentation I gave on R Markdown that just talked about just the incredible, you know, Swiss army knife capabilities of it and sort of what it could do stuff. It could replace ways we could use it instead of other things that we use now. So communicating value and then using, I think one time I called it like a stair step approach, right? Like if you can get RStudio or R and get access to CRAN packages on your work machine and use local data sources, well, then you can sort of stair step that out. You know, you can find a business need that someone needs to be solved and you can solve it better.

Right? Like someone needs an analysis. Like I remember when I was at BF, like someone, someone had this data that was coming out of like an HPLC, which is a, which is a chemical analysis where you put liquid in this machine and then you get this output of all the chemistry that's there. And they're like, Hey, we need to understand descriptive statistics for these chemical compounds. And the second time she asked me, I just built a shiny app. And I actually gave the local code just on her machine. And I was like, the next time you do this, you know, press this run button. And then it looks like a webpage, upload your file and everything you just asked me to do would just be done. And then that person was like, Holy crap, that's amazing. And then they talked about it to everybody. And the next thing, you know, you have individuals coming to you saying, Oh, you know, and then that gets the attention of leadership as well. And so that's, you know, you can sort of work at that ground level as well, solve problems for individuals. Like I mentioned earlier, then while you're doing that, that becomes part of the example that you lay out in your presentation of like, look, this is why a we can do this be I have been doing it. See, here's the feedback from that. And D here's the solution to make this, you know, a real thing we're doing instead of a thing that I'm like covertly doing on my laptop, but you may have to break a few, a few rules. I'm not telling you to like disregard your policies or whatever. But if I'm being honest, like, I probably did a couple of things that certain people in IT didn't really like. But I'm the kind of person that I think you break a few windows because you know, the new house is going to be awesome. And that's just the way that it is. And so that's what I did. And it is more awesome now. So I'm okay with that decision.

When and why to use code-based over non-code-based tools

There's actually a question about around Tableau as well. Someone just asked, how do you think about when and why you'd want to use something like Power BI or Tableau versus when you do Shiny or R Markdown?

Yeah, so honestly, I am now, you know, I'm incredibly biased, like we all are. I prefer code based solutions for lots of reasons. I think they're much more flexible. I think you can integrate all of the awesome things that are in that ecosystem, right? So in Shiny, any R package, any R function, all the custom functions I've written in my custom packages, any new model, any new statistical method or theory that comes out is going to be in R, right? I mean, that's what the programming language sort of is for. And then I can immediately use that stuff in anything that I make. It's all code based. I can collaborate with code. I can do things like pull requests. I can do things like have a Git repo. I can do things like have it be transparent, have it be reproducible, have it be completely you can make it again with a button press, you can compile it all over again, because again, it's code based. I think that those benefits are just, they're hard to match with a GUI based system, just because it is a GUI based system.

Now, where I think tools like that are helpful is not everyone's a coder, not everyone's a data scientist. There are lots of very smart, capable analysts who know their business, know what the business needs, but they don't know R and they need something again, to get to that modern web based output, single click consumption model without needing to write code. And so that's why in my mind, it's a parallel service diagram of two different paths for those types of folks and for data science types of folks. But there are also lines going between them, right? I want those models to be used in the Tableau workbook and the Power BI workbook. There's collaboration there. There's the same sort of data infrastructure we're working off of, but there's two different parallel paths for that reason. But if I had my druthers though, like if I was starting my own company, I would absolutely be biased toward let's do everything with code.

But if I had my druthers though, like if I was starting my own company, I would absolutely be biased toward let's do everything with code.

Transitioning into data science and hiring

One was recommendations for someone to transition into data science. So they said they've taken R and Python training, but they don't use it in their current role. How do you get better at coding? I highly recommend literally working through R for data science from cover to cover. If you can work through that book and you understand everything in that book, you're ready, in my opinion, for an entry level job in data science. If you can show that you, like if a new candidate came to me and they had a GitHub repo with some projects they worked on for fun. I mean, like I remember when Flex Dashboard first came out like, you know, 84 years ago and I wanted to learn it. And I just had some, I knew that census data had like, or there was a package for like baby names. And I was like, and I was having a child and I was like, what if I made some shiny app using Flex Dashboard to understand trends about baby names? And so that's how I learned. So I did that and I had it in GitHub and then every project I did, I tried to keep there. So eventually you have this really cool portfolio. So even if you're not doing it in your job, you have work that you can show someone who's hiring you that you are interested in it and that you know what you're doing. And really like, I think that's all that matters at the end of the day. Can you show competency in that area? And I think that having that passion of like, I'm going to learn this, I'm going to build something cool on the weekends or in the evening also shows me that you're kind of the person I'd want to hire because I like passionate people who love what they do.

Yeah, I mean, I'm pretty, I'm open like at Brown Forman, we had a Python arm mix and we all collaborated. It's great. I'm open to that now too, but I think showing me competency in data science, again, I'm biased toward R, we use RStudio team as our sort of data science platform, showing me that you are comfortable in R Markdown, you know tidyverse, showing me code, showing me projects you've done in GitHub. There's also this piece though, that I feel can be lacking. I think that data science is so popular and it has blown up so much that some, I have run into people who are really good at that, but they really don't have a sound statistical theory. And I don't think there's a substitute for having just a basic understanding of statistics. Like what is this regression model thing that you're doing? What assumptions does it have? Why does it work? Like what's the central limit theorem? How would you guide someone if they had a dataset and they didn't know how to analyze it? And I think you have to have that other background. So I think that's something for folks to keep in mind if they don't come from like a stats background, that that's something they should at least think about doing. It's not just the R and the Python and the packages, it's also knowing the stats and how to use them and how to communicate them to the business. That's also huge. Like one day, like if you use logistic regression, one day think about how would you explain to like the VP of marketing what an odds ratio is? Like honestly think about that. It's not intuitive, right? Like try to get good at doing stuff like that and you'll be very, very valuable.

But one question we always like to ask at the end, Paul, is if people want to get in touch with you, what's the best way? Is it LinkedIn or Twitter? Yeah, I would say find me on LinkedIn. You can connect with me there and you can direct message me there as well.