Emily Riederer @ Capital One | Explicit design at the start of a project | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to the Data Science Hangout. Hope you're all having a great week. I'm Rachel Dempsey. If we haven't had a chance to meet yet, I lead our pro community at Posit. And so the Data Science Hangout, if you've never been before, is our open space to chat about data science leadership, questions you're facing, and getting to hear about what's going on in the world of data across different industries. And so it happens every Thursday at the same exact time, same place. So if you're watching this recording sometime in the future on YouTube later, the link to add it to your calendar will be in the details below.

Together, we're all dedicated to making this Hangout a welcoming environment for everybody. And so we love hearing from everyone, no matter your level of experience or area of work. And there's always three ways you can jump in and ask questions or provide your own perspective too. So you can jump in by raising your hand on Zoom. You can put questions in the Zoom chat. And feel free to put like a little star next to it if you want me to read it instead.

I will say, Emily, I have wanted you to join us here for so long. And we've also had a number of requests to have you on. Emily is Senior Manager of Customer Management, Data Science and Analytics at Capital One. Now, Emily, let's maybe get started by just having you introduce a little bit about your role and what you do and maybe something you like to do outside of work too.

Emily's background and career arc

Yeah, definitely. And thank you so much, Rachel. I've been at Capital One for about the past seven years. Over that time, I've held a number of different roles. But fortunately, both the data science toolbox and specifically R has really been like a through line for me throughout whatever I've been doing. Spent some time on in the more strategic analytics and modeling spaces, solving kind of very specific business problems related to our card business.

I'd say overall, I think my career has pulled me kind of like progressively further and further up the data stack. No matter what problem downstream, I'm really motivated by solving, I always realize there's some roadblock just one level higher. And I get really excited about like, well, how do I just solve that one pain point and that one? So after starting out more in that first line data science space, moved a little bit further upstream, when I got really passionate about inner source tooling, and bringing the best of what I saw from the outside, our community into my company.

And building internal packages to help us solve common business problems like customer lifetime value modeling, setting and forecasting KPIs, and even connecting to just different enterprise systems. And really enjoyed kind of like building out both those packages, but also really figuring out that you needed to also build out an inner source community around them. Then kind of continuing that trajectory upstream, moved more and more progressively into also adding in elements of analytics engineering and data engineering to tackle kind of standardizing, harmonizing, breaking silos, and that raw data input that would then flow through the tools and the ultimate data products.

Awesome. And what's something you like to do outside of work too? Big runner and big reader.

Inner source at Capital One

I love that you just started off with talking about inner source there, and that's something that actually came up on a Hangout, maybe that was last year, was Zach Garland at MasterCard. And I'm curious for all of us here too, like what is inner source really mean? And can you explain a bit about what that community looks like at Capital One?

Yeah, definitely. So in some ways, it's almost a funny thing to explain because I think to the open source community, it sounds so natural. It's almost like, why does this concept require its own name? But I think sometimes like in large corporations or just in different environments, you don't just have that nice synergy where someone sitting halfway in the world as you has happened to already solve your problem and put the answer up on GitHub. And so there can be a lot of, I think, redundant work, both in one person's project, like moving from project to project, or different teams solving like very similar problems, maybe like sometimes reinventing the wheel on even the framework or solving that problems.

So inner source is just really the idea of kind of both building kind of tools and an internal code base that multiple teams can contribute to over time. And as I mentioned, like that only also works if you like try to build up that same sort of community and enthusiasm and inclusivity and knowledge sharing around it to support it.

Yeah, definitely. So yeah, a lot of, I think you can like think about it, the definition pretty expansively. But I think the place I've definitely focused on it most is in the R package space. So hopefully, like one package for one thing, and then everybody around across Capital One is using it for that.

Yeah, that that's always been, like, been the goal. And I think the other really cool thing about that is really digging into like, the levels of abstraction. Not like, you obviously never want to constrict different teams from like solving problems in different ways, because different parts of the business inherently should be customizing and tailoring. But it becomes a really fun problem thinking, you know, what should be standardized, like, not everyone wants to write the glue code to like connect to database and get through a proxy. But where are the right degrees of freedom of like, one optimization method might be truly better for one type of problem than another.

And then it also kind of, I think, helps you aggregate those business use cases, and create that kind of body of knowledge of like, these are how different teams like, have solved this different problems, why they needed to like layer in different tools. And like, it becomes kind of a knowledge store, I think, for some of the complexity of any given problem.

Yeah, no, I mean, I think definitely, as best you can, I think, like, a lot of companies do use like a version control system, like GitLab or GitHub. So like, obviously, first step minimum bar is just like getting it out there. But I think there's a great little O'Reilly book on InnerSource, that one of the chapter headlines is literally like, just because you use GitHub doesn't mean you're doing InnerSource. I think in open source, we are sometimes spoiled by the extent to which it is, if you build it, they will come. They will not always come internally.

I think in open source, we are sometimes spoiled by the extent to which it is, if you build it, they will come. They will not always come internally.

I think like corporate incentives are different. Sometimes the culture of people just have no expectation that if they look, they might find something. So I think definitely, there's a little bit more of shoe leather beat the pavement in terms of talking to other teams, trying to go presented internal forums. But also, again, building up that community with things like, I leaned very heavily on like Slack channels to kind of like bootstrap and internal version of like Twitter, to try to get those just like, serendipitous say sync conversations going.

Something I've really been experimenting with in my own work is having a lot more of an explicit, like, design stage at the beginning of a project. And almost kind of, like, front-loading model evaluation with a, even, like, fake solution is, like, the first step versus the last step.

Communicating with non-technical stakeholders

Yeah, no, that's always a hot, like, enduring hot topic, and for very, very good reasons, I think. I think, like, a couple of different, I think, approaches that I tend to use there is, A, like, leading with the solution versus kind of, like, leading with impact versus leading with that, like, kind of process. And I think as many people know, I'm a, like, huge R Markdown fan, and maybe now I should say Quarto , but I've actually, like, come to the realization that I kind of need to even change my workflow for working with a tool like that because, like, inherently, like, an R Markdown workflow is kind of very linear of, like, I cleaned my data. I did the analysis. I got the outcome. But then from a storytelling perspective, I think you can really, like, get the hook and get the buy-in to, like, keep that conversation going if you lead more with the outcome of, like, this is why what I'm about to tell you is important.

And then some people want to really dig deeper and understand the mechanics. Some people won't, and I think it's also as, like, data person learning to accept that that's okay that they don't. I think all of us, it's very, like, it can really be about the journey versus the outcome, but accepting for, like, leaders, reasonably, it is often about the outcome. But then when I want to go into those details, I think I both either try to, like, lean heavily on both and or either of, like, metaphors and also, like, diagrams or one really compelling plot, just anything to make it more tangible versus, like, purely conceptual.

Yeah, no, I think that's a great point of tools versus analyses because tools are so inherently abstract, it can't make it harder. But I do think probably how I've translated it, leaders inherently, I think, care a lot about the people on their team, hopefully from the genuinely caring perspective, but at least from the managing capacity perspective, bare minimum. And I think kind of leading with the case, I think most tools, you can also bring a human dimension into that story, whether it's such a pain point that people have to, they're doing this manual work every single month, then that's inhibiting both them doing more interesting projects in their career development, but there's a way to automate this thing.

And there's a broader framework that I really like for thinking about this called the jobs to be done framework that kind of comes from the product management world, and essentially what that says is you can think about a tool as someone you're hiring to do a job. And I've always liked that, kind of framework for, like, thinking about the interview, thinking about, like, why should I, like, bother to, like, hire and onboard you, and is this tool more, like, almost going back to the AI discussion, is this, like, an intern that I only want to use with a very high level of supervision, or if you're thinking about CICD and process automation, it's, like, am I hiring an executive or a contractor to just go run this process for me in an abstracted way.

Open source trends and transitioning into data science

So I'm, like, definitely feel really lucky to be in an environment that's, like, pretty much always been very, very friendly to open source, and I think that is, like, I know a trend I've seen across the industry. I've had the opportunity also to do some consulting projects with other people, more into biostats and pharma fields, taking that leap to other proprietary options to the R world. And especially, I mean, I can only imagine, I think, the intersection of data tooling with the current, like, economic conditions, I have to imagine can only, like, kind of accelerate that trend when it's becoming an increasingly, like, easy to hire skill set, much more attractive option through the bottom line.

Yeah, no, I mean, I think there's three aspects I'd say to that. I think first, definitely, like, can be really helpful to understand, like, a company's tech stack when you're interviewing and the amount of data they have available that you don't want to be, like, hired to be a data scientist and then show up on day one and have them be, like, oh yeah, here's a Google Drive with some CSVs in it, like, go wild. You want to be sure they have, like, enough data to support analysis and that either you'll be empowered to have the tools to build out the data you need yourself, or maybe they also have, like, data engineering or other job functions.

Secondly, I think it's helpful to be crisp about what you want to learn and grow on in a role. Like, data science can be such a nebulous job title these days. You know, I mean, I think someplace, like, it can mean anything from BI, experimentation, modeling, machine learning, and I think really just clarifying your interests and then just being able to, like, articulate them. Like, companies, like, do not have any incentive to, like, hire you for a role you don't want, so I think, like, kind of being able to, like, share what you're looking for really just helps the matchmaking.

But finally, I think it's also really good to recognize, like, you can learn a ton in pretty much any role. The spending a lot of time on data processing, taking that just as an example, that is sometimes, like, the hard part, the complex part, the part that, like, requires actually still a lot of the data science skill set of understanding how do I shape this data in a way that, like, because I do understand the stats, because I do understand the algorithm, how do I, like, structure this problem in a way that the algorithm can understand the problem? And, like, at the end of the day, I think it's a funny conceit that in school we spend most of our time learning, like, hyperparameter tuning and typing that, like, model.fit, model.predict. That often isn't the hard part or sometimes even, like, the most interesting part, so I think being open with that, like, kind of growth mindset of, like, whatever job you end up in, there's going to be, like, a ton to learn and a ton of really interesting work to be done.

I think it's a funny conceit that in school we spend most of our time learning, like, hyperparameter tuning and typing that, like, model.fit, model.predict. That often isn't the hard part or sometimes even, like, the most interesting part.

I know we just got to the end of the hour here, and I'm sorry if we didn't get to answer everybody's questions. Emily, what is the best way for people to stay in touch with you? Is it through your website or LinkedIn or GitHub? Honestly, I think wherever people are these days, I'm still, like, for now, I'm still on Twitter. I'm still spending far more time there than I should. LinkedIn, GitHub, my website has my contact information, same with my email, like, my handle on pretty much. And, like, I have the fortune of having, like, an unusual enough last name that my handle on, like, LinkedIn, GitHub, Twitter, my Gmail, everything's Emily Riederer at wherever. So, yeah, please, like, don't hesitate to get in touch. Like, I just love getting to meet more of the community.

But thank you so much, Emily, for joining us today and for sharing your insights and experience with all of us. This was awesome. Oh, thank you. This was a lot of fun.

Emily Riederer @ Capital One | Explicit design at the start of a project | Data Science Hangout

Transcript#

Emily's background and career arc

Inner source at Capital One

Current projects and data infrastructure

Communicating inner source packages across a large org

Unexpected parts of the job

Contributing to the open source community

dbt and integrating with R

What has kept you at Capital One for seven years?

AI and the future of data science

Solving problems you don't know how to solve

Communicating with non-technical stakeholders

Open source trends and transitioning into data science