R & Python Interoperability in Data Science Teams | Dave Gruenewald | Data Science Hangout

ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We'd love to see you! We were recently joined by Dave Gruenewald, Senior Director of Data Science at Centene, to chat about polyglot teams, data science best practices, right-sizing development efforts, and process automation. In this Hangout, we explore working in a polyglot team and fostering interoperability (a word that Libby loves, but struggles to pronounce out loud). Dave Gruenewald emphasizes that teams should use the tools they are comfortable with, whether that's R or Python. Some strategies for collaboration across languages that Dave suggests include tools like Quarto to seamlessly run R and Python code in the same report. Teams utilize data science checkpoints, saving outputs as platform-agnostic file types like Parquet so that they can be accessed by any language. The use of REST APIs allows R processes to be accessed programmatically by Python (and vice versa), which can be a real game-changer. The newly released nanonext package was also highlighted as a promising development for improved interoperability. Resources mentioned in the video and zoom chat: Posit Conf 2025 Table and Plotnine Contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ nanonext 1.7.0 Tidyverse Blog Post → https://www.tidyverse.org/blog/2025/09/nanonext-1-7-0/ If you didn’t join live, one great discussion you missed from the zoom chat was about pivoting away from academia, including leaving PhD programs. Many attendees shared their personal experiences of making the difficult decision to drop out of a PhD program. The community suggested alternative terms like "pivot," "reallocating your resources," or being a "refugee fleeing academia" instead of "drop out." Dave Gruenewald shared that he himself left a PhD program but has "no regrets about that." Did you leave a PhD program? You're not alone! ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us! Timestamps: 00:00 Introduction 02:21 "What types of data do your teams use?" 06:53 "Which of the three pillars you mentioned is your personal favorite to work on?" 09:26 "How do you avoid or divert scope creep?" 11:41 "How much of the project should be "planning" before any code happens?" 13:53 "Do you feel like people are just hopping in and going, hey, LLM, make me a POC?" 14:28 "Do you give them what they say they want, or do you give them what they need?" 16:40 "I'm wondering what public data do you wish existed?" 18:48 "Why not Positron yet?" 20:43 "How do you unify as a team and make it so that I can always read everybody else's code?" 23:10 "Could you talk a little bit about how R and Python work together?" 27:28 "How to start package development with a team who are very new to package development." 33:01 "What's your greatest regret career wise?" 35:53 "What about your biggest wins, specifically in your early career?" 39:40 "How would you recommend building a data science culture and community from scratch?" 41:49 "Would you set a specific timeline for EDA, exploratory analysis, to scope the project better?" 45:15 "How do you define fun projects, and how much time do you allocate for exploration in those?" 48:21 "Does your team use DVC or something similar for data version control?" 50:00 "Can you talk a bit more about your pivot from academia into data science?" 51:31 "Any advice on where to look for opportunities in data science after getting a masters degree?"

Oct 7, 2025

54 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Posit Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I think that with that, I am ready to introduce our featured leader today. We have Dave, Dave Gruenewald at Centene. Dave, could you introduce yourself, tell us your full long title, what you do, and something you like to do for fun?

Hey, yeah, sure. It's such an honor to be joining on this channel here on the Hangout. So yeah, my name is Dave Gruenewald. I am in Oak Park, Illinois, just outside of Chicago. My title is Senior Director of Data Science at Centene. My focus is specifically on the Affordable Care Act space, or Obamacare, as others might know it. And yeah, I guess for fun, I'm always into finding kind of obscure new music or movies. I've lately been picking up the banjo, so that's been kind of a fun new hobby of mine. And yeah, really looking forward to all of your questions and just the conversation today.

Yay, and we have actually several people in the Data Science Hangout family that are from Centene. Javier is here, and his camera is on, so you might be able to see him. Javier has been here. Catherine Gerton is also at Centene, and she has been a speaker on the Hangout. I don't know if she's here today. She usually doesn't have camera on, but maybe she's here. Catherine is here, yay! Okay, Catherine. So, Centene family is here, even though they're all from different departments, I am totally aware.

All right, in order to get us started, Dave, I would love you to help us build a little context around the type of data that your teams underneath you use. So, we're all from different industries. We all work with completely different data types and data sources. Can you give us a little bit of background on that?

Yeah, I don't think we're any exception there. We have a lot of different sources. So, we are using data that's publicly available from like the government CMS. We house a lot of internal data on multiple different databases. We have both Snowflake and Databricks, and we do a lot of our kind of ingestion using kind of RStudio or VS Code, and then host a lot of that, our data science products, if you will, onto Posit Connect. So, we're kind of living and breathing in this Posit ecosystem, and I think it really does facilitate the work that we do.

Three pillars of data science work

You know, I think with any sort of data science work, I've kind of always viewed it as three different pillars, more or less, where I think the machine learning and modeling and AI is one of those pillars. But I think equally as important is going to be kind of the actual data science product, whether that's like a dashboard or a report. I'm going to call that just like the data science storytelling. And then the final component that I like to focus on is just process improvement and automation. There's so much that we can do with code that sometimes it's like, I guess I always refer to it as treadmill work, where you're kind of sprinting in place, but never really getting progress. And so, a lot of what I like to focus on is how do we automate this treadmill work where we can, so that we can kind of focus on the actual fun problems that data science can help solution for.

Yeah, the fun problems. And lately I've been sort of like offering up some topics that I know that our guests like to talk about. And I think that's something really funny from when I was prepping with you, Dave, was that there are these topics that I was like, oh, these are sort of like vague general topics. I wonder how Dave is going to talk about them or what he's going to find exciting about them. And then when I listened to you talk about them, I was so enthralled. I was like, yes, I am in such agreement with all of these things. Let's talk about this forever. Let's do a four-hour podcast.

So, I want to touch on just a few of those. One of them from Dave was data science best practices, which is a very broad phrase, right? But he let me know, like, this is about speaking the same language, the same baseline language. And you said that even if two R devs or two Python devs are working in the same language, they're going to have different things that they do, right? That like make it harder for them to work together or for the team to work together. So, you want to be able to have sort of a unification of standards and formatting and styles and stuff. There was also enterprise package development and dashboard development. Like, hey, maybe developing an internal package for your team or your org sounds like a really big lift or it sounds really nerve-wracking, but it's a really great thing. But the things that I really want to talk about, one of them was right-sizing development efforts. I am somebody who's going to over-optimize too early. It's a sickness. And I think that many people in data science want to do that as well. And Dave can talk about making the right amount of effort for the right projects.

And then the big one, which I know a lot of people want to talk about, is working in a polyglot team and really developing, managing, facilitating a polyglot team. All of the different ways that you can work together as an R plus Python or Python plus something else team. So I'm really excited to talk about all of those.

We always wait just a minute while questions roll in, but we have some questions already. And Zach had asked one that I will go ahead and ask for him because it's got an asterisk. Thanks, Zach. Which of these three pillars you mentioned is your personal favorite to work on? Ooh, that's a good one.

I think learning is going to be more associated with the modeling machine learning. I think that is very fun and it's really cool to make predictions. So I think there's some immediate gratification with that. The maintenance of those long-term models that are in production may be a little bit less fun, if I'm being totally honest. And so I think the process automation is probably more what I lean on because I think any time I'm in a project like that, I find that there are way more creative opportunities on how to make something work. And I think what's really neat in that space is being able to figure out what are the right tools for the job and how do you make sure that you don't over-engineer or under-engineer a solution there. I also think that being in a rather larger company, there's plenty of opportunity for this type of automation and just generalized process improvement. So I think that's actually just where I get more of my long-term satisfaction, but it's really hard to decouple all three because almost every project touches on at least two, if not all three of those pillars.

Avoiding scope creep

Yeah, it's like a stool with three legs. It can't have less than that, fewer than that. Thank you for the question, Zach. Noor, you asked a question. I feel like Noor asks really great opening questions and so I end up wanting to throw Noor into the ring early.

Hello, here I am entering the ring center stage. That's all the wrestling terminology I know. So it's more of a broad question, but I know as a data scientist in training slash data scientist, one of the things that always comes to mind is scope creep. It's like how do you avoid it or divert it because you're like, oh, if I do this, I can save myself time. Then you're like encroaching. It's like, this is not the plan at all. So how do you avoid it or divert it, given your experience, any tips or advice?

Yeah, no, that's a great question. Even the simplest of projects can have scope creep. One of the things that we've done that I think has helped with that is at the start of any bigger project, we do a lot of our planning in GitLab, but GitHub also is an option. There's a lot of project planning tools in your version control platforms. So if you're not using version control, please do it. If you are using it, please use the planning features. But one of the things that we do with that is we will actually, I have a template that we use that is basically let's state the problem. Let's state the proposed solution. Then what are the potential rabbit holes of as you're working on this project, you will likely come across this issue. Has it already been solved or is it something that we should just avoid entirely? Then finally, we'll have another section on there called no-goes. What that is is like, hey, it doesn't matter if somebody asked for this or not. This is totally out of scope of this project and we're making that known upfront. That could be like, hey, if we're doing some process automation, that's not a dashboard project. Let's make those two separate projects.

Once we have refined what this storyboard looks like, we'll actually give it in front of our stakeholders so that they can say, yeah, actually, this seems like a really good plan. Or actually, what if we pivoted this? Maybe it's not answering the question that we actually want. I think a lot of those project requests that we get is if you've ever had any experience with stack overflow, you read the question and then you understand what's the question that they're actually getting at. I think by doing some project planning upfront, you're able to better craft of a solution that's like, maybe they didn't ask the question exactly how it should have been. This is a little bit of an opportunity for discovery without much overhead and also keeps you on target for what you're going for.

I love it. My follow-up question is, how much of the project should be planning before any code happens? Before any code is typed?

That's a tough question because I've also seen plenty of teams that spend too much time in the planning phase and not enough time in the doing phase. I don't really have a solid answer of, oh yeah, you should always spend 10% of your time planning because some projects that are small end up taking more planning and sometimes big projects are pretty self-explanatory. I would say never do a project in total isolation if you can avoid it. By just bouncing ideas off of a colleague or even your stakeholder, I think you're going to have a better product and a more well-defined scope than just jumping straight into coding. I'm not an artist, but I would also assume that I'm not going to just get some paint and just go at it. I would have a plan, I would do a sketch, a mock-up, and I think coding is very much the same. It's much easier to do planning work than it is to actually put it to code.

It's much easier to do planning work than it is to actually put it to code.

I'm going to say more of a skeleton because you don't want to over-engineer it and there's always going to be some hang-ups or maybe some things take shorter time or more time than expected, but you should have that roadmap carved out. Now that you have that, it's more of just putting those thoughts into code rather than coding and figuring out where you're going with it.

I think LLMs have probably changed this development cycle a little bit because the way that I have always worked and taught as well, before LLMs at least, and still now, write your script in pseudocode first. You plug your code in later, but write down your plan for what you're going to do in pseudocode, which is just plain language, comments and stuff, and go from there. But now with LLMs, do you feel like people are just hopping in and going, hey, LLM, make me a POC, and then they work from that like it's their blank canvas?

Yeah. I don't know if there's an answer to that. I think that is actually a pretty phenomenal use of AI just to ideate and help lead what direction you're going in. Yeah.

All right. Well, there's an anonymous question that was along the lines of this that we can tack on, and it is, do you give them what they say they want, or do you give them what they need? How do you approach that conversation?

That is a very good question. I think it really depends on your prior relationship with whatever stakeholder you're working with. If you have that working relationship, obviously, you should try to actually create a solution that you think is the best solution possible. I mean, really, this feels more like a question about communication. And while I think the planning phase is important, I think it's also good to just have some check-ins throughout the process. If you're going to be working for something, say, for six weeks, it seems kind of crazy to just keep your head down, work on something, and pour your heart and soul into it, and then at the end of those six weeks realize it's either not needed anymore or what you built wasn't really answering the question at hand. And so I do think that you don't need to be over-communicative, but I think just the occasional check-in on a weekly cadence or bi-weekly cadence to say, hey, this is where we are, this is where we're heading, is really going to make sure that you're answering those questions. Are we still going in the right direction? Directionality check.

Working in a polyglot team

Can we follow this line of the data science best practices? Let's say I have a team of four devs, and they're all working in R, but they all come from different backgrounds. Maybe one of them is hardcore tidyverse . Maybe one of them is hardcore base R. Maybe they're all tidyverse, but they all write in a different style. How do you unify as a team and make it so that I can always read everybody else's code, they can always read mine?

Yeah. Yeah. It's an ongoing challenge. I think the low hanging fruit there is just kind of agreeing upon a style, which seems kind of pedantic, but you'd be surprised how many hot button topics you run into when you do that. But I think the goal there is anytime that you are reading somebody else's code, you're going to have some unfamiliarity with what they're doing. Why make it more challenging with it also being an unfamiliar style or syntax or unique packages that you're not familiar with? And so, I think coming up with a familiar style guide that you all are comfortable with is a great starting point, but that's not the end of the conversation. I think it's also what are almost gold carded packages, if you will, where it's like, yes, of course, we'll use this package. What are some that you're like, actually, yes, I see the benefits, but let's try to avoid it if we can. Mainly because while the open source community is amazing at creating new content, there's also a wide array of how these packages are used and what that like format and syntax looks like. And so, I think it's a little bit of just refining what are you focusing on, but the goal here is not to like, I'm not negating the individual, but I should be able to look at any one of my team members' code and not be able to identify which person wrote it because we're all in a unified kind of format and style of how we're writing our code.

That gets more challenging when you have R and Python and maybe Julia, if you're a go-getter here. But I think that within those languages, there's still some familiarity there that like you should adhere to.

Yeah. I saw a question pop up in the chat as you were talking about that that was like, what if it's R and Python? Is there a unified style guide that's going to work for everybody in that case? I think that's a lot bigger challenge, but this does lead us into some sort of like questions about R and Python together and how that works. I know that a lot of people at Centene use R, but you do have Python users too. Could you talk a little bit about how that works together?

Yeah, absolutely. I mean, I think in this day and age, it's more about use the tools that you're comfortable and familiar with. I don't really want to prescribe, no, this has to be in Python, this has to be in R, unless there's an actual platform limitation that kind of forces our hand there. But where I think that like the expectation that if you're incoming Python developer or an incoming R developer, you shouldn't be the expectation that you also are like an expert in the other language that you don't know. I think LLMs are actually a great example of leveling that playing field of just using it to refactor some of your code. That's not always an option. And so I think one of the things that's worked out really well for us is the different types of, I mean, Posit has actually provided a lot of examples of this. I'm going to talk about Quarto , where you can seamlessly run R and Python code kind of in the same report, being a little bit more language agnostic. So I think that's a great example.

I think another thing that we do a lot of is, again, I'm making up a lot of terms here, but I'm going to call it a data science checkpoint, where say I'm running a process in R, and now I've got my output. Let's save it as a parquet file that now R or Python can use. And so you're kind of, yes, I'm running my process fully in one language, but now I'm saving the output so that any other language can now use that. And you can host that, say, again, on pins in Posit Connect or an S3 object, Databricks or Snow. I mean, you have so many different options, but making sure that your colleagues and the other language can still access that is kind of important.

And then finally, I think APIs are still criminally underutilized of creating your own REST API and hosting that. And so that way I can run Python code, even though I'm only an R developer, or I can run a R process that's hosted on, say, a Plumber API, but being able to access that through Python is kind of a game changer. And then, you know, going on down that list, there's also reticulate is always an option. I will be honest, that's kind of always our final process rather than, like, what we start off with. And then I actually just saw on the tidyverse blog, there's a nanonext just was released, and that seems very promising on, again, just kind of being able to operate with other teams and whatever language they choose.

I think APIs are still criminally underutilized of creating your own REST API and hosting that. And so that way I can run Python code, even though I'm only an R developer, or I can run a R process that's hosted on, say, a Plumber API, but being able to access that through Python is kind of a game changer.

This is actually something that just came up with our Data Science Hangout crew. Last night, we were all talking about it. We were all online going, oh, my gosh, have you seen nanonext? What's happening? So I feel like it's still so new that we're all, like, reading through the examples and going, like, oh, I have questions. But let me put something in the chat so that we all know what we're talking about. I want to say that at a really high level, nanonext is going to be a way to pass information between R and Python not using reticulate and having it be, like, an open and closed connection. I have a lot of questions still, but it sounds really, really promising.

Package development for teams

There was a question about how to start package development with a team who are very new to package development. This is a fantastic question. I find package development inspiring myself. I've taken a workshop on it, but I feel like if I'm on a team of people who don't have a package already, I would be reticent to, like, raise my hand.

Yeah. I think just thinking back to whenever I first started my data science career, I kind of always viewed package development as almost these, like, coding gods that are the purveyors of how code is run and, like, who's pushing the envelope the most. And I'm not diminishing that. I mean, I still think there's just, like, phenomenal packages developed out there. But they've also almost, like, Promethean captured the flame and brought it to us. There's so many tools that make the learning curve of getting started with package development so easy. If you're in R, you've got dev tools and use this, which do a phenomenal job of kind of handholding and making sure that you are, like, creating that package skeleton correctly and how to use documentation. I think it does a really good job of rewarding you for sticking to those standard conventions without it being, like, you have to hit every single thing perfectly to have something working.

So, like, with all things, let's not start totally complex. Start with kind of what's code that I just constantly am copying and pasting between projects and how do I just get that wrapped up into a function so that it's rather than copying and pasting 50 lines of code, what if I just write a single line of code with three arguments? So, I think if we're talking R, the R package book that I think we're on the second edition now, if you do nothing else, just read that one chapter called The Whole Game, and I think that gives you a phenomenal bird's eye view of what's going on. And then you can use the rest of the chapter as you kind of need to. Again, this is just, like, if you're really feeling discouraged from starting, that's, like, a really easy starting point. And then poetry within Python, I think, is just another great framework. So, yeah. Like, there's options for both. And I think the overhead for poetry is actually pretty low. So, I would recommend checking that out if you haven't.

Thank you. That's exactly what I was going to say. Wait, it's not just R. You can do the same thing in Python. Also, I will say, if you have something that you're using over and over again and somebody else needs to use it over and over again, some, like, really tiny step towards this, like, I'm going to share my functions with other people, if you have a private repo where you can get on GitHub and just, like, host a .R or a .py file to share with other people, they can use their raw link to that file in GitHub to source. So, like, you can literally type in your R code, source, and then put in the URL to that .R file in GitHub, and you will source the functions that are stored in that, right? Like, maybe that could get you over the hump of being nervous about it, being, like, are my things that I'm developing useful to other people? They probably are. And then if people use them and like them, you know, like, okay, I can go make an R package out of this, and people will find it useful.

I would actually even say a great starting point is creating a package that uses your company theming. Oh, yeah. That gives you, like, immediate buy-in from your stakeholders because it's, like, oh, this follows our the same company colors. It has our logo on it. You know, it's, like, small things like that, but that instantly builds trust with your stakeholders and makes it feel like it's a more polished, finished product. And, you know, the stress of writing a theming package is pretty low. So, if you do maybe make a mistake, it's not going to be, oh, no, but, like, our company's stock is halved now. It's very low stakes, and I think it's a great exploration of how to actually do package development.

Career advice and pivoting into data science

Sure. Hi, Dave, and feel free to parlay the question as well, because it's not a fun one, but what's your greatest regret career-wise? Huge caveat, guys. I am a big fan of, like, we got to where we're supposed to be right now, despite our stumbles and our falls and our failures, so please don't take this really as a negative thing, but it is nice to look back sometimes career-wise and say, like, goddammit, if I had done this one thing, right, or if I hadn't, I don't know, flipped off my boss, whatever that thing is, so it was kind of one of those framed questions. Thanks, Dave.

Yeah, no, that's a great question. I'll give you my totally honest answer, which is, like, I still really love ecology, and I wish I was in that field, but unfortunately, like, career and the society we live in, that just wasn't in the cards for me. So I do kind of, like, still just kind of look out like a kid with his hand on the window in the rain looking at some ecology work, but I would say advice that I can give rather than, like, actually, I can't say I'm, like, swimming in regret with career, but one thing that I will say is that don't, like, do not burn bridges, because it is crazy how often that actually does come into play. So I actually was at Centene. I left for another, actually, I worked at Posit for a year, and then I ended up coming back to Centene, and so I think that's a lot of just one of many examples of how keeping doors open is probably the best thing that you can do for your career.

I love it. Ralph, when Ralph Asher was on, he talked about preserving optionality, and I think that that really is along these lines, right? Like, it's one of the reasons why, not just career-wise, but, like, I live in a very inexpensive house in a very inexpensive area, and I drive a very old car because I am preserving my optionality, because if I lived in a fancy house in a fancy area with a fancy car, I would not have as many options as I do in life for making decisions or changes.

All right. I'll also add, like, I mean, I was also in a PhD program that I ended up dropping out. I have no regrets about that. I don't think it was a great fit for me, and I'm, like, happier with what I'm doing now, and so I know at the time, though, that was very stressful and, like, felt like a very dramatic decision to be made, and now being years away from that, it's kind of like, oh, yeah, that was actually the right call, and I'm glad I did.

Yeah. You're not alone. There's many, many people in the data science hangout space who stopped their PhD or just sort of pivoted or mastered out, like, whatever it is. I considered a PhD for a while, and I was like, I just don't think that this is for me.

Well, Arsenis had an add-on to this, which was what about your biggest wins, and he specifically said in your early career. Yeah. Are there any things that were, like, big wins that led you in the right direction?

Yeah. I think a lot of it was, I mean, I guess, again, speaking from personal experience here, coming from ecology and then pivoting into data science, knowing it and, like, knowing how I did a lot of modeling and coding in grad school, it felt like a pretty, like, within reach, but I do feel like it was a tough kind of, like, career pivot to break into, and so I think a lot of that was how do I take what I, like, professionally have on my resume and kind of shape that up to look like, yeah, I am a data scientist. I am capable of doing this work, and so it wasn't just like, all right, went straight from ecology into data science. I did work in public health for a little bit at the CDC, and then I worked as a statistician for a children's hospital, and then slowly but surely that kind of lent itself towards doing more of the work that I actually wanted to do, and so part of that was taking gigs that maybe I was doing more Excel work, but I was able to just kind of do coding off to the side for some of those more challenging opportunities, and so I think it was a little bit of a kind of a juggling act of both doing what was requested of the job and then making sure that I was doing other parts of my job in the way that I wanted to keep developing. I would say that, I know I'm rambling a little bit, but I do think that was probably the better thing that I've done in my career is actually making sure that I had, it's not so much show up for the job that you want, but, like, how do you kind of force your job to grow in what you want to grow in?

Building a data science culture and fun projects

Yeah, great question. I wish I could say that challenge goes away. It does not, but I think it's more of, I think of it more of like a stick-and-carrot approach of, okay, if you were to do this, and that kind of scratches my back, what is the benefit for you? And I think it's showing them, like, okay, normally this work takes me 80% of my day is just cleaning that data before I can even start the analysis. If you are able to organize your data, we can get a lot more productivity. I can focus more on the exciting projects of actually answering these new questions and driving the business forward or driving your research forward rather than spending your time with that data cleaning and cleanup. There'll always be some data cleanup, so let's just be honest there, but how do we minimize that effort? Because at the end of the day, that's, I'll classify that as non-value-add work because cleaning data doesn't necessarily answer any questions. It's just the necessary step before you can start answering questions. So you always want to minimize what is your non-value-add work, and I think communicating with that, but also showing them exactly what would be the benefit to them to do this is kind of your ticket.

There was a quick question that I thought we could cover rapid-fire since we only have 12 minutes left, and that is, would you set a specific timeline for EDA, exploratory analysis, to scope the project better? And that's related to something that I asked earlier on, which was like how much should be the non-coding part, but this is actually the coding part, right, where you're exploring.

Yeah, I would almost wrap that as kind of the weird no-man's land between project planning and getting the project work done because if you're not doing any of that quality control or just data exploration, you're likely going to just have either invalid answers or you're going to draw conclusions that you shouldn't be drawing. So, yeah, absolutely, any project that you do, you should be familiar with what are the nuances and biases of the data that you're working with. I think that really is just going to fit based on what is the severity and priority of the project that you're working on. Depending on that answer, it kind of depends on how much time and effort you put up front, but I can't imagine doing a data science project where you're not also exploring the database for missing values or weird duplication or what have you.

Yeah, my two cents is the whole thing is EDA. There are things you are not going to figure out about your data until you start modeling it and you're like, oh wow, wish I had figured this out in EDA, but it turns out modeling the data is a really fantastic way to find things out about it, right? And then, yeah, you might go on exploratory journeys where you're like, oh man, I really thought that that variable represented that thing and it turns out in that database it doesn't. And there's somebody in a back office somewhere who's going, ha ha, yeah, don't use that variable. That doesn't do what it says it does. And you're just not going to know until you get through your process. But that is your chance to document that somewhere so that the person after you doesn't chase that same wild goose, I guess.

Thank you so much for that question, Anonymous Asker. I guess on that note, I would add on that it's probably smart to have some sort of knowledge share. We have a Quarto knowledge share where anytime we find out weird nuances like that, it just becomes a post in our Quarto blog to basically make sure the whole team is aware, hey, this database is a little bit odd, right? This is how I would recommend doing it. Or like, hey, we have that variable called that thing, but it doesn't actually do that thing and it's not that granularity or whatever it is. Or that's an old variable and now we're using this other one that's not named as intuitively. That happens all the time.

So kind of at the beginning of your talk earlier today, you actually mentioned something kind of in passing, which was fun projects. You were distinguishing between some of the different types of projects that you have to do. So I'm wondering if you might define what that means for you and what you think it means for a team in general and how much so there are two follow-up questions with this. One of them is how do you determine how much time to allocate for kind of exploration in those fun projects? And can you share an example of a time when you kind of went through one of those with your team?

Yeah. Yeah. Great, great question. I guess I will define this more as like, I don't know, there's type one fun, type two fun of like it's maybe not super exciting in the moment, but it is fun in retrospect. I guess there's also type three fun, which is like not fun in the moment, wasn't fun in retrospect either. But I would kind of define this more as, I mean, we talked about resource allocation earlier. This is kind of like a skill set resource allocation where your projects that you take on, I think you should always have one project that's playing to your strength, one project that's like maybe a stretch and you're learning something like completely brand new and like trying to develop a new muscle. And then one project that maybe isn't necessarily tied directly to business need but like could be in the future. And I think that like sometimes those end up paying off the most. And so I think it's less so me being prescriptive of like this is exactly the type of project that you should do and more so that you kind of have this like variety of projects going on. So that way if one really kicks up, maybe you can devote more time towards it. If one is maybe like a little bit dreaded to actually like move forward on, you can have productive procrastination, if you will, by working on these other like passion projects.

One thing that we do on my team is actually everybody should have a defined passion project where it is kind of created and shaped by themselves and like exploring new tools that they want. And so one example, I have a team member that's developing an LLM around well like public information based on earning calls from our competitors. And so I think that's like kind of a fun project where there's not really like nobody's really asking for it. I think it's a great time to explore how to use these LLMs and potentially this could end up being something that's like really useful to interface with. But the expectation is that it's more about learning the tool than it is about like the end goal there.

Finding opportunities and closing thoughts

Yeah, that is a great, I mean, I feel like that does like change pretty regularly. LinkedIn used to be a great spot for that. And I think you can still find opportunities. But, you know, I'll be forthright in saying that, like, I get so many messages, even when I don't have positions open looking for positions, that it is hard to kind of stand out. I think things like this are a great way to see what's out there, connect with people, and actually have more of a personal connection with somebody before you apply. I think that's like a phenomenal start. I also think conferences, I'll be at the PositConf here in two weeks. That's actually where I first got my job at Centene, just being totally forthright there. That's a good opportunity to meet people, see what they're doing. And then if it's something that you're excited or interested in, you know, give them your contact info. You never know when those things are going to pan out. I met Centene at the conference. And then I think it was like five months later, they said, hey, we actually have a position open that we think you might be interested in. Turns out I was. But again, I think that goes back to don't burn bridges and just like keep opportunities open. Because it may surprise you when those actually can kind of come back to you.

I also think conferences, I'll be at the PositConf here in two weeks. That's actually where I first got my job at Centene, just being totally forthright there. That's a good opportunity to meet people, see what they're doing. And then if it's something that you're excited or interested in, you know, give them your contact info. You never know when those things are going to pan out.

All right. Well, it's the top of the hour. I want to thank everybody for hanging out with us and asking such great questions. We didn't get to all of them. There were some other ones that were a little too long to answer in the time that we had. But I'm so glad that you all hung out with us. If you would like to save the chat, if there are some resources, anything that you want to save, you can click the little three dots in the top right of the chat. And Dave, thank you so much for hanging out with us today. I hope you had a good time. It was so much fun. Thank you so much for having me. It was great to meet you and kind of chat here. Thank you. Yeah. Thank you, everybody. And next week, if you hang out with us, we have Mike Thompson, Data Science Manager at Flatiron Health. If you would like to join us on our kind of journey, community journey, make friends, make personal connections in the data science space. Just like Dave said, this is the place to do it. Go connect with your fellow attendees on LinkedIn. Ask them for Zoom chats and be buddies. All right. I'll see you in one week here on Zoom. Goodbye, everybody. Have a fantastic rest of your week and weekend.

Featured software#