Resources

Data Science Hangout | Marcos Huerta, CarMax | Translating Academia Experience to Data Science

We were joined at the Data Science Hangout by Marcos Huerta, Manager of Data Science at CarMax. A few snippets from the conversation with Marcos at 36:40: What was the most effective asset that you had that led to your transition to your current position? ⬢ A willingness to just try and teach myself new things. In graduate school, I'd done a ton of data analysis, but it was all in an obsolete language that wasn't going to help me. I had to teach myself Python and R. I think the openness to trying that was key. ⬢ I gave myself a project to figure out how to use Python and understand classes and object-oriented programming, which I did not understand 10 years ago. ⬢ I do think that my work experience -- because I had done a lot of non-technical stuff -- helped as well. I had this record of professional accomplishment that maybe wasn't technical but people knew I could think and I had this track record. How do you do the mental folding to translate the many years you spent getting a PhD in astrophysics to a different position? ⬢ The first step for me was not to data science, it was to the science policy world. I think I always had this interest in politics and the government. That first transition came because of these talks at Rice that happened once a month about non-academic careers. Someone who had done science policy and worked at the National Academies of Science and as a Congressional Science Fellow came and gave a talk about what she had done with her physics or astrophysics degree. That really fascinated me. ⬢ From science policy, data science was more practical. There's a ton of jobs at the entry level for science policy but as you start to work your way up I was running out of things to do. There become fewer and fewer job openings. ⬢ Turns out when I started itching that part of my brain again, I really enjoyed it. I enjoyed the Data Incubator. I enjoyed trying to do the swirl lessons in R, building some Shiny apps, etc. Once I got back into doing technical stuff, I found it was still very satisfying. *non-verbatim transcription, summary of a few insights ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We'd love to see you!)

Nov 17, 2022
1h 0min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody, welcome back to the Data Science Hangout. So nice to see everybody. If you are joining for the very first time, what is this? So the Data Science Hangout is an open space for the whole data science community to connect and chat about leadership questions you're facing and getting to learn about what's going on in different companies and industries in the world of data science. So these sessions are recorded and they're shared to the RStudio YouTube, as well as the Data Science Hangout site, which is going to be brand new on the new Posit site pretty soon here. But you can go there to rewatch and find helpful resources.

Together, we're all dedicated to making this a welcoming environment for everybody. So we love when you all can participate and we can hear from everyone, no matter your level of experience or area of work. So you can ask questions a few different ways. If you've been here before, you know the drill that you can jump in by raising your hand on zoom, you can put questions into the zoom chat. And feel free to just put a little star next to it if you prefer. I read it out loud instead. Otherwise, I love to just call on you to read it and and add some context too. And then lastly, we also have a Slido link so you can ask questions anonymously and Tyler or Hannah will share that in the chat in just a second here.

You can ask questions these ways, but you can also jump in and add something to the specific topic if you have other perspective you want to share. It doesn't just have to be questions. We love to hear that too.

And then if there is something that you're really excited about and you want to keep the conversation going, I would love to have people connect in the LinkedIn group too, which I'll put in the chat again in just a second here. But I am so excited to be joined by my co-host for today, Marcos Huerta, a data science manager at CarMax. And Marcos, I'd love to turn it over to you to have you introduce yourself and share a little bit about your role and background and maybe something fun you like to do outside of work too.

Marcos's background and career journey

Sure, sure. Hi, everybody. It's fun to be the co-host this week. Thanks, Rachel, for having me on. So yeah, I have a PhD in astrophysics from Rice University. Completed that quite some time ago at this point. I actually spent about a decade working in Washington, D.C. in science policy. I worked for a member of Congress. I worked for my professional society as a science policy fellow. I was a McCall fellow. I worked at the U.S. Department of Energy Office of Science for about five years. I had a lot of titles, special advisor, senior advisor, special assistant. But at the end, I was basically the chief of staff of the Office of Science.

But when that position ended in January of 2017, I was kind of looking for other stuff, kind of ended up working, you know, part-time contract gig at University of Texas, Austin. But eventually, I did the data incubator, data science boot camp in fall of 18, back in the before times when we could do things in person. And so I can talk a little bit more about how I got to the point where I could do that. I, you know, had, I could use IDL for my science in graduate school. Has anyone ever heard of IDL? If you use IDL in the chat, IDL fans, like, put a post in the chat. But, you know, this is proprietary C-like data language. That's what I wrote, like, my thesis in. So that was not a programming language that was going to be of any use to any data science-like job I might want. So I taught myself R. And I taught myself Python.

Actually, I found my first commit for a Python project just doing, like, non-data science in Python in, like, 2015, it looks like, when I was playing, started playing around Python. And then my wife and father-in-law actually suggested data science as a career track. And I, and I started, I did Swirl, right, to use to teach myself some R. I got RStudio. I think I originally was using R, and then I found RStudio, which was much nicer than R, so I started using RStudio. And I think, I guess I did the Hopkins Coursera courses, which are kind of, like, kind of very similar to Swirl, right? They're kind of using some of the Swirl stuff. And that's how I, because a data incubator has, like, a multi-stage application process where they give you, like, data sets and questions, and you have to, like, type in the responses. And so I used R for all of that, right? That was kind of my thing. And then I think I discovered Shiny somewhere in there and started playing around with Shiny apps just for kind of fun side projects. And then eventually got into the incubator in fall of 18, and then, well, that was mostly a Python-driven program. But I did use Shiny for my final project, at least for the web interface for my final project, my capstone, as we called it.

CarMax hired me. I got hired by CarMax on the very last day of the incubator. So we were at the little going away party, and CarMax got a call from my current boss at CarMax and making me the offer. And I've been there since. I work in the appraisal lane. CarMax will buy cars from anyone. We obviously use algorithms to help us make those decisions and those offers. And then recently, we've started about a year and a half, two years ago, we started doing this online. We go to carmex.com and type in stuff about your car and give us your VIN, and we'll give you an offer on your car. And that really has been my main focus for the last, well, however long we've been working on that, several years, the algorithms, the systems behind it, et cetera. Yeah, so now I'm a manager. Data science, I've been there, I'm almost getting close to four years now. It'll be four years in January at CarMax.

So happy to talk about the career transition stuff, happy to talk about the tools and the systems we use. Yeah, absolutely. Okay, one more quick thing, what do you do for fun outside of work? Something I do for fun. It's funny. I used to play a lot of softball and I was going to play... We had the CarMax finally organized a game like last Friday, but I was in Texas, so I missed it. So I'm hoping that they will come back and we can play some softball in the spring. I mean, everything's going to be sore after swinging a bat for the first time in years, but that is something I used to do a lot. I used to do a lot in DC. I had a team in DC that was a lot of fun. Hopefully, I get to play that again. These days, I do a lot. There's a lot less time. I have a three-year-old, so I just want to spend time with her, of course, and the small windows where there's calm, I play games. I do a lot of side projects, ironically, with coding and stuff. Some of my work colleagues have scratched their head that I program outside of work hours, but I do that too.

Transitioning into data science

That's great. So thank you for the background too and sharing your journey. For some people who maybe are thinking about making the switch into data science, what is, I don't know, maybe some of your lessons learned in moving over? Sure. So I mean, I had a quantitative background. That certainly helped. I think both for, I don't know how much it helped with my resume, but I'd done some quantitative stuff in graduate school. I'd analyzed a lot of data. I could put a bunch of big numbers about how many megabytes, gigabytes of data I reduced and stuff. So I think that helped. But I think to me, the best thing about the incubator bootcamp, this might be true of other programs, it reminded me that I could still learn stuff, and relatively quickly. The incubator, the way it works, it's like every week, one week is like web scraping, one week is scikit-learn, one week is PySpark. It's just like boom, boom, boom. And at the end of every week, was I a master of the new thing I'd learned? No, but I'd kind of gotten a decent handle on it, right? And just the ability, just knowing that I could learn again, knowing that in my 40s, not a state secret, hold on, that I could like, it was like 30s back then, I could still learn all of this stuff and kind of figure it out. It's kind of the anti-impostor syndrome. I went from thinking, well, I don't know what data science is, to like, hey, actually, I can kind of pick up anything in a week if I work on it full time. That was a very liberating kind of thing to figure out. But it will sometimes feel overwhelming, like all career changes do.

I went from thinking, well, I don't know what data science is, to like, hey, actually, I can kind of pick up anything in a week if I work on it full time. That was a very liberating kind of thing to figure out.

Starting at CarMax and overcoming imposter syndrome

So making that switch over and jumping right into the role at CarMax, how did you find that transition into a management role? Yeah, so I started as a senior data scientist. My title is manager, but it's really more managing projects than it is people management. But at the beginning, the first few weeks of CarMax as a senior data scientist were very intimidating. It was still in person. People were nice, but I just didn't know anything. I didn't know what unit test was. I didn't know. We were going through these code reviews. People were making jokes about how they were working on unit tests. And I'm like, what is a unit test? Like, I mean, I knew so little about anything, right? So it was just very, felt very like, oh my God, what have I got myself into? Like, what am I doing? Like, why do I do this? Like, and I moved my whole family to Richmond. You know, my wife was about, you know, was pregnant.

So it was just a very scary transition. But, you know, slowly but surely, I realized, okay, I kind of understand this. I'm kind of getting my head around that. I made my first little pull request where I changed one little thing and added like mean absolute error method so we could get that to pop out of something. And I was like, okay, I kind of, I'm kind of getting this. I kind of know what a unit test is now. And then slowly but surely, the imposter syndrome started to melt away and I realized that I could add a little bit here or there. So the transition was tough though. I mean, like, you know, obviously when the pandemic hit, you know, I've been insulated, not insulated, but I had been in nonprofits and in government through all previous economic troubles. So like, you know, I was not particularly worried about my job, but like, you know, pandemic comes, stores are shut down. Oh, if we can't sell cars, we don't have any money. This is like, this could be a problem. And, you know, people started getting furloughed.

So that sort of, you know, obviously it's just, that was a very, not shocking, but it was just a very visceral sort of thing. Like when we had all the economic, you know, problems with the pandemic or later shutdowns. So that was very different. But overall, like, you know, CarMax had hired some other PhDs right before me and after me. So like, there was still a real, it kind of felt in some ways, once I got into it and I got into a rhythm, the senior scientist role felt like kind of being back in graduate school. You know, I was like looking at data all the time, I was making plots. The difference was instead of as an astrophysicist, kind of tunnel vision on my one little star that I'm looking at, no one else knows anything about, like I'm touching like a part of an algorithm that like the whole team is working on. Right. And there's a lot more interaction, a lot more feedback, a lot more people that can directly help me. Right. Because they know like they've worked on something similar before. Whereas I felt like as an academic astrophysicist, like I kind of was the expert on this little thing and, you know, yeah, I'd go to a group meeting and talk about it, but then everyone else went back to working their own little things. And there wasn't really a lot of cross, you know, connections. Whereas at CarMax, like, hey, we're all trying to buy cars profitably. And there was a lot of good camaraderie, it got good, you know, especially back in the very beginning when we were all in person, you know, with people looking at my screen and like telling me and showing me the code, where the code lied. It was very collaborative and it was like a better version of being, I guess it wasn't for some, you know, purely let's figure out how the universe works sort of thing. We're trying to sell cars, but still, it was very satisfying. In the end, it felt very satisfying, like trying to iterate and solve problems.

The online appraisal algorithm

When you say like building an algorithm or sometimes I'm like, oh, the algorithm, like, I know exactly what that means, but like, what's an example of like a project that your team works on? Sure. So the online appraisal algorithm is a good example. You know, you know, someone has to go to CarMax.com and type in a bunch of information about their car and we have to figure out what offer to make on that car. Can't get into the super details, but obviously there's some machine learning aspects of that, which I worked on and built out a lot of machine learning algorithms that could, that could help generate, you know, that, what the value of that car was worth. And that was a multi-month effort to kind of come up with new machine learning. But there was also like a lot of system stuff, because we had to, we had to suddenly respond to like these real-time requests, like the way the store systems work, like, you know, I don't know if everyone's ever sold a car to CarMax the old-fashioned way, but you go in, they look at your car, you sit around for like a while, right? Like, you know, the time, the time, the time of the computer doing something is tiny compared to like the time it takes you to go and sit in the seat and wait to get an appraisal, right? Whereas if you're doing it online, you want to click submit and see what your car's worth really fast, right? So that was a whole new paradigm, like response time became a lot more important than it was. And our systems had to be rejiggered. We ended up using Azure Service Bus, ended up writing a ton of Azure Service Bus Python code. Who knew I was going to be doing that, you know, five years ago, to get the systems kind of in line to be able to handle like this new paradigm of like, hey, we're going to have to respond to these requests fast and, and send back a response fast, you know, with our partners that were obviously building the website and stuff.

So it was a lot of, there was some system work, there was some machine learning, there was, you know, we have to, certain things we want to look up about the car that we have to go do these relatively fast lookups and make that work. So, so kind of pushing all that together, getting all this stuff into production, all these, all these things productionalized so that we could do this live in real time. But that was, you know, a big, that was a multi-year process. I mean, I'm making it sound like it's just something we did. Like there was like, obviously we rolled out, rolled it out in various stages, but that was a good example of what I work on. There's other teams at CarMax. So I work on, this is the appraisal lane team. So we're thinking about how do we buy cars from like any people on this call. There's also a team that buys cars from auctions, like, you know, like Mannheim auctions and stuff like that. And then there's also a team that thinks about what we, what we price the car in the lot, right? Like when we put a car in the lot to sell, like what is that price? So there's multiple teams that are thinking about pricing. There's obviously a lot of other data science teams at CarMax, but I am focused on the team that thinks about car prices.

Communication tools and team practices

Cool. Thank you for the context. Frank, I see you asked the question a little bit earlier. Do you want to jump in? Yeah, I'd love to jump in. So like I mentioned, I'm still thinking about part of our conversation from last week where we started talking about communication tools. And Marcus, I'm curious when we think about communicating with your team, with your peers, with your stakeholders, everyone uses Zoom. We're on Zoom right now. Everyone uses email. There's also Slack, but I'm curious, is there anything else, any other tools that you use that you've tried to manage your time, measure, like manage how you talk to people, communicate with the folks that you work with? Absolutely. That's a great question. We actually ended up using Teams at, which is, which is Microsoft's kind of Slack, Zoom thing. You know, it works. And I kind of like Slack more for some things, like Zoom more for other things, but, you know, Teams is what it is.

But, you know, I would say at the beginning of the pandemic, you know, obviously at the beginning we were all in person. So a lot of in-person meetings, a lot of in-person conversations, a lot of, you know, a lot of going to the third floor and having to find a meeting room at the beginning of like the online appraisal project. Now it's mostly Teams. We did start using, and we let me borrow this from like our IT, our technology partners when we were working on the online, on the online product and the online appraisal product was, what's it called? LeanKit. We started using LeanKit as a, which is a tool to kind of track tasks and kind of have these lanes of tasks. And you can make a card and assign people to it and kind of move it from, you know, the backlog to, to working now, to finish as planned or discarded or whatever. So LeanKit is a tool that my team uses, uses a lot. We didn't when I started.

So it keeps kind of, it kind of helps us like, you know, one, it helps you, hey, I think we should, you know, redo this table in a better way or whatever. We'll make a card for it and you'll, and you can kind of help remind yourself to come back and look at it later. That's kind of communication. We also started doing like, at least for my team, and this is something that I don't know if it was my idea or not, but we did something we did when I was a DOE. When I was at the Department of Energy, we would have a morning tag up, like with the leadership would all get together every day, every morning at like 8.30, which was miserable, 8.30 every day. And I would always be a few minutes late. And we would talk, everyone would talk about what they had going on that day. And I remember bringing that idea to my boss. And I don't know if it was inspired by that or just partially related to that, but we started doing that at CarMax. And I think most teams have like a daily tag up where you kind of, and each one is kind of structured differently. One day we'll look at LeanKit. One day we'll talk about, you know, we'll review kind of open topics. One day we'll kind of look at this other kind of structure we use. We're kind of talking about the week's agenda and stuff. So kind of that daily touch-up.

It's getting a little trickier as we start. I think everyone on my team is still like East Coast, but I think other teams are starting to get some West Coast folks and 9 a.m. standards don't work if you've got someone on the West Coast, right? So I think one team has moved their stand up to like, you know, later in the day. But that seems to work. But I think it's mainly that. Ironically, we don't send a lot of emails. Like we get a lot of automated emails, like telling us like reports and like graphs and like, you know, this model retrained and stuff. But we don't, we don't actually, I don't get a whole ton of emails like from my team. Most of everything is done in teams for better or for worse. Different channels, channels for support when things are breaking, channels for systems, channels for like just my team and the R&D. That seems to be the main thing.

Unit testing practices

Hey Daniel, do you want to jump in? Sure, yeah. Hey everybody, hey Marcos, I really enjoyed that conversation so far. I'm just interested in hearing a little bit more about unit testing that you guys roll into your work, you know, I think it's one of those parts of pipelining that is very like human driven, you know, like what unit tests are you thinking about, like how are you including them to your workflows, you know, what's important for the kind of work that the organization is doing and just kind of interested to hear more about, you know, what are some standard unit tests that you're working on, your team's working on, what are some non-standard unit tests, like where you roll them in, how much time you spend on them, that kind of thing.

That's a great question. Um, so as I was arriving, the reason why my team, my bigger team, the core pricing systems team was talking about unit tests so much when I got there is because they had kind of made a decision to make sure they had a hundred percent coverage of all their of our entire code base. And they weren't, because I think we had just migrated to the cloud. This is another thing that happened right before I got there. Like everything had been running on-prem and then we moved everything to running in Azure. And so I think as part of that transition, I think our unit test coverage had plummeted or maybe it was never high and it got lower. So there was a huge effort where everyone was kind of just grabbing different pieces of code and writing unit tests to cover all, to cover all the lines for people that know what coverage is, just in case you don't know, like you can basically run a coverage report and like a test that you run, like four tests and the test will like run this method and run that method and run this, you know, function or whatever. And then the coverage report will tell you, well, you never, this line has never been run, right? Like you never run this line or this conditional has never been flipped because you never, this thing was never, you know, the price was never less than 10,000 or whatever.

Um, so there was a huge effort to get full coverage, which I was completely clueless about because I didn't know what unit test was. So I get in there and everyone was telling me unit tests all the time and describing how I got this thing to crash. And I got this exception to be caught and et cetera. So basically our goal right now is to kind of basically have this giant, one unique thing that I should point out is that, what is, I think unique to the team I'm on in, in, um, at CarMax, unique to this team and CarMax and probably unique in general is that we are very vertically integrated. Like the team, like the code base that I can touch is, can have code that basically is wrapping around machine learning, but it can also have, it also has the code that is like running 24 seven in the cloud that is like hitting Azure Service Bus, right? So we have like this, the deployment of the algorithm, we get really close to the deployment. We don't own like the cluster, like we have our technology partners that like own, like, you know, the actual Azure pods that are running the stuff, but like the code that we run on that thing we own, right. Which is very unusual. I think a lot of data scientists kind of ride the machine learning thing and then kind of throw it over the wall to some production team. And it's not like that with us.

So because of that, we, plus we have this production code, which we call like our cluster code. And we have like the code that wraps our machine learning models, which we call components. Like, uh, we have, we kind of have all that code and we can basically run unit tests to try and cover every single thing. Right. So they tend to be, tend to be focused around like, uh, at the, you know, um, we also have like wrappers for Mongo tables, like that we have like a wrapper that we've written around Mongo. So all these little pieces of code, basically the goal is to kind of just make sure that every line is covered. There's been some debate internally about, well, it's a goal to test, like, you know, we don't have a lot of tests that like run through the entire pipeline. Like we don't have like integration tested in the unit test. And there's been a lot of debate. Should we add things that are actually sending the ping pong ball all the way down from, you know, the raw request all the way to going back? Or should we just kind of be testing everything a little bits? I think we're more testing everything a little bits, but, um, um, but that is our goal.

Bottom-up vs. top-down project decisions

Sure. Um, so as somebody who's used CarMax before, by the way, I'm always kind of, um, the light bulb goes on when I realized that some of the things that I've used every day are actually machine learning or an algorithm in the background, right? You're like, oh my God. Yeah. That's, that's what we do in play and in practice. So that's really cool to hear that verified. Right. Um, but, uh, my question was about, so with data projects in general, right. I mean, you've launched some pretty big projects. I mean, as an organization CarMax has, right. You've talked about a few of them. Um, what does the, and I'm interested from anybody on the call, by the way, if you want to throw your, your organization's kind of MO in the chat, but do you feel that, uh, a project kind of gets delved down to you from the management team or does your team or somebody maybe from one of the other data science teams kind of start to bring projects forward, they get looked at by management and then kind of get the go ahead back. Do you see what I'm saying? Is it kind of bottom up or top down?

Yeah, exactly. I think that the goal, I think the goal to do, like we want to, we want to be able to appraise cars on the website was a, I don't know how high up it was, but that was a top down. That was something that came to us. Like we wanted, we want to figure out a way to do this, to do this. And we're, so it was a task given to us, but I think there are smaller things where it is kind of bottom up. Okay. I discovered this cool, I discovered this cool thing. This is cool opportunity. You know, I spice the data this way and you know, this bucket of cars is doing this interesting thing. So let's change, let's change how we do this algorithm. Let's change how we do this, whatever. I can't get too specific about it, but there's definitely things where, you know, people are doing discovery come up with something, come, come up with a new way to do something. Maybe it's faster, maybe it's just more efficient. And then that gets pitched up and it's like, yeah, this is better. Let's just do it. Right. So it definitely happens both ways. I think, I think the bottom up tends to be smaller components of a thing. Whereas like a huge, like idea, let's say let's though you could argue that the origin of CPS, which is like, let's try and use algorithms to help price cars that did start bottom up.

I'm in a position where there is no projects. So I'm trying to figure out where do I start? Like management start brainstorming, or do I start bringing the data science team along that I'm brand new, like shiny assembled to start to propose projects to the management and start, where do I start that? But I guess it's, it's anybody's guess, right? You just kind of throw spaghetti at the wall and see what sticks.

So I want to answer, I want to answer, absolutely. I want to answer Libby's question, which I'm seeing before I forget, which is about sharing practices. And we actually just started using a stack enterprise, whatever it was, stack overflow software, but private, whatever it's called. I think it's called stack enterprise Libby. And that's been, we had a wiki that was based on GitHub, you know, our on-prem GitHub, and that never seemed to work very well. So now we've kind of moved to this stack enterprise thing, which we use both CarMax. I think it's used both CarMax wide. And then we have one just for our CPS team. And that's been really, really pushing to like, every time someone asks a question in teams, like, why don't you make a, why don't you make a stack, answer that on stack. And so we've been trying to move that way so that people can, can share, but it tends to be more technical things like, Oh, how do I do this? You know, how do I install this SQL client? But sometimes it is, it is more like, what are these codes mean? Or what, how do we, you know, so we're trying to do that. It's not really best practices yet, but it's, it is, it is aware as a knowledge repository that we're trying to use.

Knowledge sharing with Databytes

I love it. That's fantastic. I've kind of done that in other jobs where I've gotten in and I realized people don't have a way to talk to each other and teach each other. So I've set up like SharePoint messaging boards, where if it's after hours and you're doing something you can ask and somebody else can answer you. It doesn't have to be me, the coach like during office hours, helping you, you can help each other. Right. I was kind of wondering as an addition to that, if you had anything that was like knowledge sharing, sharing sessions, like maybe I don't work in the same area as you, here's what I'm working on. So you can get other people's brains. We absolutely do. And that's, as soon as you started saying that, I realized I forgot another thing we do, which is this thing called Databytes, which is our little, like, it's kind of like a lunch and learn sort of thing. You know, we kind of, it was going for a while and then we kind of, I think we paused it over the summer, but it's back. And this is a chance. So we had told, I mentioned, we have these multiple teams within pricing systems, and this is a chance for us to hear what other people have been working on, like what tools that they use, what things that they discover. So like, I think I presented a history, like it wasn't really, I think I wasn't super technical, but I presented a little history of the online appraisal product that we just talked about a while ago. Other teams will talk about their algorithms, the new versions of the algorithms they're working on and the new systems they're working on. And this is a way for us, because why I mentioned the collaboration is very deep, like inside the team. And every now and then you kind of touch like the other appraisal, the other pricing system teams, like there is kind of a, huh, I know they're working on this thing, but I don't know how it works. And so the goal of Databytes is to like, yeah, get into the some details, like, well, this is how we set it up. This is how we're correcting it. This is the kinds of machine learning models we're using. This is the data we trained it on. Why didn't you do, why didn't you try this? Why didn't you use this much data? You know, that the whole point is to kind of have an open discussion about this sort of thing.

Mitigating risk in car pricing algorithms

So you said that you build algorithms to use models for car pricing. How will you use these models for car pricing? When you implement it, how do you mitigate the possible risk it brings to the business decision? Because we know that car owners, they have similar models for car pricing, and it didn't work well. I mispredict a lot. They had to lay off thousands of employees earlier. And earlier at Zillow, they have similar algorithms. They predict a possible price for houses. And they buy a lot of houses, and it didn't work either. Right. That's a great question. There are some risks for business decisions. How do you mitigate these risks?

Those are great questions. How can I answer this? Because the one thing I'm not stuck about are the details of our algorithms. All I can say is just look at our success, look at our quarterly reports. I think we reported in our last SEC filing that we bought 300,000 cars online or something. We make money per car. I don't want to say our algorithms are better. But I think we obviously monitor how we're doing. We can monitor. We know what we bought the car for. We know what it ultimately sells for. We can cut that by a bunch of different segments. And so there's definitely a very regular process by which we're looking at the performance of the company as a whole, our algorithms as a whole, and really monitoring things very carefully. But if you just look at the stock price, Wall Street is going to do what it does. But if you just look at how we've done in terms of how much money we're making per car, we've been profitable throughout this entire... As far as I know, maybe I shouldn't... It's not stock advice. But if you just read our reports, then we seem to be managing that risk very well.

Obviously, it's been a crazy time. Cars are appreciated and depreciated. I mean, you haven't seen all the headlines about car prices. It's been a crazy three years. Basically, since we started this online appraisal product, nothing has been normal about car prices, and yet somehow we've made it work.

Balancing project management and technical work

I'm trying to keep up with the chat. I think Allen asked a question about my time, program level versus people leader. I would say, and I have made a point to try and, and I think I mentioned this in a previous Hangout in the chat, that I feel like finding time to focus on, to head down, actually doing coding or discovery, whatever. I want to keep that time. So I started blocking out certain chunks of my calendar to make sure that nothing gets scheduled in those chunks so that I can head down, focus on doing some discovery or maybe planning with a new algorithm or looking at trends or whatever. So I try and do that. But I think, obviously, we have to have meetings with my team about what we're trying to do, what is our goal. I mentioned that we have a daily tag up. I mentioned we have what we call the MLOps, I'm on the MLOps team. So there's meetings. I do do interviews. I do first round interviews. So those get scheduled, those get dropped onto my calendar. But I think I probably, it's probably like 50%, like, you know, kind of all that service and stuff. And then maybe a little, it depends on the week, obviously. And then 50% where I can actually like focus on either writing code or trying to find a, you know, solve a bug or what have you. So kind of more technical stuff.

Career transition advice

Hi, thanks for the talk. It was really, really great inspiration there. I'm going to rewind a little bit to the very beginning and sort of ask, as somebody who sees themselves in a similar, keeping options open in terms of career, what was the most effective asset that you had that led to you being able to transition to your current position?

Right. That's a great question. I mean, I think probably two things. One is like that, that, that interest or ability or willingness to kind of like teach, just try and teach myself stuff. Like, like I said, I, in graduate school, I'd done a ton of data analysis and stuff, but like it was all in this obsolete language that wasn't going to help me. Right. So I had to teach myself Python and then I had to kind of teach myself R and then eventually teach myself more Python. So, I mean, I think just the openness to trying that and kind of, I gave myself a project, you know, to kind of figure out how to, how to use Python and understand classes and object-oriented programming, which I did not understand 10 years ago. But then too, I do think that my work experience, because I had, you know, I had done a lot of non-technical stuff. I do think that probably helped. I, I don't really know how that was all balanced when I, you know, when I interviewed, but I did feel like I had this like record of like professional accomplishment that maybe it wasn't technical. But obviously I, you know, I had a, people knew I could write, people knew I could think, people knew I could read, you know, I had this, I had this track record. So I think, I think like, you know, success, even in my previous jobs, even if they weren't data science jobs, probably, I like, I think also helped.

One is like that, that, that interest or ability or willingness to kind of like teach, just try and teach myself stuff. Like, like I said, I, in graduate school, I'd done a ton of data analysis and stuff, but like it was all in this obsolete language that wasn't going to help me. Right. So I had to teach myself Python and then I had to kind of teach myself R and then eventually teach myself more Python. So, I mean, I think just the openness to trying that and kind of, I gave myself a project, you know, to kind of figure out how to, how to use Python and understand classes and object-oriented programming, which I did not understand 10 years ago.

Okay. Yeah. Thanks. And the second part of that is sort of a more like, I guess, personal passion, motivation thing is how do you go from, presumably you don't get a PhD in astrophysics unless you actually like, you know, that, like stars or doing math about stars. How do you, how did you translate, do the mental folding to translate that from, you know, you spent many years doing this to, you know, a different position, you know? Right. That's a great question. That's probably, my dad would like to know the answer to that. Cause he's like, I thought you wanted to be an astronomer. What are you doing? You know, the first, the first step was not to data science, of course, it was to the science policy world. And I think I had always had this interest in politics and the government. And, and so that first transition came because someone, when I was at Rice, they have these talks like once a month about non-academic careers and someone who had done science policy and worked at like the National Academies of Science and had worked as a congressional science fellow came and gave a talk about what she had done with her, you know, physics or astrophysics degree. And that really fascinated me. This guy could kind of combine these two things.

And as I went through my postdoc, like I just found my brain not really focused. I mean, I kind of enjoyed my research and I was kind of enjoying my wrapping up what I'd done my PhD on, but I wasn't super motivated by like the new stuff I was working on as a postdoc. So it was relatively, so even though I was a lot of astronomy and, you know, very happy and I met a ton of great people in grad school who are my, some of my friends today. You know, it became, it was pretty easy choice to try and kind of pivot to this science policy world. And then from science policy, data science was more practical, right? Like, you know, I needed a more, once you get to the upper echelons of the science policy world, there's just a number of jobs and openings become fewer and fewer. There's a ton of jobs kind of sort of at the entry level for science policy. But as you start to work your way up, I was, I was running out of things to do. I was trying to become the chief of staff at a university. It's like two openings of that a year. So like, you know, it was just, it became more practical, right? And then it turns out when I did it, I was sort of, I started itching that part of my brain again. I really enjoyed it. Like, you know, I enjoyed the incubator. I enjoyed trying to, you know, do the swirl lessons in R. I enjoyed building. I built like some, some tiny apps and it was like very satisfying. So I found that like, once I got back into doing technical stuff, like I find it, I found that scratching that itch was still very satisfying.

Yeah. Thanks. I really appreciate that. I'm kind of in the same position where I'm in academic, you know, clinical research and it's like this pyramid or it's like, yeah, they kind of get a little stuck. And yeah, it's really, it's really nice to be reminded that you can always, life is, you know, long journey and you can always switch courses and it's, you don't always need to be looking at other people and being like, Oh, they did this by age 22 or like, you know, they've been doing, you know, it's like the straight trajectory to some, to success. I mean, you could argue that CarMax is my first job because it didn't have an expiration date. Cause all my previous jobs had some kind of like, they were fellowships. So they were right. So you could definitely argue this is my first job. It doesn't have a built-in expiration date like my previous positions did.

What skills to highlight when applying for data science roles

Yeah. I mean, I think I, I mean, so I was in this position where I felt like my best selling point was this. Well, I've been, you know, I'm trying to think, think back. So I did go through the data incubator program, which kind of for better or for worse, like when I was a fellow, this is like three years ago, I don't know how they've changed their business model exactly, but like, I just basically didn't pay anything to do the data incubator, but I had to go try and get placed with their hiring partners was kind of the way that worked. And so, and I was technically, I was, you know, I think it was only supposed to like only supposed to interview with their hiring partners for a while. So, so I had this kind of like subset of like employers I could work for, I could work with and they'd help me with my resume and stuff. But the, and the end result was, I felt like my, my, my, my, what made me a little different than everybody else was the fact that I was, you know, had, I wasn't a lot of my colleagues, they were fresh out of grad school, right. I had worked for 10 years. So I felt like that, that background of how did I, how did I phrase it? I'm working, you know, in a high pressure and professional environment, you know, in a tight deadlines, you know, that, that, that work experience was still relevant, even if it wasn't technical, right. So I tried to sell that.

And I remember talking to the founder of TDI who came in for lunch with us and he told me to try and try and talk up that aspect of things. But I also had sort of, you know, yes, I was new to being data science. I was new, you know, I recently learned R and recently got this bootcamp, but I did, had done a very technical PhD, right. So there's clearly, you know, data and numbers, statistics were not something that like people were new to me, right. I mean, I had a bachelor's in astronomy, I had a PhD in astronomy. I think that helped too. You know, I can't, you know, I did a bunch of interviews, like when I came to TDI, I don't really remember how they all went. They were all different. You know, CarMax's was not super, it was more about critical thinking skills and like the kind of questions I got asked were not like, you know, co-challenges, whatever. They were like thinking through kind of thinking through more of like a, more like a consulting sort of thing, like a case kind of interview. And so, but I was mainly trying to say, hey, I did a bunch of data analysis. I have all this experience from my PhD. I have all this work experience where I was working in like, you know, on type deadlines, you know, working for important people, you know, doing like instituting stuff, right. And now I've kind of refreshed on my technical skills and like got to this data incubator and I'm ready to go, right. That's kind of how I package myself.

Staying current and tools for the year ahead

That's a great question. You know, it's, it's funny, I, I've started following more people on, on Twitter. I don't really have a great, my Twitter presence is super chaotic. Like I have like a bunch of accounts and like my main account, I'm kind of anonymous on because I don't want people to know what I say. But like, I, though I have a world, I have a world bot, right? I made a world bot like a while ago that just, it just tries to solve portals. Right. And it has like 40 followers. And, but, but it's, but it's links to my GitHub. So like everyone knows that's me. So I started following like a bunch of a bunch of like, I follow you on that from that account, Rachel and like other folks. And I started trying to follow more people who tweet about data science and stuff. And that actually has been kind of interesting. I think I found people from, from the conference, from your conference, from the RStudio conference. And, you know, I heard people talking about Duck TV and I started playing with Duck TV and how it works with Parquet files. I'm like, oh, this is kind of neat. So that has been like lately, it's been like just kind of seeing people that people retweet. I started following the Quarto pub, but there's an account that retweets every tweet that mentions Quarto.