
The changing landscape of data science | Kanchana Padmanabhan | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We'd love to see you! We were recently joined by Kanchana Padmanabhan, Director of Data and AI at Homebase, to chat about data science team structures, the role of math in understanding LLMs, building effective hackathons, and communicating model insights to stakeholders. In this Hangout, we explore the importance of understanding the probabilistic nature of LLMs and how that understanding should influence how data scientists approach their work. We also discussed how to structure a hackathon to encourage learning, centering on the customer problem, and collaboration between technical teams and business stakeholders. Resources mentioned in the video and zoom chat: List of R Conferences for 2025 → https://rworks.dev/posts/r-conferences-2025/ Posit Conference Call for Talks → https://posit.co/blog/speak-at-posit-conf-2025/ Julia Silge's workflow demo on model cards → https://www.linkedin.com/posts/posit-software_join-us-for-a-live-workflow-demo-on-creating-activity-7287998741557522432-jseQ?utm_source=share&utm_medium=member_desktop Shiny Assistant Gallery → https://gallery.shinyapps.io/assistant Data Science Hangout Playlist → https://www.youtube.com/playlist?list=PL9HYL-VRX0oTu3bUoyYknD-vpR7Uq6bsR Add Posit Team End-to-End Workflows to calendar → https://evt.to/aoimiohuw Making of a Manager Book → https://www.amazon.com/Making-Manager-What-Everyone-Looks/dp/0735219567 If you didn’t join live, one great discussion you missed from the zoom chat was about how to gain domain knowledge for a new industry, where attendees shared their experiences and advice. Let us know below if you’d like to hear more about this topic! ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us!
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome back to Data Science Hangout, everybody. If we haven't met before, I'm Libby. I am a community manager here at Posit, helping to enrich our beautiful, wonderful Data Science Hangout community. I am also a Posit Academy mentor, where I help people learn R and Python to do better work with data in their everyday job. We are so happy to have you joining us today, and if you have not been here before and you're not familiar with our format, it's an open space to hear what's going on in the world of data across all different industries.
This is where we chat about data science leadership, we connect with other people in our spaces who are facing similar things as we are, we learn about other industries, and we get together every Thursday, almost every Thursday, same time, same place here on Zoom. So we hope that you have this added to your calendar. If you are watching this recording sometime in the future and you want to join us live, there's going to be details below in the description box on how to add this to your calendar.
Thank you so much to everybody who has made this the friendly and welcoming space that it is today, that it has been over the last few years. We are all dedicated to keeping it that way, so if you have any feedback about your experience that you'd like to share with us anonymously, good, bad, whatever, maybe even suggestions for topics that we could cover or people that we could have on, we are going to share a Google Doc or a Google Form in the chat, and that will allow you to give us your anonymous feedback, but you could also find Rachel and I on LinkedIn and let us know there or leave comments on any of our posts, of course.
We learned, Libby, that we can automatically launch a survey right at the end of the Hangout, too, and then that way we can know which specific Hangout it was associated with as well. So thank you to everybody who shared feedback in that Google Form before, but now you'll see when you exit out of the Zoom, there'll be a pop-up survey there, too.
So we really encourage you to connect with other people in the Hangout. This is a place for us all to get together and the chat is your space to have a party, have fun. We really recommend that you introduce yourself. What do you do? Where are you? What do you like to do for fun? Leave a link to your LinkedIn so other people can find you after our chat goes away.
There are three ways today to jump in and ask questions or share your experience. So we are going to be having a group discussion and that doesn't happen without everybody asking questions. So you can put your question in the chat. If you can't talk today, if you don't have a mic, or maybe you're in a very loud place, you can just put an asterisk somewhere in your question, we'll ask it for you. You can ask anonymously on Slido. You can also raise your hand here on Zoom and we will call on you to jump in, maybe if you have a follow-up or something to the current conversation.
Introducing Kanchana Padmanabhan
Kanchana Padmanabhan, Director of Data and AI at Homebase. Kanchana, could you tell us a little bit about yourself, what you do, maybe what Homebase is, and a little bit about what you like to do outside of work? Sure. Thank you, Libby and Rachel for having me. I think the participant counted 121 and my nervousness is slowly increasing. So hi, I'm Kanchana. As Libby said, I'm heading Data and AI at a startup called Homebase. Homebase is a company that builds software for small businesses. So we build scheduling, payroll, hiring, timesheets, and a lot of other features.
My background is basically data. I've been in the data space since I graduated many years ago from NC State. And then I've been through different industries. I started in social media when Twitter was still called Twitter, when we used to ingest all the firehose and build products off of that. Then I went to retail for a bit and supply chain, then healthcare, and then I'm here.
I think I said that to Libby when I started and Rachel is that they had some questions around data science and how it's evolving and changing. And I basically said that I have a belief system. It seems to be updating over the years that I've been in this space. So I may share opinions now, which are really opinions, which what I think about right now. And I may get more information in three months that may alter what I, you know, what I believe. But I think that's true for all of us. I think we're all still learning how the space is changing.
I absolutely love music. So I learned to play the keyboard. And I also learned vocals. So I try to practice both. And I also have two kids. I also teach part-time at the university. I actually teach at all three universities in downtown. So I teach at the University of Toronto, the Queens University, as well as the TMU. I teach at the business schools in University of Toronto, which is Rotman and Smith School of Business in their data science.
Data science team structure at Homebase
So we work on both what we call customer facing or external ML features. So my other teams I own are data engineering, data platform, data science and ML platform. The problems that we solve are both for customers as well as for our internal customers, as I like to call it. We have ML features in scheduling, for example, predictive scheduling, like using the data, like building out an optimization engine that can optimize a schedule with various constraints, as an example.
We also have timesheets and around how to optimize their timesheets and how to optimize their hourly billing and how that works. We also build models that are lots of risk models. So we have a lot of financial products, so we have an entire space around risk modeling that we build. We also build more for our internal customers, so we build recommendation engines for our marketing team, so they could target appropriately. We build propensity models. More recently, LLMs. I guess everybody's using LLMs now, so LLM type models. So just a mix of different types of models.
Defining data roles
So when I started in social media when Twitter was still called Twitter — Kylie, you had a question about the difference between data roles. So for a data engineer, it's at least, I have two data people, so data platform and data engineering. So they basically work on making sure that all of our data within our product, as well as external vendor data, all of that is collated, brought into one system, and then organized and ready to be used either for analytics, for reporting.
The platform team takes care of the infrastructure, the ingestion, and also sending data externally. The data engineering team is responsible for all the transformations, like as the data comes in, the monitoring, the validations. For data science versus data platform, then my data science team is primarily involved in, I would like to think building data-driven solutions. So it could just be as simple as figuring out a bunch of rules that need to go in to solve a business problem, or it could be all the way to predictive models, hooking up an LLM, anything in that spectrum.
And the ML platform team are the ones that provide the tooling. So making sure that real-time pipelines can be built in a repeatable way, or batch inference can happen in a repeatable way, or the data scientists can build their prototype and they can package it up in either a wheel or some way, and then can be deployed very easily and gotten into production. So the time that it takes from prototype to production keeps getting shorter. At least that's the goal is to build all the tooling that can enable that.
Between analytics and data science, I feel is always a bit of a gray area because in the company that I'm in right now, and the one company that I was in previously as well, analytics was very heavily involved with business. So they were involved in a lot of business reporting. They were involved in a lot of looking at metrics with the business to kind of pull data that would support any of their campaigns and any of the targeting that they did, or any of the business objectives.
How much math do you need for LLMs?
I think it's important. I think that you need enough math to understand that these LLMs are basically probabilistic machines and that they are not going to, they're not sentient or they're not going to understand what you're saying. I think I've seen a lot of people not even get, I mean, not getting there. So I think if that level of understanding that I think it's a great start, because I was in a demo just before this and it's kind of like very interesting company, but they call the LLM a name and refer to it as she, and says she has knowledge and she, and it's just, you know, I can't get beyond that to actually enjoy the demo.
So I think if you have the understanding of just, just one level of like how these transformers work, right? Not, not every math, not how they are trained, not even like the exact algorithm, but just understanding what they do from the perspective of, oh, it's just, it's a weighted average, right? You're learning the weights and then you're producing some vectors and then these vectors are thrown into an N dimensional space. And then you're predicting based on what's close to each other. So I think we have that level of understanding, you at least know, and you're not surprised by the answers the model is producing.
So I think if you have the understanding of just, just one level of like how these transformers work, right? Not, not every math, not how they are trained, not even like the exact algorithm, but just understanding what they do from the perspective of, oh, it's just, it's a weighted average, right? You're learning the weights and then you're producing some vectors and then these vectors are thrown into an N dimensional space. And then you're predicting based on what's close to each other.
For my own, I think company, I think one of the things I've had to do is two things. One is that my team is a mix of what you call data scientists. You know what I would call more traditional, like the ones we were like, we built predictive models. We understood data, we did exploratory analysis. And a few people who are now, who have done research in LLMs, who do understand the deep parts of the math behind it. So once there's a cross-pollination, like together they're learning, you know, some of them come from research, they're not very used to how to work in a product environment, how to build product features, but they know how, they understand the math and the research and then vice versa, right?
I've also had to like set up what I'm calling the center of excellence, because with ChatGPT, I think you technically don't even need a data scientist to run these prompts. Like everybody in my company is running stuff using prompts and building assistance as they call it for different purposes. So I've really had to set up the center of excellence around like teaching and teaching how these things work, even being technical, talking about evaluations, talking about hallucinations, what they could mean. And so it's become a constant like teaching, providing guardrails, you know, making sure people are thinking about the right things.
Building a learning-centered hackathon
Yes. So this happened in November. So we typically have two hackathons a year, and I kind of hijacked this one for this purpose. So what I really wanted was there was a growing interest in the company with LLMs and different people were excited to try things. We had set up an internal community called the AI Builders Community. It's a Slack channel where people who are building with this could come and ask questions and we would support. But the one thing I realized was, I think, as Abigail mentioned, that there was no context to how these models were working.
And the other thing that I was noticing was people, I think it's happened to us many times, is we get so enamored by the technology, we forget what we're solving. So it's like at the end of the day, we're solving user problems, we're solving some problem for ourselves or we're solving some problem for our customers. And so you want to center it around that when you're solving, even with an LLM, because it doesn't matter if it's an LLM or XGBoost or the other, it doesn't matter, as long as it's solving a real problem.
So what we did was we kind of did a few things. One, we said that part of the hackathon would be learning. And one learning was obviously the technical side of things. We went through how LLMs work, we went through how to evaluate them, how to think about evaluations. I also had one section on security and what these APIs are, what data should you throw in, what should you not, and how should that work. And then another piece of it was that I also invited my head of design and her group to talk about customer centricity. What does it mean? How to do a mind mapping, how to think about a customer problem and how to center yourselves on the customer problem.
And one other piece that we did was we gave each team a mentor, an executive mentor, like from our VP, CTO, VP level and up, so that they could kind of see what these teams are building and they could guide the team from the perspective of like, you know, will this solution be useful? Also at the end of the hackathon, I organized these office hours with my principal and staff engineers so that people could come and just vet their solutions. So one, the engineers have a sense of like, what's going to come downstream? Like what's going to come two months from now?
LLM costs and long-term strategy
So for LLMs, I think there's two parts. One is just, you know, evaluations and, you know, actually putting your evaluation set together, actually getting your metrics in place, doing qualitative and quantitative validations, kind of knowing where the boundaries of your, you know, of your, you know, failures are like, where are you going to fail. You know, having to actually collect manual label data. People think label data has gone away. It hasn't. Like you still have to evaluate the model, even though you don't need the millions of data points to train it, you still need some set to like validate it.
I think from a cost perspective, I think I was definitely nervous when everybody got excited about LLMs. My first instinct was also that, oh my God, cost is going to go through the roof. And what I realized was I was the person who needed to solve for it. And so what we do is, yes, these OpenAI endpoints, you know, as you scale or any other like Anthropic or any other endpoints that you use can get expensive over time, definitely. But I think what they help you do is kind of evaluate your use case really fast, right?
And so the key thing with AI, and I think it's always been true with ML, but I think with LLMs it's also true is the iterations are important. Like how fast can you ship a version and how fast can you get feedback from it? And these LLM endpoints from these other companies really enable that. They make it really easy to do that.
I think one of the ways that I have built out or we continue to build out these kinds of endpoints is the product, well, the product feature, we're the ML platform team. We build, you know, just very common patterns. We build an API. So we kind of encapsulate. So the product doesn't hit OpenAI directly. We have an encapsulation layer. They hit it through us. We hit OpenAI. And I've already started, we've started doing experiments on small language models. Like we start training on our own data and then getting more specific models in-house.
And a bigger part of our discussions always doesn't even need an LLM. Can we replace it with a much simpler model? And for many use cases, it could very well be the case. An LLM is a short answer, but it doesn't have to be the long-term strategy. So we may not even go with an LLM. Six months down, we may replace it with a bunch of simpler models.
Explainability versus accuracy
For the models that were more, I think, risk-based, things that were determining if someone would, we have a product called Cashouts. So we have models that, you know, that internally we have risk assessment of like, you know, when people are drawing cashouts, you obviously want to be, there's a responsibility angle to it, right? Like even if the business may not think about it as a data scientist, we have a code of ethics. We write up for data science, we have a conference page that says, we have a code of ethics that says do no harm, like make sure that you're not biased and you know, all of that.
So we try to make those models as much as possible explainable. And the one thing we do is we don't reject anybody, if you know what I'm saying, like everybody gets a minimum of something, right? Like there's no, like the model will not make a decision in the negative. And we make sure that when there's a negative decision, that it, a minimum is given and there is a review. There's something else happens beyond that. It's, it's never a model making a decision.
Advocating for better analytics tools
I've definitely, even this year, had to get buy-in for revamping our entire data architecture because it was in a place where our pipelines are failing and, you know, stakeholders are not getting the data on time. And so the discussion became like, oh, we need to re-architect things. We need to like reorganize our data. We want to move to using more Databricks. So a lot of it has to really do with just putting into the context of business value. I know it sounds cliched, but that's really what helps is talking about like how many hours are, is my team spending on these types of problems?
It's really making the case around like time spent, how much money is a data engineer making? How much money are we spending on this? How many failures are we having? How many times have we delayed things for marketing? So it's really putting the case together with all these things and saying, oh, by the way, this is a proposed solution and this is how easily you can onboard. This is how long it'll take.
And one thing that's really helped me with, especially new analytics tools, because I am trying some of these new LLM based tools. It's just more of scoping it out as a POC, like scoping it out as, you know, we can try it. It'll just cost you maybe 10,000, 20,000, whatever it is. You can scope out a POC and then go from there. And if it's useful, and if you can have the right people testing it out in that phase, that also helps you get buy-in.
Transitioning into management
I think I always tell my boss, I'm one of those happy managers. I wanted, it's a calling, I enjoy it. It's what I thought I wanted to do. I think even when I was in grad school, we were a team of 15 PhD students in my group, 15 men, me alone. It was just naturally something that I picked up, supporting people with their projects, just collaborating, making sure things were going on track because research can get sometimes very lonely and you're on your own trying to figure something out.
When I went into work, I found myself in a similar space of always caring about the bigger picture, always caring about what's going on holistically on the team. I was never like, assign me a ticket, I'll do it. I was never that person. I was always like, okay, I'll do my ticket. Also, what's that person doing? What's that person doing? How are we fitting all the pieces together? What output, impact are we creating as a team?
When my manager asked me at some point, he had to decide. He was like, hey, I have this position. Do you want to be a manager? Do you want to be an IC? You could be both. It's your choice. I was like, yeah, management seems more my calling. Although the one thing I would say is management, and I feel a lot of people may think that management will take you away from technical stuff. I never stay very far away from technical stuff in general.
Communicating model outputs to stakeholders
For me, that starts with my definition of done that I have for my team, which is like, unless somebody's using it, your model is not done. I think a lot of these are two things. One is that understanding that model building is an iterative process in the sense that you slowly get requirements from our, like, no end user is going to come and tell you, build me this thing. It's always going to be a vague problem that you're trying to, like, distill down into something technical.
For me, that starts with my definition of done that I have for my team, which is like, unless somebody's using it, your model is not done.
So the first step is, like, can you distill it down into something technical and then explain to them what outputs they're going to get? Like, what are you going to get out of this model? How will you use it downstream? And then what I usually recommend to my team is let's put a quick POC, like, let's put one iteration of this model together. It can be slightly rudimentary. We can still have metrics and everything, but let's get our baseline model ready to go and let's get an end-to-end going.
And I think the caveat and the thing, the learning that I've had to work with the team say that, hey, these things are iterative. Like, yes, the first version might not look ideal, but we'll try to solve for the subset of use cases, right? This one subset of things will be solved. Yes, a lot of other things you asked are not solved for, but we can get to it over time, right?
Building domain knowledge
Of course, there's resources available that you can probably learn from. But I think the way you learn is assuming that your college grad wanting to start, you go in and then you spend a lot of time with the experts who are there. That's the thing that I do with my data science team. My analytics team is very embedded, so they're already embedded into different domains. So they just naturally are in the meetings and they learn. But even for my data scientists, I make sure that they are sitting in all these meetings with the team, even if it's not specifically a data science problem that's being discussed.
So I don't know if anybody else has a good answer about like learning it from university itself. I mean, one of the domains I learned was doing research. And I went through my PhD. I just started with not knowing very much about computational biology and knowing a lot. By the end, I finished my PhD. But yeah, definitely, once you're in, definitely spend, just be attached to the hip, as I call it, with any and all domain people you can.
Learning from good and bad managers
So the story is that when I finished grad school, my advisor, as all advisors, had a certain pattern to it. And so when I finished, I was a very nervous person, if you can call it, like very nervous about work and then finishing things on time and answering emails the minute it gets into my inbox and having no delay and like being available on the weekend, even though I was not on call and didn't need to be.
And then I had my very first manager, Eddie Kim, who literally came to me one day and said, OK, you're going to leave your laptop at work on a Friday. He came to me, he said, you're going to leave your laptop at work. And I said, what's going on? And he said, like, I've seen you reply to things and doing things like, why are you doing that? You're doing two things wrong there. One, why are you doing it? And two, you're setting a bad example for the rest of the team. Like, why are you responding to something and you're setting the expectation that everybody else needs to when it's not necessary?
And he made me do that. And I suddenly realized, oh, my God, like this, this is a way to live. Like, I can do this. Like, this is a way that people can be managed. And it's not like I have to be on my toes the whole time. And it really, really like switch something in me where not that, you know, not that I would have gone and done that to some other person, but it's like, I don't think I was even mentally there. He turned my life around. He just flipped a switch for me when I'm like, oh, I can breathe. It's okay.
And have I had bad managers after a hundred percent? Like I have, like, like, like I've had people who expected a certain type of behavior, but I have generally fought against it. And I've always like, like, like always at least at the very least try to protect my team, like, cause you know, and, and make sure that my team still was sane and healthy and, and, you know, manage my own relation with my manager, but try to keep my team protected and safe.
Tech stack
The one rule of thumb I try to follow is keep my stack very boring and very standard. And, and, and because I've been in a place earlier when it was not, and we had Hadoop failures all over the place. But right now our tech stack is Databricks for almost all things ML and data. We do have Redshift for our warehousing, trying to move a bit away from it. We use Airflow for all of our pipelines. We have DBT that helps, you know, that helps do a bit of self-serve with analytics because they can build their models and build their metrics a bit more easy in SQL and they don't have to do it in Python. We have Looker as our dashboarding, would not recommend it. SQL, of course, Python, SQL are the, are the fundamentals. And because Databricks, we use Spark for a lot of our processing.
Well, thank you everybody. And as housekeeping, I will remind everybody that Rachel put in the chat, Julia Silge's workflow demo on the 29th of this month is going to be on model cards for transparent, responsible reporting. Might be a great follow-up to this conversation if anybody is interested. Thank you so much for joining us. We hope you have a wonderful day. We will see you next week, same time, same place. And thank you so much, Kanchana, for joining us. This conversation was amazing.

