Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Posit Data Science Hangout. I'm Libby Heron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I would love to introduce our featured leader today, Brijesh Chejerla at Florida Blue. Brijesh, could you introduce yourself? Tell us a little bit about you and something you like to do for fun.

Hey, everyone. I'm Brijesh Chejerla. I am a data scientist at Blue Cross Blue Shield of Florida. I have a PhD in computer science. And by way of introduction, I'm an enthusiast in many things. Data science happens to be one of the conduits to help prosper my enthusiasm in those many things. Machine learning is at my heart, it's at my core. And I chose to be a data scientist way back in 2016, because after I graduated from my university with a PhD, I was like, what do I do with this?

And I was thinking about where do I go next? Because I was pretty particular about what I would do and which field I would be in. And then I realized being a data scientist, you are a plug and play model. You can be in different fields. You could be in sports, you can be in medicine, you could be in healthcare, you could be in construction, you could be whatever. And then they all kind of fit together. You do very similar stuff. But at the core of that, it's still the machine learning.

So that's kind of the reason why I chose to be a data scientist. And yeah, I'm here. I graduated from being a data scientist to a machine learning engineer, I would say. As for fun, I love to watch football, soccer for American audience. And I spend a lot of time, I would say, going through the analysis and all of those things, you know. So that's kind of my fun activity for the most part. Generally, I love to listen to podcasts. Most of these podcasts these days are football related for me, because I'm in that space right now. But then otherwise, so to speak, I just love to converse with people. You know, that's a fun activity for me.

Motorsports and the transition from academia

I have to ask you about your stint in motorsports, because I know that your sort of like transition into industry from academia had a pit stop. Could you just give us a little bit of info on that?

After I graduated, I was searching for jobs, obviously, you know, and then I just didn't want to work for the sake of working. I was pretty, I wanted to take, spending five, six years in academia, I really wanted to take a break. So I took my own time to find myself a job. And then this motorsports job came along as a shot in the arm on a scape. And I had no intention to even look at motorsports for that matter. I was a, I am still a Formula One fan. I still watch Formula One. But NASCAR was something that was way out of my radar. So I got this job through a connection, as in one of my friends recommended me to this job. And then he connected me with the people who were looking for a data scientist to be hired. And that's how I started working at Hendrick Motorsports. So that was my first foray into motorsports, first foray into NASCAR, everything was very new. I was looking at jobs in the Bay Area. And then I had to move over to Charlotte, North Carolina. And that's how things worked out, you know.

So one thing that most people don't know is that the amount of data that you see on screen, on TV, is not the same kind of data that people get to work with behind the scenes. Because NASCAR is basically stock car racing, right? So not all of the teams have the same amount of funding. So NASCAR as an organization does not allow you to, or it does not give you the data, because they know that people who have, or teams who have more funding will have some benefit over teams who do not have that kind of funding.

So you're restricted with the data that you collect. So you will have to be, you'll have to get very creative with what you have and solution for it, because the models that we built when I was working at Hendrick Motorsports were, it was basically, you know, to be able to predict when the driver should pit, you know, when should they go into the pit stop. And that is dependent on the fuel consumption, and it's dependent on the tire degradation. So we had to build models based on that.

And for motorsports, you have to be extremely accurate to the milliseconds level, right? And at that point of time, you have to think, am I overfitting this model because it's pretty accurate, or is it generalized enough? Because if it's generalized enough, then you may not be giving the race engineer or the race crew chief what he wants. But at the same time, from the data science level, you know, if you don't generalize, you're basically overfitting, which is not a good model. So you'll have to be very creative and you have to be very specific about how you want to go about those things.

Resources for machine learning and sports

I don't think there's anything that is resourceful enough for machine learning and football together. Because at the end of the day, you're applying a solution, a mathematical solution to an existing problem or to a real world use case, which has some data. So it's that connection, which is why I said during my introduction that I wanted to see if being a data scientist, I can plug and play. Can I go into finance? Can I go into more sports? Can I go into healthcare? Can I go into whatever? And will I be doing very similar stuff?

I always say that, you know, keep to the basics of machine learning, keep to the basics of statistics that will serve you no matter what you do, no matter where you go. And then, you know, you can apply that to anything. You could look at different types of regression. Data scientists also do analysis analysis. You have to have very strong statistical background, right? So you have to develop that. You have to understand how to look at this. You have to look at design of experiments. You have to understand that you have to, you know, create hypothesis.

You have to, you have to look at when it comes to the world of data, it's an elephant and everybody is looking at the trunk of an elephant or the leg of an elephant, kind of know that it's supposed to be an elephant. And when you draw it, it generalizes to an elephant, but it's not exactly the same elephant that everybody looks at. That's what data is. So I would just say approach the bottom up. Whereas, you know, your foundational machine learning, your foundational mathematics, your foundational statistics are very strong so that it can be applicable to any data that's given to you.

Choosing an industry and career path

So I share your perspective on the general applicability of data science and that being appealing, but like my background is in mathematics itself, just a master's in pure math. And like I, something I've struggled with has been like, how do you pick an industry? Like in my job applications, I've worried that I've been casting too wide an eye on the net. And I almost wish I had like one specific industry that was just my favorite and I can target. So I just love to hear about your experience. Did you sort of just somewhat accidentally land where you are or was it targeted and how did you choose your targeting?

So the way I started looking at it was two things. One is how much money am I getting? And two, how much of my personal ethics and morals do I have to forego for that kind of money, right? Those were my two parameters. So like any company that you go and work for, any industry that you go and work for, you will have to compromise on certain things because that's the, you know, that's how the world works. So I say, just pick on those two things and think, you know, what your financial status is and, you know, how strongly do you feel about certain things?

The reason I picked motorsports was because there were no ethical or moral dilemmas for that for me. And two, it was something that excited me, right? So that's kind of how I picked that off all of the other options that I had. Coming off a PhD, the first, I was extremely idealistic. So I would say, do not be idealistic. You know, if anything, don't be that. Be more pragmatic about where, you know, and where to work and what to do and stuff like that.

Because the first three to six months, I struggled, because I thought, you're supposed to do it this way. And the business doesn't really care how you do it, as long as you get it done in a certain time frame, and you give them similar outputs that you're expecting, you know. I got myself into it as a data scientist. I grew into being an architect, then I was an admin, you know, I was a Posit admin for some time, for about six years.

I wanted to think if I were my own leader, if you were on your own leader, and you as a developer, would you look up to your leader who has knowledge about most of these things, but then also has a vision, because as leaders, you expect to have a vision, or would you want to just look at yourself as just an independent contributor, where you're very happy doing just model building. So think of those things, and then go both deep and across, you know, both vertical and across. Develop new perspectives, because once you become an architect, or once you not become, once you think like an architect, once you do system design, once you get into data engineering, your perspective about data science will change. I have done that. I have consciously trained myself to think like that.

Develop new perspectives, because once you think like an architect, once you do system design, once you get into data engineering, your perspective about data science will change.

Tools, the Posit platform, and being a reluctant admin

Our tech stack for data science machine learning at Florida Blue is Posit Workbench, Posit Connect. So, we develop in Posit Workbench, we deploy in Posit Connect. I was a reluctant admin to begin with because I didn't know anything about it, honestly speaking, but then I'm glad I got into it because being an admin changes the way you look at how you want to deploy stuff, how you want to make your models available. So, that helps you architect your solutions better. That helps you become a better solutions architect for that matter.

Now, I think we had around 150 developers and about 800 to 1,000 consumers of the content. That's the license that we have for it. And about 150 active developers and increasing. The last I checked was 150. So, what I did was, and then, you know, we kind of used to work in silos and then people, we would get emails back and forth saying, okay, this is not working, that is not working. So, what we did was we just created a Teams channel and said, if you have any questions, post it here. And if somebody else also knows the answer to that, go ahead.

So, what we did was we then have a set of packages that are internally developed to our own needs that we use. And those are specifically used on the Posit platform, you know, like, you know, to connect to databases, to deploy, and to do all of these things. So, that's kind of how it started. And then it's at full maturity right now. We are looking to move from on-prem, our solutions are basically on, sorry, our applications are basically on-prem. And we are now looking to move into cloud.

If I had a DevOps team, I as a developer would just want to wake up in the morning, make my coffee, sit, just turn on my computer and log in and then just get away, you know? So, when I was the admin, that's kind of what I wanted to provide to our users so that they have least amount of friction to do the stuff that they wanted to do. But at the same time, that was a lot of work for me outside of my data science day-to-day work.

So, if you ever want to get into admin, you have to learn Linux. You have to learn how that works. You have to understand how to solution some things from a security perspective, which is, again, a different difficult thing depending on the figure that you're in. So, in healthcare, security is paramount, right? So, you have to be extremely specific. Users don't, especially data scientists and analysts, they aren't necessarily software developers. So, you don't go through the STLC process all the time. So, you'll have to guide them through those things. So, that got me into the security. That got me into being a cyber secure software developer hack. And so, that's kind of how my journey grew.

Yeah, it seems like people want to write a book called The Reluctant Admin as well. But I was wondering, how did you so successfully transfer that IT admin ownership over, especially with so many people being reliant on the tools? Like, what was that transition like?

So, the transition period was not easy, honestly, if I'm being very honest. Posit admin is a very niche-specific admin role. It's not your general systems admin. It's not your general Hadoop admin. It's not your general Linux admin. You need to have the knowledge of all three of them, and you need to know how Posit Workbench works. You need to know how the config files are set up. You need to know how Posit is set up or is expected to be set up. You need to know all of those things, and then you need to know the connect side of things as well.

And being a Posit admin also comes with, hey, I have these R-related questions. Can you help me? I have these Python-related questions. Can you help me? So, I'm not a R person. I've never tried to be an R person. I would just say I'm literate at R. So, any of my R questions, I kind of diverted to my teammates who are experts in R. Any of the Python-related questions, I handle. So, it's a very niche kind of role where you have to be extremely good at many things at the same time.

So, the new admins who have come in, they came from not this kind of background. They weren't R developers. They weren't Python developers. So, it took us a while. So, we kind of developed an internal mechanism where it made it easier for them to kind of know what sort of issues that come up. So, what we did was we have a Git issues and we created a dashboard of, you know, a Git board. And then, basically, if you're a new admin, you can search for an issue and that issue will have been resolved somewhere. And then, you know, if, let's say, I didn't exist in this company tomorrow, you can actually look at how that issue was resolved and then go about resolving that issue. That's kind of how we designed it and set it up.

AI's impact on data science roles

With AI, this question is pertinent to a lot of us. How do you think AI is going to impact data science roles? And then, you know, do you use AI as a productivity tool in your day-to-day? And if so, how?

Yeah. Very pertinent question. So, yes, AI is going to impact not just data science. AI is going to impact even software development for that matter. AI is going to impact to a reasonable degree now architecture in the future. Probably, you know, that's where we're going to go. I think the only safe stream right now is data engineering. So, if you are a good data engineer and, you know, you're worth your salt, then you're safe for now is what I would say.

I'm pretty sure most of us have seen that, you know, you throw a CSV, let's say, at the model and you ask it questions, it's going to generate X, Y, and Z. If you are a Pandas user, you know that Pandas AI is a package that you can actually load to just ask it questions, you know. So, it just depends on how it's being used. California is a different bubble in itself altogether. So, outside of California, all of the other companies are still trying to get up to speed with this. In many companies like ours, there are a lot of, you know, compliance and regulatory issues that we have to overcome before we start doing all of these things because of where the data sits and where it goes and what you can ask and what is quote-unquote touched by AI.

So, for now, it's okay, but then very soon, there's going to come a place where many of the jobs are going to be redundant because you can then do more with less. Now, but that also means that you're not necessarily going to get more productive. It means that you're going to have to work on a lot of things because the assumption is that, hey, you don't now have to write the code that you otherwise previously would have to. So, those assumptions are changing. I am seeing that in multiple organizations, you know, the whole idea of trying… For instance, we have a AI coding agent, like we have Windsurf, you know, this thing, you know, as part of the organization. So, many people use that. I use it to… So, I set up a few things so that I use it for code review, you know. It helps me review the code much faster because with all of the other things that I do, I won't have enough time to actually do enough, like, detailed code review.

So, I ask it to do X, Y, and Z, and it comes back with whatever. But then I still rely on my own know-how to actually go and look at the logic that's actually written for the business. It's not just about the correctness. It's also about, is the business logic correct? Are you reading the right kind of columns? Does it make… The transmission that you do, is it clinically relevant? All of those things, I don't think AI can do yet, you know.

In order for you to hedge yourself against that risk, I would still say, be fundamentally thorough in your machine learning, deep learning, statistics, mathematics, whatever you have it, you know, and then coding is kind of out of our hands. You know, you tell people I'm a better coder than a coding agent right now. It's a hard thing to prove these days. But you can always still say that I'm a much better data scientist, or I'm a much better machine learning engineer than a coding agent or, you know, the AI can do right now. And you would still get away with that, and people will trust you.

In order for you to hedge yourself against that risk, I would still say, be fundamentally thorough in your machine learning, deep learning, statistics, mathematics, whatever you have it.

System design and thinking like an architect

It's not just about data science models that you build. So we also build up the applications in our team, right? So that brings in itself its own level of system design that you'll have to think about. You'll have to understand what works, what doesn't work. That's another part where AI kind of falls short, because you are the one who has to think about you. There's practically no way for the AI to be cognizant of all of the existing stuff that's going on in your organization and how things are set up and all of those things. So your system design is basically your architecture at the end of the day. Any architect's fundamental thing is system design.

When you architect your solution, it's not just about which algorithm I use or how I input the data. It's about how many users are going to use it. How am I going to connect it to different things? How am I going to fetch from this database at whatever pace? Is it going to be given to me in a stream? Is it going to be given to me in a batch? Is it going to be given to me in a JSON format? Is it just going to be a row by row fetch? All of those things you'll have to think about when you build your model, because your model outputs are going to then be dependent on the speed at which your system integrates. So system design is definitely something that's a must for most machine learning engineers.

Becoming a data expert and growing your career

I'm a little young in my career as a data scientist. So I wanted to ask, it seems like you've switched a lot of careers where data is vastly different. The systems that you're using are vastly different. And as data scientists, we're expected to know, be the experts on our systems and data. How do you quickly become an expert on the, when you walk into a new job, how do you quickly become the data expert?

I don't think there's any such thing as quickly becoming a data expert. So the way I grew into my role is I, so if this is a funnel, I'd say here, if this is a funnel here is where data scientists usually operate. And I'd say, stop just being here. This is a local minima for you. Just don't be here. Which is why I say, if you think like a architect system design comes into the picture, you have to know where your data exists, what's your data pipeline. So depending on whichever company that you work for, try and get to know the data pipeline.

And then you'll have to start thinking about, if I have to collect more features for whatever data science work that I'm doing, can I extract features only from this relational database or from NoSQL, or can I also go above and beyond and say, you know what, from this particular WAV file, I can actually transcribe this document, this audio file, and then do some NLP on top of that. That is kind of how you have to think, because you should never stop yourself in this here. You also have to think about a data engineer. You have to think about, okay, but if I have to connect these two different databases, how do I go about connecting them? Am I writing the most optimal SQL?

And then most importantly, I would say you'll have to think like the business or like your end user. If you were a recipient of the analysis that you were given, what would you want or what else do you want? How deep would you want to go? And so that's kind of how you have to switch your thinking. So your data science, with all of the machine learning and the statistics background that you must have learned in your school, or even when you're in your job that you're learning, that you're being an architect or a system designer, and then being a vertical data engineer is fundamental to being a good machine learning stack.

So like I said, go and look for data that you probably don't even think you need for right now. Just look at the data, like see what's available. And then you'll know that, okay, when the time comes, oh, I have this data over here, would that be useful? Or there is data over here, which is somewhat translatable to the feature that I'm looking for. Let me see and add that. That's kind of how I would go about it.

GitHub portfolios and job seeking

I've been told that I need to have a portfolio and I need to have done a bunch of projects on GitHub to be a stronger job-seeking candidate. How true is that?

It is true to a certain extent, especially if you're in a tie break situation. That's my first thing that I look for is, do they have a GitHub profile? And what kind of work they've done? I'd like to see what kind of changes they've made over time, the comments that they do and stuff like that. But at the same time, it's more relevant if you are very early in your career.

I just say, start on the side, building out projects that you think you're interested, not to show somebody else that you can do X, Y, and Z. But if it's a project that you think that you're genuinely interested, automatically you will go very deep into that. And when you talk about that in your job interview, you will then tell them how you thought through the problem, how you thought through the solutions, how you thought through the system design, if there is a system designed to that, how you thought through the scale of problem, how you thought through the lack of data. All of those things will automatically happen. So even if it's just one project, the depth of the project is sufficient.

The way I look at interviewees is how they are able to think, especially in this day and age when you have coding agents and the need to be an exceptionally good coder is kind of reduced. It's all about the way you think about a solution and the way you come up with the solutions. Have you been able to put out a dashboard of sorts? And all of those things do matter.

Career advice

Like I said, when Logan was asking, don't put yourself in a local minimum. If that's what you want to do and you're happy in that space, that's fine. There's nothing wrong with that. But if you are a curious person, if you want to grow up the ladder, even don't just put yourself in a local minimum.

I think the first thing that you need to develop is perspective as a data scientist or a machine learning engineer or whatever. It's about developing new perspectives. You will have to spend your time and actively curate what perspectives that you develop and how that translates to what you do in your job. I very consciously built myself into thinking like a data scientist. I was a researcher, so I used to think like a researcher. And then I quickly realized that just thinking like that wouldn't help because there are deadlines to reach. There are so many other things, not every job, not every problem is a research problem. So you'll start having to think about what is the simplest way I can get to the solution.

If there is a simple way, do I even need to go and think about a more complex machine learning solution? Take it to the business. If the business is happy with that, okay, so be it. But then you kind of get a buy-in because I think a lot of people's issues are we are not really given enough time to implement X, Y, and Z. Initially, you will never be given enough time, but then you get a buy-in after producing some results and then tell them, hey, usually this is not a good thing for X, Y, and Z reasons.

First and foremost, be kind. That's the first thing that you should do. Be kind to people. And as a data scientist or a machine learning engineer, you have to train yourself to be unbiased. What that means to say is you have to stop thinking of this works. I know I have so much experience, blah, blah, blah. Put all of that aside when you get into a meeting with someone. Hear them out. Don't talk over them and hear them out and understand what they're trying to say.

Many times when we go and sit with the business is we try and get the requirements. The getting the requirements part is more like the business says these nine things are my requirements. Am I delivering on that? No. There's a lot of in between the lines. So try and understand what the business really wants. Sometimes the business doesn't really know what they want. So try and talk to them about it.

If you're interested, be a mentor, becoming a mentor or becoming a teacher changes the perspective altogether in the way you actually go about your own job. Because then there's a difference between you understanding something and there's a difference between you explaining something. And then like Feynman or Einstein, either of them said, if you're not able to explain something adequately, you haven't understood it well enough. So being a mentor to somebody who's a junior or not as much experience as you are always helps. And they bring in new perspectives that you probably never would have thought of.

And I think this is extremely important for data scientists, develop rapport with people and get them to invite you into meetings that you have no business being in. Just sit there and listen to what they are discussing, especially if it is business related, because you will then understand why business is asking for something rather than what they're specifically asking for. So just go there. I used to do this. I used to just say, Hey, can I just join here? And then I just, I used to say, can I be a fly in the wall? And they'd be like, okay, CC Brijesh. And I just go and listen. And I developed so many perspectives. That's kind of how I developed my know-how of some of the domain knowledge. Otherwise you won't get that domain knowledge. You will then hear more problems in the business than you think, Oh, you know what? I can actually solve for this. That gives you some more acceptance and buy-in as well in the company that you are working for.

Develop rapport with people and get them to invite you into meetings that you have no business being in. Just sit there and listen to what they are discussing, especially if it is business related, because you will then understand why business is asking for something rather than what they're specifically asking for.