Lessons from a Broad & Varied Data Science Career | Arcenis Rojas | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Posit Data Science Hangout. I'm Libby Heron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I am so excited to introduce our featured leader today, Arcenis Rojas. He's a data scientist at Indeed, and he's also a longtime regular at the Data Science Hangout. If you've been here, you've probably seen Arcenis ask a question. Arcenis, would you like to introduce yourself for me? Tell me what you do and something you like to do for fun.

Sure. Thank you, Libby. So real quick, just off the top, I wasn't nervous about this at all until you just now introduced me, so that's kind of fun. Anyway, so yeah, as you mentioned, I'm a data scientist at Indeed with Indeed's hiring lab. And what the hiring lab is, is basically we try to provide unbiased labor market data and analysis to the public at large using the very unique data set that we have at Indeed, which represents both the supply side of labor and the demand side. Now, with that said, I am here just representing me, Arcenis, and none of the opinions or thoughts that I share are those of Indeed necessarily, whatever that fun statement is, you've all heard it.

So so be curious about other people and care about them. Yeah. Just say hi and ask them whatever it is that, you know, you have on your mind about what they're doing, you know, within social norms.

So yeah, about PositConf, I actually found the community that like the people showed up, some of the nicest people in general, like as a as a group. And I was actually intimidated initially about seeing, you know, Hadley and Emil and Max and Julia and like I fanboyed all over Julia. I'm not going to lie. And she was so cool about it. She was like, hey. So we talked about like some of the work that she had done. But I have found what you just said, Libby, about the data community. People do tend to be very cool and very chill and everybody does things that are so different from one another that it's kind of normal to have curiosity and to like learn from one another.

Misconceptions about the job market

So I can't speak very intelligently to the ATS, because I've never been on that side of things. But it is true that some organizations do use algorithms using OCR and things of that nature to process resumes ahead of time. So that is true. That happens. And I know that it happens because OPM at the federal government does it, right? And if OPM is doing it, other organizations were doing it 10 years before. So yes, it does happen.

What I see more than anything is when I look, for example, at news stories from kind of all different sources and all that, what I see is sensationalism a lot of times. And that kind of makes it hard to see the stories that are really underlying things that are actually going on in the data. So misconceptions that I do see tend to be more about the sensationalism than about the data. When people talk about the data that actually exists, I actually don't hear a lot of things that I would characterize as misconceptions. So if you focus on the data, you're generally going to be okay.

So I guess I can talk about that a little bit. I don't think anybody in this Hangout will be surprised that if you don't dig a little more deeply into the numbers, you can very easily be fooled, right? So I'm not going to talk about the current environment. I will take it back a bunch of years to the Reagan administration, just to not make the environment too hot. So during the Reagan administration, something that happened, and I was just a little kid, so I'm not talking like I, but I've looked at the numbers. And one thing that happened is that there were lots of, there were increases in the job market happening all the time. They were going up, it was a strong job market, et cetera, et cetera. When you dig into the numbers, what you see is there were two, I'm going to, sectors is the wrong word, but two groups that were increasing a lot while other segments of the job market were going down. The groups that were increasing, that were seeing increases in job openings were low-skill employment and contract positions in the government, right? So contracting got really big under Reagan. That was one of the things under that administration.

So those were the groups that increased. So if I take that nugget and bring it forward to today, what I would say is look a little more deeply into the numbers. They're out there. They're there. So if you're just looking at top-line statistics, which is what most people will do, top-line numbers, yeah, you'll be fooled. You will certainly be fooled.

The advantages of a broad career trajectory

So I'll answer both. I'll try and be brief here. Advantages are, there's one big one, which is that I am now really confident that there is no data set I can't figure out pretty quickly or tech stack that I can't figure out. The other advantage is that it has taught me. I've learned how to talk to people at different levels of organizations, as well as in different domains. So the way I can communicate about data has changed quite a bit. So I would say those are the two really big advantages.

Disadvantages, there are some big ones there as well. One, my career trajectory doesn't show progression in the way that a lot of people like to see it on a resume. Oh, you started out as an individual contributor, then you became a team lead, then you became a manager. No. I don't have, I've led multiple teams and that's on my resume and stuff. And I've done things that have bigger and bigger impact in different situations. But what I've, my progress has been more wide than it has been kind of vertical, right?

But what I've, my progress has been more wide than it has been kind of vertical, right?

So that can be a disadvantage depending on who you're talking to and what you're trying to do, what you're trying to accomplish. Another disadvantage of it is that, like you just kind of alluded to, it gets really, it gets challenging talking about what it is that you do or what it is that you're good at to anybody that's not in the field. So there's, all they see is you go from this thing to that thing, to that thing, to that, you just seem like whatever. But there is, like in me, I actually have found a very consistent set of interests and things that I'm good at and things that I enjoy doing. And they are, to explain them, it would get, it could get a little bit technical very quickly, right? So I kind of, like you, I stay away from getting into too much detail about what I do.

AI and the hiring sphere

So very high level. I actually helped with a big research project called Gen AI at Work. How we expect it to affect different things. And what we did was we took a little over 5,000 skills and we looked at how generative AI would, how well it can do those skills. And then we took different job titles, you might say, and we basically said, okay, of all these job titles that we have, how many of those skills can AI replace or assist with? And so the main author on this, her name is Anina Herring. She actually created an index for this. And the short answer to this part of the question is, AI is going to assist a lot of people in their work in the coming 5, 10 years, but it will be able to replace very small segments only of the workforce.

Now, some of the things that are going to be most highly impacted, entry level tech jobs or anything that's entry level knowledge work. Entry level knowledge work is what generative AI is meant to do, right? If you think about what those things are, very general questions. Okay. So now with Indeed itself, we're actually implementing it in a lot of ways internally. So we have an agentic AI job coaching experience. So if you go on Indeed and you use it in any way, you'll actually see on the website now, there's an offer. If you use it as a job seeker, you'll see that there's this agentic or AI job seeker agent or something like that, that actually helps you go through the process and identify job postings that are better for you. So that's one way. I actually just built a chat bot that analyzes data internally for our team, which could bring the cost of doing that down quite a bit.

And ghost jobs, I want to address that because I'm actually about to start a big research project on that. And so there is evidence in the data that there seems to be ghost jobs, at least in our data. I don't know anywhere else, but we're not, I can't say for sure yet there are ghost jobs. So a ghost job is a case where an employer or somebody puts out a job announcement, but they don't actually intend to hire for it, right? So yeah, so we're looking at, it appears that we do have some ways of finding out if that's going on in our data. I don't know yet cause I haven't gotten into it, but so there's not a lot I can say yet. But yeah, it seems that ghost jobs are actually happening.

Tool stack and reporting

So I'm going to, the core stuff that I use is R, Positron now, I've kind of, I think I've made the switch fully over now from RStudio. So R, Positron, AWS. So on AWS, I use Athena a lot for our databases, as well as Snowflake. So that's kind of the core of everything. Yeah, there's some other things around it like Asana and all this, but yeah, that's the core data science stuff. So yeah, two things. So reporting is going to be primarily Quarto docs. We do a lot of those internally. But I also build some Shiny applications that are, and we have those on PositConnect for the most part.

Knowing when to pivot

So the first thing, the first thing that you have to start with is, what actually do you bring? What do you enjoy? To have a job, a happy career, I actually see it as the triangulation of three things. Something you're good at, that you enjoy doing, that you can sell. Now with being in tech, tech is always changing, but the triangulation of those three things in yourself might not change. So you have to know yourself. What is it that you actually like doing, that you're good at, that you can bring to the market, to the labor market? That's step one. And if you find that the role around you is moving in a direction that doesn't correspond well with that, then you have to start asking questions like, do I want to stay in this?

So for me, I always start with explaining to the leadership structure around me, hey, this role is moving in a direction that doesn't correspond well with what I'm doing, or what I enjoy doing. Here are some things that I have done in the role, and I try to always have some accomplishments kind of on the list. Here are some things that I have done that correspond really well with what I enjoy doing. No leader, look, no company, no management team, they never want to have employees that hate what they do. Nobody wants that, right? So a good manager, a good leader will try to, will take the information you're giving them on board. But remember, they also have to think about the company, the organization, and if it has to go in the direction that's different from what you like, that's just what's got to happen. And so you might work out a solution. My preference would be to stay at the organization when I can, but that's kind of how I go through the decision-making process. If it's time to look elsewhere, it's time to do that. But I just can't see making my life miserable, you know what I mean? Just, it's something that nobody wants. If you're a miserable person, you're not giving your company good work anyway, is my argument here.

Persuading others to use R or Python

So with individuals, if the individual is somebody who is kind of open-minded and is interested in learning stuff, you can actually show them how easy it is to use R or Python by just typing one plus one in the console, right? That usually gets somebody like, oh, what else can I do? With organizations, it depends a lot on the organization and how kind of tech forward they are and all that. So at the BLS, one of the first things that I was challenged to do by the manager of one of our tech teams, he had me write a very short report on the differences between like SAS, R, SPSS, and some of these things, like the negatives and positives of all of them. So that was one thing I did. She started to get convinced. At the BLS, the effort was a lot bigger because I actually had to convince many, many individuals and I had to get the people that were already on board to start talking about it more. So it was more about like getting community around it. Like, yeah, we're all doing it. And once there was community around it, you know, the leadership couldn't but pay attention. Like they kind of had to. And then when you told them, hey, it'll save you a bunch of money as well. Then they were like, well, maybe we should look.

Descriptive vs. predictive: serving the right audience

Yeah. I understand your question, I think, really well. Here's how I'll answer it. I'm going to talk about my experience at Deloitte. I had two things pulling at me. I had the folks in my Deloitte structure trying to get me to use more machine learning models, but then the director of the program, the TB Depot program, he wanted just inferential statistics like regular... We were using logistic regression and all that. He was happy with that. The way that I actually ended up bridging the gap was I looked at... I just talked to the director for a while and I said, okay, what is it you're actually trying to do with this thing? What I learned was they were actually serving two audiences and he wasn't aware of it. One audience was he wanted to serve researchers who would need more inferential statistics, you want confidence intervals, you want all that, but he was also trying to serve clinicians. A clinician does not care what the confidence interval is around an estimate. They want to know, is my patient going to die? That's their concern. For that, predictive stuff was better.

What I did was I ended up using a lot of stuff like SHAP and LIME and all that to help the director go, okay, yeah, we can use ML, using explainable ML. This is 2019. I helped with that, but I ended up doing both on this project. The way that I did it was I helped the director see, hey, you actually have two audiences. First, you have to understand the problem. It's always listen first, figure out what the real question is, and that's how you can bring the two sides together.

It's always listen first, figure out what the real question is, and that's how you can bring the two sides together.

Lessons from a Broad & Varied Data Science Career | Arcenis Rojas | Data Science Hangout

Transcript#

Arcenis's career journey

Econ vs. finance and what makes econometrics unique

What is econometrics?

Making friends at conferences

Misconceptions about the job market

The advantages of a broad career trajectory

AI and the hiring sphere

Tool stack and reporting

Knowing when to pivot

Persuading others to use R or Python

Descriptive vs. predictive: serving the right audience

Featured software#

plotnine

Positron

Quarto

rstudio

Shiny