Lessons from a Broad & Varied Data Science Career | Arcenis Rojas | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We'd love to see you! We were recently joined by Arcenis Rojas, Data Scientist at Indeed, to chat about econometrics, public vs private sector data science, navigating a varied career trajectory, AI integration in the hiring sphere, and making friends at conferences. In this Hangout, Arcenis talked about how his career journey has been wide as opposed to vertically narrow. He shared that this breadth of experience has given him confidence that he can quickly figure out any dataset. He feels it also taught him how to communicate effectively about data to people at different levels and across various domains. He also shared his tech stack at Indeed, including RStudio, Positron, AWS, Snowflake, Quarto for reporting, Shiny for apps, and Posit Connect for deploying them. An attendee asked about the impacts of AI on the job search space, and Arcenis shared the AI at Work Report (linked below) from the Indeed Hiring Lab. He says, based on research, generative AI is expected to assist many people but only replace small segments of the workforce in the coming 5-10 years, and that entry-level knowledge work is predicted to be the most highly impacted area. Resources mentioned in the video and zoom chat: Indeed Hiring Lab: AI at Work Report 2025 → https://www.hiringlab.org/2025/09/23/ai-at-work-report-2025-how-genai-is-rewiring-the-dna-of-jobs/ To Explain or to Predict? (Galit Shmueli, 2010) → https://arxiv.org/abs/1101.0891 Announcing the 2025 table and plotnine contests → https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/ If you didn’t join live, one great discussion you missed from the zoom chat was about the wide variety of data types data scientists work with. Attendees shared that their data included genomics, finance/trading, environmental/natural resources, e-commerce products, and medical/clinical data. What kind of data types do you work with? ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us! Timestamps 00:00 Introduction 06:16 "What do you like to do for fun?" 08:51 "What are the unique aspects of financial and economic data science?" 15:07 "What are econometrics?" 16:02 "Is the difference that hard sciences stats is trying to explain what happened where econometrics might be what might happen in the future?" 19:39 "Suggestions for making data friends and going to a conference alone." 23:26 "Do you see any misconceptions about the job market online, specifically the ATS thing?" 29:52 "How has your varied career trajectory been an advantage or a challenge in data science?" 34:08 "How is the recent hype wave of AI integration manifesting in the hiring sphere?" 40:08 "What are the tools that you use in your job for reporting?" 41:42 "How do you know when it is time to pivot and leave your role because your skills are stagnating?" 45:56 "How would you persuade leadership to use R or Python?" 49:32 "Did you find yourself always trying to use more complex models when simpler ones would serve the audience better?"
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey there, welcome to the Posit Data Science Hangout. I'm Libby Heron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.
I am so excited to introduce our featured leader today, Arcenis Rojas. He's a data scientist at Indeed, and he's also a longtime regular at the Data Science Hangout. If you've been here, you've probably seen Arcenis ask a question. Arcenis, would you like to introduce yourself for me? Tell me what you do and something you like to do for fun.
Sure. Thank you, Libby. So real quick, just off the top, I wasn't nervous about this at all until you just now introduced me, so that's kind of fun. Anyway, so yeah, as you mentioned, I'm a data scientist at Indeed with Indeed's hiring lab. And what the hiring lab is, is basically we try to provide unbiased labor market data and analysis to the public at large using the very unique data set that we have at Indeed, which represents both the supply side of labor and the demand side. Now, with that said, I am here just representing me, Arcenis, and none of the opinions or thoughts that I share are those of Indeed necessarily, whatever that fun statement is, you've all heard it.
Arcenis's career journey
So there are actually two parts of the story, two major chapters of the story in my career. So there's pre-grad school and post-grad school. So for about seven or eight years, I was in finance and did a bunch of other stuff. I actually worked in telecoms for a little while, a lot of fun stuff. After being an equities trader for a while, I actually realized that I liked the research aspect of trading equities more than I actually liked the trading. So that's what prompted me to go to grad school. While I was in grad school, I got a lot deeper into econometrics. This was for a master's in economics. Got very deep into econometrics, became known as the data guy in my cohort. I was very good at SPSS and then R.
Now, funny story, I actually learned Python before I learned R, but kind of got away from Python because of a lot of the things, a lot of the reasons that other folks have talked about, about dependency hell and all that. But now that some of those things have been resolved, I'm kind of going, hey, Python's kind of fun again. So anyway, that's grad school, did econometrics. And I went to work for the U.S. Bureau of Labor Statistics right after that for the U.S. government and very quickly became an expert on the Consumer Expenditure Surveys Program Public Use Microdata. From there, I decided that I wanted to kind of, I was an economist there technically, but I was doing a lot of stuff with R and opening up the possibility of using open source technologies there. This was 2015, 2016. I actually ran the R users group for the entire agency for a couple of years until 2019 when I departed to go into data science a little bit more, kind of more directly.
And I did that as a contractor for the federal government. I did it as a contractor for private industry. I did it as a consultant with Deloitte. And there I was doing, or sorry, consulting for public organizations, government and public service organizations. Then I went on, did it at AWS for a little bit. And, you know, with all that, like I've been at Indeed for a year and a half now, having a ball.
But through all of those things, one of the, there are two threads in my, through in the second half of my career that I've noticed. One that I've really enjoyed kind of the kind of trying to bridge the gap between descriptive statistics, which is what I fell in love with in econometrics and predictive stuff. So more of the kind of bridging the gap between the frequentist statistics, inferential and like ML, that kind of stuff. And one thing that I've, that I spent a lot of, a lot of time doing in the first part of this chapter is helping people kind of understand when, which one is useful. There was actually a paper that came out in 2010 by Shmueli. I forget his first name. I'll have to look that up, but. Maybe somebody in the chat can find it.
Yeah. So Shmueli is a S-C-H-M-U-E-L-I. And that paper went into kind of the difference between those two things and all that kind of stuff. So I just kind of fell in love with that and did that for a long time. The other thread is that I love learning new technologies and kind of understanding some of the things that are always coming into our space, into the data science space.
What do you like to do for fun? Scuba dive is, is the, the main thing that goes on these days. That's when I think fun, that's what I'm thinking of. So the day after tiny dev day, which is the, which was the day after PositConf. So this is the Saturday after PositConf. I went diving in the Georgia Aquarium. I went diving in their, I think it's called Voyager, Ocean Voyager. And I got to dive with a whale shark, giant mantas and all that. And it was phenomenal.
Econ vs. finance and what makes econometrics unique
So in terms, so what's unique about econ and data science, or about econ and finance. So econ and finance, I actually think of as two very different domains. So for me, econ is the very basic question of economics is why do people make the trade-offs people make the trade-offs that they do? People and groups, why do they make the trade-offs that they do? What are, what are the incentives that make one person or group choose one thing over another? So it's, it's actually about making choices. Finance is more about optimizing for certain outcomes in, you know, for financial outcomes. And so what's unique about econ is that it, it is, in my opinion, the, the most statistically rigorous of the social sciences, right? The rigor in economics, the statistical rigor of it is really high, especially when you're talking about things like econometrics.
So what makes it unique is a, a deep understanding of econometrics, as well as the models that go into economics. They're not, they're not the same as what you have in the natural sciences or the hard sciences. Some folks will say, you know, econ is a soft science, whereas the natural sciences are hard sciences. It's, it's really understanding probability, probability distributions around things, forecasting, all that, which you do have in other domains, but it's really big in econ overall is, is all about forecasting and doing that kind of thing.
The other thing that's, I would say, distinct about the social sciences in general is the practitioners of it are, you know, obviously there, there's some differences, but we like flat tables. We like fairly, fairly clear questions and storylines. So everything, things try to, you know, we try to get things down to three dimensions as much as we can, which isn't true in a lot of the other domains I've been. So some of the, so to give someone a to give a little more context, some of the domains that I've done things in the consulting that I did was for the NIH. So I, you know, did a lot of work with clinical data and genetic data, as well as demographics there, aviation data at the FAA.
So it's like at the NIH, for example, when I was consulting there, so to, to kind of contrast it with biostats or clinical data, I actually found a lot more, many more incidents of variables, features being multi dimensional, like a single feature having more than one dimension. Think for example, if there is a clinical trial of some kind for a particular drug, that particular drug, you might test it on one individual three times over that trial, right? To look at the outcomes. Well, to answer a particular question, this set of responses that you would get might actually be considered one thing, one categorical thing, right? Because if you limit the responses to let's say three classes of responses, what you're interested in is that set of responses, you're not necessarily, you're not always going to have a numerical value that comes out as a response. What you're going to have is a class of response that the patient had to a particular drug.
And so that set of responses in and of itself is a feature that you'll have over and over. So that's something that you don't often get a lot of, for example, in economics, like you're not going to see stuff like that a lot. And that actually was kind of an issue that I had to work through myself. But yeah, so the data sometimes are stored differently, and they're understood differently by the practitioners. Me coming from economics, I saw that and I was like, what's going on, break them up. And then I had to have the director of the program of TB Depot explain to me, he's like, no, no, this is one thing. And I was like, oh, that's why that's in a cell. Okay.
What is econometrics?
So econometrics is a kind of branch of economics that's focused primarily on statistical measurement of things. So if someone goes into an econometrics class, you're basically going into advanced statistics. So this is where you start learning how to do regressions and all the modeling that we do in economics. That's what, when I say econometrics, that's what it means. It's the metrics of economics.
So I wouldn't dispute that characterization, but I think the difference is much bigger, is that the difference that I see is that in the natural sciences, you very often have very precise measurements of things. And this thing did happen, and so I can expect this thing to happen again if I have the same set of conditions. So given the same set of conditions that you have, you should always get the same outcome. That's not true in economics. You may not. And so everything is probabilistic. So in economics, we do a whole lot more like Monte Carlo simulation, that kind of stuff.
Basically what it is, is you have, we might call it states. You have a set of things that you set to a certain state to start out. So let's say, for example, what's a good one? A good one might be forecasting. Let's say forecasting sales for a given company. What you might try to do, what you might have as inputs to your model are things like time of year, whether there are any discounts that you're offering, stuff like that. So imagine you have all of these. And then let's say you have a certain amount of capital that you allocate through the sales team to go out and promote. So something you might do is you might draw probability distributions around all of these inputs that you set. And then you create simulations, allowing all of those changes to bounce around in those probabilities. And then you see what your outcomes are. And then in the end, you get a distribution of an outcome, right?
So this is one of the big differences between what I would say, like all statistics in the social sciences and like the natural sciences, that's the primary thing. So in the natural sciences, if I have the same conditions, I should get the same outcome every time, right? And you typically know a whole lot more about the conditions. You can control for them a lot better in the natural sciences, typically, not always. Whereas in the social sciences, there's just no way, you know. Humans are too complex. It's a complex system. Any system that you throw a human being in automatically becomes a complex system and becomes incredibly hard.
Making friends at conferences
So so first the use of the word networking always kind of like puts me on end because it makes me feel like that's people talking about other people as nodes as like it makes it transactional. So I just like to go places and make friends. That's it. I just say hi. See how you're doing. Hey, what are you up to? And so one thing one thing that is true about me, I think, is I've always got a question loaded. I'm always wondering what are you saying? What do you mean? And, you know, I'm always processing what people are saying. So taking that curiosity into every every environment makes it pretty easy to just get to know people.
And as a teenager, as an adolescent, I was super duper shy. Super. You would a lot of people say they would never believe it now. But it was it's just like I always had this curiosity and I realized, oh, most people do like it when you ask them about something they're doing, like they like to talk about. So so be curious about other people and care about them. Yeah. Just say hi and ask them whatever it is that, you know, you have on your mind about what they're doing, you know, within social norms. But of course, just just ask, you know, and every once in a while someone will give you a really kind of nasty, spiky response or whatever. You know, it's not you, it's them. And that's OK. Move on.
So so be curious about other people and care about them. Yeah. Just say hi and ask them whatever it is that, you know, you have on your mind about what they're doing, you know, within social norms.
So yeah, about PositConf, I actually found the community that like the people showed up, some of the nicest people in general, like as a as a group. And I was actually intimidated initially about seeing, you know, Hadley and Emil and Max and Julia and like I fanboyed all over Julia. I'm not going to lie. And she was so cool about it. She was like, hey. So we talked about like some of the work that she had done. But I have found what you just said, Libby, about the data community. People do tend to be very cool and very chill and everybody does things that are so different from one another that it's kind of normal to have curiosity and to like learn from one another.
Misconceptions about the job market
So I can't speak very intelligently to the ATS, because I've never been on that side of things. But it is true that some organizations do use algorithms using OCR and things of that nature to process resumes ahead of time. So that is true. That happens. And I know that it happens because OPM at the federal government does it, right? And if OPM is doing it, other organizations were doing it 10 years before. So yes, it does happen.
What I see more than anything is when I look, for example, at news stories from kind of all different sources and all that, what I see is sensationalism a lot of times. And that kind of makes it hard to see the stories that are really underlying things that are actually going on in the data. So misconceptions that I do see tend to be more about the sensationalism than about the data. When people talk about the data that actually exists, I actually don't hear a lot of things that I would characterize as misconceptions. So if you focus on the data, you're generally going to be okay.
So I guess I can talk about that a little bit. I don't think anybody in this Hangout will be surprised that if you don't dig a little more deeply into the numbers, you can very easily be fooled, right? So I'm not going to talk about the current environment. I will take it back a bunch of years to the Reagan administration, just to not make the environment too hot. So during the Reagan administration, something that happened, and I was just a little kid, so I'm not talking like I, but I've looked at the numbers. And one thing that happened is that there were lots of, there were increases in the job market happening all the time. They were going up, it was a strong job market, et cetera, et cetera. When you dig into the numbers, what you see is there were two, I'm going to, sectors is the wrong word, but two groups that were increasing a lot while other segments of the job market were going down. The groups that were increasing, that were seeing increases in job openings were low-skill employment and contract positions in the government, right? So contracting got really big under Reagan. That was one of the things under that administration.
So those were the groups that increased. So if I take that nugget and bring it forward to today, what I would say is look a little more deeply into the numbers. They're out there. They're there. So if you're just looking at top-line statistics, which is what most people will do, top-line numbers, yeah, you'll be fooled. You will certainly be fooled.
The advantages of a broad career trajectory
So I'll answer both. I'll try and be brief here. Advantages are, there's one big one, which is that I am now really confident that there is no data set I can't figure out pretty quickly or tech stack that I can't figure out. The other advantage is that it has taught me. I've learned how to talk to people at different levels of organizations, as well as in different domains. So the way I can communicate about data has changed quite a bit. So I would say those are the two really big advantages.
Disadvantages, there are some big ones there as well. One, my career trajectory doesn't show progression in the way that a lot of people like to see it on a resume. Oh, you started out as an individual contributor, then you became a team lead, then you became a manager. No. I don't have, I've led multiple teams and that's on my resume and stuff. And I've done things that have bigger and bigger impact in different situations. But what I've, my progress has been more wide than it has been kind of vertical, right?
But what I've, my progress has been more wide than it has been kind of vertical, right?
So that can be a disadvantage depending on who you're talking to and what you're trying to do, what you're trying to accomplish. Another disadvantage of it is that, like you just kind of alluded to, it gets really, it gets challenging talking about what it is that you do or what it is that you're good at to anybody that's not in the field. So there's, all they see is you go from this thing to that thing, to that thing, to that, you just seem like whatever. But there is, like in me, I actually have found a very consistent set of interests and things that I'm good at and things that I enjoy doing. And they are, to explain them, it would get, it could get a little bit technical very quickly, right? So I kind of, like you, I stay away from getting into too much detail about what I do.
AI and the hiring sphere
So very high level. I actually helped with a big research project called Gen AI at Work. How we expect it to affect different things. And what we did was we took a little over 5,000 skills and we looked at how generative AI would, how well it can do those skills. And then we took different job titles, you might say, and we basically said, okay, of all these job titles that we have, how many of those skills can AI replace or assist with? And so the main author on this, her name is Anina Herring. She actually created an index for this. And the short answer to this part of the question is, AI is going to assist a lot of people in their work in the coming 5, 10 years, but it will be able to replace very small segments only of the workforce.
Now, some of the things that are going to be most highly impacted, entry level tech jobs or anything that's entry level knowledge work. Entry level knowledge work is what generative AI is meant to do, right? If you think about what those things are, very general questions. Okay. So now with Indeed itself, we're actually implementing it in a lot of ways internally. So we have an agentic AI job coaching experience. So if you go on Indeed and you use it in any way, you'll actually see on the website now, there's an offer. If you use it as a job seeker, you'll see that there's this agentic or AI job seeker agent or something like that, that actually helps you go through the process and identify job postings that are better for you. So that's one way. I actually just built a chat bot that analyzes data internally for our team, which could bring the cost of doing that down quite a bit.
And ghost jobs, I want to address that because I'm actually about to start a big research project on that. And so there is evidence in the data that there seems to be ghost jobs, at least in our data. I don't know anywhere else, but we're not, I can't say for sure yet there are ghost jobs. So a ghost job is a case where an employer or somebody puts out a job announcement, but they don't actually intend to hire for it, right? So yeah, so we're looking at, it appears that we do have some ways of finding out if that's going on in our data. I don't know yet cause I haven't gotten into it, but so there's not a lot I can say yet. But yeah, it seems that ghost jobs are actually happening.
Tool stack and reporting
So I'm going to, the core stuff that I use is R, Positron now, I've kind of, I think I've made the switch fully over now from RStudio. So R, Positron, AWS. So on AWS, I use Athena a lot for our databases, as well as Snowflake. So that's kind of the core of everything. Yeah, there's some other things around it like Asana and all this, but yeah, that's the core data science stuff. So yeah, two things. So reporting is going to be primarily Quarto docs. We do a lot of those internally. But I also build some Shiny applications that are, and we have those on PositConnect for the most part.
Knowing when to pivot
So the first thing, the first thing that you have to start with is, what actually do you bring? What do you enjoy? To have a job, a happy career, I actually see it as the triangulation of three things. Something you're good at, that you enjoy doing, that you can sell. Now with being in tech, tech is always changing, but the triangulation of those three things in yourself might not change. So you have to know yourself. What is it that you actually like doing, that you're good at, that you can bring to the market, to the labor market? That's step one. And if you find that the role around you is moving in a direction that doesn't correspond well with that, then you have to start asking questions like, do I want to stay in this?
So for me, I always start with explaining to the leadership structure around me, hey, this role is moving in a direction that doesn't correspond well with what I'm doing, or what I enjoy doing. Here are some things that I have done in the role, and I try to always have some accomplishments kind of on the list. Here are some things that I have done that correspond really well with what I enjoy doing. No leader, look, no company, no management team, they never want to have employees that hate what they do. Nobody wants that, right? So a good manager, a good leader will try to, will take the information you're giving them on board. But remember, they also have to think about the company, the organization, and if it has to go in the direction that's different from what you like, that's just what's got to happen. And so you might work out a solution. My preference would be to stay at the organization when I can, but that's kind of how I go through the decision-making process. If it's time to look elsewhere, it's time to do that. But I just can't see making my life miserable, you know what I mean? Just, it's something that nobody wants. If you're a miserable person, you're not giving your company good work anyway, is my argument here.
Persuading others to use R or Python
So with individuals, if the individual is somebody who is kind of open-minded and is interested in learning stuff, you can actually show them how easy it is to use R or Python by just typing one plus one in the console, right? That usually gets somebody like, oh, what else can I do? With organizations, it depends a lot on the organization and how kind of tech forward they are and all that. So at the BLS, one of the first things that I was challenged to do by the manager of one of our tech teams, he had me write a very short report on the differences between like SAS, R, SPSS, and some of these things, like the negatives and positives of all of them. So that was one thing I did. She started to get convinced. At the BLS, the effort was a lot bigger because I actually had to convince many, many individuals and I had to get the people that were already on board to start talking about it more. So it was more about like getting community around it. Like, yeah, we're all doing it. And once there was community around it, you know, the leadership couldn't but pay attention. Like they kind of had to. And then when you told them, hey, it'll save you a bunch of money as well. Then they were like, well, maybe we should look.
Descriptive vs. predictive: serving the right audience
Yeah. I understand your question, I think, really well. Here's how I'll answer it. I'm going to talk about my experience at Deloitte. I had two things pulling at me. I had the folks in my Deloitte structure trying to get me to use more machine learning models, but then the director of the program, the TB Depot program, he wanted just inferential statistics like regular... We were using logistic regression and all that. He was happy with that. The way that I actually ended up bridging the gap was I looked at... I just talked to the director for a while and I said, okay, what is it you're actually trying to do with this thing? What I learned was they were actually serving two audiences and he wasn't aware of it. One audience was he wanted to serve researchers who would need more inferential statistics, you want confidence intervals, you want all that, but he was also trying to serve clinicians. A clinician does not care what the confidence interval is around an estimate. They want to know, is my patient going to die? That's their concern. For that, predictive stuff was better.
What I did was I ended up using a lot of stuff like SHAP and LIME and all that to help the director go, okay, yeah, we can use ML, using explainable ML. This is 2019. I helped with that, but I ended up doing both on this project. The way that I did it was I helped the director see, hey, you actually have two audiences. First, you have to understand the problem. It's always listen first, figure out what the real question is, and that's how you can bring the two sides together.
It's always listen first, figure out what the real question is, and that's how you can bring the two sides together.