Data Science Hangout | Ivonne Carrillo Domínguez, Bixal | Transitioning to data engineering
We were joined by Ivonne Carrillo Domínguez, Data Engineering Manager at Bixal. Ivonne is passionate about storytelling and empowering data professionals to jump to the cloud. Here's a snippet (55:00) including a few thoughts that both Ivonne and Brittany shared at the hangout with regards to some people being overlooked for data engineering roles based on their experience: When I am hiring people for a data engineer position - I focus more on the programming skills. If you are a good engineer, you will learn Spark or MapReduce in time. You will end up using certain tools, but in my opinion I wouldn’t say no to someone because of a specific technology. If you find a good candidate, you can teach them or mentor them in specific frameworks. When I started my first job as a data engineer, I didn’t know about big data or Spark. I had a lot of experience with Java and that’s why they hired me. I think knowing how to program is more important - and that’s the philosophy I use. I think what they are looking for when they ask about certain tools is if you have experience with concurrency or parallel jobs. Sometimes you need to think a bit differently with distributed computing frameworks, but if you start playing around and reading about it, it will also help give you an opportunity to land a job. Sometimes when you’re working through a third party like a hiring recruiter, they may not know all of the things that are needed for the role. They’re given the job description and they may not know “I really need someone who has this skill, but if they don’t have this particular one - I’m willing to train on that.” The recruiter might not know that information and this may even change depending on how long the role is open for. What’s best for the candidate in that case is to get as close to the actual hiring manager as possible. Give them your resume and let them know your experience because sometimes they’ll look at your resume and would absolutely love to give you an interview, where the recruiter might say, “oh, you don’t meet X percent of the job description, therefore I’m not going to pass it on.” This is one of the disconnects between having a third party do that interfacing. Where to find more? ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Data Science Hangout site: rstudio.com/data-science-hangout ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi everybody, welcome to the Data Science Hangout. If you're joining for the first time today, I'm Rachel, and it's great to meet you. If this is your first Hangout, this is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing, and what's going on in the world of data science.
The sessions are recorded and shared to YouTube as well as the RStudio Data Science Hangout site, so you can always go back and re-watch or find helpful resources too. We also have a LinkedIn group that Tyler or Hannah will share in just a second here in the chat. If you ever want to continue a certain discussion or meet people from today as well, together we're all dedicated to creating this welcoming environment for everybody.
We love when everybody can participate in these and we can hear from everyone. So there's three ways that you can ask questions today. You can jump in by raising your hand on Zoom and I can call on you. You can put questions into the Zoom chat and feel free to just put a little star next to your question if you wanted me to read it instead, if it's maybe loud where you are or something. But we also have a Slido link where you can ask questions anonymously as well.
Just like to reiterate, we love to hear from everybody, no matter your level of experience or area of work as well. And all of these questions asked today are up to you.
I'm so excited to be joined by my co-host for today, Yvonne Carrillo-Dominguez, Data Engineering Manager at Vixel. Yvonne, I'd love to just start by having you introduce yourself and maybe sharing a little bit about your work and the company that you work at as well.
Sure. Thank you. Thank you for inviting me to hang out with you. I was feeling a little bit of imposter syndrome because I was like, I'm not a data scientist. I don't know what I can contribute, but I hope this is helpful and productive for you all.
And well, about my role, I am Data Engineering Manager, as Rachel said, and my role is similar to the manager roles that you will see in small companies because we are a small company and also the data team is still small and we are growing it little by little. And what that means for me is that I am leading a team and also mentoring them, but I'm also still in some contracts, I will work as an individual contributor as well. So I am a data engineer in one of the contracts, while at the same time leading a team of data analysts, engineers, and scientists, so it's like all together. So that's in a nutshell, or in a high level, what I do here at Big Cell.
Well, about Big Cell, we are a company that we do consultancy work for the federal government, so most of our contracts are for the government. There are opportunities as well for the private sector, and I think that's where we are leading right now, but yeah, the kind of work that we do is digital services for learning, for marketing, communications, the technology part that is the biggest part, while we do content systems and web applications, and then, of course, the data team. Yeah, that's what we do.
Excitement about the data space
Awesome. Thank you. And while we wait for questions to come in from the audience here, I'm curious, what's something that you're most excited about in the data space in the next year or so?
Well, one of the things that I am seeing, and mostly in the federal government, where I have more experience, is that some years ago, most of the agencies were in the point where they had a lot of data, but they didn't know what to do with it, or how to ingest it in the cloud, how to analyze it, they just know that there were some data, and now what I am seeing is that a lot of agencies already have some sort of report system, or a dashboard, or a data pipeline, and they are in another phase, but they want to do more things with that, so they want to start, like for real, making decisions using data, so data-driven decisions, and using it also to lead, or to use it for, to make policy changes, so that's something that I really like, and I'm really excited, and also I'm seeing more opportunities for machine learning and predictions, something that I remember, like even two years ago, most of the projects were still in the point where they needed a data pipeline, so now it's like in the other stage, so I'm really excited about that.
Getting to know Yvonne
That's great, thank you. I said I was going to start asking this question after last week as well, but I wanted to ask people so we can get to know them a little bit better, but what's something that you like to do in your free time outside of work?
So, what I do, I am, I'm not a super fun person, but well, I have a dog, and I like to hang out with her and walk her, but one of the things that I also like is to cook and to bake, and I think that's something that I was doing before the pandemic, but definitely like since the pandemic, that also, that grew exponentially, and another thing about me is that I am a vegan, so one of the things that I like is to veganize like dishes, and for example, I am from Mexico, so the Mexican cuisine is like very, can be very vast, but many of the dishes has meat, so try to make them without meat is something that I really like, so that's something that I do, yep.
Balancing individual contributor and manager roles
I see, Tanya, you asked a great question in the chat. Do you want to jump in and ask that live? Yeah, sure, and I'll probably tag team with Libby's question too, because it's kind of all in the same vein, so you, you mentioned that you are an individual contributor and a data engineer manager, and I'm just curious, how do you balance your time between these two different roles? I imagine it's kind of going from one mentality of doing to thinking more high level, so I'm curious how you navigate that, and then Libby asks a great question too of which do you like better?
That's interesting. It depends on the day, but yeah, I think the thing that I enjoy the most is to mentor people and to share my knowledge, and since I still need to put my hands on the code and work with a team as an individual contributor, I learn a lot there, so that's something that I try to take advantage of. I mean, in a contract, I will always meet people that know more than me and they do things differently as well, so that's something that I like about that, that I can take and then share that with the rest of the team.
I think at this point, I'm starting to enjoy more the manager role rather than the, I think right now, the two of them can have stressful times, but the kind of problems that I have when I am a mentor or a manager, I think I enjoy more, like I'm more excited about that, and well, as I see sometimes, the work or the stress is about setting up an environment, and I really enjoy that. Before, right now, I'm like, no, I don't know if I want to spend my time or most of my time on that. I still like that, but one of the things that I get to do as a manager is also like the design solutions or solutions are architecting, and that, in that role, I need to investigate new technologies, new tech stack, so I need to set up that environment, but it is a new thing and I am learning and I am figuring out if that's the best solution or not, so that part I like. When it is as an IT, it's different. Sometimes, it's kind of, if you're in a contract, it's going to be the same tech stack until the contract ends, and as I say, I still like that and I get benefit of that, because I will collaborate with people and I will learn a lot, so I still like that, but if I need to choose one, I will go with the managerial.
Does an engineering manager need a technical background?
I have a follow-up question, if that's okay. So, Elon Musk put out that thing that was like, you have to be an engineer in order to be an engineering manager. What do you think about that? Do you think that having the technical background is really important to your ability to lead these engineers and mentor people, or do you think that you would still be good at it if you didn't have a technical background? I think you will still be good at it.
It helps me, because I can understand some of the problems and I can also help to be the bridge between the data analyst and the infrastructure engineers, for example, or other software engineers. I have that background and it helps, but I think the most important thing is to be empathetic with people and have listening skills. There are other things when you are doing people management that are maybe more important, and I know there are, especially in the IT field, there are engineers that want the engineer manager to know more than them, but that's not always going to be possible. I have some background doing almost all my career as a software engineer, but now that I am going towards being a manager, organically, I won't be able to be in contact, so I won't have that deep-down knowledge, so I think it's not, we shouldn't give a lot of weight to that, and if you don't know something, that's okay, and you can lead that person to the person that will know or that will mentor that person with that specific skill or get some formal educational training for them, and so there are other ways that you can solve that or navigate that.
If your role as an engineer manager has the solution architecting part, that's different, because, well, maybe for that, yeah, you need that background, but otherwise, no, so there are different types of managers, so it can be very technical or not. In my case, it is more technical, but I think what makes a great manager is that they have these people management skills for sure.
Benefits of being a data professional when talking to senior leadership
I can unmute, so I know that we had talked a little bit earlier, Yvonne, about benefits of being a data professional when talking to senior leadership, and I was curious if you could give us a little more information about what you mean by that. Yeah, so I've seen that senior leadership, so the C-level MVPs like to talk with data professionals because they feel like we are more flexible or we can give the information faster, but I think what happens is that, when we want to explain something, we're very good at using visual information as opposed to just talking, so if I want to present a solution and I know what they want to hear, they want to hear time, resources, money, and I can put that in a nice chart, because that's what we do all the time, and so they can see it right away, all the information that they need, and I think that's one of the benefits of being a data professional, that if you need to share some information or to inform or to lead some people to some decision, you can do that with a dashboard, a chart, and it's not like you don't need to talk. Obviously, you will need to explain things, but even then, whenever they ask questions, the information is there, so you can just point to that, like, okay, this is the money that you're going to save, or this is the time that we call save as well.
So yeah, that's something that I'm seeing, and it's not that I do, that I see that with other people in my team as well, that whenever they want to talk with other teams, they always prepare a dashboard very quickly, just like a workflow, for example. When we talk with infrastructure engineers as well, we have the workflow. Maybe we won't know exactly the services that we want, but we can say, I need some storage here. I need to run some R or Python script here, and I need to other storage here, and I need some sort of database here, and they can know or suggest, and we can work easier with them, so that kind of collaboration is always easier when you have those visuals, and I think that's something that we do it very fast, or since this is what we do, like to share information with visuals, like that's something that we leverage that when also contributing or collaborating with other teams.
Planning the workday
I see there was a question on Slido a bit earlier that was, when you get to your desk in the morning and start your workday, what's the first thing you think about? Well, the meetings, the meetings that I work out of the day, yeah, and sometimes I have very nice days when I just have one meeting and that's all, but yeah, that's what I do to plan my day, and so the hours that I don't have any meetings, sometimes I block that time, like this is my focus time and I need to work, and so that's one of the things that I do to plan my day, like meetings versus focus time, but also if I have a pending thing with one of my team members, I know that they are working on something. Maybe somebody asked them an estimation or like they need to do something and I know that it's like time sensitive. I plan some time also just to check in and see if they need some support, and if not, yeah, all day is going to be just focus time.
Navigating tool constraints in government work
Thank you. I see, Justin, you had asked a question, or it sounds like someone was about to jump in. Do you want to jump in with that question, Justin? Yeah, sure. Hi, everyone. Rachel, thanks for putting these together. These were great. This is my first time joining, so really thanks for joining. I'm in the city of Baltimore, and my question is basically around like, you know, a lot of folks who get into data science are like really eager to use kind of new tools, machine learning, but like, and I expect this to happen in industry too, but particularly in government, there's only, we get kind of constrained about like what we can actually use. It has to be either sanctioned by IT or at least needs to meet certain guidelines and parameters, and not everyone knows how to use those tools, so how do you manage kind of those expectations for folks about, you know, yeah, that looks like a great tool, but like these three things over here are the ones that our company uses or that the government uses to get our work done. Do you run into those kind of frustrations, or yeah, how do you manage that?
Yeah, so there are like different reasons why that will happen, so the first one is just budget, so if they already have a license, let's say that they have a Tableau license, but we see that we could do it with Power BI because they use SharePoint, for example, but then it's like they don't have that license, and it's just budget, and to get approval for tools, and yeah, money can take time, so that's something that we need to adapt, like that's the tool that they have, and if we see that we can get the approval within time, we will take the risk. Otherwise, we just go with the tools that they have approved, and yeah, just work around that.
And another part is what you were mentioning, that they already have some tools that they know, and they are used to that, and it's hard to make them make the move. Maybe some people really want to change, but they know that some users will have some resistance to do that change, and not everybody is on the technology side. They are not very tech savvy, so we need to understand that part as well, that they have a job that is not related with technology, so it's going to be hard to move a whole group of people to go to another tool, so it depends on the audience as well, on how big of the audience it is, because one of the things that we do as well is precisely learning services, so we can prepare, or we can propose that, to prepare some trainings, some documentation that they can use, so we can use another tool, but the reason it's going to be, or it should be always, because it is cost-effective, so we are going to do it faster, or it's going to be easier to maintain.
Sometimes, they are proposing things that will require a whole team on their side just to maintain that infrastructure or software, and we can propose something that is going to be just easier to maintain, and so at the end of the day, it's more cost-effective, so that's one thing, and most people will like that, so if it is going to save them time and money, they are always going to like that, but then it's the other part that they need some training or education, so that's something that we can also mitigate by providing those services as well, and there are people that are just afraid to change because they feel that it's going to take more time to change all this, and that is, at the end of the day, that's money, and maybe they don't have that, but if we really do the diligence to do that discovery and say, okay, no, at the beginning, yes, it's going to take more time, but then you are going to save all this, so that's something that we could do, and sometimes it's just a no from their side.
Maybe they didn't get the approval, and in that case, we just need to tell them what is going to happen, like, okay, we are going to use your tools and your methodologies, but we are not going to be able to deliver all the things that you want, so with those restrictions, this is what we can deliver, and explaining to them why, like we cannot just work with that or we cannot optimize or modernize your system if we have those limits. We can just do like these three things, for example, and most clients will understand that because they know that there's not much they can do as well.
With those restrictions, this is what we can deliver.
It's a really helpful phrase, too. With those restrictions, this is what we can deliver. It looks like in the chat there are a lot of people that are experiencing this as well and are saying they feel your pain, Justin, so I'd love to stay on this topic a little bit longer if other people have gone through this and have other things to share as well.
Not to call on anyone, I know David, you mentioned that you've had to do this and Evan. Sorry, Rachel, I was trying to multitask here while you were asking that question, but yeah, definitely. For those who don't know, I work at NASA, and we've been trying to implement a lot of different tools, and it's been a long, arduous journey over the last three or four years just trying to get an infrastructure set up, so it's definitely not an easy task within the government at times, and I'm sure the industry is the same way. But we've been able to do a lot of things, there's always a group that wants to use no-code, low-code type tools and just buy a vendor, which for some of the organizations in our agency that have the funding, maybe that's easy to do because they can spend that kind of money on a tool such as Alteryx, which costs us $5,000 a user per year, but my organization with little funds, we need to be able to do things with coding, so it's definitely just a matter of keeping the conversation going and showing the value of open-source technologies and other resources rather than some of these no-code, low-code type tools, so it's not easy, but it can be done.
I see Jamie, you have your hand raised. You want to jump in? Sure, yeah, just a brief note, two minutes to add to that. Also coming from a public setting, I'm in Toronto with a hospital group. We've recently been exploring some more container-based tech stacks, largely for this reason, to give sort of the flexibility that some of our engineers and scientists may want, although at other expenses too, like if everyone's using their own language and own frameworks, it can become unmaintainable as well. Other quick note, I've been kind of surprised by my own hiring, how much a tech stack and a job description can influence what kind of candidates you receive too. I was trying really hard to hire a developer at one point and just changing the tech stack to something that was more appealing to industry at the time completely changed how kind of candidates we had.
That's a very good point, like hiring IT people is hard and when you are in a public sector, it's just harder for sure.
Hardest recurring problems in data work
I see there are a few other questions coming through on Slido as well and Yvonne, what are some of the hardest problems you find yourself trying to solve over and over again, where it feels like we don't have any good solutions available yet?
Oh, that's an interesting question. I don't know, something that I've seen a lot is a lot of agencies needed that data pipeline and so they can consolidate all the information in one place and while I know that a lot of agencies have moved on to the next stage, that's something that was very repetitive and I think the challenge was exactly what we were talking about, how you can make this data pipeline with those limitations. So, for example, sometimes the client doesn't want to go to the cloud, so you need to do it, but they want it somehow to be optimized and maybe not automated, but the process needs to be optimized and some of the parts are automated, so that was very challenging and it's not an isolated case, like that happens a lot, but at the end of the day, there is a solution for that, we can just work around that.
Dealing with technical debt
Thank you very much. I see Sam, you asked a question in the chat, I think it has a star there, so I'll read it, but it's have you worked on projects where you inherited technical debt and how have you tackled that to obtain a more future-proof solution?
Yeah, sometimes there are projects that are just too good to pass on, but it is like somebody started it, so you are inheriting it and then when you see the code, you are going to see a lot of to-do's in there and it's like, okay, good, it makes sense. Also, we have done work that needs to be done from scratch and given the circumstances, there are some unknowns, so I understand that part, but yeah, I think the challenge sometimes is to know why they took some decisions on why they are doing this tool, why there are those to-do's. Sometimes, they don't make sense with the final infrastructure or the whole solution and yeah, I guess it's just tackle one by one, because there's always going to be that technical debt.
I think in one of the contracts that I remember that was very successful at that is that we had some overlap with the previous contractor, so they were able to tell us exactly what that tech debt was there. These are the things that we didn't have time to do or that, because of priorities, we never got to this, and then we were able to have all those in tickets and then prioritize that. From what we heard, what they shared, this is what we need to work on right now before we jump into the actual work, and the rest is going to be just little by little, and some of those tickets or things are not going to apply anymore sometimes, because maybe we changed to another technology that we found was more cost-effective. Sometimes, we are inheriting projects that are maybe not super old, but yeah, maybe ten years, and ten years ago, the technology was very different, so yeah, this will be obviously more or better tools that we can just change, so yeah, some of the tech debt is not going to apply anymore, but I think if we have that opportunity to ask the previous contractor, we will have that and then prioritize. If not, then at the beginning, when we try to host that or to replicate the infrastructure is when we are going to see the biggest things that we need to tackle, and the hidden ones are those to-dos in the code, because you don't see it right away, but one thing that you could do is just try to find them. Just look at all the to-dos in the code and see if you can identify one of those that are really big or important. You're not going to, in some cases, maybe most of them, or you're not going to be able to tackle all of them. Sometimes, this is really huge, but yeah, I guess just prioritize and focus on the big ones.
Yvonne's journey from generalist to data engineer
Thank you. I'd love to hear a little bit more about your journey as well, and you're jumping from generalist software engineer to specialized data engineer and what that was like.
Yeah, so I graduated as a computer systems engineer, and at the beginning, I didn't know what to do. I didn't know what the industry was offering to begin with, so I was just saying yes to any offer, like whatever life was throwing at me, and so I had the opportunity to have very different jobs, so at the beginning, I was a front-end developer, so I was migrating an application. I will sound very old, but yeah, that was a standalone application in a mainframe, so with green screens and that, I was migrating that to the web, and so I learned some web development that later on didn't help me a lot, because I was using technology that is not used anymore, but I was using Java, and Java really helped me out later. From that, I jumped to be a tester, so I was doing software validation for aircraft engines, so a whole new thing, and then I jumped into the retail sector doing back-end for an operating system, so there, I was using Java as well, and some C or C++, so it was more like low-level programming, and that helped me a lot with my programming skills, because that's where, since it was a very low level, and it was like programming like the signals that the keyboard sends to the operating system, things like that, you need to use a lot of like linked lists, arrays, vectors, so all those data structures that I was like, I'm not going to use this when I work. Well, I was using all of those, but that helped me with my coding skills.
Then, all of that was in Mexico, and I also was a data engineer in Mexico. That was my last job, and then I moved to the United States, and I, at that point, I knew I wanted to work with data, because I really liked that. It wasn't like as low-level as operating system, and but still, I got to use those coding skills to be parsing data, so I really liked that, but when I was here, I knew that I needed to get any opportunity that I have, and I was, I learned some web application, web development frameworks like Ruby on Rails, PHP on Django, and PHP with Drupal, and Go, like different stacks there, and so I felt that I was like moving around. I also was in the infrastructure team for a while, so I was learning a lot, and that was helping me to get or to be put into contracts, because I was able to do a lot of stuff. Maybe I wasn't very knowledgeable in anything, but I knew a lot. I mean, it was very wide, not deep, but I really wanted to go, so one, go back to data, and two, start specializing in something. I felt like, okay, I enjoyed all that, like learning all those skills, but I was seeing other people that they really know a lot of what they know, so I wanted to be like that, so whenever I was able to be in data contracts, because I wanted to to keep working for the federal government, so to do that, you need some clearances, especially when you are seeing data, and to do that, I needed to be a US resident, so finally, I became one, and I was able to move to the data field again, and that's when I started specializing more, and yeah, I really like that, because there's just too much to learn in all these fields, like infrastructure, front-end, back-end, and it can get crazy sometimes, and just in data, there's a lot to learn as well, but at least I'm more focused now on what I want to do or where I want to go.
Roadmap to data science
I see Victor, you had asked a question in the chat. It was, what should be the roadmap to data science based on your experience? And Victor, I'm curious of what perspective that's coming from, like just starting to learn data science or which role you're in now?
I don't know if I really can answer that, but I will tell any person that wants to go to data science is, well, first find a mentor that can really help you or guide you in that part, but as a data engineer, I will encourage you to use cloud tools to learn that part. I know that data engineering may not be the career path that you want to go, but you will always need some data engineering skills, especially if you want to show something or to prototype something, and you can be able to do that yourself. Just small projects, you don't have to start with big data, but obviously, if the scope of the project is huge, you will need a lot of teams. You need data engineers, infrastructure engineers, but just to know what all the teams do, I think that's going to be very helpful when you want to collaborate with others, so it will help you to collaborate and also to do your research or analysis yourself easier, because you won't depend on anyone, so you can start using some tools, like, for example, if you want to use AWS, you can start using S3 for storage and maybe start playing around with Athena or with Glue for transforming the data, and then when you want that, so you want to show a proof of concept, maybe someone can take that and scale that, and yeah, I think as a data scientist, you still need to clean your data anyway, so I think that's a part that I can't talk about. I just really encourage you to grow also in your data engineering skills. That will make you a more solid data scientist, maybe at the beginning, you need to focus on other skills, but if you want to jump into the industry, you will always need to do something in a small scale.
Learning new programming languages
Yeah, for sure. Hi, everybody. My question is around the languages. You talked a lot about a variety of languages that you've worked with, and languages are similar and dissimilar in different ways. Do you have a framework when jumping between the different languages, and if so, what is the first thing that you look at when you're looking at a new language that you're going to use, or a new framework within AWS, for example? What are you looking for?
Yeah, that's a really good question. When I need to learn a new language, the first thing that I want to see is the data types. They can change slightly between languages, but that's the first thing that you need to work with, with variables. So, if you have a text, it's going to be a string or it's going to be another name. If you have a boolean, it's going to be just bool or boolean, just to know the data types, and then the flow, the syntax to do those flows, like the if, for, while, or switch case, or other. I forgot the name of what those are called, but yeah, how to do those flows, because yeah, as well, they change between languages, and they have slide changes sometimes. It's for, and then the name of your temporary variable, and then colon, and the list, and sometimes it's between parentheses, and you need to do it all the way around. Just knowing that can let you start with that. So, at the beginning, my code is super basic. It will have a lot of if and fors, and maybe that's all, but little by little, I started learning more of the ways of optimizing the code, but I think that's the very, very first thing that I do or try to learn, and one of the things that you also need to do, especially in data, is just open a file, open a CSV, open an Excel file, open a JSON file. So, that's something that I will do next. So, how am I going to read a JSON file? So, I try to learn that. So, what is the code that I will use? If it is Python, for example, you will do it with Pandas. They're just like a very easy function to do that, and that brings me to another thing.
Sometimes the language is not specialized to handle data. For example, Python with R, like that's specifically for doing statistics and data, so it's very easy to use for that, but Python wasn't created for data. So, that's another thing that you will need to do, like what are the libraries that are used for data analysis? So, okay, it is Python, and then you will say, okay, there's NumPy, there's Pandas, and then try to see the documentation and what are the things that you can do with the data. So, that will be maybe the last thing I will do, and then it's just practice, just practice and trying to get better at it.
Tech stack and data visualization preferences
Thank you. I see there are a few Slido questions that touch upon a similar idea, and one was asking about what your tech stack looks like, and then the other is around preferences for data visualization and why.
Sure. So, for tech stack, it depends on the contract I am working on. So, yeah, I don't have like one specific one. In Vixel, we try to push to use AWS, and we have this, the infrastructure, we did like an environment that we can reproduce, so we are using Terraform, so that's why we like that. So, if we know that we are going to need some extra buckets, maybe a way to transform the data with AWS Glue or Lambda and then another storage, we can deploy that infrastructure very quickly because we have that already in Terraform. So, that's what we try to use, and also it has all the requirements that some federal contracts require that are like the security compliances like FedRAMP, for example. So, if we are free to use any tech stack, we are going to use AWS for sure, and for doing the ETL or the extract and form load process, we tend to use Python just because it's easier to deploy that to AWS, and it's always easier to find people that know Python. At some point, we were leaning more into R, but it was hard to find people. It was harder to, especially like two years ago, it was harder to use it in the cloud. Right now, it is getting better.
For the data visualization tools, well, I like Shiny. I've worked with Shiny on RStudio, and it's just very fun. It's very fun to work with and trying to make your charts and visualizations prettier and interactive. I really enjoyed working with that one, so I recommend that one just because it's fun, because I really like it, and you can customize it as much as you need. Sometimes, when you're using a tool like Power BI or Tableau and then the customer is like, oh, I want to have my branding and my theming, sometimes it's not as easy to customize it as much as the client wants, but it also depends on the time or the resources that you have. If you have limited time, sometimes Tableau is the easiest way to go, and also it's the one that the client knows the most, so if we can do it our way, maybe we're going to go, and we have unlimited time, we will need to decide between doing it with Shiny or maybe a React application. It also depends on who is going to do it, but my favorite one, it will be Shiny and Tableau.
Sharing data internally and externally
Thank you very much. One other follow-up question I think someone had asked around sharing APIs with inside and outside of the organization. Is there a certain tool that you use for that? I'm not sure I understand the question.
Let me double-check. AJ, I see you asked that. Do you want to jump in? Yeah, so a lot of times, you have great tables, logic built into tables, your suppliers, or people within cross-functionally want to get their hands on that data, but they're huge files, right? They're tables that get refreshed every so often, so is there a way you prefer sharing those big data files internally or externally, like through API tools? I don't know if AWS has those technology.
Yeah, so yeah, that's also a good question. We've been in projects where it's just easy to push that data to GitHub or GitLab, but that's not really best practice. Maybe that's something you could do when you are in the prototyping phase, especially if it is private data that you cannot share, that you shouldn't share that, like never in GitHub, but we have used two options there, so just upload the data into S3 and give access to the people that need to have access, and with S3, that's a good tool because it is reliable and it's scalable, and it's scalable, so you can just upload as much data as you want, and you can also restrict the data depending on what people can see, so you can assign roles to the people and they will only see what they need to see, so that's a good solution.
And another one that we have used, and this is if people that are not very technical are going to be participating, we use SharePoint because we use a lot of Microsoft tools, so we have Teams and Outlook, so a lot of people are already familiar, I mean, within Bixel, they're already familiar with SharePoint, so that's a tool that we are going to use, and you can also restrict the data. I think we will prefer S3, but depending on the people that we are going to be collaborating with, maybe we are going to be using SharePoint, and then when you are setting up your environment, we know where to go to download the data, and if it is S3, you can just call it from your code. If you have access, you just need your AWS keys and you can just programmatically download that data. If it is SharePoint, I think there's a way to do it. We haven't used those, but that's something to explore, and also, we use a lot of web applications. We use Drupal, which is a PHP framework for content management, and if we need to download data from there or to use content that is in there, if the web application is within the company, so Bixel developers are maintaining it, we can ask them to create an API for us, so Drupal has the ability to enable the API, and then we can use it, or we can request them exactly what data we need, and they will implement that endpoint for us, and then we just use it. So, I think those are the three options that we use when we need to share information internally.
Data engineering experience requirements when hiring
And Jorge, I see you asked a question in the Zoom chat around data engineering positions. Do you want to jump in and ask that? Yeah, sure. Yeah, yeah, that's fine. Sorry. So, the question is regarding just when I was transitioning from developer to working in machine learning, I came across a lot of positions that were in data engineering, and oftentimes, when I had the conversation with the – sorry, I think my camera's locked. Oftentimes, when I had a conversation with a recruiter, as soon as they heard that I didn't have experience in, like, Spark or any of the other distributed computing systems, MapReduce, all of those, I only covered it in grad school. I didn't use it in, like, any job, so they just immediately lost interest, and they said, no, we just need interest – you know, we need people with experience in those, and that was it, right? So, do you find experience in those frameworks necessary to be a data engineer?
So, you will learn that, but to – when I am hiring people for a data engineer position, I focus more on the programming skills. If you are a good engineer, you will learn Spark or MapReduce in no time. Like, seriously, it's not – I wouldn't want to lose a good engineer because of one skill. Yeah, no, I don't think – I mean, it is necessary in terms of you will end up using them, but in my opinion, I wouldn't say no to a person that I know is good, and I can teach them or mentor them in these frameworks. When I started, like, my first job as a data engineer, I didn't know any of those. I didn't know what big data was or MapReduce or Spark. I had a lot of experience with Java. That's why they hired me, because they were like, okay, she knows Java. She knows how to program. That's more important, and I think I agree with that, and that's the philosophy that I still follow up to this moment, and so, yeah, programming skills, that's basic. You really need to know how to use code, and if it is one programming language that it is more common in data, like Java or Scala, Python or R, it's going to be better. If your programming language is more like PHP or Ruby, those are good. I