Data Science Hangout | Javier Orraca-Deatcu, Centene | Excel to data science to lead ML engineer
We were joined by Javier Orraca-Deatcu, Lead Machine Learning Engineer at Centene. Among many topics covered, Javier shared how his background in finance and consulting led to his interest in data science to automate some of his work - and how he helped get other data scientists together in his organization. (26:31) How did you organize and recruit people for the data science community group at Centene? So I sort of piggybacked from a general data science community chat that we had at the company. There were several hundred people on this of varying backgrounds and expertise levels so there was a lot of conversation happening. There was already a Python group that was meeting– I think every other month. So three weeks after I started, I got really excited about the possibility of potentially creating something similar for R users. 1. It started by just trying to figure out who owns that already existing data science chat and see if they could help support the idea of creating an R user group, something to meet once a month or once every two months. At larger companies especially, getting that type of top-level executive stamp of approval and support can go a long way, especially if that individual is part of the already existing IT or data science function. 2. At the time, I created a Blogdown site. For those of you who are familiar with R Markdown, Blogdown is a package that allows you to create static websites and blogs with R Markdown. Now with Quarto you can do the same thing and create websites. I love the syntax of Quarto. 3. We had partnerships with Posit, so we were able to get some people to come in and do workshops as well. 4. We also had reticulate sessions, where it was a co-branded Python & R workshop where we were looking at ways in which we can actually communicate between teams of different languages a lot easier. I had a great experience with it. Everyone was so collaborative and it was such a great way to see the excitement around what you could do with both R and Python. I think what started as 13 users the first month, jumped to about 100 - 125 monthly users on this monthly meetup. ...And on the journey to machine learning engineer, what was the hardest part? (49:10): Because of SQL, I had a really good understanding of at least how tabular data could be joined and the different transformations that could be done to these data objects. I think I would have really struggled without that basic understanding. But having said that, I think the part where I really struggled at first was function writing. Function writing was not intuitive to me. Basic function writing was but in general, I found it to be very complicated and it took a solid three to six months of practice to feel actually comfortable with it. Even when I started building Shiny apps– basic Shiny is quite easy but large functions underpin the entirety of a Shiny app. Everything you do within Shiny is effectively writing functions. The process of learning Shiny and becoming more comfortable with Shiny was very difficult and something that just took a lot of repetitions but it all sort of played together. While people may think of Shiny more as a frontend type of system, it did make me a much better programmer in the way I thought about actual functions and function writing. Other things that I found hard, looking back, I'm sort of embarrassed to say this, was reproducibility of machine learning – being able to reproduce a code set and get the exact same predictions every time. I wasn't quite sure why this wasn't working or how to create these fixed views, setting a seed or whatever you need to do to ensure that someone else downstream could replicate your study or analysis and get the exact same findings themselves. ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We'd love to see you!)
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, everybody, welcome back to the Data Science Hangout. Hope everyone's having a great week. For anybody joining us for their very first time today, welcome. This is actually our second to last Data Science Hangout for 2022. Thank you to everybody who's been with us this whole year as well.
This is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing, and getting to learn about what's going on in the world of data science and getting a view into different industries and use cases. We share the recordings of these sessions to our Posit YouTube, so you can always go back and rewatch or find helpful resources. And sorry, I am three weeks behind, but I will do that this week and update them.
Together, we're all dedicated to creating a welcoming environment for everybody. No matter what industry or background or experience you have, we want to hear from everybody. So there's always three ways that you can ask questions. And also to provide your own perspective, it doesn't have to be just a question. You can jump in by raising your hand on Zoom, and I'll be on the lookout there. You can put questions into the Zoom chat. And feel free to put a little star next to it if you want me to read it out loud instead. Maybe you're in a coffee shop or something. And then third, we also have a Slido link where you can ask questions anonymously.
I will mention too, if you want to connect with people after the fact, we do have our LinkedIn group for the Hangout. I know right now, not too much conversation goes on in there, but we'd love to create that space where you can easily find each other. You do have to manually turn on notifications in that group, so that might be part of it.
But I am so happy to be joined by my co-host for today and our friend from the Data Science Hangout, Javier Orraca-Deatcu, Lead Machine Learning Engineer at Centene now. Congrats on the new role. Javier, I'd love to have you reintroduce yourself, because I know you've met a lot of us from past weeks as well, and share a little bit about your role and the company now, and maybe something you like to do in your free time too.
Thanks, Rachel. Yeah, and it's great to be here. This has been, I've told many people this, I really do mean it, this has been like my favorite standing meeting for the last year and a half or so. I feel like you've been doing an awesome job, Rachel, and I love this kind of platform for knowledge sharing and whatnot.
But yeah, my name is Javier Orraca-Deatcu. I've been around the finance, corporate finance, and sort of data science world for like 15 plus years. I've, I spent a lot of my career in consulting, doing different types of financial modeling around evaluation work, different types of like economic studies, like, you know, economic obsolescence studies, functional obsolescence studies, some tax optimization type work. And in those days, I was mostly using Excel.
We would do some, you know, SQL or SQL work, but it was pretty minimal. Whenever we were doing that, I was actually using Microsoft Access. I feel like that's not really a tool used nowadays, but at least at the time when we were working with, you know, hundreds of thousands or millions of records, it was much easier to do the same types of group buys and sort of summations that we were doing in Excel, you know, and for bigger data using Access.
So I went back to grad school. I really wanted to get into data science. It was kind of a buzzword. I was reading all about predictive analytics without really knowing what it was or what it meant, but it excited me. I wanted to really get my hands on that and understand how I can take sort of the forecasting or, you know, modeling skills that I had, sort of taking it to the next level so that I could start trying to automate some of my work or some of the reporting that I'm doing.
So I went back to grad school, and yeah, when I got out of grad school, I joined Centene. I was with them for about two years. I quit to go work for an e-commerce startup for about a year, and I just recently returned to Centene. I'm on week two in a new org, so bear with me if I can't answer all the questions about my work, but yeah, here at Centene, I'm a lead machine learning engineer. I'm part of a scrum team that is a joint effort with different data scientists and data engineers, and we're partnering with our business stakeholders and really taking these, like, high ROI, you know, predictive modeling concepts and putting them into production.
Yeah, something I like to do for fun, I love playing board games. I feel like I spend way too much of my free time reading about developments in the R world or, you know, Python developments. I mean, I enjoy it. A lot of times I'm just reading about the stuff, not really even, you know, applying it through code or anything, just trying to keep up with all the trends that are happening. And yeah, that's a little bit about me.
I have to ask, what's your favorite board game?
In terms of being able to explain it to friends or play it quickly, I love Splendor. It's a card game. It's two to four players. Especially if the people you're playing, you know, playing it with have had a few repetitions, it's a really fun kind of quick game.
When is machine learning appropriate?
Somebody asked anonymously, when is machine learning appropriate or necessary?
Okay, so I come from a heavy Excel background. So I will sort of caveat my response by saying, you know, my views might be a little more nuanced than someone maybe coming from like a CS background or someone, especially nowadays, I mean, I feel like people graduating from undergrad and going into master's programs are like diving straight into machine learning, which is awesome.
But I will say there is a lot of, you know, reporting or KPI creation or, you know, evaluating different KPIs that doesn't require machine learning. And that is also high value, because it helps the business track whether or not some different product or, you know, program is successful. Where, I mean, anytime, like, machine learning can be used for inference or prediction. And I feel like both kind of sides of this machine learning equation add a lot of value for different reasons.
When it gets to, you know, prediction, I'd say this is where you really want to make an intervention of some sort or, you know, you want to focus on a subset of your overall customers or consumers and either give value to them or help the business maybe by, you know, mitigating the risk of churn. There's just a lot of reasons. I'm not giving a great answer here, and I'm happy to give more detail. But if that individual's, you know, if you're willing to submit another anonymous question or something, I'm happy to dive further into that.
Excel, VBA, and moving to open source tools
Bill asks, how big is your team at your new job and maybe break down by roles? Or do people still try to develop new or run old models using Excel?
So, so I asked because I work at a biotech company and some guy was giving his presentation of this fancy model about drug pricing prediction and all this kind of stuff. And I said, oh, well, you know, do you use SAS's, you know, add on product that costs mega dollars or what do you, what's this done in? And he said, he said VBA.
So I mean, that's a really good question. And again, like my Excel experience is nuanced. I would say 99, let's say 90% of people using Excel or using it as a big calculator, everything sort of ad hoc, just sort of thrown together. Maybe they're using pivot tables, but there's no real like modeling workflow to their Excel work.
Then you have maybe 10% of people that are actually using Excel in a more advanced way where they actually have like, you know, an inputs and assumptions tab, which is just straight up text explaining the purpose of the model and what's going to happen in the subsequent tabs. And you kind of go from like your records, maybe one tab is just, you know, straight tabular data, you know, names or identifiers on the left, and then a bunch of different calculations on the right, and then going sort of left to right or right to left, you get to summaries that are actually consumable for leaders in the business.
And then there's probably like 0.1% of Excel users that are using it in a pretty advanced way, whether it's, you know, like logistic regressions or different types of linear regressions. You could do a lot of Monte Carlo simulations. There's a lot you can do. I mean, when you start feeling like the GUI itself isn't, you know, sufficient enough, VBA or Visual Basic, it's, you know, programming language that plays hand in hand with Excel. And you can actually do a lot with sort of automating the workflow or the sequence of calculations or group buys or whatever with VBA in Excel. I will say that is not typical of Excel users.
It's also very limiting because with more sort of open source programming languages, not only with the languages themselves, but with all the libraries that are available for these languages, there's just so much development happening on a daily basis. And I feel like for what you described, while it could potentially be done in VBA, if you wanted to scale something that is consumable for that business organization or for other data analysts or business analysts, you know, at your company, you probably want to take the results of those findings and push them into some data warehouse or something that again is consumable to a larger organization.
And in order to do that, you need to figure out, OK, well, where is this code going to live? When is it going to be refreshed? Is it going to be retrained? How is it going to be retrained? You know, what systems are going to help us take this overall logic and put it into a production setting so that me as a data analyst or a business leader can just access some table on a data warehouse and see the results I want to see very quickly and know that they're updated at any point in time.
Centene's data science culture and community
So we're, there's like 70 plus thousand people at Centene. It's a massive company. I design, I mean, they're, I think they're like a fortune 26 company. You really don't know the name because by design compared to the other health insurance carriers, um, they wanted to have kind of a portfolio management company. And then, you know, each state that we operate in sort of has their own brand.
With that, the majority of our analysts and probably data scientists and biostatisticians work for the brands doing kind of plant specific analyses. I, where I'm at now is kind of the corporate group. And we partner with stakeholders in the business at different operating units, or, um, we partner with, you know, ideas, uh, idea generators that, that want, that have a business use case that can be scaled for the entirety of our business. Not just one plan.
From that perspective, you know, I, I think there are some differences between like, like I roll up into it now and I definitely sympathize with you because when I was previously at Centene, I was actually working for HealthNet, which is one of these state plans. And there's a lot that I wanted to do to put into production, but there's so much sensitive data in production that corporate sort of, um, you know, helps manage the whole process of getting a model into an actual production setting, a model or a Shiny app or something like this.
So you can do a lot as a data scientist in a business org, you know, in the development and test environments that are out there, but when it gets to production, then there is somewhat of a handoff and, um, ownership, if you will, um, or responsibility of getting that code into a production setting. Um, and that's where the, there's a kind of hand or, you know, handshake or pass off between sort of the business scientists and then, you know, the, um, the corporate data science team.
And there's, so 70,000 people, we have an ongoing chat of about 500 data analysts and data scientists. There's, you know, different, again, like clinical informatics people and biostatisticians, and there's a lot of very interesting niche work going on. Um, but the one thing that I really love about this organization is the, the knowledge sharing, um, there's a lot of that that happens organically. Um, you know, I love that there's not a lot of shame in asking questions. It's encouraged. If you don't know something, go ahead and ask it.
Prioritizing ML projects and ROI
Um, I see Eric, you had asked a question a bit earlier in the chat. Do you want to jump in?
Um, it's great to have you Javier, um, on, on the Hangout. Um, as part of a part of my organization and the group I'm in, I would say we're kind of growing into getting AI and ML more entrenched in some of our work, but leadership will hear all the positive buzz from various tech sectors and they say, we want to use ML, but what I'm wanting your perspective on, if you could share any insights is how do you approach finding best use cases to use the algorithms without going down potentially dead ends or making sure that leadership understands that ML is not going to magically be the perfect solution for every use case, I don't know if that makes sense.
Yeah, for sure. Um, I would say, and I I'm new to this process because again, I was sort of a, add, you know, a one-off data scientist in a business org prior to joining this corporate team. Um, and so I, I'm still learning about all our systems and everything that's available. But one aspect of this team's work, which, um, you know, I think could be applicable at other companies is to have a really structured sort of intake process for people that have new, you know, everyone's excited about ML and AI, but there should be some sort of detailed or like an educated guesstimate of ROI for the projects that people want to put in. And that's really hard to do.
Um, even for experienced, you know, developers and data scientists. So I would say if you don't already have some type of like, you know, intake process or something like this, where your business stakeholders that might have ideas can come in and request, um, some type of like company wide, you know, project that as part of that submission, um, or as part of a conversation that your team has with them is to really try and like quantitatively determine what the benefit would be.
So I would say if you don't already have some type of like, you know, intake process or something like this, where your business stakeholders that might have ideas can come in and request, um, some type of like company wide, you know, project that as part of that submission, um, or as part of a conversation that your team has with them is to really try and like quantitatively determine what the benefit would be.
Um, I mean, I think that is going to go a really far away. I'm still learning about our whole intake process, but a lot of my health net peers, um, you know, now that they know that I'm back or like, Hey, can we partner with this? Can we partner with this? And I'm like, there is sort of a, like, we are trying to prioritize the highest ROI, um, you know, projects and there is an official intake program. So I'm pretty much just sending them information on, Hey, here's, here's the forums, you know, please populate them. And, um, yeah, that, that seems to have helped a lot or help this team scale out their models a lot and, and prioritize which models to actually put into production.
And there's actually a follow-up anonymous question to that too. The question was, what does an ROI timeline look like for most projects? Are you working on things that come to life years down the line?
I actually do not know yet, but I would say a lot of these probably come to life within the same year. I don't think they're like, I don't think the planning is so far ahead. There are larger IT goals and things in motion that could be like a multi-year type of transition. I just don't work on those types of projects. The focus of my work is more data science specific.
Starting the R user group at Centene
So I hope I'm not remembering incorrectly, but I'm thinking pretty sure you had a hand in starting our user group at Centene when you were there previously. Um, and so I was wondering if you could talk a little bit about that because that's a difficult task and you have to like recruit people and getting people to meet up and help each other and stuff like that. Um, it's really challenging. So I was hoping you could give us a little bit of a rundown on how that happened.
Yeah. Thanks Libby. And great to see you. So, um, there was, I sort of piggybacked from a general data science community chat that we had at the company. Um, and you know, there were several hundred people on this of varying backgrounds and expert expertise levels. And so there was a lot of conversation happening. There was already a Python group that was meeting, I think every other month.
So me coming in like three weeks after I started, I got really excited about the possibility of potentially creating something similar for our, uh, our users. And so I think it started by just reaching, trying to figure out who owns that already existing data science chat and see if they could help support the idea of creating a Python or sorry, an R user group, you know, something to meet. Once a month or once every two months. Um, because I think at the larger companies getting that type of like top level level kind of executive, uh, you know, stamp of approval and support can go a long way, especially if that individual is part of the already existing IT or data science function.
Um, and so, yeah, at the time I created a blogdown site for those of you who are familiar with R Markdown, blogdown is just kind of a, you know, it's a package that allows you to create static websites, uh, static blogs with R Markdown. Um, you know, distil is a very similar concept. Um, Quarto now with Quarto, you can do the same thing, creating websites. Um, and I, yeah, I love the syntax of Quarto, but anyway, uh, so we started, I think what started is like 13 users the first month.
One of them is at RStudio now. I don't know if he's on this chat. Uh, Dave Grunwald, he might be here.
Hey, how are you? So David, um, Dave taught me Shiny. I tell him this often. I owe like my current career to Dave because it made Shiny itself made me such a better overall programmer and the way I think about functions and recycling code, like seriously, thank you, Dave.
But, um, yeah, what started with about 13 users, uh, within a few months jumped to about a hundred, 125, you know, monthly users that were on this, our specific, uh, monthly meetup. Um, so we had a really great time, you know, the partnerships we had with Posit. We were able to get, uh, some people to come in and do workshops as well. Um, it was, yeah, it was, it was just a great way. Everyone was so collaborative and it was such a great way to see the excitement around what you could do with like all things are, um, and even Python, like how can we tap into this, these robust, like, you know, Python libraries or Python and like we had reticulate sessions where it was, you know, a co-branded Python R kind of workshop where we're looking at ways in which we can actually, um, you know, communicate between teams of different languages a lot easier. And so anyway, I, I had a great experience with it.
Navigating stakeholder uncertainty and project valuation
I, I don't know about traditional engineering projects, to be honest. Um, I don't feel like I've been surrounded by that world enough to, to like speak, you know, yeah, I really don't know when it comes to the data science projects, um, at least at the point I feel like where we are now. Um, you know, just really careful, objective kind of unbiased review of what the ask is, um, has been very helpful.
Um, being able to like, in however way you want to do this, like it doesn't need to be just a dollar value. Like what savings are we going to have, or what benefits are we going to have? You can kind of score the different potential projects coming in. Um, where at least project to project, you can prioritize which one is going to have the most impact for the business, or, you know, an impact can even be measured differently, like which one's going to have the most short-term impact. What's the, what's the best project for long-term impact? I mean, maybe all these things could be different weights for your scoring of these different projects to understand what your team should focus on. Um, but yeah, I would say trying to develop some system or framework where you can actually, um, like help prioritize or rate the importance of these new projects is really helpful.
Shiny in the workplace and the branded interview app
Did you create a branded test Shiny app in applying for your return to Centene?
No, I didn't. Luckily I had several apps in production. We were doing some really neat things, you know, when I was at Health Net. So we had several apps that were in production. It got to the point where like managing some of the pipelines for the different apps was becoming a bit of a recurring, more time-consuming process. The more apps we had, all of a sudden we found ourselves needing to continue to streamline like data intake or transformation for some of these different apps. And so we created a web app updater that was, it was itself a Shiny app, but also capable of updating other Shiny apps, or at least the code from other Shiny apps.
That kind of stuff, you know, it's not typical to see that kind of stuff with Shiny apps, at least not the Shiny apps I've seen. So anyway, where I was going with this, the production team I am now part of, or like the corporate data science team, at least they had some samples to kind of chew off of, of the type of, you know, our work that, or Shiny work that I could do. But that, I would say for, for organizations not familiar with, you know, Plotly dash apps or Streamlit apps, or, you know, Shiny apps, being able to show them the speed of a web application like Shiny and how kind of like clean it can look with an advanced UI, that definitely goes a long way, in my opinion, to, you know, impressing the people that you were interviewing with.
For anybody who didn't see it before, didn't know what that question was referencing to, Javier has shared before that in an interview with Bloomreach, he created a interactive Shiny app that used their branding and color scheme. And so I just put that into the chat too, so everybody can see the blog posts that he made about that.
Yeah. Travis Gerke, I don't know if he's on here today, but he had asked if he could reference this as a cover letter accessory in one of his RConf talks. And I was like, of course, yeah, please do this. But at the time my GitHub repository for it was just like, you know, a super high level, read me, it was, you know, Oh, this is a Shiny app that I styled with Bloomreach theme. So I wrote this blog post, just in an effort for people that are less familiar with Shiny, or maybe with R, I tried to write it in a way where, you know, people with basic like GitHub understanding could go in there and clone the repo and, you know, try and tweak the app to their liking, to their company.
I was just going to say AppSalon as well. They're a consulting firm, data science consulting firm. They make some incredible Shiny apps. They have some beautiful, you know, Shiny app examples on their gallery, on their public facing website. The Shiny website itself, I don't know what the new posit link for it is, but that has a bunch of galleries, a bunch of gallery examples on their gallery. With both, like you can go in there and launch the Shiny app. You can also see the source code behind each and every one of those. So that's a good way to learn too.
Grad school, function writing, and the journey to ML engineer
So short and sweet. Just wanted to get your opinion on if grad school was the way to go to getting you feeling like you were able to open up the doors for more data science and machine learning type roles, or if kind of having that one-off background. I mean, ideally for me, it's like I come from kind of a hybrid. I've got my background is more in public health epi, but then I do more programming and you know, clinical type work, but having that heavy medical background has always been an asset. So I didn't know from your background, if you felt that was the way to get your feet in the door, or if you were more like, was it a benefit or do you feel like you could have got there without, I guess is the short answer.
Grad school allowed me the time to be able to get to where I wanted with like the basics of data science and programming. If I was in a full-time job throughout that time, you know what, even if I didn't have a full-time job, I think grad school just gave me sort of a set routine for being able to learn this stuff. I don't actually think you need a graduate degree to, you know, get into this type of work or I lack the discipline to learn all these topics and concepts just on my own.
And for me, I found that grad school really did help just push me to, you know, learn not Python and R more specifically, like to learn about the math that underpins a lot of the data science that we're doing now and, you know, how to apply these different algorithms to business problems. That, the way in which grad school sort of force feeds that information to you, I think the alternative learning it on your own would just be really hard. I don't know that there's any one resource that, and I'm sure there are some great ones out there. I just don't know of any sole kind of resource that would give you such like a robust or like holistic understanding of data science.
Yeah, for me, I kind of knew the kinds of data science work I wanted to get into. I mean, I was doing, I wasn't calling it this, but I was doing time series forecasting, that was sort of my bread and butter in financial modeling. And so I knew of all these different techniques for time series that were possible, I just didn't know how to, you know, I didn't know where to start. And so grad school for me really helped sort of open the doors of what's possible with code for different types of time series problems, you know, like, yeah.
So it is, you went from Excel to lead machine learning engineer. Can you tell us about the journey? Anything you found surprisingly hard or easy?
I had a really good, I mean, because of SQL, I had a really good understanding of at least how tabular data could be joined and the different transformations that could be done to these data objects. I think I would have really struggled without that kind of like basic understanding. But having said that, I think the parts where I really struggled at first was like function writing. Function writing was not intuitive to me. Basic function writing was, but in general, I found it to be very complicated. And it took a solid, I don't know, three to six months of practice to feel actually comfortable function writing.
Um, even when I started building Shiny apps, you know, basic Shiny is quite easy, but large functions underpin the entirety of a Shiny app. Everything you do within Shiny is effectively writing functions. So the process of learning Shiny and becoming more comfortable with Shiny was very difficult and something that just took a lot of repetitions. But it all sort of played together because while Shiny is, you know, more like people think of it more as like a front-end type of system, it did make me a much better programmer in the way I thought about actual functions and function writing.
Other things that I found hard? I mean, looking back, I'm sort of embarrassed to say this, but reproducibility of machine learning was not something super intuitive either. Like, you know, being able to reproduce a code set and get the exact same predictions every time, I wasn't quite sure why this wasn't working. Or how to like create these fixed views, you know, setting a seed or what have you. You need to do to ensure that someone else downstream could replicate your study or analysis and get the exact same findings themselves.
IDEs, what's next, and bridging Excel and data science
And somebody had asked anonymously on Slido, what IDE, if any, are you and your colleagues using for development work? RStudio, Emacs, Vim, VS Code, Notepad?
I think all of the above, honestly. Yeah. I use RStudio every day. So I'm in the RStudio IDE. I seem to be in Terminal a lot these days. Just, you know, like the shell, writing bash commands and whatnot. Anyway, but yeah, the RStudio IDE, definitely where I spend most of my time.
I know that you are two weeks into this new role, but is there, like, if you think of the next year ahead in this role, what are you most excited about? Or what made you most excited about this role?
For me, it was more, I'm really excited about the challenge that's going to come with becoming a better just overall software engineer or, you know, become better at programming at large, not just R specific. I'm, like, constantly humbled at everyone I work with. Just their breadth of knowledge with all these different systems. I've sort of touched a lot of these different systems for, you know, ML ops or ML engineering. But being able to really dive deeper into some of these platforms to get production jobs out of any language. I'm really excited about the challenge there and, you know, the growth and learning opportunity.
I just happened to see a question. It said anyone using VS Code for R Shiny? I've tried this. I still feel like RStudio itself is just, like, the gem for coding in R. But I do really like VS Code for Python.
Thanks, Javier. This is Daniel. I asked that question. I asked it because I write a lot of R in my position as well, and I write a lot of Shiny applications, and I've started to think about VS Code because I write my Python and Postgres in VS Code, and I would just kind of like all that to be together. But as you mentioned, VS Code is not ideal for R, and RStudio really is the best place for writing R code these days, and just kind of wondering where others are in that.
And in the minute, few minutes or so we have left here, I know there were a lot of questions in the beginning and comments on, like, Excel and data science, and I was just curious, circling back to that conversation, like, what do you think has been most effective for you at, like, bridging the gap between those two teams or two sets of users? Because I know there are some people who are probably always going to stay in Excel, but you might need to work with them as well, and what have you found helpful?
Like, data extract, if you're working at a company that's large enough where, it doesn't need to be a large company, but I feel like at larger companies, tapping into databases, or you actually have databases, you're not just operating in a world where it's Excel files or CSV files. Then, showing how you could kind of stay within the same, you know, R notebook or framework from data collection, just pulling it straight into your environment, you know, manipulating it there, writing it out or summarizing it as, you know, an HTML file, like, knitting an RMD to an HTML file, or a Flex dashboard, or something that's still static but interactive, that has gone a long way, too. And the speed of everything, like, the speed of the data manipulation and data handling for millions or tens or hundreds of millions of rows, that always shocks people, because if you've got a lot of columns in Excel, after about, I don't know, three, four hundred thousand rows, your Excel is crawling, and you can look at your RAM, and it's eating up your entire available RAM, whereas, you know, with something like Python or R, it's definitely not the case.
And the speed of everything, like, the speed of the data manipulation and data handling for millions or tens or hundreds of millions of rows, that always shocks people, because if you've got a lot of columns in Excel, after about, I don't know, three, four hundred thousand rows, your Excel is crawling, and you can look at your RAM, and it's eating up your entire available RAM, whereas, you know, with something like Python or R, it's definitely not the case.
Resources for staying current in data science
But, Javier, before you go, I know in the beginning, you also mentioned, like, one of the things you like to do for fun is keeping up to date on everything that's going on in the data science space. And I was curious if there are other, like, resources, or maybe people you follow, or things that you'd like to share with us all.
Let's see. The Tidyverse blog, in terms of staying up on news related to tidy models, or Tidyverse, that's great. That's one of my favorite resources. The RStudio AI blog, or Posit AI blog, that's another good one, just kind of looking at the incremental developments with Torch, which is like native Torch for R, and then looking at, you know, Louv's, which is sort of like the Keras for Torch. That stuff really excites me.
Twitter has been amazing. So, keeping up on Twitter with just a lot of the R related, you know, R stats, or shiny, or other sort of hashtags like that, that's been a great resource. The R bloggers, or R Weekly, that's, you know, R Weekly is a great publication. You can subscribe to their XML, and just get, like, a weekly download of new packages, new updates to existing packages, new tutorials. Eric Nance runs the R Weekly podcast. He's one of the co-hosts there. Yeah, there's just a plethora of resources out there.
Awesome. Thank you so much, Javier, for joining us, and sharing your insights this week, but also in other weeks as well, when you're on the audience side. I did want to let everybody know, because this comes up sometimes, because there's great comments and resources in the chat. You can save the chat. So, the little three dots to the right-hand corner, if you press that, you can save the chat too. But I will try and group them to share with the recording when it goes up to the site as well.
Thanks, Rachel. Yeah, if anyone wants to, you know, contact me, feel free to hit me up on LinkedIn. That's probably the best place. Twitter as well, but I use, I don't use Twitter as much.
Awesome. I know you have to jump, so no worries if you have to leave us here. I'm going to share your LinkedIn in the chat as well. All right, great. Thank you so much.
Yeah, thank you, and hope to see everybody back next week. We will be joined next week by J.J. Allaire, CEO and founder of RStudio, now Posit. Maybe he heard that as he's walking by here, but excited to talk with J.J. next week as well, and that will be our last Hangout for 2022. It's so nice to see you all this week. Have a great rest of the day.