
Data Science Hangout | Michael Chow, Posit | Exploring Team Structure w/ Data Scientists & Engineers
We were joined by Michael Chow, Data Scientist and Software Engineer at RStudio. Michael also previously led a team at the California Integrated Travel Project. On this week’s hangout there were a lot of thoughts shared on structuring a data science team from both Michael and the broader group: ⬢ Jacqueline Nolis also shared thoughts on this on a data science hangout that there were virtues to different ones, but ended up sold on the decentralized model where data scientists are embedded in teams: https://youtu.be/CcPE29bYGVo?t=325 ⬢ Michael agreed that data scientists and analysts should be sitting with the teams that they’re pushing out reports for. Otherwise, I would be trying to send people into those teams to figure out their priorities. ⬢ A data scientist should work with a Project Manager or whoever’s leading the team to push up metrics but also help change the roadmap. ⬢ It leaves a tricky question of where data engineers should be and how they should interact with the team. Today data engineers are often doing more tooling empowerment, so it can be okay to have them a bit more centralized and connect to the data scientists to enforce best practices or enable new pieces for them. ⬢ I think a nice model is for data scientists/analysts to live in the teams and data engineers to be like spokes of a wheel where then the data scientists connect with them and work closely to enforce better best practice and enable new important things. ⬢ Tatsu shared that in thinking of the structure, it’s also important to find your translators and to use the power of feedback. Reach out to those people to start to put that feedback into action. ⬢ George shared that insurance companies have come from a really traditional landscape where they have lots of actuaries working on lots of excel spreadsheets and there can be a lack of knowledge sharing and tool sharing. This is where the data science element comes in. To me, within the organization, you need to have this team which is a mini-spoke if you will, because they are central to the actuarial team. If they are too far removed and they’re back with the IT team, you end up with the old problems because they may not get the business concept communicated back. It's all about getting enough skills, so they can get stuff done, especially proof of concepts. Maybe after that you can take a step back and then start to look at the centralized model again. ⬢ A central team can help converge to what they see as best practice, but if you’re pushing out something new, exploring a new line of work or area it can be important to set the data engineer there to actually do whatever they need to. Make sure that the converging doesn’t stifle creativity or prevent a team from doing the right thing. ⬢ Manny jumped in to share the perspective from data science being with IT as well, data science is a new field for their company (in real estate) and there’s an identity of where does data science fall. The IT team is fantastic and they’re very structured. Data science is so fluid and creative and non structured at the moment, so you kind of have to look at where it actually should fall. * please note that some of the points above are summarized and not 100% actual quotes. Resources shared: ⬢ Tatsu shared in the chat, a few projects that Michael is working on: vetiver: https://vetiver.tidymodels.org/articles/vetiver.html, siuba: https://github.com/machow/siuba ⬢ Libby shared a helpful tip on creating a 2 minutes YouTube video with a cover letter, to get the attention of a hiring manager ⬢ Javier shared an example Shiny app used in an interview: https://javierorraca.shinyapps.io/Bloomreach_Shiny_App/ ⬢ Michael mentioned David Robinson’s screencasts: https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ ⬢ Michael mentioned an article on “What data scientists really do according to 35 data scientists”: https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists ⬢ Rachael shared a blog post link where Jacqueline Nolis talked about team structure as well: https://www.rstudio.com/blog/building-effective-data-science-team-answering-your-questions/#Structure ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout ► View the Data Science Hangout site here: rstudio.com/data-science-hangout Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, everybody. Welcome to the Data Science Hangout. If you're joining for the first time, it's great to meet you. I'm Rachel. I'm the host of the hangout. If this is your first hangout, what is it? This is an open space for the whole data science community to connect and chat about some of the more human-centric questions around data science leadership, questions you're facing, and what's going on in the world of data science. So we want this to be a space where everybody can participate.
And you can ask questions three different kinds of ways. So you can jump in live and just raise your hand on zoom. And I can call on you, you could put questions in the zoom chat. And feel free to just put a little star next to your question if you want me to read it out. Maybe you're in a coffee shop or your dog's barking or something. But we also have a Slido link where you can ask questions anonymously too. And Tyler will put that in the chat right now. Just like to reiterate, though, that we love to hear from everybody, no matter your level of experience or your area of work.
You will notice that we had a little bit of a last minute change of plans today. And I'm so excited to be joined by Michael Chow, who is stepping in as our featured leader today. So Michael is a software engineer at RStudio and previously led the data science team at California Integrated Travel Project. And Michael, I'd love to just turn it over to you and have you kind of introduce yourself and share a bit about the work you do.
Sure. Yeah. Thanks for having me. So I did a PhD in a different life, PhD in cognitive psychology. And then I sort of flew the coop to industry where I worked on educational tools and tools to measure data science skill. And then, yep, switched to working on building out a data science team at California Integrated Travel Project. I'm really interested in sort of, but I will say I love the tidyverse. I'm really interested in sort of like how people can do analytics fast and what makes like really skillful data scientists and data science teams. So I'm really excited to be here. Now I'm working at RStudio as an engineer. And then as a kind of weird side thing, for the past years, I co-directed an org called Code for Philly, which just tries to connect volunteers to nonprofits in Philadelphia to throw like experts or like a hundred volunteers at really impactful projects in the city. So happy to be here. Really excited to, you know, connect with people in data science and, you know, discuss kind of all the weird parts of, of data science and data science teams.
What's exciting in data science
Awesome. Thanks, Michael. While we're waiting for people to jump in with their own questions, I'd love to kick it off by asking you, what's something that you're most excited about in regards to data science and thinking about the year ahead?
Yeah, I feel like they're kind of like two, two things. One is, and this, this has really been growing over time, but I think tools like dbt and the tools of like the modern data stack to really build out a very nice warehouse quickly are just absolutely killer. I think that the amount of ground a person can cover once they've kind of like wrangled these tools is so great. I'd say on the other hand, I still like for the past decade, have felt so excited about tools like dplyr in R and the tidyverse. And on the other side of like its ability to let people query data freakishly fast. So I feel like dbt is kind of a new thing I'm excited about. dplyr is the thing I've been like excited about for the last decade somehow. And it's just stayed on like the top of my list.
Awesome. So I've been hearing about dbt a bit more, but I will admit, I don't really know exactly what it means. Could you explain it a bit more?
Yeah, I would say so like, something I've seen before is like, imagine you're a person building out like, so you've got all this data into a warehouse, it's just loaded up, it like needs to be cleaned and made into like a data mart. So people can, like end users can consume it. You're a person in charge of that, like, there was a point in time where it wasn't crazy that that might just be a folder of SQL scripts, and then a weird some weird custom script to somehow like chain them all together. And it just absolute chaos. So dbt, they're really focused on analytics engineering, I shouldn't put that in air quotes as a legitimate thing. But basically, how can you have a sane set? I would say like, Lucy, like, how can you have a sane set of SQL scripts that very cleanly, like, orchestrate, like, all your cleaning in the warehouse, you know, like the whole thing of like, I create a table, and then another table depends on it, and another table depends on it. That's the like, high level part is this kind of dependency graph. But also, they've really refined, I think the practice of warehousing and like the the classic moves you'll need to do a lot of those have made it into dbt. So it's easy just to stamp down kind of like a thing like they have snapshots. And that's I need to stop air quoting things. But um, so that's like a move you can do now with dbt. And it's super handy.
Code for Philly and civic tech
Well, thank you. I know, just before people were jumping on, we were talking a little bit about more about code for Philly too. And you told me a pretty awesome story about some of the work that you did with that organization. Would you be able to share that example with us?
Like, yeah, so code for Philly. It's weird, because we spent a lot of time like talking to nonprofits and learning about their problems. I would say the biggest problem we have is just connecting volunteers with something that's impactful, like a lot of volunteers that used to come in, would just have kind of random ideas they wanted to build, but they wouldn't be as focused on kind of like, what a lot of nonprofits at Philly, we're hitting. But, but from time to time, we've been able to really connect on great partnerships. So, so one was right at the beginning of the pandemic last, I don't know how many years at the beginning of the pandemic, it's like March 2020. You know, we're, we're like, deep in our apartments. And we connected with Penn Medicine, to create a COVID monitoring dashboard. So actually a dashboard for capacity planning for hospitals. And, and that was like such a stressful, but it helped helpful projects to like connect with this team, like data science team that just wanted to put up a dashboard, and help them make it open source. And then wrangle like all the people who came in. So Linode came in, and they provided resources. We started getting like 300 people a week coming into our Slack, which was like 10%. That like we have 5,000 people. So our Slack started growing, and we had to like moderate. So it's a lot of just how do you connect people with these things and help them grow? And really, it's like a lot of Slack channel wrangling, and kind of like doc writing, but that so that that to me was like a nice Code Freely project, we're able to just connect two groups of people, and then work on all the communication and kind of like untangling that comes along with it.
Defending non-traditional backgrounds in data science
Cool. Thank you. I saw Libby, you put a question into the chat if you want to jump in.
Sure. Yeah, I was wondering if you if you've ever had to defend your psychology background as being data related. This is this happens to be a conversation that I was just having this past week with a lot of my friends in the data space that are from psych, sociology, demography, like applied demography, that have had to kind of defend their education as being quantitative. Just wondering if you've ever hit that?
Yeah, that's it. Such a great question. I feel like, yeah, like my whole educational background is tuned towards, I would say, like, psychology is the most important thing I think I bring, but everything else is just like getting in the door to defend against that, like, like to bring in that reality. So I doubled major in undergrad in psychology and statistics. And then I did a pretty computational PhD in psychometrics. So that's sort of the methodological side of psychology. But I, I say this, because actually, like that, yeah, a lot of engineers are happy with that. They're like, math, that's great. But it's hard to explain. Actually, I think the most important parts I got were the psychology parts, like research and design, and actually, like, being able to decompose weird, hard problems and concepts. So yeah, I think it's a never ending battle to, to like elevate what psychologists do. Like, I think every discussion about metrics is kind of a psychological discussion. But yeah, I don't think I've gotten a lot of traction by saying like, I'm a psychologist, I'm ready to help with your metrics problem. It's more been like, somehow going in the backdoor.
I think every discussion about metrics is kind of a psychological discussion.
Yeah, one of my advisors in, in grad school always said, like, empaths, or empathetic people always make the best data scientists, because you can kind of put different hats on to put yourself in different shoes and being able to analyze all of that data. So I think, like having psychology or having a different background, not only like, does it bring new questions, but you can also relate it in different concepts, right? So it's super cool.
Yeah, yeah, I think totally. And it, I mean, I think the psychologist side also hits a lot of really good product management type skills that I think psychologists are very good at, like wrangling the high level problems, and like cohering a whole, like, set of problems into kind of like a plan or coherent whole. So I do think like, I've like, there's a couple of data scientists, like junior data scientists, I've been mentoring, and to me, they have like, really great psychology skills, they're really good at breaking down the problems. And so I don't, I'm really hopeful for psychologists as people who can make not necessarily great data science managers, but like, do that weird data science work of like, lay the problems out and help things like move along beyond like, coding things up.
I also think you guys get a lot more experimental design than people realize. And it's so, so important. It's something that a lot of other people in other disciplines don't get. And so it's something you guys can bring to the team. I know for my, my psych teammates, that's something big they bring to the team. It's really, really wonderful. It's interesting that people come up against that kind of barrier sometimes, because there should be more visibility into it. You guys are so important.
I love this topic, too, of like the skills that we have that make us very successful in what we do that might not necessarily be the skill that people think on paper, would make you successful in it. I see Andy, you put a comment in the chat, too. Did you want to jump in? Oh, just I, this is echoing really true for, for my grad experience. I have a master's degree in system science. And so we, it's the basis of looking at things as systems of interacting parts rather, and how you can look at the parts and the holes of things. And, and that's, you know, a lot of systems thinking and systems theory. And, and it actually, that that program came out of a psychology PhD program at Portland State University. And so I just did a lot of this discussion is really ringing true for me. And I did an applied statistics alongside of it. And so I got the experimental design and, and, and some of that. And that's really where I, that was my entry to R and data science. And so I, but I entirely lean on that for like the access to work in projects and stuff. And the system stuff is just like, oh, you can do systems mapping. Awesome. You know, and it's like, well, you know, the way that we're, we have mental models that, you know, that are very relevant in what we're studying here. And so being able to, you know, talk about those and, and use those as well. Like, I think it's really valuable, but it's funny how much I have to explain it sometimes.
Soft skills and team communication
Thanks, Andy. I see, Sam, you have your hand raised too, if you want to jump in.
Yeah, I think this is a really interesting topic and it's something that I've been dealing, maybe struggling with is not as maybe too strong, but dealing with is sort of maybe not strong enough at work recently, where some of these things that aren't like hard data science skills that are more like soft skills are, I'm finding really undervalued where I'm at, especially in like the technical teams of things. So I work on a team of data scientists and software engineers, and I'm sort of wondering, like, I really think that soft skills are really valuable and that's maybe my like private liberal arts undergraduate degree sort of coming through. But, you know, how do you get more buy-in from, from people on like, basically getting them to, to basically believe that soft skills are important and that it's important to spend time on them as well as on like the hardcore, like data science, software engineering stuff.
Yeah, that's a great question. I'm curious, like, what kind of soft skill, so you're thinking, or like what kind of problems you're thinking even for, like, where the soft skills would really come to bear? Yeah, a lot of the stuff I'm thinking about and that I've been struggling with is specifically when it comes to record keeping. So things like taking minutes at meetings, making agendas for meetings, commenting your code, documenting processes or workflows, things like that. Yeah, that's, that's a great question, because I will say, like, the only, from my last job, the, the person I meet with regularly right now is like a junior data scientist who, who's, like, standout skill was amongst other things, like, standout skill was amongst other things, like record keeping. So, like, that, I think that's huge.
In my view, like, a lot of those things almost, if you can, like, wrangle them into existence, like, if you can make it happen, or just, like, they're a void and, and someone will fill it eventually, or when it, like, with, once the note-taking happens, I think that people really tend to rally behind it. So, but I'd be curious to, yeah, I, I'm really curious about situations where it's, do you mean, like, the team, maybe, like, you have a team of data scientists, and it's hard to, like, get people note-taking, or?
Yeah, so I'm, I'm by no means, like, a leader of a team of data scientists. I am definitely, like, one of the people being led, and we sort of have, like, a hierarchy where there's, like, a manager, and then there's a data science manager and a software engineering manager, and then there's a few people floating around underneath, and I'm sort of floating around in the data science area. And I've been, I've only been at this company for about a year and a half right now, and I'm just finding myself really, like, I can't figure out, like, what processes are, or, you know, what people are trying to do, or, you know, basically how to, how to communicate, like, other stuff people are working, what they're working on, and sort of where they need help, and if somebody needs to switch to a project, which happens quite a bit, getting context on that project is really difficult, things like that.
Yeah, yeah, yeah, that's a, thanks, that's, that's really helpful to hear. I think, like, if, yeah, teams, and there are, like, a few different, it sounds like, pieces involved, like, some engineers, data science, their managers, but maybe a lot of cross-cutting concerns. I think one of the most impactful things I saw was there was this really great consultant I worked with last year named Kane Bacigalupi, and they, they kind of blew my mind. They, they introduced 15 minutes daily stand-ups, and I, prior to that, hated stand-ups, but, but the one thing they introduced that really amazed me was they said, like, and it's just stand-up, it's, like, 15 minutes, it has to end in 15 minutes, and you can't have, like, a product manager trying to work a backlog in the meantime, and then I realized that every stand-up I didn't like was, did have a manager trying to, like, update the status of different things, but to me, it's, like, game-changing to just get people in a room for 15 minutes just to say what they did, like, that's the goal. Everybody says what they're working on, and that, that, for me, has been really helpful for, like, figuring out you have, like, this big stew of people sometimes hitting different things that, if you hear what people are working on, oftentimes it's enough to figure out, like, oh, who do I have to connect with afterwards, or, like, who should I, like, chat with?
I don't know if that's useful, but that really kind of, like, changed my work life, because it's hard to get people to take notes or even to know where the notes will live or that they'll be relevant. It seems like just getting people in a room and not making that an adverse experience, like, just make it a very simple experience sometimes really, like, cuts down this insane, like, issue of, like, needing notes, but notes go out of date or often aren't just right, you know, like, the right grain for us often.
From managing a team to individual contributor
I was curious, Michael, what it's been like for you coming from a team, from leading a data science team back into, um, like, more of an individual contributor role? Yeah, that's a great question. Well, I think RStudio is so, in my mind, the Tidyverse team is so great. I almost, like, it, and I gave a talk last year at RStudioConf on porting, sorry, I have a window over here with a fat bird outside, so it's like cat TV, um, but, um, yeah, uh, actually, I, I would say, like, um, it's been really nice, and I think that, um, the challenge managing was, like, um, I was learning a ton, basically, about, like, data engineering. I, I had, like, great people at all parts of the stack, and so I was learning a lot about, like, data engineering and data science and analytics, um, but not a lot of time to kind of develop or really take a lot of those ideas and put them together in tools, um, so it's been really nice to be on the Tidyverse team, and, um, now just, like, seeing people take a lot of insights and build tools out of them, um, has been really nice, um, but, uh, yeah, so I would say the thing I miss from managing is, like, really smart people telling me how wrong I am about a lot of things, um, what, what I really like about the Tidyverse team is now there are these incredible people, like, trying to just take these, like, insights and kind of, like, bottle them into tools so that everyone can kind of, like, use them easily. I, I definitely won't miss all the meetings, though. Managing was a lot more, uh, meetings, yeah.
Thanks, Michael, and, um, I get to ask you all these questions if people don't have other ones, so, reminder, you can always just raise your hand, jump in, put it in the Zoom chat, or you can use slide 02 to ask anonymously, um, but when you just said that, it was making me think, um, when you said, like, a lot of people telling you how wrong something is or giving feedback, like, when you were a manager, how did you handle those conversations and, and, and creating an environment where people felt comfortable to do that? Yeah, I think, um, I think one really helpful framing was, like, giving people an escape hatch for criticizing my work, which was, like, hey, uh, so this is what exists now, and, like, it kind of, like, I had this weird trade-off where I was, like, one person, but, uh, like, my goal was to help the organization help the organization see how useful data science could be so we can bring you in to, like, make it extra great, and I think that the framing of, like, so what I did was actually pretty bad, and it was just to, like, get the vision out that we needed you, um, and now you're here, so, uh, I think that's helpful because people could be, like, hey, I'm really glad that you, like, did this work and, like, pushed out an insight. It, like, all needs to be fixed now, like, um, especially, like, the data engineering, I do more, like, data science analytics, I think, um, so they're able to be, like, I'm glad that this worked and, like, something came out, um, and now that I'm here, like, we need to just rewrite it to be, like, robust, where robust is a code word for, like, what you did was not super great and will probably break in, like, unforeseen ways, but I found that helpful just to, like, actually for both of us to have that, like, hey, I, I, yeah, I did this, it's probably bad, uh, I was really focused on pushing things out and, and I'm glad you're here to, like, make it right.
Structuring data science teams
Yeah, I was gonna ask you, Michael, you brought up something that I've been talking with people about this week as well, which is the structure of data science teams and how, um, some teams are, like, just data scientists and they've got an IT team that will support them and implement models and do the engineering side of things, and there are some teams out there that have data scientists who are also data engineers, and then there are teams where there are both data engineers and data scientists working together in tandem on things to make sure that stuff doesn't get stuck in, like, a development pipeline and have to be refactored before deployment. Um, what are your thoughts on structuring data science teams, including engineering or not?
Yeah, that's, it's like the, it's such a tricky problem, I think, and there are a lot of different pitches on structures. I listened to, um, I know Jacqueline Nolas talked a bit about it on a data science hangout, and she said, like, that she started believing that there were virtues to different ones but ended up sort of sold on the decentralized model where data scientists are embedded in teams. Um, I think I agree that, that data scientists and analysts should be sitting with the teams that they're pushing out reports for. Um, it just, I mean, otherwise I would be trying to send people into those teams to figure out what they want and their priorities. Um, I think a good data scientist works with, like, a PM or whoever's leading the team to, like, both, like, push up metrics but also kind of, like, help change the roadmap. Um, I think it leaves the tricky question of, like, where should data engineers be, um, and how should they interact with the teams. Um, it does, it seems okay that today, like, data engineers, I think, are doing more, like, tooling empowerment, so I think they're a good team to have, like, yeah, it's okay to have them a bit more centralized and, like, connecting to the data scientists to, like, enforce best practices or enable, like, new pieces for them. I think that's a nice model, like, data analysts, data scientists live in the teams and data engineers are kind of, like, spokes of a wheel where then the data scientists connect with them and they work closely to, like, enforce best practice and enable, like, new important things.
Yeah, I think, you know, largely what you're saying, Michael, is kind of use your tools at your disposal, right? And I think that, you know, people like Rachel and myself who aren't necessarily in a standard data scientific sort of role, but, you know, we're all where we kind of interface with a lot of folks in different functions in the company. And I think it's really important that, we've talked about this before, but it's important to find your translators, right? And it's important to kind of use the power of feedback. I know Rachel's a huge proponent of taking feedback and actually making it into something actionable. But, like, really, like, you know, reach out to those people, those people, right? Like, you know, Michael and I recently worked on something together where we were trying to get some feedback based on something that he's working on. I feel like, I think Michael's pretty amazing at actually putting things into action as well. But, you know, I think that you have a lot of folks out there that are willing to help you do what you need to, and you don't need to be so kind of, you know, focused just on what you're doing. I think you'll find that if you're able to kind of get somebody from a different perspective that can also kind of level with you, it's really a lot, a lot easier to, like, make something happen.
Yeah, I thought that was quite cool talking about where the data scientists and where the data engineers go, because, so I deal with just insurance companies, but I go from insurance company to insurance company, and they're non-life companies, and I'm dealing with the actuaries mostly. And pretty much the model that I'm trying to get more insurance companies to take in is because they've come from this really traditional landscape where you've got lots of actuaries working on lots of Excel spreadsheets, and it's getting to, it's been at the limit of that for years, but it's at the limit of that, and there's a real lack of knowledge share and tool sharing, and that's where the data science element comes in. But for me, within the organisation, you need to have this team that's, I guess it's kind of like a mini-spoke, if you will, because they're central to the actuarial team, which is a spoke of an insurance company, but within that team, I think, for this speed of integration, especially sort of where it is now in the life cycle, I think you need a data engineer, you need a couple of data scientists, they need to be kind of a jack of all trades around sort of start to finish, but the most important thing is they need to be remote from the IT team, and they need to be with the actuarial team. I think if it's too far removed, and they're back with the IT team and stuff, you end up with the old problems, and they don't get the business concept, and then you have a migration or something, and it doesn't work, and it goes back to square one, but this time everyone's really livid about transformation and bringing in data science, because they did it once already, it didn't work, and it cost x million dollars, and they don't want to go again, so I think, yeah, so for me, within my teams, it's all about getting enough skills so they can get stuff done, especially proof of concepts, and then if you need to go back and sort of take stuff back at a step, then that's where you might look at that centralized model again, but that's my thoughts on it and my experience.
I think that's spot on, that the more central it is, it's kind of a diverge-converge thing, that a central team helps practice converge to what they see as best practice, but in your case, if you're describing pushing out something new, or getting something out that requires data engineering, that seems important, that's like kind of diverging and going into exploring a new area, or a new lane of work, just setting a data engineer there to allow it to actually do whatever it needs, versus like do what a big body thinks is... Yeah, but if you want to bring it back, and you're right, converge to the company-wide, make sure that everything's sort of on the same platform, then it's important to bring it back again. Yeah, but I think it's spot on, just making sure that that converging back kind of doesn't stifle creativity, or prevent a team from doing the right thing, kind of, even if it breaks from what's happening now.
Screening and hiring data scientists
That's a good question. I feel like I'm always really interested in like can this person create a slideshow? Like it's like Hugo Bowne, Anderson in his article like what data scientists do according to like 35 data scientists, said something like yeah, like maybe the most important skill is like making a slideshow. I think that's important is like can they communicate? Do we think they could like sit with the team and actually like understand its goals and work with the PM?
The other thing is that I'm a strong believer that in fast, live, real-time data analysis that not enough do we like have a data scientist just try to pull our data. Like companies, data scientists and companies really vary in their ability to just answer questions from data live. I think that's a really important skill for being able to drive the ship and gain the trust of like a PM. And I think like a good example of this is Dave Robinson is an R user, uses the Tidyverse and he feels comfortable enough to just live analyze data for an hour a week on YouTube. I think that's so important that a person can do that fast. It's like a communication skill basically. Like do you have enough data science fluency to communicate with data like a conversation with a person?
Do you have enough data science fluency to communicate with data like a conversation with a person?
And that's kind of why I started. So I maintain a port of dplyr to Python called Tsuba. And that's a lot of why I started porting it is because I was working with really good R users who could analyze data in real time and I just couldn't do it in Python. So I think that's critical basically. How you measure it, I don't know. I think there are a lot of ways you could do it with like realistic scenarios and tests. I don't think people should be subjected to that kind of like pressure in an interview necessarily, like you have like laser eyes on them and want them to analyze data. That would probably like be a really stressful thing for a lot of people. But I think that skill somehow has to be like sussed out and enter the picture. You could have a lot of, I should say you could have a lot of good data scientists who can't do this and that's okay too. But I think there's, all orgs need like these people basically.
I see there's a lot of great thoughts in the chat and I had a similar question to yours, Tatsu, because I know like if something looks really nice, like the same way if a Shiny app or data visualization looks really nice, you might think it's better than something else. And Tatsu, you asked a question about appearance if you want to jump in. I'm thinking like this is a case where like reading the book by its cover actually does matter, right? Like in this specific case where you're a data scientist, you have to be able to like take data, make it appear in a way that's digestible to people, right? So I would almost think that it's true, but I don't know what you think. Michael? Yeah, I think for like a designer, your resume is kind of a portfolio in a way because you're a data scientist won't just be in a cave, they'll be actually interfacing with people. So how they present this like initial, how they give you this initial glimpse into who they are in a lot of ways is a very data science-y interaction.
Hey. Hi. No, I think that I just wanted to like circle back on the idea of like screening resumes is super challenging because you get basically, you have no idea why somebody applied based on them sending you like a document they sent to like 100 other companies. So I really like the idea of having people answer like one or two screening questions, which actually gets at like the kind of key criteria that you have to meet in order to which actually gets at like the kind of key criteria that you're looking for a job or forces somebody to respond to like specifically what you're looking for and you get that kind of customized like, okay, what I'm looking for is how curious is somebody really about this position? How engaged are they? Are they really interested in working in this industry or did they get like a LinkedIn saying like, hey, you should apply to this job? Because I've talked to candidates, like I get on the phone with them and I'm like, hey, why did you apply to this job? And they were like, oh, it came up on my LinkedIn. And I was like, oh my God, no, no, that's not what I need. So I don't know if there are better platforms or ways to get people to like, yeah, I love cover letters too. If somebody sends a cover letter, it's like an automatic conversation with me. So yeah, if there are ways and platforms and things to get like, you know, some sort of gauge on like more than just a resume, that's priceless.
I, yeah, I will say too, like one, one thing that worries me about using the resume is, so like one, one weird fact about me is I, I did grad school at Princeton, but I, I'm a real barbarian. I actually had no idea like what Princeton was before I went. My, my advisor basically just pitched me there. And the only reason I bring this up is like, I didn't know that grad students went to school for free at Princeton. And so when I like, first, I almost didn't apply there because I legitimately believed they would charge me an incredible amount of money. And two, when I was there, I was so freaked out because I, no one told me that it was free. And so I was like sitting through the like interviews and like the, or like the visit, like with this freaky problem, just hanging over me. And so I, I think with resumes and just jobs, there's, there's that whole thing. It's like the less spooky we can be, like, actually like at the point that a person submits a resume, we haven't even probably cued them to what we want. So like that, that's why one hesitation of using the resume itself beyond maybe like a broad kind of entry point is that as a company, it's probably our responsibility to really very clearly tell people what we're like evaluating them on and just helping them appreciate actually what is being used in the process. So that, I do think the resume is really, is a reflection of kind of data science skill to a degree, but my, my one hesitation is that like, once we actually bring people into the interview, that we have the chance to like tee them up on our expectations. And so I think everything prior to that, I would kind of just, just in terms of like equity and where people come in from, just, just in terms of like equity and where people come in from, be careful to really like be super critical of.
Portfolios, cover letters, and standing out
Yeah, sure. I can talk about that. So last year, and I'm, to preface a statement, like I'm a career switcher. So like, I'm in my mid to late 30s. And I was having to find an internship last year. And finding an internship in your 30s is weird. It's hard. It's like, people see your resume that's full of lots of experience and think like, why is this person applying for a resume or applying for an internship. So I attended a talk that was led by Polly Mitchell-Guthrie at Kinaxis, which is a Canadian company, supply chain analytics software company. And I noticed that she happened to be hiring an intern in strategy. She's the VP of thought leadership and industry outreach. I was like, how cool is that? That's amazing. My undergrad is very supply chain oriented. So I reached out to her, asked her a little bit about the position, let her know I was interested, and then applied. And in my cover letter, I added kind of in the place of one of my paragraphs, you know, you have a few small paragraphs in a cover letter. And at the end of my paragraphs, it was like, I feel really passionate about XYZ. And if you have the time to watch this very quick two minute YouTube video, I think it'll explain why I'm passionate about it. And had just a little link to my video. And I got the job. And that was definitely what sealed the deal. Like everybody told me, I can't believe you made a video for your cover letter. That's amazing. We've never seen that before. I had some like quick animations in there that I did on my iPad. It was nothing fancy. And I actually recorded the video the day after I got my COVID vaccine. So it's really sick. So it doesn't have to be perfect. It just has to be real and authentic. And, you know, you and connected to like, I talked directly to her. I was like, hi, Polly, how's it going? You know, in my video. I highly recommend that for anybody who wants, you know, them to get across, but it wouldn't have worked if I hadn't had her email, if I hadn't attended her talk, if I hadn't reached out to her and been an actual person.
You can do it. I mean, the more I think the more kind of like stuff people can see and interact with the better if the codes on GitHub, it's super handy to have. I don't know if people will read through it in detail. But even its existence is I think really a big indicator that I mean, I mean, you've got like a portfolio of work that you were able to put up, which is probably what you'll be doing a lot as a data scientist is like pushing up projects and stuff.
Yeah, I this actually, you know, I don't think I got the current job I'm in right now because of this. But I'm speaking to a team of people that had been surrounded by data their whole careers, who weren't really that familiar with, you know, R as the programming language or some of the capabilities of like Shiny, I think, you know, I built a pretty lightweight Shiny app, using the BSLib framework, and thematic and a few other popular packages, and it kind of blew their mind. And I think that's just, you know, the nature of them being surrounded by like, Excel dashboards, or, you know, like kind of Tableau can make amazing things. But the Tableau dashboards I usually see in production are just to kind of summarize data quickly, like they serve a very useful function, but they're not really aesthetically pleasing, in my opinion. So sharing kind of a real lightweight, you know, Shiny app, went a long way. They told me afterwards, they were like, that app went all over the company, we're so happy that you made that. So I themed it as if it was their product. So I took like their logo and their color scheme, and I made the whole Shiny app look as if it was, you know, a bloom reach, like data product.
Yeah, I was just gonna say, I think that last point that you made there, Javier, on putting it into their brand and their style, and it being on topic is the most important thing. I think from my experience, it's really difficult to choose like just a general thing that shows you can do data science. Most data scientists, I'd say, can see the sort of the broader aspect of what you can do with data and stuff, but take it outside anyone there. And even though you've done something really impressive with numbers, producing really good results, if it's not in their domain, people are really bad, I think, at seeing the transferable skills. And that's my biggest experience. So in terms of having demos, I think they're really good. And I wanted to get some up on my website portfolio and that sort of thing. And I always sort of struggle to choose the right one. Actually, I've just got to make a decision and get something up. But it's that sort of thing as well. I think it's like if you're going for your dream job, you kind of almost have to really fire it up and make it really specialized to that. Because in my experience, if you go into an insurance company or a banking company or whatever with your favorite sport, and you've taken all the data and you've brought it together and you've put out this really kick-ass graphic, and then they just all look at it and go, yeah, but we do banking. You're like, yeah, but it's just numbers. It's all just numbers. Can you not see it? I find it so frustrating that people can't just see that you can just do all this with numbers and it's transferable to every business and every company and every hobby in the world.
That's an awesome point, George. I know an hour is going by extremely quickly. I see, Brittany, you asked a great question in the chat a little bit earlier. Would you want to jump in?
Sure, I can jump in. So I think I've heard this question a lot around having some sort of portfolio. And because I keep hearing it more and more often, I think I'd just love to ask the room, do you think it's becoming one of those things that's more and more important that maybe people should think about prioritizing time to create if they're looking in the job market? And if they are, what kinds of things do you think that we should be including in that portfolio? Is it just very simple data analysis? Is it maybe actually doing more of a Shiny app? Is it more of an actual model and like full out model building process? Like if someone wants to invest the time, what do you think they should do to do that wisely?
Any thoughts, Michael or Larry? I see you have your hand raised. Oh, cool. Yes, I'm on. All right. Sorry, I'm very passionate about this one. With the projects, I've noticed that I'm not more, I'm more on the consulting side, but with the projects, if I hire somebody to help me on a project, I'm looking at how they go through the process. So I ask them if they have a project that they did to gather data. I ask them if they have a project that they did to clean the data. Then I ask them the analyze, the visualization, maybe a slide deck. And if they can walk through that whole process, then I can see they can do the whole process from beginning to end gathering data. They can go find data sets, use them, apply them, combine them, and then get insight out of them. That's the projects I look for, but I'm more on the freelance side. So I don't know if that helps, but those are what I'm looking at.
Yeah, I think portfolios are nice. One, it's like a victory lap. You get to show off all your hard work. So I think that's super nice. I think two is applying for jobs. I think applying for jobs, it's so nice to find other people who are also looking for data science jobs or working as data scientists. And then a portfolio. I would say the nice thing is even versus sending it to a company you're applying to, being able to connect with people and show your work to them and talk about it and work on stuff. I think that's also a really exciting piece.
Brittany, was part of the question about knowing what the project should be that you work on for them too? Yeah, I think Larry gave some insight and a couple others did too, but just it sounds like it's a little more of what we might consider the basics and not so like super fancy, crazy modeling types of things. Yeah, I would start like just by even boxing your time and deciding how long you want to spend on it and then work backwards. Because the biggest risk is that the projects never happen. And I know this is true for me. I just want to end up creating the perfect project, but if I block my time to one day, then at least you can just get something out. And then I think it also really sharpens thinking around how much is good enough.
Working at RStudio and what's ahead
The question is just like, how are you staffed? How does RStudio staff you on to new projects? Like are you working on multiple problems at the same time? Or is it more like, you know, you're personally very curious about some aspect of the Tidyverse that you want to kind of dive deeper and try to improve? Or like, how does that kind of staffing model work at RStudio? Yeah. So I'm probably not the best person as I've been here for like a couple, a few months. But I think, I mean, the Tidyverse team is just so great and self-motivated. I mean, I think everyone on the team is sort of like a fanatic in a way, like they have a problem that they really want to solve. They're very much like, they remind me a lot of researchers. And so really, I think a lot of it seems to fall out pretty naturally that it's pretty weird for people to have like this kind of focus, like Thomas on ggplot2. Like you're just not going to see that level of dedication, I think, a lot. And so I think that like the hard part is kind of done, I think, with a lot of people on the team. It's like people with weird obsessions. So I have to say, like, I don't know why I've just been dedicated to porting tools to Python for the past three years. Like I use Python for a lot, but wish I was using the Tid

