Dr. Travis Gerke | UnicoRns are real | RStudio (2020)

Transcript#

This transcript was generated automatically and may contain errors.

Hi, I'm Travis Gerke. Thanks for being here. Thanks for having me. This is tremendous. I'm excited. I don't know how else to put it. So, I lead a couple of data science teams at Moffitt Cancer Center in Tampa, Florida. You can find me at that Twitter handle, it's my name.

And I actually just tweeted out all the slides and the code that kind of goes along with making these slides, although this is not a technical talk. This is just a talk about writing job descriptions and things which are typically boring, probably, HR nuances. I'm sorry if there's any HR people in here, but we're going to talk about that.

So in particular, I'm interested in how we write job descriptions and how those end up mapping to certain salaries for data scientists, and in particular for our data scientists. I found this challenging often, and I thought I'd just share some experiences and hopefully get some good information for you all.

The unicorn job posting problem

I hope this is a message that most data science managers seek to convey, and I hope they practice it. Certainly they all want to hire the most talented data scientists, but importantly, I hope they want to compensate them accordingly. Seems pretty simple, but in practice it doesn't always pan out that way.

This is a retweet of a parody account. So Associate Deans is a parody account that talks about sort of bureaucratic silliness and mostly academia. And here what they're talking about is soft money positions. So what a soft money position is, it's a position where an investigator has brought grant money to an institution, and they hire someone to work on their grant with their grant money. And they feel empowered. At least they feel like they should have full control over how that grant money is spent. So they should be able to hire the people they want and pay them as much as they want, because they sort of earned that money.

But then HR kind of gets in the way, so to speak. So here Jason is lamenting this fact, and he's saying, well, I know who I want to hire and I want to pay them some amount, but HR is now telling me that I have to pay them less. And I jumped onto this thread and I said, yes, I feel this as well. I've experienced this myself. Many times where I want to pay a data scientist who I know is worth their weight in gold some amount of money, and then HR might say, oh, no, no, there's another calculation that has to happen here, and you have to pay them less.

So we're left kind of with the question, these HR people, are they our friends? Are they our enemies?

Whichever the case, and again, I'm so sorry if there's HR people in here. I promise it gets better. Whatever the case may be, I started to think, maybe there's another way, right? Maybe we can trick them. Like, maybe if I write a job description that describes a person who does not actually exist, then they can't do a salary benchmark analysis, and they can't establish a baseline for my person who doesn't exist, and then I can price point how I want. Seem pretty fair. And maybe other people have had this idea, and thus, unicorns are born, right? Maybe this is how it happened.

We've all seen these kinds of postings, these data science unicorn postings out in job boards and whatever, where they ask for a really, really long list of skills and needs and technical requirements and things like that. And so we're going to talk about those kinds of people.

When I thought I would take this strategy, that I'm going to write a unicorn posting, I thought, well, I should find one that uses a template, because it's kind of a lot. I don't know a whole lot of programming languages. I know R pretty well. I know some other things, but I don't know what I'll put in those. So I started shopping around, and with some help, we identified this one, which was live just last week on Indeed, and you don't have to read all that. I'll walk you through what this person's going to do. This is not my posting. This is someone else's posting.

So this person will do some machine learning. They'll deploy some models in production, so they'll do some machine learning engineering, it looks like. They're going to build applications, and they're going to develop and implement cloud-based security solutions. Cool. And then they'll integrate data and do some decision-making, so it sounds like there's some decision science tasks that are on this person's plate. Fair enough.

So here are the technical requirements for that person. I'll mention something that's at the bottom that didn't make the cut, is they only have a zero-to-one year of experience, and they know all these things. But anyway, don't read all this stuff.

I did read all these things, and as I stared at them, I became more convinced that a contemporary R user who is equipped with enough R packages at their disposal, which all of you have, and the RStudio suite of tools, which makes doing a lot of these things very easy, they can do all of these things with a single language and a single toolkit.

I became more convinced that a contemporary R user who is equipped with enough R packages at their disposal, which all of you have, and the RStudio suite of tools, which makes doing a lot of these things very easy, they can do all of these things with a single language and a single toolkit.

So there's a lot of languages listed on there, and I don't really think they're all necessary. They want this person to do exploratory data analysis, okay, tidyverse , they want them to do some visualizations, okay, ggplot, and then they want to wrangle large, complex data potentially, so there's all kinds of things out there, data.table, Vroom, other resources, even ddplyr now, dashboards, Shiny, visualizations, all this stuff. It happens. Interfacing with modern database technologies. We have packages for that now, deep learning, machine learning, all of it in R.

So I became convinced, right, so the R unicorn is real, and I want to actually not trick HR, but write the correct job posting for the person that I want. I want a unicorn-like person who knows R, and I think we should be able to write a job description like that.

But in order to be able to do so, well, hang on, there's a problem. We've seen warnings like this out there on Twitter and from other resources, and in particular from this book, which I want to point out is if you read one book this year, read this one. Even if you're not seeking a job, it's called Build a Career in Data Science from Emily Robinson and Jacqueline Nollis. I devoured it in like three days, and they very eloquently kind of map out the challenge here, where they say there are lots of listings that look like the previous one that I just described, but it might be a red flag that the company doesn't know what they actually want from a data scientist, and so you might want to sort of beware, and yet I just claim that these unicorns are real, and importantly that we should pay them the right amount.

Particularly for our users, you know a lot and you can deliver a lot of value to any institution that you choose to join. Feel empowered to apply for those roles and you should get them because I think you're all awesome. So you don't need like language X. You just probably need R, at least in my opinion.

And these drivers, unfortunately, I don't know that they're totally correct, right, for data science in particular. Autonomy, fair enough. But experience, it's hard to have ten plus years experience in a technology which isn't ten or more years old. Like Shiny is a semi-arbitrary example, right? I don't really know how to navigate that and there's not really good solutions in the HR domain, but I think at least they're aware of it. So at individual organizations, you can talk to them about that sort of thing.

The salary surveys are not yet capturing specific data science roles. So again, in the build a data science career book, they do a very great job of spelling out all the data science subdomains. So like machine learning engineer, a decision scientist, all kinds of business intelligence analysts, all this stuff. There's all these words. And they do a good job of spelling that out. Because that hasn't been standardized for a long period of time, many companies aren't hiring into those titles and so they're not reporting out data on those. So we do not yet have data on how much each of those sub-niche roles sort of make. But that will change in the coming years.

I'm not arguing for or against the current process. I'm just telling you what it is. I'm kind of powerless against the machine, just like so many of us are, right? I mean, there are federal laws that are in play here. So you really can't diverge too much from what the standards actually are. But at least by understanding the process, you may be empowered to understand if you're a job seeker, where you land in the data science one through five track. And if you're a manager, it might also empower you to help know how you want to write the job description to get the unicorn that you actually need.

I feel a thank here. So Don Evans, of course, my compensation consultant and HR guru, she was immensely helpful. Jordan Creed, many of you probably already saw her because she was giving out the unicorn hex stickers, which she also made. She's a great data scientist and apparently also a future PR representative. Gary Caden-Booey, I've seen him thanked in at least half the talks here. And he's similar here because he makes so many good tools that everybody uses. I benefit from working with him. I was able to get advanced access to the share engine extra package, which made a lot of the fanciness that you saw in these slides happen if you're interested in such things. This ggiraffe, I think is how it's called, package, is what made that interactive map happen within the slides outside the context of Plotly. Also if you're interested in how those things work, that's a pretty great package. Thanks to the Tampa user group. They did quick feedback on this. I gave a little preview of this talk a few weeks ago, and they gave some great feedback. Again, Emily and Jacqueline for that book. There's a link. Please do check it out. If you're interested in any of these sorts of topics, you'll find that book fascinating. They're going to the resources, that's how you find me on Twitter. Thank you again. It's so awesome to be here.