Hadley Wickham - R in Production

Transcript#

This transcript was generated automatically and may contain errors.

Our next speaker represents the year 2018. And his current favorite spirit is Rum Fire. Please welcome Hadley.

Hey, everyone. So today, I wanted to talk about putting R in production. And this is a talk about a topic I don't really know anything about. And I've never done it myself. So a lot of this, I'm not going to be showing you any code today. What I'm going to be showing you is my efforts to understand what this thing is all about, with the hope that a few months down the road, it'll start to maybe influence some open source packages or maybe some of Posit's pro tools.

And I also want to say, I'm going to be talking about R in lowercase p production, not R in uppercase p production. And I think this is an important distinction to make if you're talking to folks in your IT organization or your DevOps organization. Because when they think production, they're often going to think capital P production.

And what's the difference? The difference, I think, is basically paging. The difference is paging. So when something's in capital P production, it means it is so vital to the health of your organization, to the correct operation of your organization, that if it stops working, someone is going to tell you, regardless of whether that's 3 PM on a Monday afternoon or 3 AM on a Sunday morning. That's what capital P production means, that your code is vital to the health of your organization.

And so I think you certainly can put R in production. But you probably don't want to, not because there's anything wrong with R, but because you don't want to put data scientists in production.

but because you don't want to put data scientists in production.

Because your data scientists don't want to be woken up at 3 AM on a Sunday morning, right? That's not part of their job description. So it's totally fine to put data scientists in lowercase P production. In fact, really important, because they're still going to be your data scientists are going to be producing things that are really useful for your organization. They're not so vital that someone has to be woken up if they break. But they're producing dashboards that people are going to be looking at every day, every week. It's really important.

And you absolutely can put R into lowercase P production, because the dirty secret of most organizations is they already have Excel in production. Like the number of organizations where there is some semi-automated Excel spreadsheet that is really, really important to someone in the executive part of the company is pretty high.

And this of course, like leads to the worst possible debugging scenario, which is bugs that only occur for other people who can see data that you cannot see. And how you debug that, I have no idea, but thoughts and prayers.

Wrapping up

So today I'm going to talk about two of the three things that I think make something a production job that it's not run just once. And that causes these challenges that the schema might change, your dependencies might change, your platform might change, your universe might change, your requirements might change. And you need to think about these problems. Some of them have good solutions, some of them don't have good solutions, but at least I think acknowledging them is the first step.

And then the other problem is that you're not just running it on your computer. You've got this transition possibly from Windows to Linux, from desktop to server, from interactive to batch, and you've got a bunch of challenges related to auth.

I'll show you, I'll leave you with just like one last picture, which is my, sort of speaking to that last bullet point, that last problem, which is not just you, that now you're working with a team of data scientists and there's kind of this hierarchy of needs that ideally you want to be able to at least find your colleagues' work. Even better, you should be able to run their work. Even better, you should be able to understand their work. And optimally, you should actually be able to edit it if needed, so that if someone does leave your team, you can still go. I suspect many teams are still down here. Some teams are like striving to get higher up that pyramid, but definitely a challenge of how do you share work across people? How do you build standards in your team? Thank you.

Hadley Wickham - R in Production

Transcript#

What does "production" actually mean?

The ice cream prediction scenario

De-risking schema and dependency changes

When the universe changes

When requirements change

Summary of challenges

Running code somewhere else

Windows vs. Linux and desktop vs. server

Package installation and interactive vs. batch debugging

Authentication and access control

Wrapping up

Featured software#

dplyr

ggplot2

pkgdown

roxygen2

testthat

tidyr

tidyverse