Mark Sellors | R in production | RStudio (2019)

Transcript#

This transcript was generated automatically and may contain errors.

R is a one-word trick that we use to get R into production. I'm also not going to talk specifically about any code that might help you run R in production. That's not really the area that I work in.

So where I work, a lot of the time we're helping organisations adopt the R language and trying to help businesses of varying sizes, from very small organisations up to massive government agencies to adopt the language in their businesses, which isn't always an easy conversation to have.

So this is kind of my starting point. I genuinely think that all of the technical barriers to running R in production are easy to overcome. I say easy, I mean comparatively easy, right? Those are always challenges. But it's the cultural barriers that slow us down.

But it's the cultural barriers that slow us down.

So there's some overlap here with what Joe Cheng was talking about this morning. I like to present this as a kind of inspirational quote, but actually it was just something I said on Twitter one time. But I do genuinely believe it.

Production is very much a team sport, which is why I talk a lot about building bridges between these different disparate groups within the organization.

As I said before, I don't want every data scientist to have to learn all of the kind of software engineering best practice, and I don't think that there are necessarily many data scientists who are interested in that sort of thing, which is why it's important to build bridges with these different teams within your organization. But obviously in those circumstances, it's about building a kind of common language and sort of trying to understand a little bit about what they do.

Production readiness checklist

So I've put together, and again, this list is by no means exhaustive, but I've put together a kind of production list or a production readiness checklist for some of this stuff. So when R goes into production, you need to think about the target environment. What is the target environment? Is it Windows? Is it Linux? Which versions and things like that?

You also need to think about the release process. Can you just take code off your desktop and shove it onto a production server? Probably not. So a lot of organizations will have a formal release process that you'll need to either get on board with or somehow persuade somebody to adapt for your specific needs.

You'll probably need a testing strategy, so it's worth writing down what that is so that when whoever it is from the QA team says, well, you haven't done any testing, you can say, well, actually, we've done a lot of testing, and our testing strategy is really good and clear. Some organizations have a change management process. So a really good example for this one is retail businesses who often don't want to change anything in their infrastructure over the Christmas period because they're extremely concerned that any changes might damage their infrastructure in some way, and so there'll be a gap, two, three months, whatever it is, where literally no changes can be pushed to production at all.

You might need a security review. Infrastructure is kind of similar to the target environment, but also covers things like, do you need access to databases and things like that? What is your deployment process? Will there be an automated process? Is it a manual process? Will you push straight from your desktop? Is it going to come out of Git or wherever? Who's going to support your application? If you're taking an R application into production, do you want the 3AM support core when it falls over? Probably not. Who's going to do the support? Are the support team prepared to support this application? Do they have the information they need?

With a lot of data science projects, you do the project and then you move on to the next one, but in production, we need to think about maintenance. Who's maintaining this thing when the initial drop is done? Where's version two going to come from? Who's going to do the bug fixes? Who signs off on releases? Which version of R are you going to use? Which packages, which versions? There's a lot of things to think about. This checklist is kind of a starting point. Yours will be different. Feel free to copy it or take a picture or whatever and add your own things onto that.

Once you've done that, you can be happy up a mountain enjoying having R in production. The last thing I want to encourage you all to do is if you do have R in production currently or if you do manage to successfully get R into production in your organisation, I really want you to share what you've done somehow, anyway, because it helps lift the rest of the R in production community up. It's really good to get these stories out there to encourage others to do the same and so that we can learn from each other.

Last thing, that's me, that's the company's website. The last thing is the field guide to the R ecosystem. It has been quite useful for people who don't know R, so people with ops and management backgrounds and things like that. It's a very short, high-level overview of the R ecosystem. With that, I will stop talking.

Q&A

Thank you, Mark. We have time for one question. If anybody has any question that they would like to pose to Mark, we've got microphones. It looks like we have one right over here.

You mentioned about building an internal CRAN for a company and installing and running unvetted code is kind of a big concern in my organisation. Were the packages somehow vetted when you were including them in the repository, or how did that work?

They were vetted, yes. Basically, all I did was take... They gave me a list of these are the 80 packages that we use. I got those packages from CRAN. I got all the dependencies from CRAN. I ran their kind of corporate antivirus, their corporate anti-malware stuff over those packages, and then I built a small repository to host them. That was enough to get their security team on board effectively.

Mark Sellors | R in production | RStudio (2019)

Transcript#

Defining production

Why R is great in production

Getting R into production

Two paths to adoption

Data science vs. software engineering

A real-world example

Production readiness checklist

Q&A

Featured software#

rstudio