James Blair | Democratizing R with Plumber APIs | RStudio (2019)

Transcript#

This transcript was generated automatically and may contain errors.

Today, I'd like to talk about democratizing R using the Plumber package. And so to get started, what I want to do is kind of go through and provide an introduction to Plumber if you are perhaps unfamiliar with the package or haven't been exposed to it before. And then highlight some new features that we have that allow us to integrate more tightly with OpenAPI, things of that nature. And then finally, walk through a use case, maybe typical example use case of what Plumber could be used for. And then walk through a demonstration of what that might look like.

So if you're unfamiliar, Plumber is an R package that allows you to create API endpoints around common R functions. So you can define any sort of R function that you're interested in, and then with a few simple commands, you can turn that R function into an endpoint that can then be interacted with via any sort of API call.

This looks like an example Plumber script would look something like this. And again, if you haven't been exposed to Plumber before, there may be some pieces in here that aren't quite recognizable. But for the most part, if we kind of abstract away the specific Plumber parts for just a moment, this starts to look like just a typical R script, right? At the top here, I load the Plumber package. I load in some saved data object that I have. And I apologize if this is a little small in the back. This is all available on GitHub, and I'll make sure that you have the link to review this later. But then I load some saved data object, and then finally, I define a function that interacts with that object.

Plumber annotations and OpenAPI

To look at the Plumber specific pieces as we walk through this, Plumber uses specialized annotations, and these are just special comments written in the regular comment syntax in R with the hash symbol, and then that's followed by a little asterisk. And this allows us to annotate our document or our source script in a way that Plumber can then understand and interpret our just standard R code and turn it into these APIs. So here, this initial, these few initial comments here provide an API name and description from an API.

The next step that I have is I have this filter that's defined, and a filter is used when I have some sort of logic that I want to take place on an incoming request before it's served to an endpoint. And so this is where I could do something like if I want to provide some sort of authentication on incoming requests, I could use a filter to do that. Here this example filter simply takes some information from the incoming request, such as where this is coming from, the user ID associated with it, a timestamp, and prints that out so that I have a nice log available to me of who's interacting with my API as well as the frequency that this API is being interacted with.

Finally we have the function that I started with when we demonstrated this R code, and I just have a few different things that I have that I've annotated on this function. I've given it a name and a little description, and then I've defined the endpoint that this function should respond to. So again, I apologize if it's a little small, but I've defined that this function will respond or will be called when the predict endpoint is accessed with a post request.

To go a little bit further, once I have all these annotations in place and I actually run this API, this thing on the right-hand side gets generated for me. And as an R user when I was first exposed to Plumber, this was like a mind-blowing moment because I wasn't overly familiar with APIs. They seemed kind of scary to me as an R user for a little while. It's like this thing that everybody knows about, I don't know what it is, right? But I started using Plumber, and as soon as I ran my API, all of a sudden, free of charge, I have this nice little interface that would pop up and not only show me what my API looks like, but it allows me to touch and feel the API to make sure that it's operating as I would expect.

If we look at a comparison between the code that we've written and this file and user interface that's been generated, we can start to see how these two things are connected. Here I see that my title has been copied from my source code into this user interface. I can see that the description matches the description that I defined in my R script. And finally, I have this endpoint that's been defined, this predict endpoint that shows that it's going to respond to post requests, and it shows the description that I provided in my source code.

This sort of magical experience is made possible through the use of OpenAPI. Now what OpenAPI is, it was formerly referred to as Swagger. This is a specification, so it's a set of guidelines that defines a way that you can go through and essentially describe an entire API. The endpoints, the attributes of those endpoints, everything from kind of the top to bottom of your API can be described using this particular specification. There's a few different ways this can happen. You can write like a YAML file, which is the example that I have pulled up on the screen. You can also create a JSON file that is then ingested and interpreted by OpenAPI to provide things like the user interface we were just looking at.

Plumber interacts with OpenAPI in a couple of different ways. We've already been exposed to one, and that is through these specialized comments where we have the hash symbol followed by an asterisk, and then typically some sort of decorator like an at symbol. And what happens is Plumber will parse the file, identify these special comments, and then insert those into a named list that is finally processed into a JSON file that OpenAPI will read and provide us with the user interface. Now this functionality has been in Plumber since essentially day one. And one of the nice advantages here is that my annotations, my documentation, my description is all closely connected to my code. They're right next to one another.

However, there are some limitations here, and that is the entire suite of options that are available through OpenAPI is very broad, right? And so it's not feasible to consider creating a specialized Plumber annotation that provides access to every single feature available using the OpenAPI specification. And I'm happy to announce that given some recent efforts that have happened with the Plumber package, we now have the opportunity to update and adjust the full OpenAPI specification at runtime of the Plumber API.

What this looks like is over here on the right-hand side, I have an entry point file that Plumber will read, and in this entry point file, I define at runtime a swagger function that essentially will take in a few different arguments, and all it needs to return is a named list that can be appropriately parsed into JSON that OpenAPI can interpret. So I can update anything that I've written, or I can add additional attributes to my API. In fact, if I wanted to, I could overwrite everything that Plumber has already automatically done and just create my own specification file for my API to be ingested and then rendered for the end user.

A typical data science use case

Given this introduction, let's consider an example use case. As a data scientist, I'm often handed a piece of data and asked to provide some insight, and I go through this process that's described in R for Data Science. I ingest the data, I do some cleaning of my data, and then finally I go through this iterative process of trying to understand the story or understand the insight that should be delivered from my data.

Being a data scientist, that means I'm often creating models, and so I'm going to dig deep into my bag of tricks, and let's do something cutting edge. Let's make a linear model out of this data, right? We've gone through, we've cleaned this up, and we've created this model. The next step now is to present the results, and as an R user, there's a variety of different ways that I can do this. I can create a PowerPoint presentation, I can create an R Markdown document, maybe it's an email that I send. My tool of choice when I'm faced with this opportunity to communicate is to build a Shiny application, and this allows me to build some sort of interactive web user interface that users can then access to touch and play with the model that I've developed, right?

So I can publish this application to something like RStudio Connect, and then I can have my end users come in to the picture, and they're able to come in and interact with my model and get the insight that they need. And now I'm in this happy place where I have a Shiny application that's sitting on top of all the work that I just did, and it's servicing all these end users, whether these are fellow analysts, business users, managers within my department, whatever the case is, they're able to come in, and in a very intuitive way, they can interact with my application, they can interact with my model via this web application that I provided.

Now I've been in this position before, and this is a great feeling to have, right? I've done all this hard work, and now people are using it, and it's getting recognition, and I'm helping people's, ideally I'm helping make people's lives better in some way. As this model starts to gain traction, and as this application starts to gain traction, something always happens, or at least in my experience, something always happens, and that is another set of users slowly start to appear, and these users are a little bit different than the traditional user base, because they don't want a web application, they want programmatic access to whatever I've done, right?

They want to be able to access the underlying model, and typically it's at this point in the process where I start to scratch my head and say, well, we've had a good run, right? Because I know this is what's coming. I'm going to go to one of these engineers, and I'm going to get something like this in return. Now, this is not because R is bad, and Go is good, or vice versa, right? This is because these are different languages designed for different purposes, and I've been in many conversations with software engineers where they're trying so hard to understand my R code, and I'm trying so hard to explain it, but that handoff always has some amount of friction associated with it. And up until recently, that in my mind has just been kind of this necessary evil, right? At some point, my little baby R project has to grow up and enter the real world of software engineering.

I propose today that there's actually a better solution, and that solution is to use Plumber as the intermediary here. Instead of going directly to building a Shiny application, I can instead expose my model or whatever it is that I've worked on as a Plumber API, and now I've given myself the flexibility to not only service end users who want programmatic access to the work that I've done, I can still continue to service my regular users, right? By using a Shiny application to access this API so that my traditional user base that are used to a web interface can continue to use the work that I've done. And at this point, now everybody's happy.

I propose today that there's actually a better solution, and that solution is to use Plumber as the intermediary here. Instead of going directly to building a Shiny application, I can instead expose my model or whatever it is that I've worked on as a Plumber API, and now I've given myself the flexibility to not only service end users who want programmatic access to the work that I've done, I can still continue to service my regular users, right?

Live demo in RStudio

So what does this look like? Let's take a short demo into RStudio. And so here we've got the RStudio IDE. This is a Plumber file that I have open. This is the one that we've been kind of looking at. And so I just want to walk through a few different things that are happening here, and then we'll run this and take a look at what happens under the hood.

So I load the Plumber package, and then as I mentioned before, I have this model that I've saved out. So this is a very simplistic example, but the idea here is this could be a broad range of things, right? There was a great talk given earlier about serving Keras and TensorFlow models using Plumber, right? So it's really limitless in the scope of what you can do here. So I've loaded this model in. Again, I have this filter that logs some information about my incoming requests. And then finally, I have this endpoint that listens on this predict location in response to incoming post requests.

And what this function does, if we look through this, it's actually fairly straightforward. It anticipates that the incoming request contains a post body object, and it anticipates that that post body object is JSON. And so it will try to parse that JSON into a native R data frame. If it can't, it returns an error to the user. Otherwise, it takes this new R data frame and sends it to my model to generate new predictions. Now one of the nice things about Plumber is that by default, any response that's generated within my endpoint function is automatically serialized into JSON, which is a language that most other computer programming languages or toolkits would be able to naturally ingest and understand. So once this prediction is made, down here with this predict call, the resulting predicted values will be returned as a JSON array to whichever client made this request.

Now typically, when I've done something like this, and I've built several different APIs in this fashion, this is where Plumber previously has kind of been stretched to its limits in terms of what it can do from a documentation and an interactivity perspective. And that is because there's no way for me with these annotations to specify details about the request body itself. And so when I would run this, I would get the nice interface, and I would be able to click on my predict endpoint, but I would have no way to really interact with it. I would need some other tool chain, right? I could use curl from the command line, I could use postman or some other API client, but I needed some other tool to verify that things were working.

However, like I mentioned earlier, with the newest updates in Plumber, I can now specify anything that I want about the API specification at runtime of my Plumber API. So here I have this, I'm loading the Plumber file, and then I'm using this run method. I've passed a function into Swagger, and here I'm defining what the request body to this predict endpoint should look like. And when I run this file, and I want to point out for just a moment that in the newest version of the RStudio IDE, there's some nice tooling built around Plumber, so I can automatically run the API from within the IDE with this run API button, and when I click on this, let me expand over here, I now have this nice user interface, just like I've come to expect, but the nice thing is, as I scroll down here, I can see my post endpoint.

If I open this up, notice it says no parameters, which is fine, I didn't define any parameters, but this is traditionally where it stopped, right, and I wanted some mechanism to test this out. Now, however, I see this request body, and it shows me not only an example of what this should look like, it shows me what the schema is, and then it gives me the opportunity to say, well, okay, this is what it looks like, let me try this out, and here's some values that are pre-populated by default, but I could come over, and I already have this pulled up, so I could come over and grab some JSON data. This is just a selection of the MT cars data in R, and I can come over here and edit this, paste it in, execute, and so a couple of things happen.

Now we see the curl command that was generated here. We see the URL that I made this request to, so this is just running locally right now, and I've made a request to this predict endpoint, and then if I go down here, I see the response body, so this is the predicted value, the predicted miles per gallon value of this JSON data that I just passed into Plumber, and notice that I never left RStudio. I built this API, I tested this API, I verified that everything was working all from within RStudio.

Real-world validation

Now some of you at this point may be thinking, okay, that's great, right, this is all well and good, James, but is the handoff really this easy, right? Is it really as easy as you made it seem? And I've had that question myself, and so I was actually on a phone call this past weekend with a close college friend of mine, and I was going through my talk and kind of telling him about what I was going to be demonstrating and everything like that, and he was following along, he's a mobile app developer, and at the end of the talk, he said, well, send me some information, or send me the GitHub repo to your API, and now this is an API that I've published to RStudio Connect, it's open, it's available, and so I sent him the repo, and I really didn't think anything of it, and a couple of days later, I got this in a text message.

Now my friend knows how to spell R, and that's the end, right, he has no working knowledge of R, he's never interacted with R in any sort of context, yet here he's built an entire mobile app using Swift that's interacting with R built by somebody who doesn't know R. This is what Plumber enables us to do, right? We can now go straight from I have this idea in concept to I have this idea that anyone else can interact with, whether it's a Shiny application or it's a mobile application, it's some sort of Java process, whatever the case is, I've now opened up the work that I'm doing to a much broader audience.

This is what Plumber enables us to do, right? We can now go straight from I have this idea in concept to I have this idea that anyone else can interact with, whether it's a Shiny application or it's a mobile application, it's some sort of Java process, whatever the case is, I've now opened up the work that I'm doing to a much broader audience.

If you'd like to take a look at some of the recent changes that have been made, they haven't quite made it to CRAN yet, but you can download the recent changes from GitHub, so you can do that in this manner, and then there are some additional resources that are available in the Plumber documentation, as well as if you're curious about OpenAPI and what that specification entails, and then the GitHub repository for this talk and the materials that I've used are here as well. Thank you very much.