Resources

James Blair | Democratizing R with Plumber APIs | RStudio (2019)

The Plumber package provides an approachable framework for exposing R functions as HTTP API endpoints. This allows R developers to create code that can be consumed by downstream frameworks, which may be R agnostic. In this talk, we’ll take an existing Shiny application that uses an R model and turn that model into an API endpoint so it can be used in applications that don’t speak R. VIEW MATERIALS https://bit.ly/2TXfFR5 About the Author James Blair James holds a master’s degree in data science from the University of the Pacific and works as a solutions engineer. He works to integrate RStudio products in enterprise environments and support the continued adoption of R in the enterprise. His past consulting work centered around helping businesses derive insight from data assets by leveraging R. Outside of R and data science, James’s interests include spending time with his wife and daughters, cooking, camping, cycling, racquetball, and exquisite food. Also, he never turns down a funnel cake

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Today, I'd like to talk about democratizing R using the Plumber package. And so to get started, what I want to do is kind of go through and provide an introduction to Plumber if you are perhaps unfamiliar with the package or haven't been exposed to it before. And then highlight some new features that we have that allow us to integrate more tightly with OpenAPI, things of that nature. And then finally, walk through a use case, maybe typical example use case of what Plumber could be used for. And then walk through a demonstration of what that might look like.

So if you're unfamiliar, Plumber is an R package that allows you to create API endpoints around common R functions. So you can define any sort of R function that you're interested in, and then with a few simple commands, you can turn that R function into an endpoint that can then be interacted with via any sort of API call.

This looks like an example Plumber script would look something like this. And again, if you haven't been exposed to Plumber before, there may be some pieces in here that aren't quite recognizable. But for the most part, if we kind of abstract away the specific Plumber parts for just a moment, this starts to look like just a typical R script, right? At the top here, I load the Plumber package. I load in some saved data object that I have. And I apologize if this is a little small in the back. This is all available on GitHub, and I'll make sure that you have the link to review this later. But then I load some saved data object, and then finally, I define a function that interacts with that object.

Plumber annotations and OpenAPI

To look at the Plumber specific pieces as we walk through this, Plumber uses specialized annotations, and these are just special comments written in the regular comment syntax in R with the hash symbol, and then that's followed by a little asterisk. And this allows us to annotate our document or our source script in a way that Plumber can then understand and interpret our just standard R code and turn it into these APIs. So here, this initial, these few initial comments here provide an API name and description from an API.

The next step that I have is I have this filter that's defined, and a filter is used when I have some sort of logic that I want to take place on an incoming request before it's served to an endpoint. And so this is where I could do something like if I want to provide some sort of authentication on incoming requests, I could use a filter to do that. Here this example filter simply takes some information from the incoming request, such as where this is coming from, the user ID associated with it, a timestamp, and prints that out so that I have a nice log available to me of who's interacting with my API as well as the frequency that this API is being interacted with.

Finally we have the function that I started with when we demonstrated this R code, and I just have a few different things that I have that I've annotated on this function. I've given it a name and a little description, and then I've defined the endpoint that this function should respond to. So again, I apologize if it's a little small, but I've defined that this function will respond or will be called when the predict endpoint is accessed with a post request.

To go a little bit further, once I have all these annotations in place and I actually run this API, this thing on the right-hand side gets generated for me. And as an R user when I was first exposed to Plumber, this was like a mind-blowing moment because I wasn't overly familiar with APIs. They seemed kind of scary to me as an R user for a little while. It's like this thing that everybody knows about, I don't know what it is, right? But I started using Plumber, and as soon as I ran my API, all of a sudden, free of charge, I have this nice little interface that would pop up and not only show me what my API looks like, but it allows me to touch and feel the API to make sure that it's operating as I would expect.

If we look at a comparison between the code that we've written and this file and user interface that's been generated, we can start to see how these two things are connected. Here I see that my title has been copied from my source code into this user interface. I can see that the description matches the description that I defined in my R script. And finally, I have this endpoint that's been defined, this predict endpoint that shows that it's going to respond to post requests, and it shows the description that I provided in my source code.

This sort of magical experience is made possible through the use of OpenAPI. Now what OpenAPI is, it was formerly referred to as Swagger. This is a specification, so it's a set of guidelines that defines a way that you can go through and essentially describe an entire API. The endpoints, the attributes of those endpoints, everything from kind of the top to bottom of your API can be described using this particular specification. There's a few different ways this can happen. You can write like a YAML file, which is the example that I have pulled up on the screen. You can also create a JSON file that is then ingested and interpreted by OpenAPI to provide things like the user interface we were just looking at.

Plumber interacts with OpenAPI in a couple of different ways. We've already been exposed to one, and that is through these specialized comments where we have the hash symbol followed by an asterisk, and then typically some sort of decorator like an at symbol. And what happens is Plumber will parse the file, identify these special comments, and then insert those into a named list that is finally processed into a JSON file that OpenAPI will read and provide us with the user interface. Now this functionality has been in Plumber since essentially day one. And one of the nice advantages here is that my annotations, my documentation, my description is all closely connected to my code. They're right next to one another.

However, there are some limitations here, and that is the entire suite of options that are available through OpenAPI is very broad, right? And so it's not feasible to consider creating a specialized Plumber annotation that provides access to every single feature available using the OpenAPI specification. And I'm happy to announce that given some recent efforts that have happened with the Plumber package, we now have the opportunity to update and adjust the full OpenAPI specification at runtime of the Plumber API.

What this looks like is over here on the right-hand side, I have an entry point file that Plumber will read, and in this entry point file, I define at runtime a swagger function that essentially will take in a few different arguments, and all it needs to return is a named list that can be appropriately parsed into JSON that OpenAPI can interpret. So I can update anything that I've written, or I can add additional attributes to my API. In fact, if I wanted to, I could overwrite everything that Plumber has already automatically done and just create my own specification file for my API to be ingested and then rendered for the end user.

A typical data science use case

Given this introduction, let's consider an example use case. As a data scientist, I'm often handed a piece of data and asked to provide some insight, and I go through this process that's described in R for Data Science. I ingest the data, I do some cleaning of my data, and then finally I go through this iterative process of trying to understand the story or understand the insight that should be delivered from my data.

Being a data scientist, that means I'm often creating models, and so I'm going to dig deep into my bag of tricks, and let's do something cutting edge. Let's make a linear model out of this data, right? We've gone through, we've cleaned this up, and we've created this model. The next step now is to present the results, and as an R user, there's a variety of different ways that I can do this. I can create a PowerPoint presentation, I can create an R Markdown document, maybe it's an email that I send. My tool of choice when I'm faced with this opportunity to communicate is to build a Shiny application, and this allows me to build some sort of interactive web user interface that users can then access to touch and play with the model that I've developed, right?

So I can publish this application to something like RStudio Connect, and then I can have my end users come in to the picture, and they're able to come in and interact with my model and get the insight that they need. And now I'm in this happy place where I have a Shiny application that's sitting on top of all the work that I just did, and it's servicing all these end users, whether these are fellow analysts, business users, managers within my department, whatever the case is, they're able to come in, and in a very intuitive way, they can interact with my application, they can interact with my model via this web application that I provided.

Now I've been in this position before, and this is a great feeling to have, right? I've done all this hard work, and now people are using it, and it's getting recognition, and I'm helping people's, ideally I'm helping make people's lives better in some way. As this model starts to gain traction, and as this application starts to gain traction, something always happens, or at least in my experience, something always happens, and that is another set of users slowly start to appear, and these users are a little bit different than the traditional user base, because they don't want a web application, they want programmatic access to whatever I've done, right?

They want to be able to access the underlying model, and typically it's at this point in the process where I start to scratch my head and say, well, we've had a good run, right? Because I know this is what's coming. I'm going to go to one of these engineers, and I'm going to get something like this in return. Now, this is not because R is bad, and Go is good, or vice versa, right? This is because these are different languages designed for different purposes, and I've been in many conversations with software engineers where they're trying so hard to understand my R code, and I'm trying so hard to explain it, but that handoff always has some amount of friction associated with it. And up until recently, that in my mind has just been kind of this necessary evil, right? At some point, my little baby R project has to grow up and enter the real world of software engineering.

I propose today that there's actually a better solution, and that solution is to use Plumber as the intermediary here. Instead of going directly to building a Shiny application, I can instead expose my model or whatever it is that I've worked on as a Plumber API, and now I've given myself the flexibility to not only service end users who want programmatic access to the work that I've done, I can still continue to service my regular users, right? By using a Shiny application to access this API so that my traditional user base that are used to a web interface can continue to use the work that I've done. And at this point, now everybody's happy.

I propose today that there's actually a better solution, and that solution is to use Plumber as the intermediary here. Instead of going directly to building a Shiny application, I can instead expose my model or whatever it is that I've worked on as a Plumber API, and now I've given myself the flexibility to not only service end users who want programmatic access to the work that I've done, I can still continue to service my regular users, right?

Live demo in RStudio

So what does this look like? Let's take a short demo into RStudio. And so here we've got the RStudio IDE. This is a Plumber file that I have open. This is the one that we've been kind of looking at. And so I just want to walk through a few different things that are happening here, and then we'll run this and take a look at what happens under the hood.

So I load the Plumber package, and then as I mentioned before, I have this model that I've saved out. So this is a very simplistic example, but the idea here is this could be a broad range of things, right? There was a great talk given earlier about serving Keras and TensorFlow models using Plumber, right? So it's really limitless in the scope of what you can do here. So I've loaded this model in. Again, I have this filter that logs some information about my incoming requests. And then finally, I have this endpoint that listens on this predict location in response to incoming post requests.

And what this function does, if we look through this, it's actually fairly straightforward. It anticipates that the incoming request contains a post body object, and it anticipates that that post body object is JSON. And so it will try to parse that JSON into a native R data frame. If it can't, it returns an error to the user. Otherwise, it takes this new R data frame and sends it to my model to generate new predictions. Now one of the nice things about Plumber is that by default, any response that's generated within my endpoint function is automatically serialized into JSON, which is a language that most other computer programming languages or toolkits would be able to naturally ingest and understand. So once this prediction is made, down here with this predict call, the resulting predicted values will be returned as a JSON array to whichever client made this request.

Now typically, when I've done something like this, and I've built several different APIs in this fashion, this is where Plumber previously has kind of been stretched to its limits in terms of what it can do from a documentation and an interactivity perspective. And that is because there's no way for me with these annotations to specify details about the request body itself. And so when I would run this, I would get the nice interface, and I would be able to click on my predict endpoint, but I would have no way to really interact with it. I would need some other tool chain, right? I could use curl from the command line, I could use postman or some other API client, but I needed some other tool to verify that things were working.

However, like I mentioned earlier, with the newest updates in Plumber, I can now specify anything that I want about the API specification at runtime of my Plumber API. So here I have this, I'm loading the Plumber file, and then I'm using this run method. I've passed a function into Swagger, and here I'm defining what the request body to this predict endpoint should look like. And when I run this file, and I want to point out for just a moment that in the newest version of the RStudio IDE, there's some nice tooling built around Plumber, so I can automatically run the API from within the IDE with this run API button, and when I click on this, let me expand over here, I now have this nice user interface, just like I've come to expect, but the nice thing is, as I scroll down here, I can see my post endpoint.

If I open this up, notice it says no parameters, which is fine, I didn't define any parameters, but this is traditionally where it stopped, right, and I wanted some mechanism to test this out. Now, however, I see this request body, and it shows me not only an example of what this should look like, it shows me what the schema is, and then it gives me the opportunity to say, well, okay, this is what it looks like, let me try this out, and here's some values that are pre-populated by default, but I could come over, and I already have this pulled up, so I could come over and grab some JSON data. This is just a selection of the MT cars data in R, and I can come over here and edit this, paste it in, execute, and so a couple of things happen.

Now we see the curl command that was generated here. We see the URL that I made this request to, so this is just running locally right now, and I've made a request to this predict endpoint, and then if I go down here, I see the response body, so this is the predicted value, the predicted miles per gallon value of this JSON data that I just passed into Plumber, and notice that I never left RStudio. I built this API, I tested this API, I verified that everything was working all from within RStudio.

Real-world validation

Now some of you at this point may be thinking, okay, that's great, right, this is all well and good, James, but is the handoff really this easy, right? Is it really as easy as you made it seem? And I've had that question myself, and so I was actually on a phone call this past weekend with a close college friend of mine, and I was going through my talk and kind of telling him about what I was going to be demonstrating and everything like that, and he was following along, he's a mobile app developer, and at the end of the talk, he said, well, send me some information, or send me the GitHub repo to your API, and now this is an API that I've published to RStudio Connect, it's open, it's available, and so I sent him the repo, and I really didn't think anything of it, and a couple of days later, I got this in a text message.

Now my friend knows how to spell R, and that's the end, right, he has no working knowledge of R, he's never interacted with R in any sort of context, yet here he's built an entire mobile app using Swift that's interacting with R built by somebody who doesn't know R. This is what Plumber enables us to do, right? We can now go straight from I have this idea in concept to I have this idea that anyone else can interact with, whether it's a Shiny application or it's a mobile application, it's some sort of Java process, whatever the case is, I've now opened up the work that I'm doing to a much broader audience.

This is what Plumber enables us to do, right? We can now go straight from I have this idea in concept to I have this idea that anyone else can interact with, whether it's a Shiny application or it's a mobile application, it's some sort of Java process, whatever the case is, I've now opened up the work that I'm doing to a much broader audience.

If you'd like to take a look at some of the recent changes that have been made, they haven't quite made it to CRAN yet, but you can download the recent changes from GitHub, so you can do that in this manner, and then there are some additional resources that are available in the Plumber documentation, as well as if you're curious about OpenAPI and what that specification entails, and then the GitHub repository for this talk and the materials that I've used are here as well. Thank you very much.

Q&A

Is there anything on the roadmap for testing APIs? Right now what we're doing is we're opening the API in one R session and then running a script with our test calls to the endpoints that we've written in another. Will there be a certain more elegant way of doing that, either in the IDE or elsewhere?

There's nothing on the roadmap right now, primarily because since we've opened this up as, for lack of a better term, a REST API endpoint, there's lots of tooling that exists around testing those types of endpoints, and so we trust that users could leverage some of the existing tooling. At this point, I don't necessarily see a need for us to recreate something, but it's worth maybe continuing to think about.

Is there, for validation of requests, something that you can kind of say, here are all the required parameters, without specifying inside the method, checking for things one by one?

That's where filters come in, and that kind of goes back to this broader idea of, well, sometimes I field questions about, is there integrated security, are there ways to verify things about the incoming requests? Right now there's not any sort of magic incantation that says, hey, verify this request or something like that. That's kind of up to me as the API developer or the R developer to say, maybe I define some filters that check for certain attributes. It might be something like a security token, or it might be something like I'm checking a timestamp to make sure that people aren't making requests beyond operating hours. So you have the flexibility to really define any criteria that you want, but we don't provide anything out of the box that provides verification or anything like that.

Do you guys have any hubs of, like, lots of examples of Plumber APIs? Because I find it kind of hard sometimes to find examples and try to mimic them.

We are working on building out some resources. So the Plumber documentation and Plumber examples are something that we're working on building out, and we'll make sure that those are widely announced and publicized as they become available. And ETA. Sometime soon. But we're definitely, that's something that I'm trying to work on as well as building out some of these examples and everything like that.

Two questions really quickly. First, can you specify a dependency, say, a proprietary dataset that you don't want to expose to the outside? And the second part is, where this background process is running? And do you have a control over it?

So kind of two parts there. The first part is, can I specify something external? In the example that we just looked at, I actually had a serialized R object that I loaded from disk. So I loaded this model that I previously trained from disk, and then it was available to any process referenced within that API. So I could do that with a flat file or something like that. And then one of the nice advantages of publishing to something like RStudio Connect is those dependencies get identified and I can automatically send those files with my API source code. And then in terms of where the underlying process is running, in the example we just looked at, this was just running locally. As soon as I hit run API, it actually just spun up a small web server on my machine to surface this API. When I publish it, there's a variety of ways you can deploy this thing. My recommendation is RStudio Connect because you have the click button deployment, but the Plumber documentation runs through a variety of deployment options.

I have a deployment related question. So I saw that RS Connect deploys today to the pro server, but not to anything else like DigitalOcean. Is that in the roadmap for you guys to include to make it easier to deploy to anything beyond just the pro server?

As it stands right now, publishing from within the IDE, just as kind of a reference, I meant to showcase this. From within the IDE, we mentioned that I have access to run API up here. I also have access to this publish button where I can publish this to RStudio Connect. That's where that interaction ends. But again, in the documentation for Plumber, there's some guidelines around if you wanted to publish to maybe DigitalOcean, for example, there's some guidelines around how you might do that. But I don't anticipate integration of that type of workflow into the IDE.

Just a quick question with the new improvements. You showed two different files you were working on with the OpenAPI. Is that the same one coded differently with the new improvements with OpenAPI or are those two files, I mean, is it the same thing, I guess?

So there's two different files that are at play here. One is the Plumber file and this defines all of the R logic. Plumber also has this concept of an entry point file, which if Plumber, when I run an API that's a Plumber API, it will look for an entry point.R file within that same folder, within that same directory. And if it finds it, it will use that to find additional details about the API. So here, I could programmatically alter my API. I could add additional routers, but what I'm doing in this case is I'm just simply updating the specification file with this named list that will then get parsed into JSON and rendered the way that we saw it.

If you can say a few words regarding scaling, I mean, how many posts can this support? Can this scale up?

Yeah, that's the classic question, right? Does R scale? And the answer is yes, absolutely. And it really depends on the way in which you choose to deploy this. If you deploy this to RStudio Connect, you can actually go in and adjust the number of processes that are dedicated to the support of that API. So the notion here, and part of the underlying question I think is, well, I know that R is single-threaded. If I'm running R in the back end of my API, I anticipate that there may be challenges with scalability. RStudio Connect provides a solution to that by allowing you to spin up concurrent R processes that support your API. So essentially, you add this notion of multi-threadedness to your R process. And then there's, again, depending on the way in which you choose to deploy this, you could deploy this in an environment where it maybe at auto scales in some sort of a Docker cluster or something like that. So it's really up to you in terms of how do you want to deploy this thing, but it certainly is able to scale and meet a high load of demand. I know that in the talk I was in earlier where they were discussing using this at T-Mobile, they mentioned serving a very, very large customer base and their mobile app interfaces using Plumber and accessing R APIs that way, and they've had no issues with it. Thank you all again very much.