
James Blaire & Barret Schloerke | Integrating R with Plumber APIs | RStudio (2020)
Full title: Expanding R Horizons: Integrating R with Plumber APIs In this webinar we will focus on using the Plumber package as a tool for integrating R with other frameworks and technologies. Plumber is a package that converts your existing R code to a web API using unique one-line comments. Example use cases will be used to demonstrate the power of APIs in data science and to highlight new features of the Plumber package. Finally, we will look at methods for deploying Plumber APIs to make them widely accessible. Webinar materials: https://rstudio.com/resources/webinars/expanding-r-horizons-integrating-r-with-plumber-apis/ About James: James is a Solutions Engineer at RStudio, where he focusses on helping RStudio commercial customers successfully manage RStudio products. He is passionate about connecting R to other toolchains through tools like ODBC and APIs. He has a background in statistics and data science and finds any excuse he can to write R code. About Barret: I specialize in Large Data Visualization where I utilize the interactivity of a web browser, the fast iterations of the R programming language, and large data storage capacity of Hadoop
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you, everyone, who's joined in today. I really appreciate your time and hope that this can be a productive time for all of us as we spend some time talking about R and Plumber together. As Robert mentioned, my name is James Blaire. I work as a solutions engineer for RStudio, and we're joined on the line today by Barret Schloerke, who also works here at RStudio and is the primary maintainer of the Plumber package and is here to help with questions and provide some support that way as well. So we're grateful for his presence as well as we get started today.
Our topic today is Expanding R Horizons, Integrating R with Plumber APIs. The idea here is I'm hopeful that this really is an opportunity for if you've never been exposed to Plumber before, I hope you find something that's useful. And if you've used Plumber in the past and have come to kind of see what's new with Plumber, I also hope that there's something useful. So the ideal state here is that everyone walks away with something that's beneficial for them and that they feel like they've learned.
Just to kind of give you an idea of what we're going to do today. First, I want to set the stage with a problem that we're going to look at to kind of frame the conversation that we have. And then we'll have a brief discussion about what an API is in case you're unfamiliar. And then we'll talk about how you get started using APIs with R and the Plumber package. And then we'll spend some time discussing some of the new features that have been recently released in the latest CRAN version of Plumber. And then finally, we'll conclude by looking at different ways that you can deploy these APIs so that they can be widely utilized, either inside of your organization or by a broader audience.
And then at the end, I'll provide some links to additional resources. And like Robert mentioned, this is being recorded. Today's October 28, 2020. So if you're watching this recording down the road, things may have changed in the Plumber package. And I'll share some links here at the end that can make sure that you stay up to date with the latest changes and things like that. But resources will be provided if you if you want to learn more.
The Palmer penguins dataset
Today, we're going to work with a data set. This is the Palmer penguins data set. If you're unfamiliar, this is a fairly kind of recent to the R space data set. You can find some more information at Allison Horse GitHub repository that I have linked here in the slides. But basically, the idea here is we have 344 different observations for three different penguin species, Chinstrap, Gentoo, and the Sedeli. And this is if you're familiar with kind of the iris data set, the typical kind of classification data set that you often see.
This is a similar style data set, right? Fairly easy to understand down here on the on the bottom right hand corner, you can see kind of a preview of what this data looks like. We've got the species, the island they came from, we have some different measurements from each of these different penguins, their bill length, bill depth, flipper length, body mass. And then we have the sex of the penguin and then the year that the measurement took place. So what we want to do here is just kind of frame this and this could be any data set, right? This isn't the example that we walked through today is not specific to this data set by any stretch of the imagination. But I think like most data science, most analytical problems or projects start with some data. And that felt like a natural place to start today.
Building a prediction model
Let's start with some data and go from there. So what we want to do now is we, you know, we'll say we've been handed this data and we're going to go ahead and we're going to do the data science or in some cases, you know, that means build a model. And we're going to take this, these 344 observations. This is not a model building or necessarily, you know, data science themed webinar. So this is not about how to build the best model possible. This is just let's work with an example that's easy for us to understand. So we'll take the data, you know, we might explore it, clean it up a little bit, and then we'll fit this model to it.
And the idea here is given some set of measurements. So given the bill length, the buildup, the flipper length, and the body mass, we want to predict the species of that particular penguin, right? And so that's what we've done in the example that we have pulled up on screen. We fit this model using the tidy models framework, and then we're able to see some output from that model. And now that we have this, we can generate new predictions, right? We can, given a set of new data, maybe we get a new set of measurements from a penguin of unknown species, and we want to predict what the species of that penguin is, we can pass those new values through to the model and generate a predicted outcome in terms of what species we think that penguin is likely to be.
Now, this is all great, but the question now is kind of, well, now what, right? We have a couple of options. Now that we've done our work, we've built this model, we have a few options ahead of us. One is we can just kind of ad hoc score new penguin predictions whenever we receive new data, right? So maybe once a month we get an email from researchers that are off researching penguins, and they send us their new set of data, and we run that set of data through the model in order to predict the species of these new measurements. That works, right? But maybe we want something that's a little bit more real time.
Maybe measurements are happening at a pretty rapid rate, and we don't want our entire day job from here on out to be predicting penguin species using the model that we built, right? But we also don't want the model that we built to just kind of wither away and die. We spend a lot of time here. This was something that we feel like contributes to the goals of our business or the goals of the project that we're working on, and we want to make sure that it continues to contribute in the best way possible. But the problem is we're kind of stuck because we either have to be the ones that generate new values and new predictions because we're the ones that built the model, and we know how to use R, or we have to find somebody else that knows R and can do that themselves, right?
So maybe we email them or we provide them with access to the model, and they're able to generate their own predictions. But that's still kind of a high bar because that means that that person needs to be able to open up an R session and load our model and load the data and run the data through the model, which if you're an R user, that's probably not that big of a deal. But if you're trying to teach one of the business users or one of the end users of your model how to run this, and you've got to teach them about R, now all of a sudden this becomes a much larger task.
The other option is we could just pass this off to somebody else. We could say, okay, we've done our job. Here's the model. Here's kind of how we set this up. Now let's let that model grow up, quote unquote, and we'll give it to an engineering team, and they'll reconstruct this model in a framework that they're using so that it can be used for real-time scoring or real-time predicting of new data. And that's a totally valid option, but it's often a little bit tricky to get that right, right? To be able to hand off the model in that way, it's time-consuming. It can be, you know, it can be a difficult process to go through when in reality we already have everything we need. What we really need is just a convenient way for other people to be able to use our model.
What we really need is just a convenient way for other people to be able to use our model.
What is an API?
And that's really where APIs come in. So let's take a step aside from the model for a moment, and let's just talk about this notion of what an API is. And in general, an API stands for Application Programming Interface. Now that can mean a lot of different things, and in different contexts the idea of an API can mean different things, which makes it a little bit difficult to understand. And so for the purpose of our conversation today, we're going to narrow in on this definition that an API is a web API communicating over HTTP. And basically what that is, is that provides us with a standardized way for different computers to communicate with one another.
And I'm not going to dive into the nitty-gritty details of what makes up an API request and what a response looks like. There's lots of different features involved here that you can look into, and that will provide some resources that provide some detail there. But the main idea is when we talk about web APIs over HTTP, which is what we're going to talk about today, this is just a standardized set way for different computers to communicate. Just like I'm communicating now through my voice, and you're able to understand, computers have their own way of communicating. HTTP web APIs is one of those mechanisms for communication.
The idea is there's a client that sends a request, that request meets the standard of what the server is expecting, and then a server receives that request and then generates some sort of a response. And that response could be something as simple as, I received the request, that's all I need, that's all I need to do. To something that's much more complex, like firing off some sort of process or updating a database or generating predictions against the model, or any number of different things can happen in response to a request that comes in. But the server is responsible for taking the request and doing something with it and then providing a response.
And even if you've never really interacted directly, knowingly with APIs in the past, it's interesting to note that anytime you use a web browser, so if you use Google Chrome or Safari or Brave or Firefox or anything like that, and you visit a website, there's a request that your web browser makes to a server that says, hey, I'd like to view this web page that's at this address, and the server responds with typically a bunch of HTML that your browser then renders for you to view. So anytime you view a website, you're really interacting with APIs, even if you don't acknowledge that fact, right? Your computer is a client and it's sending a request, and there's a server somewhere that's responding to that request with some HTML that you view as the web page.
Now, this is, again, not meant to be a comprehensive overview of what APIs are or how APIs work, but this hopefully sets the stage and gives you a little bit of background. And the next general question I think that makes sense to ask here is, okay, well, that's great, but as an R user, as a statistician, as an actuary, as a data scientist, as an analyst, why do I care, right? Why is this something that's important to me?
And I think there's kind of two reasons here that this becomes significant, and it goes back to our initial conundrum where we built this model and we want to give people access to it, but we don't want to train the whole company on how to use R, right? And the fact is APIs allow the work that you do to be used by a wide range of tools and technologies, and that's no longer limited to just the R language, okay? So, I can now essentially say, okay, I've got this model that we're going to work with, and we'll walk through an example of this in just a moment, but I have this model here, and I've got another team in my organization that's using Python, or they're using C Sharp, or they're using JavaScript, or they're using C++, or they're using any number of tools, and they want to use my model. Well, APIs allow me to do that in a very, very straightforward and easy way so that I no longer need to hand off my model to be rebuilt by another team in another language, but instead, they can just communicate directly with the work that I've done via this web API. And this, again, the idea here is this dramatically reduces the handoff between the work that you do in R and other tools that are being built within your organization by teams using other languages or other technology frameworks.
How Plumber works
Here's an example of a Plumber API over here on the left-hand side of the screen, and I want us to, I'm going to walk through some of the core components here, but before I do that, if you just take a step back and ignore the comments, right, so if you just ignore the commented lines of code or the commented lines in this R code, it should look pretty straightforward if you've written R code before. We've got a couple of functions that we define, and we load the Plumber package at the top of this script. And if I look at these functions, there's nothing that's terribly, you know, interesting or different about them. If I look at the first function over here, it takes a single parameter message, and then it returns a list just that verifies that it received the message, essentially, right? It returns a list that says, okay, the message is, and then it echoes back that message. My second function here doesn't take any arguments, and instead it just returns a histogram of some random values that I drew from the normal distribution.
So, it's pretty easy to reason about what these two functions will do, right? Function one is going to echo back whatever I give it. Function two is going to give me a random histogram of some values. Now, the real, I think the real magic and the real power of the Plumber package is how it allows me to go from these simple R functions to a responsive web API in just a couple of comments. And that's where these comments come into play. So, if you focus on what I have highlighted here in this box, you'll notice that these comments look a little bit different than standard R comments. I've got the pound similar, the hashtag symbol, that's what is typically used to comment things in R, but then I follow that with an asterisk. And that indicates the Plumber, hey, this is a special comment that provides instructions for what Plumber needs to do.
And so, I can see, for example, if I take this first comment, I give it this API title tag. And I know that because it starts with the at symbol and an API title. And then I give the API some sort of a name, Plumber example API in this case. And then the next comment block, these three comments following describe how Plumber should use the function that comes afterwards. So, first comment here tells me, gives a description of what this function is going to do. It's going to echo back the input. My second comment here describes the parameters of that function. So, in this case, I have a single message parameter, that parameter is the message that I'm going to echo. And then the final line here is perhaps the most important, because it defines both what types of requests this will respond to, as well as what path this particular function is listening on.
And so, essentially, what this says is, okay, if I run this as an API, and I make some sort of a request to the echo path, you see this slash echo here, then what I'm going to get back is I'm going to get back the response of this function. And whatever this function does, that will be returned to whatever client makes a request at this echo endpoint. And to be even more specific, that client will make a get request, which we've denoted with this get tag here. So, the idea is I write standard R code as functions. I use these special comments to identify how Plumber should handle those functions. And then I run the API, or I plum it, right. And you can see here in this little screenshot, I've got this run API button.
And in practice, what this looks like is this. I click run API. I'll get this nice little window that pops open over here that shows me the API running. And then I can try this out. I can say, here's my message is hello world. I'm going to go ahead and try that out. And if we scroll down, we can see that the output of that message is the output or the output of that of that execution is the output of the function that I wrote in R. So again, if I look at this, try this out, write my message, hello world. And then here at the bottom, we'll see some JSON that comes back that contains the output of my function message, the messages, hello world. And it really is that simple, right? Now what I've done is I've taken my functions in R and I've made them so that they're easily accessible from other tools and other frameworks.
Building the penguin prediction API
So, let's actually take a look at this and let's consider our original kind of situation here. We've got this model that we built using this Penguin data set. And what we want to do now is we want to see, okay, is there a way for us to expose this model so that others can interact with it? Is there a way for us to essentially say, okay, what if a client makes a request with new data? Could we just respond with the predicted outcomes? Could we just respond with the predicted species given that request and do that automatically so that if somebody in another department has a Python script that wants to use our model, they can just send a request with new data and we can send them back the predicted outcome. That's what we want to accomplish here.
So, let's actually, I'm going to flip over here to RStudio now and let's go ahead and let's build this out. Okay. So, what we want to do is we want to load the plumber package. And then before I do anything else, let's just kind of make sure that we've got the pieces in place that we want to have in place. So, what I'm going to do first here is I'm going to create, I'm going to zoom in a little bit, I'm going to create just a single endpoint that all it does is just verifies that things are working. It's not going to do anything other than just return some information that says, hey, the lights are on, things seem to be working.
So, we'll call this, we'll write the function here, if I could type, and we'll say we want to return a list and we want to say, okay, the status is that all is good. And then we'll return the time just so we can verify that this is working the way that it should. Okay. And if I run this function, we could call this function status or something like that. And if I run this function in R, we can see what I get back. I get back this list and it tells me the status is all good. And it gives me this timestamp of when that function executed.
And now what I want to do is I want to turn this into an API endpoint. So, we'll give it a little description here. We'll say, we'll call this health check. We'll just call this health check, right? Is the API running? Okay. So, that's our description. And then this will respond to get requests at the health check panel, right? So, I've taken, I've written my function and now all I'm doing is I'm adding these special comments that we saw before that tell Plumber what to do with this function, right? What do I want to do? Well, I want to call it health check. This is the title and description I'm going to give it. And then here's the important piece that says, okay, whenever a get request comes in, and I'm not going to get into what different request types are and things like that, that's an exercise for another time. But essentially, if a request comes into this location, then return the results of this function, execute this function and return the results.
And if we save this, we can click run API here. And we'll see over here on the right hand side, we've got this little interface that shows up that allows us to see, okay, we've got this health check. Let me expand this out a little bit so we can see. All right. So, here I've got this get health check. It tells me the description of the endpoint, which is what we provided right here is the API running health check is the API running. And we can try this out. Let's come in and try this out. There's no parameters. I don't need to provide any parameters here. So, I can just execute this. And here I see my response body.
And I can see directly. In fact, let's see if we can make this a little easier to compare. If I scroll up up here. Here we go. Here's the result of my function in R, right? We ran the function in R. And that's what we see. And here is the result when I make a request, an API request to this endpoint. And hopefully you can see the connection between the two, right? I hear up top, I've got an R list that was created that tells me the status and the time. And here down below, I have that same list, except now it's just in JSON format, which is the kind of the industry standard for how these APIs communicate.
Now, you can adjust how this what type of response this an API generates. And we'll look at that here in a little bit. But by default, plumber will take the output of your function and convert it to JSON. And that JSON data will be returned to the client. So, here's what we here's what we saw returned to the client. And basically, what this says is, okay, we have this working, right? plumbers working, we're able to see that the API is running, we can try this out again, if we wanted to, right, we could execute this again. And we see the same result. But the timestamp is now updated to reflect the current time, which in my local time is 1123 am.
All right, we're gonna stop the API and now we're going to keep going. So this this makes sure that okay, we've got all the pieces in place, we know what we're doing. Now let's figure out, okay, what do we really want to do? What's our goal here? Well, first of all, I need the model. So I've already saved the model, we'll just read the model in. Okay, so I've got this model op file that I've already saved, we can read this in here. Alright, so we've got our model. And now what I want to be able to do is I want to say, okay, what if I if I predict my model, and I say new data is and I've got some JSON data lying around.
Okay, and I'll say type equals crop. Okay. So let's see. There we go. Alright. So now if I look at this, and I'll walk through what we did here. But if I look, what I really want is I want to say, okay, given my model that I already trained, I want to be able to give it some new data and return the predicted outcomes. So in this case, what I'm returning is I'm returning a probability value for each of the three different species, right. So in this case, penguin one has a strong, we're almost entirely confident in this gen two, prediction two, or observation two, we're very strongly predicting that it's Delhi, so on and so forth, I can see the breakdown across each potential outcome or each potential species here in my output.
So this works. This is I mean, this is how I would generate new predictions in R. But what I really want is I really want this functionality to take place in an API, I want somebody to be able to just pass me some data. And then I can pass them back the predicted outcomes, like like what I'm doing here in R. So this is this is our goal right here. Our goal is to do this is to have this kind of behavior in an API endpoint.
So we've got our model in place, we'll need to let's bring in we've got a couple of other packages that we'll need to make sure that they're available in our environment here just for the model. So we need to bring in the parse net package for the predict function to work the right the way that we want it to. And this is all because of how we train the model. And then we want to bring in the ranger package because we trained this model using the ranger package. So this is again, just to remind ourselves, this is a random forest model that we built, we saved it out. And now we're going to use it in this API that we're building.
Okay, I've got these pieces in place. So now, what I want is I want to say, okay, I've got a function. And and what I want to be able to do is I want to say, okay, I want to predict I've already got my model, because I've loaded that in my environment. So I've got I've got my model. And then I've got new data. And this is the part, okay, now I got to figure this out, because where is this data going to come from? Right? I don't have, like, like, when I run this as an API, I want this data to come from the user, right? This is the this is this is data that comes from the client request. Okay.
So, okay, let's think about how that works. The way that this works in plumber is, I'm going to have, I'm going to write my function so that it takes a couple of arguments, request req and response rs. And what this does, if I write my function this way, it allows me to have access to the full request that's being made from within my function, plumber will automatically pass that that request object into my function, and then I have access to it. And one of the cool things with plumber with with the latest release of plumber is that you can automatically parse incoming data and make it available in the request itself. So for example, if I have a user that makes a request with some JSON data, and says, Okay, here's here's some, here's some information about some new penguins we found. And I want to know what their what their species is. So they make this request, plumber will automatically take that JSON data, convert it to a data frame, and make it available for me to use within the body of this function that I'm writing right now.
So all we need to know is we need to know, okay, well, what, what, like, where, where does that get stored, and it gets stored as the request as part of the request, and it gets stored as the body object attached to this incoming request. And in fact, we'll take a look at that in just a moment. But this is this is essentially what I want to be able to do, I want to say, okay, let's do this. And let's say, pipe is probability. Like we saw before, right, we're just we're just taking this, this function, that was our goal. And we're, we're now putting it inside this function that we're going to use within our API. So we're going to say this is going to be predict species for new penguins.
Okay. And then this is, let's say we want to respond to post requests at the predict endpoint. Okay, excellent. Let's, let's, let's go ahead and run this. I'm going to go, I'm going to go ahead and comment this out, just because this isn't a necessary piece of what we're doing. This is just so that we know what we're trying to accomplish here. We're going to run this API. And let's pop this open over here. Okay. So we see our health check, let's just verify that that's doing what we think it should be.
And it looks like it is, which is great. That means everything's working on the plumber side, as far as we know. And then let's take a look at our predict endpoint here. Okay, we've got this predict endpoint, we'll try it out. Okay, execute. Okay, we get some sort of error here. Okay. And if we look at this, this is just an R error that's come back to us in JSON format, right, which is again, a nice kind of feature of a plumber is that if my R code throws an error, I can capture that error. And I can gracefully return something to the user. Or if I don't capture it or do anything like that, then plumber will automatically take that error message, convert it into JSON and return it back to the user. So they get some idea of what's going on. And what's really happening here is, like, I didn't give this any data. So this is saying, look, I didn't find, you know, one or more independent variables was not found.
Like, I didn't find any data. And that actually makes sense, because I didn't, I didn't give it any data in here, right? Like, I didn't, there wasn't any part of this where I like provided it with some sample data. And so now I've got to figure out, okay, well, how do I, how can I provide some sample data here? Right. And one of the one of the features that's new to plumber is the ability to modify the user interface that appears.
Customizing the open API specification
So we saw this user interface that showed up over here. In fact, let me just run this one more time so we can see it. Right. So I've got this user interface that appears over here. This is really, really nice, because it means that as an R user, when I'm building my API, and I run it, I get a nice, really clean way to interact with the API directly from within our studio, right, right here in our studio, I can check to see that my API is working. And here we go, right? Things are working, I can check this endpoint. But as we just saw, if I try to do something like check this endpoint, all of a sudden, I'm in a little bit of a bind, because there's nothing here that allows me to pass in data.
So what I really would like is I'd like somebody to say, okay, like, how can I like, can I just plop some JSON in here, right? Like, instead of, instead of not being able to do anything, what if I could just drop some JSON into here, and and run it that way. And with the newest release of plumber, that is entirely possible. So that entire user interface that we're just looking at, is built around something called the open API specification. Now, the entire open API specification is a massive standardization standardized format for defining API behavior, and endpoints and things like that, I am not going to spend a lot of time going into the entirety of what open API is, and all that entails. And you're certainly welcome to look more into that on your own. But what it means for us as our users, and as plumber users, is that if we want to enhance the capability of our of our user interface here, we can do so by doing one of two things, we can either modify it directly in our with by modifying a list, a massive list, or we can provide our own specification file and use that from within our and so that's, that's what we're going to do here.
And I'll show you what this looks like. So I'm going to create another little section here. And we're going to call this app plumber. This is another new feature of the package. And basically what this special tag does is it allows me to take my existing plumber object and modify it however I want. Okay, so I'm no longer defining an endpoint. But rather, what I'm doing is I'm saying, okay, after I've done all this stuff, so after I've added health check, after I've added predict, I want to further modify this object that's being created. And so I'm going to say this is a function, and we'll call it PR for my plumber router. And then in here, I can modify what what I want to do to this PR object. And this gives me another chance to unveil one more really great feature of the newest release of plumber. And that is, I can now do something like this PR pipe, we are set API spec. And I want to say, okay, let's I've got this yaml file that I'm going to read. And I'll show you what this file looks like in just a moment here.
Okay, so let's before we look at this yaml file, let's look at what we're doing here. I'm saying, okay, plumber, take this information and start building me an API router. But now that you've built these pieces, so you've got this endpoint, you've got this endpoint, I want to do something else to it, I want to take that router, and I want to add this API specification file, right. And so what I can do here is I can either provide a named list, which this will do. Or I can modify the existing specification that plumber is already building for me, either one works, I kind of like this approach, because it allows me to lay out the whole outline of what I want. And that can be helpful. But either one works. So if we look at this open API file here, this is a yaml file that just contains information about the API.
And notice, I define like the summary for my health check, I define the summary for predict. And if I run this, now we come back over here to my plumber file, and we run this, you'll notice that my descriptions have now changed, right? My health check says determine if the API is running and listening as expected. And notice that here, my description says health check is the API running? Well, why are those two things different? They're different, because I overwrite in this yaml file what that description should be. Notice in here, I say, determine if the API is running and listening as expected. And that is what we see listed in my interface.
Okay, so I have effectively overwritten some of what I already did by providing this new definition file. But in addition to overwriting some of that information, I've also described that I want to be able to pass in JSON data to this predict endpoint. And so if I come over here and open this up, it tells me here, look, the request body should be JSON. And here's an example of what that JSON data should look like. And if I try this out, I can come in here and I can edit this, I could say, well, what if we said 5110? Or what if we said 5440 or 5540? Whatever the case is, right? I can come in and I can pass my own data into this endpoint now and try it out. Right, we can execute this. And here we see that we have once again, an internal error.
And this is because if we come back over to our API file for just a moment, we're passing in only a single value. And it's being parsed as a vector instead of a data frame. So if we just change this one piece, and say we want this to be a data frame. Now let's run the API one more time. Try this out. Here's our data that we're passing in. Let's execute on that data. And here we see our response is a JSON object that contains the predicted probability for each associate species. So in this case, this particular penguin is almost a toss up, right? We're not super sure which one it is. It may be a chinstrap, but it's a fairly even split across all three potential outcomes here.
And what's great about this is now we have exactly what we were describing originally, right? If I come back up here for just a moment and look at what my goal was, right? I want to enable an external user to give me some data. And then I want to be able to return to them the associated likelihood that that particular set of data matches up with each of the three species that are potential outcomes. And that's exactly what I've done here.
Now to recap, right, just to revisit this idea one more time, we have specified our own specification file, this YAML file that identifies all the details of how this user interface should be laid out. And that's what enables me to have the ability to input the JSON data needed to generate a response from this endpoint. So all from within RStudio itself, I'm able to build my API, and I'm able to come over here and verify that that API is working the way that it needs to.
Okay, that's been a little bit to swallow. I know that, you know, maybe we've gotten into some detail that we didn't necessarily need to or want to get into. But I think it's useful to understand how some of these pieces work. Is it necessary to change the API specification file? No. Is it something that you have to do? No, right? You can do this, you can operate just fine without it. Now, granted, that might mean that you need to find some other way to make a request to your API to verify that your predict endpoint is working the way that you expect. But there are lots of tools that work that will enable you to interact with APIs and a really nice kind of clean interface. Postman is one that I use regularly. And there are several others that exist as well that allow you to generate requests and see responses and things like that. So do you need to make these modifications? No. But if you want to, you certainly can.
Customizing the UI and serializers
The default interface here is Swagger. And I see that up here in the top left of the corner. This is the Swagger interface. Swagger has been around for quite a while. Open API, the specification kind of grew out of some of the efforts that went into Swagger and everything like that. But essentially, now what we have is Swagger is will interpret my specification and give me this interface. There are other alternatives to Swagger that exist today that do the same thing, that create an interface for me to use. And now within Plumber, given the latest release, it's entirely possible to use one of those other systems instead of using Swagger itself.
So, for example, if I wanted to use it, there's a library or a viewer called Rapidoc. So if I wanted to use Rapidoc, I could come in here. There's an R package. It's not released on CRAN. It's only on GitHub at this point. There's an R package called Rapidoc that I can install here or that I could load here. And then further down, I could say, let's do PR set docs. And we'll say docs is Rapidoc. We'll save these changes. And now if I come back and run my API again, you'll see that my interface looks different. I have the same functionality, right? I can open up this one. I can try this out. I can see the response. I can open up this one. Come in here. I can see the example data. I can try this out. We see the response, right? So again, my functionality remains the same, but maybe I have a preference over which particular UI I use when I'm experimenting with and working with my APIs. And you have the flexibility now in Plumber to be able to define those.
One more thing that I want to look at here is while we've got this open, we have this built in example here that gives us the measurements for a single penguin. But I've also got this Java or excuse me, this this JSON data here that contains measurements for several penguins. And if we copy this in here and try this out, we'll see that our response now includes predicted output for all the penguins that I provided input for. Right. So I now have this long JSON object that contains all the information for the penguins that I provided.
OK, let's look at one more thing here. Let's come back into my plumber file and let's say, OK, instead of JSON, though, let's say I just I really want I really want CSV data to come out of this. Right. So we could say at serializer CSV. This tells plumber, look, when you return the output of this function, don't return it as as JSON, return it as a CSV file. And if I run this again or return this as CSV data, and if I run this again and try this, let's bring in all of our observations again. You can see now instead of this JSON object, I have comma separated values for my response. Right. So I can adjust what specifically is returned to the client from within here.
New features in Plumber
OK, so just to recap and kind of summarize what we looked at here, several new features have landed in the plumber package with the latest release, the tidy interface. So the ability to build your APIs by piping from one command to the next is new. We barely touched on that, but that's something that you can explore more if you're interested. The ability to automatically parse and parse the incoming request body so that it's available for downstream execution is new, which we saw that when we said, OK, the request body contains the parsed values that we're getting from the client. There's support for new serializers. And the way that you define serializers is greatly simplified. The open API specification we looked at, you can adjust that in a couple of different ways by either providing your own file or by modifying the existing list that are is working with. You can customize the UI. So instead of using swagger, which is the default, you can use rapid. You can use redock. There's a handful of different custom UIs you can use. And last but not least, the plumber hex logo has gotten a facelift. This was on the title slide, so you probably already come across it. But there's the new plumber logo that that accompanies this update that's happening.
Deploying Plumber APIs
OK, so now we've got this API. It's running. It's doing what we want. We've mission accomplished. Right. We're able to take in some data. We're able to generate predictions. We're able to return that data back to the client. All is good, except for one thing. We are still the bottleneck. Right. Like this is like I'm just working off my laptop right now. And so when I'm running this API, it's just running locally here on my own machine. And that works for development purposes and testing purposes. But what happens when I now want to say, look, Sarah and engineering is ready to make a request to this API? Well, I don't want my laptop to always to be the one thing that's servicing these requests. I need some way to deploy this so that it's in an environment where it's always listening and available to clients that are trying to make that request.
Again, whether those clients or other parts of my organization, or maybe I've built an API that's just generally publicly available and is being used by who knows who. There are a couple of options that exist for deploying plumber APIs. RStudio Connect is a professional product that's developed and built by RStudio that allows for easy deployment of plumber APIs. We'll look at an example of that here in just a moment. You can deploy, you can wrap up these APIs inside of a Docker container and then deploy them into an environment that's suitable for that type of deployment. And then there's some nice kind of helper functions that used to exist in the plumber package and have now been offloaded to a new package called Plumber Deploy that allow you to easily deploy to Digital Ocean. And that package may evolve to include other deployment endpoints as well.
So we've got a host of different options. I want to spend a couple of minutes just to talk about RStudio Connect as one potential way of deploying these plumber APIs. RStudio Connect is like I mentioned, it's a professional product that we create and develop here at RStudio. It allows for easy push button deployment of things created in R and Python. It handles dependency management. So all of your packages and dependencies come along for the ride. It allows you to adjust how your API scales, how it responds to concurrent requests, how new processes are generated, things of that nature. It integrates with Git and GitHub so that you can automatically deploy from repositories. You can specify who has permission and authentication to access things like APIs and dashboards and things like that. And then, like I said, in addition to Plumber, you can also publish R Markdown documents and reports, Shiny applications, Jupyter notebooks, Flask, Dash, Streamlet, Python applications, and other additional pieces of content.
And if you'd like to learn more, you can visit our website rstudio.com slash product slash connect, which details what RStudio Connect is and provides a little bit of additional context. But what I'd like to do now is I'd like to take a look at RStudio Connect. So if I come here, this is RStudio Connect. Let's come in to kind of the landing page here. So here's what I see when I come into RStudio Connect, something like this. And let's go through the process of publishing this API that we built.
Now, I've set up this API that we built in RStudio to be publishable via Git. So if I come in here and say, OK, let's import this from a Git repository. Let's pull in the repository name. So here's the repository for that. I had to find it really quick. So GitHub.com, I've got this repository that I keep talks and things that I do in. And then this is the Plumber Webinar 2020 repository. Click Next here. This will look to see what branches are available. I've only got a single branch available there right now. Click Next. And then this will look for deployable subdirectories. In this case, we only have one. It's looking for this manifest file that I previously created. And then we'll call this Plumber. OK, deploy content. And now this is deploying our content.
There we go. Let's open this up and see what it looks like. OK, so here we have our API running. I can open this up and say, OK, let's, you know, anybody can come in here and view this if they want to. Here's my health check. Let's verify that this is working. Looks like it is. Gives me back the information that I expect. Let's check out our our other endpoint or predict endpoint. It looks like it's doing what we wanted to do. There we go. We see our CSV output here and all is well and good. Right. I've got everything. Everything is working the way that I expect it to.
And just to kind of illustrate this, right, if I so one one way to to quickly and easily demonstrate how an external either tool or framework or language can make a request to this API. If I open this API up here. There we go. Let's take a look at this

