How to develop and deploy a machine learning model with Posit

Transcript#

This transcript was generated automatically and may contain errors.

Hey everybody, thank you so much for joining us today. It's nice to see you, even if it is just seeing your names pop up in the YouTube chat. But it was also fun to see how excited everybody was for this session and all the responses to my email letting people know about it.

So if you're wondering how you can make sure you're notified of events like this one too, you can subscribe to Posit event emails and I will include that in the YouTube details below.

Before we jump in today, I also wanted to use this opportunity to remind everybody that Posit Conf 2024 registration is now open. If you'd like to join us in Seattle in August. Julia and Isabel are actually leading one of the one day in person workshops there on an intro to MLOps with Vetiver.

I'm always so impressed with Julia's Tidy Tuesday YouTube videos. So I had reached out to her to see if she'd be interested in joining forces this month for our workflow demo. So I encourage you to also subscribe to Julia's channel and I'll put that in the YouTube details too.

If you have questions during the demo today, you can use the Slido link shown on the screen or ask questions in the YouTube chat. And right after the demo in about 30 minutes or so, we will jump over into a live Q&A room too. If you have specific questions about Posit Team, I welcome you to book time to chat with our team after as well. But thank you all so much for choosing to spend time with us today. I will turn it over to Julia and see you all in a bit.

Hi, my name is Julia Silge and I'm a data scientist and software engineer at Posit PBC. And today in this screencast, we are going to use Tidy Models and Vetiver to explore the machine learning lifecycle. You're starting with some data, you develop that model, and then you deploy that model.

This screencast is going to have an emphasis on using the Posit Team tools because it turns out that a lot of things that can be difficult about managing the whole lifecycle of a machine learning model can be addressed with the kind of tools you use. So we're going to be using Posit Workbench as a place for model development, Posit Connect as a place to publish and put our models into production, and then Posit Package Manager, which helps us manage reproducibility and package versions.

We're using a recent Tidy Tuesday dataset that is on educational attainment in the UK. Basically, towns in the UK, how do they differ in terms of educational attainment? How is that related to the size of the town, the income of the town, and other characteristics that we have? So let's get started.

So this can be a common failure mode in deploying models that you can lose reproducibility because your packages are not really locked down and the same in both ways.

Notice that not only do I have these guarantees around reproducibility and using the right packages because I'm in this unified environment where I have all these things together. But also, that was pretty fast. That was pretty fast. It is really deployed now. Here it is. Here it is over here.

So I can do things like, let's say, uk.edu.rsa here. So I'm making a nice URL for it. Again, let me say that all users can get to it, but login is required. Let me save that. And then I can interact with this here. If I click this, it'll be in its little standalone window. And we've got all these API endpoints that were automatically created for me. We've got a ping endpoint that tells me whether the model is online or not, the URL, that same metadata that we had before. I can get it through the API in case that's a better fit.

The input data prototype is telling me what's the shape of the data that needs to go into the model. Notice that by default, we don't put possible values here, but that's actually something you can also control with an optional argument. Here's where the really big thing happens here. So this is the predict endpoint. These are all kind of supporting, and then this is the main one here.

Okay, so let's say higher deprivation towns, small towns. Okay, so higher deprivation towns. And this one's capitalized, yes. Okay, small towns. Let's say coastal. Let's say university. And let's say northwest, like this. Okay, so if we try this, it's going to get me a prediction. So notice this is on the lower end here. If we are looking at a lower deprivation town, it's quite a bit on the higher end here.

Think of what we're looking at right now as visual documentation for your model. This is the kind of thing you can use when you're collaborating with a software engineer, coworker who needs to know how to generate predictions from your model. Don't think of this as like a shiny app, but do think of this as a way for you to be a good collaborator when it comes to the MLOps lifecycle, so that you can give your software engineer, coworkers, what they need to integrate predictions into the rest of your system.

Think of what we're looking at right now as visual documentation for your model. This is the kind of thing you can use when you're collaborating with a software engineer, coworker who needs to know how to generate predictions from your model.

So this is just a regular API. So anybody inside of your, depending on the levels of which this is made available to people, you can access this from Python, from JavaScript. It is an API, but you also can access it from R. So let's talk about how you do that as kind of this last little thing.

So let me get the URL here. So I'm going to create a URL, which is just a string. So this is the main URL, and then let's say I want to get predictions from it. So I'm going to use the predict endpoint. I'm going to create an endpoint. Let me create a Vetiver endpoint object with the URL, and then we can look at what that looks like here. So this is a model API endpoint for prediction. So I have not called it yet, so let me do that right now.

So we predict using this endpoint, using the predict function, just actually like it was a model that was in memory. And so instead of putting the model here, we will say slice, let's say slice sample of the testing data and equals 10 like this. So we can, like if we had the model here, like let's say, if we had the model here, we could predict on it. So this is predicting with the model that's in memory in R right now that I have. I can get some answers here.

Now I can actually also predict on the endpoint, which is so convenient and nice. Now I think since I set this to you have to be logged in, this that I'm about to do is going to fail because it's like you're not logged in. You need to ‑‑ I'm not going to give you access to this. So instead what I can do is I can use authentication here. So I'm going to create something called my connect authentication.

And I am going to use hitter. I'm going to add headers that have my authentication here. So it's ‑‑ if I remember right, this might take me a sec to do. I have an environment variable. I'm not going to show you what my ‑‑ I'm not going to show you what it is, but I have an environment variable in here for authentication in this dot R environ file. And I'm going to get it. It's called connect API key. Like that. So I'm not going to print this out because I don't want you all ‑‑ I don't want to put my ‑‑ I don't want to put my API key on YouTube. Talk about a security fail.

Now with authentication, I'm going to predict on that end point, and it is calling the API. It's taking my data from R, sending it, converting it to JSON, sending it to the API, and then sending it back. And so I have taken a different sample, so the values are different, but we are getting these predictions here as well.

Wrapping up

So just to highlight again, I did model development in Posit Workbench. I published to Posit Connect both my ‑‑ like if I want to ‑‑ if I go back to content and look at what I have here, these are my things. I have two artifacts here, the binary model object together with its metadata and the deployed API, and especially when it came to deployment, being able to use Posit Package Manager solved some reproducibility and speed challenges that we often get with keeping, maintaining correct versions.

All right. We did it. We started with this data about educational attainment in the U.K. We spent a little bit of time exploring that data to understand it. Then we moved into model development. So these pieces of EDA and model development happened on Workbench by using Workbench that's easily connected to these next steps. It really sets us up for success. We then deployed the model. We were managing a couple of artifacts, including the model binary and the API that serves predictions using that model binary. We published those to Connect, which were nicely authenticated to each other, so we had this smooth experience. And then throughout this process, we used Posit Package Manager, which lets us have confidence about exactly what packages are being used.

We have confidence about the reproducibility and also makes installation really fast, which sometimes can be a struggle if you've ever spent a long time building a Docker container. Now, the steps that I showed you how to use here, use open source software. A great thing about this kind of approach is that you're not locked into one set of professional tools or one vendor, but as someone who uses our tools, I do think we solve a lot of problems that can be quite challenging. I hope this was helpful, and I'll see you next time.

Thank you so much, Julia. Now we're going to go jump over to our live Q&A, where Julia will join us in that Q&A room. YouTube should automatically push you over there, but if for any reason that doesn't happen, the link to the Q&A room will be in the YouTube details below, and we'll put it into the chat right now, too. We'll see you over there in just a second.

How to develop and deploy a machine learning model with Posit

Transcript#

Exploring the data in Posit Workbench

Building the model with Tidy Models

Deploying the model with Vetiver and Posit Connect

Wrapping up