
R-Ladies Rome (English) - Extending the data science workflow: {vetiver} and {pins}
In this video, Isabel Zimmerman goes through the fundamental aspects of machine learning operations (MLOps) tasks, bridging the gap between data analysis and model deployment. While data practitioners excel in data analysis and model development, there's often a significant gap in understanding tasks beyond the conventional data science workflow. You'll explore crucial MLOps concepts, such as deploying models as API endpoints and monitoring model decay, while leveraging the powerful capabilities of the vetiver and pins packages. Material: - presentation: https://www.isabelizimm.me/talk-extending-ds-workflow-rladies/ - RStudioConf2022 talk: https://www.isabelizimm.me/talks/rstudioconf2022/ - Vetiver website: https://vetiver.rstudio.com/ 0:00 Welcome & R-Ladies Rome Chapter Introduction 0:04:45 Slido Pools 0:10:15 Talk Intro 0:10:56 Isalbel's Talk 0:47:53 Hands-on session 1:02:20 Q&A Have a look at our WebSite for more insights about our events: https://rladiesrome.quarto.pub/website/talks/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome everybody to this R-Ladies Rome event. We are thrilled to have Isabel Zimmermann tonight with us to talk to us about extending the data science workflow with vetiver and pins.
A bit of a disclaimer first, this talk is recorded and will be posted on our YouTube channel. So feel free to turn off your cameras if you do not want to be recorded. And I mean, a nice thing to keep in mind is that we prioritize creating a safe and inclusive space free from any form of harassment, fostering a respectful environment for everyone to learn and connect.
So we would like for you to keep in mind that all attendees are expected to adhere to our code of conduct. You can find it in our website, rladies.org. And you can of course use the chat to introduce yourselves and to ask questions or just raise your hand.
So what can you expect from this talk tonight? Basically, whether you're a beginner or an experienced data practitioner, tonight, you can expect to learn about actionable strategies and tips for enhancing your data science capabilities within MLOps.
And yes, basically we would like to welcome everyone. This event is hosted by R-Ladies Rome. I am Silvana Costa. I'm from Uruguay. I work as a data scientist in tech and I have a background as a PhD in econometrics. And together with Federica Gazzelloni and Rafael Arribeiro Lucas, we are thrilled to have you all here tonight.
Hello everyone. My name is Federica Gazzelloni. I am one of the Chapter Organizer. I'm Italian, I'm from Rome. Background is statistics and archaeological science. So excited to be part of this team.
Hello everyone. My name is Rafael, I'm from Brazil. I'm a researcher in cardiovascular disease. And I'm developing as a Data Scientist.
Maybe a little bit about what is R-Ladies. R-Ladies is a global organization with the mission of promoting the R language and for empowering women at all user levels by building a collaborative global network. It is a gender diversity friendly community founded in 2012 by Gabriela de Queiroz in San Francisco. And R-Ladies is now a worldwide organization with more than 200 chapters in more than 60 countries and 4,000 events and more than 90,000 members globally.
And R-Ladies Rome is a local chapter of R-Ladies Global. And our monthly meetings provide a platform to discuss current trends and hot topics in R. And we encourage active participation, of course, and engagement from all of our attendees.
Some of our past events have been in January, an introduction to Quarto with Torin White. In February, we had two events, building a chatbot with Shiny and R with James Wade. And the last one we had was debugging in R with Shannon Pileggi.
Well, so today in March, we have Isabel with us. And our next upcoming event is the 16th of April. And this will be for geospatial data science and public health surveillance with Paola Moraga. And then in May, we will have an evening with Hadley Wickham. And for June, we are expecting a topic in R OpenSci with Giannina Bellini.
Introduction and welcome
As Federica said, now it's time to hand over the floor to Isabel Zimmerman. She's a software engineer at Posit, where she works primarily on tools for MLOps tasks. And she also serves as a current editor-in-chief at PyOpenSci, where she helps facilitate open scientific software in the Python ecosystem.
So, without further ado, let's welcome her to talk us about extending the data science workflow with vetiver and pins. And thank you very much, Isabel, for making the time to be here with us tonight.
Of course. Thank you, Federica and Silvana and Raffaella for having me here.
So, thanks, everyone, for joining me on this Friday afternoon or evening or morning, depending on where you're at in the world. Um, I hope you're as excited to learn about vetiver and pins as I am to teach it today. Um, and vetiver and pins and this whole idea of extending the data science workflow is something that's super near and dear to my heart.
But while you guys are listening, there is a small code-along. At the end, it's only going to be, like, 30 lines of R code. A lot of those are empty lines as well. But if you can install a few packages, if you're interested in joining this code-along at the end, these are the packages that you should have installed.
So, if you want to open up your RStudio IDE or wherever you are writing R code and have these pins installed or packages installed, you should be ready to go.
About Isabel and the data science workflow
So, who am I? This is me. And I actually recently graduated with my master's degree. I did the whole, like, full-time work and full-time school thing while I was doing my master's degree. As I was working at Posit, you know, I was writing this vetiver framework and finishing my degree. And part of what kind of made me so excited about MLOps was because of what I had learned in my degree.
And when I was learning about data science, my degree was in data science, sort of, like, half computer science, half data science. Kind of one of those weird in-between things. But I learned about a data science workflow that looked kind of like this. You know, you start with some data. You collect it, whether that's from an API or from, like, CSVs or people are just handing you data. You know, you get this data. And then you get to use some tools to understand the data. And that's things like the tidyverse or data table.
If you are in the R world, I actually work in the Python space as well. So, you're going to see some Python references. So, I might use something like Pandas or NumPy or Siuba. If you guys are looking ever for a dplyr-like Python package, Siuba is amazing.
And so, you have this data. You've cleaned it. You've understood it. And then in this data science workflow, you train and evaluate some sort of model. That's using things like Caret or tidymodels in R or Keras or scikit-learn or PyTorch in Python. Some of those are in both languages.
And as I was learning, you know, all of these different data science things, I was learning about best practices in data science. And I will give you guys a warning. There is Python code on my slides. This is not to scare anyone off. I personally am a Python person primarily. The end will be in R. But I just want to show you guys that, like, these concepts are the same in both languages. And these frameworks actually exist in both languages. So, you're going to get a dose of both.
So, some best practices in data science are things like setting your seed for reproducibility. If you've worked in different data science worlds, this is kind of the first thing you learn to do to make sure, you know, when you load data, it's always the same. Just good things for this reproducible data science workflow. And you also learn about best practices like splitting your data into test and training data sets. So, you're not, you know, giving your model the answers while you're training it.
You know, there's whole workbooks, whole textbooks on these best practices that we will not dive into today. But it's good to know, like, these are very well-known things. You know, making sure you're choosing the right feature engineering for your model. Making sure you're putting these things into a pipeline if it's feature engineering that you're training. These are things that span across languages that you kind of always have to keep in mind as you are doing data science.
The gap in the data science workflow
But my first job as a data scientist before I was building software, the problems I encountered as a data scientist were different. I knew all of these best practices, but I realized, you know, the issues I was running into is how do you share data with your teammates? Like, are you emailing everybody CSVs or Excel files? You know, how are you sharing models with each other? Is it on Git? Is it, again, emailing things around? And how do you get people who don't code as much to be able to engage with the things that you have created?
You know, oftentimes this is, you know, at the end you have a shiny application that, you know, your model is running in. And how do you put that model into a shiny application? There's a lot of different problems when you're putting this whole cycle of data science together that are just kind of beyond the best practices that I had learned in school. So I learned that these practices maybe weren't enough.
And I also learned that if you develop models, you can operationalize them. And so this is making sure that your models are in some sort of larger ecosystem. Maybe that's on Git. Maybe that's shared with your teammates in some way. Making sure that the model isn't just on your computer. So I learned that, you know, you want to bring your model outside your computer. And you want to learn kind of the best practices of how to do that as well.
What is MLOps?
And so those best practices for machine learning operations to operationalize your model are called MLOps. And so there's a lot of hype around this word that I find online. It's like, I don't even know what MLOps is. You can read all of these different, like, startups and you read their documentation and you don't really know what they're doing at the end of it. It's kind of stressful to get started. But here is a kind of one-liner. MLOps is a set of practices to deploy and maintain machine learning models in production reliably and efficiently.
So where we had those best practices like setting a seed or splitting your training and testing data to build a model, MLOps are the same different set of practices. But, you know, it's a set of practices to make sure that the model, when it leaves your computer, can still be reproducible and shared in a safe way with others.
it's a set of practices to make sure that the model, when it leaves your computer, can still be reproducible and shared in a safe way with others.
In my first job, when I was starting to learn about MLOps and learn how to deploy models, I was struggling because there was a set of tools that were really based on Kubernetes. And it felt like I almost had to be like a cloud expert and then also like a DevOps expert and like maybe kind of a systems architect and also a data scientist. And that felt like way too much. These practices are hard. It gets models off your computer into some sort of larger system. And sometimes tools don't necessarily feel ergonomic for data scientists. It felt like these tools were not really made for people with a skill set that I had coming from a purely data science background.
And that is kind of how vetiver was created. I ended up changing career paths from doing data science to building data science tools and making these MLOps tools to feel like an extension of what people are already doing. So I do work for Posit, the company that was formerly known as RStudio. Vetiver is funded by Posit because I'm a developer for them. So it's no surprise that while I'm showing you Python code and while I primarily write in Python, there is also an R version of vetiver as well.
The code actually looks almost identical, and that was on purpose, especially realizing that people sometimes might have to work in different languages or collaborate with people who are using different languages. We wanted to make tools that feel super easy to go back and forth so it doesn't feel like you have to start from zero if you're moving between languages.
So part of the idea of vetiver is that there's this life cycle where you collect and understand and evaluate your model. But it wants to fill in the gap of how to get this model out into the real world, making predictions from real world data in a really safe, reproducible way.
So some of the tasks that vetiver helps with is to version your models, to deploy your models, and to monitor your models. And those are the three main tasks vetiver focuses on. There are many, many other best practices that MLOps encases, but we really wanted to narrow down what we thought was important for people who are just starting to deploy models, who are just trying to figure out what this world is and what it means for them.
Versioning models with pins
So the first step of this is versioning. And when people think of versioning, it's normally in the context of Git, where you have some sort of central repository of files that people can read and write to, like push and pull to, but people actually version a lot of things. And they normally version things pretty badly, actually.
I think of kind of how I am building models, kind of my past, and I would version, you know, I would start with a model and I save it probably as a RDS file or as a Joblib file. So I save my model, I name it model, or maybe something a little better. And then maybe you update some of your features and you get new data, and then you re-save it and it's model final. And then maybe you realize that it needs a little bit more tweaking, and so you have your model, final, final, and you guys see where this goes. You end up with all of these versions of one object that maybe you need information from multiple files of these, but it's not really scalable and these aren't really reproducible.
Versioning really is the foundation for success in machine learning deployments. And also maybe not just for machine learning deployments, but a lot of reproducibility problems. A well-versioned system, a well-versioned file is normally going to be easier to share. It's going to be easier for you to share with yourself six months from now as well.
So when we think about our ideal versioning system, it would be nice if all of these files lived in a central location, so you don't have to search between different directories. Maybe if they were discoverable by a team, so something that's not just on your computer, but something that can be hosted in a cloud environment between people. And it's also really nice if things can be loaded right into memory, so you don't have to, like, go download the data if you can't in an easy way, load it onto your local computer, open up RStudio, load the data in that. So it's a few extra steps.
And sometimes Git is difficult for people who don't have that skill set, or maybe that's not their primary job is to be working in these version control systems. So something that helps vetiver out is actually another package called pins. And we're going to do, like, a very small pins side quest. You don't actually have to install pins to use vetiver, but they play together really nicely, and they're kind of built alongside each other.
So pins is a package that publishes data, models, other R objects, and it makes it easy to share them between projects and with your colleagues. Some things that might be considered good uses for pins are, like, an ETL pipeline that will store a model or summarize data sets maybe once a day. Pins, though, is not meant for multiple writing things back and forth.
So what's actually super useful about pins is that it actually is meant to go across languages as well. So if you are pinning something that can be loaded in R and Python, I collaborate with my teammates all the time. I'll pin up data in R because it's a lot easier to clean data in R, and then I'll read it into Python and use that exact same data set that I have just updated in R, pull it into Python for machine learning, or something like that.
So it's good for pipelines where things is read and are updated. Pins is not really meant to have multiple people writing data to it at once. You can have as many people as you want reading from it, but you can't have multiple writes. That is kind of a downside of pins. One thing that we see people try to do, and that I just kind of want to get ahead of, is don't try to make a Google form with a pin. These are just files. So if you're trying to have multiple people update the same file at once, it might get corrupt.
But if one person or, say, some sort of system is reading and writing this pin one at a time, or you want to share something for multiple people to ingest, pins is good. Please don't make a Google form of it. I can tell you right now, I've had some horror stories from that. But what works really well is people will have GitHub actions running where it will update a notebook or something like that. And pins is kind of a perfect solution for them.
And pins is built to be super easy to use. This is what it looks like in Python and R. It's essentially just model board equals board temp. So this is what you would do if you wanted to make a board that goes directly on your computer. Pins works off this mental model where, imagine you create kind of like a pin board. I kind of have one right here on my wall, where it's a collecting spot for a lot of different information. So you make a board and you pin different objects to it.
For this, it will be something that doesn't last on my computer, but it's good for demo purposes or if you're just trying stuff out. So this is the temporary board. If you want to change it to be a S3 board, it would just be board underscore S3 or board underscore Azure or whatever. So it's super easy to switch between boards or something like that.
Creating a vetiver model
And this is where vetiver comes in. So let's say we are in our data science workflow. We have created our model. It's called RF pipe. This is like a random forest pipeline that I created. And we're ready to deploy it. So we want to create this deployable model object called a vetiver model. And our inputs into this vetiver model are going to be the model itself that we've created. And then we're going to give it a name as well.
And the things that are useful about using this vetiver model object is that it holds on to a lot of information for you. Inside this vetiver model, you're going to have things like an input data prototype. So when you created your model, you, of course, fed it some input data. And vetiver actually can like slurp that up on its own. And it knows the data that it should expect as an input. So if you have five columns and you accidentally put four when you're making a prediction, vetiver will tell you, like, no, that's wrong. Your data should be five columns or if it has like a wrong date format or something like that. This becomes super useful when your model is deployed somewhere that's not on your computer.
It also will say things like what packages are needed to recreate this model and a little bit of metadata about the model itself. And then with your model board and then your vetiver model created, you can do a function that's just vetiver pin write, your model board, your vetiver model. And it will actually version your model automatically. So this will be wherever you want to place it. And your board will collect this information, collect this vetiver model.
And kind of there it is. It's meant to be only a few lines of code. There's a little bit of magic that happens under the surface, but it's supposed to be kind of like a natural extension of what people are already doing. If you wanted to see the metadata that it saves, you can see it has the title. It has a description when the model was created, a hash, a little bit more robust versioning system than a bunch of underscore finals, as well as the required packages.
Model cards and documentation
There's other things that are beyond just creating these models in vetiver. MLOps is not just about creating models that are, you know, like in different locations. It's a lot about documenting and making sure you have good models that are well documented, that are reproducible and can really show what you've learned when you've created this model.
We believe that, you know, when you've created the model, this is the time where you've spent all of your research. And there's no better person to write information about this model than the person who created it. So there are a few templates that you can pull up in RStudio. One of them is a model card. Actually, when you write the model itself, you'll get a little feedback that the model card is a framework for transparent, responsible reporting and to use this Quarto framework as a place to start.
And when you run this code in Python or use the template in RStudio, it gives you something that you can give your pin and it will automatically document what it is able to automatically document. And then it kind of has fill in the blank type responses for things that a machine can't do. And that is kind of knowledge that only the model developer would have. So you can see, like, it has the model details where you'd have to add your name. But then it's a place, you know, there's four features.
At the bottom, there's also some places for you to put ethical considerations and caveats and recommendations. Which we believe, you know, even if you don't think that your model has any ethical considerations, that you should at least maybe put none that you know of and to not just delete the section. Just if you don't have complete information, it's better to leave something blank to show that you've thought about it than to delete something altogether.
And so with all of that information, with using pins, you can version your model. And it's kind of the first step in this process of MLOps. So you think about saving your model into location that has a robust version. It has metadata. It's shareable. It's not just a file system of you overriding your information.
Deploying models as API endpoints
So next, we'll go on to deploying your model. So you have your model kind of saved in a central location. And at this point, if people want to use it, they have to go to that location, download it, run it in their environment, and kind of hope it works. Sometimes things work better on somebody's computer. Sometimes it's difficult to reproduce exact environments. You have to make sure you have all the exact same versions of packages, versions of R.
So we think kind of the ideal scenario here is maybe your model is running somewhere that people can use it without even having to download it. And that is essentially what deploying your model means. So deploying your model, you're going to be moving your model here into someplace that is not your computer. It's super useful because others can use this model without having to load it. The way that we're going to create this is by using a REST API endpoint.
An API is kind of an application something process interface. Essentially, it is a place where people can ingest your model's information. The really nice thing about this is that it works with JSON, not necessarily R or Python, to send requests to and from your model. So it is pretty language agnostic as long as the input data information is the same.
So in Python and in R, it is pretty much one-ish lines of code to create a local API endpoint. In Python, it's putting this vetiver model V into a vetiver API and then calling run on it. In R, it is also creating a vetiver API that you're putting a vetiver model into and running it using Plumber, if you're familiar with Plumber, to make API endpoints.
So this is going to give you a local API running, but, of course, what we had wanted was a model that was not running on our computer. We want it to be shared somewhere else. So we have a one-liner for Posit Connect, which is like one of Posit's pro products, where if you give it the board and the pin name and the version, it will automatically deploy it as an API endpoint.
But there's also ways to do this for Dockerfiles. So if you're working for a company that maybe is using a different cloud platform, most of them have a bring-your-own-container ideology. So as long as you have some sort of Dockerfile, it will kind of automatically slip it up. It knows how to handle these Dockerfiles and these Docker containers. So this prepare Docker call for Python or R will create really three files for you. And these three files are what you'll need to deploy your Docker image locally or on another cloud.
And that's the application file itself, so kind of a plumber.r file, the Dockerfile, to give the system information on how to build the Docker image. And a requirements.txt in Python or an renv.lock file. So the computer that you're running this new API on will know all the information it needs to, like, download the right packages and make sure it's all of the correct versions as well to ensure reproducibility.
And with this image, this Docker image, this API endpoint up and running, you're able to just call predict. So predict with the data that you are running that your model expects. So that can actually make predictions from this model locally, even if the model itself is not running on your laptop.
So that's what deploying a model looks like. I think kind of maybe the most famous deployed model right now would be something like ChatGPT, where you'll interact with it locally on your laptop. But the model, of course, is running somewhere else with all this information. But you don't have to be someone who knows how to run ChatGPT on your computer. All you have to do is interact with it, which I think is really showcasing the beauty of MLOps and what a deployed model should look like and kind of how that should feel.
Monitoring models over time
So the last piece of this is monitoring a model. So when you've created a model, you have your fresh data. It looks great. It's performing well. But that is at the exact point in time when the model is created. Oftentimes, data will change over time. I think of maybe my Spotify or whatever music listening platform I use. My music taste is different now than it was five years ago or 10 years ago. So models have to continually update, train. If I got the same suggestions for music now as I did earlier, I would probably not be on that platform anymore.
So the same concept applies here. Models will degrade over time, will often degrade over time. Not all the time. But monitoring for MLOps might mean something different if you are a data scientist or if you are a systems admin. So what the vetiver framework looks like for MLOps monitoring is the statistical output of the model. So it's tracking things like maybe accuracy or R-squared of a model where a systems admin or a DevOps person, when they think that they're monitoring a model, they might be thinking about like the CPU usage of a model or the run time. So I think if you're collaborating with other teams, these are things to keep in mind that you're clear in your expectations of what these kind of loaded terms might mean like monitoring.
In the vetiver framework, there's really three main functions to kind of help people on their monitoring journey. It's to compute the metrics where you give it the data, the column that has the date, the time that you want to aggregate over. So if you want to look at how your model has performed over every one week or one year, the metrics that you're trying to compute. So things like mean absolute error, R-squared, and then the true value of the model. So the model output and the true value. You can also do things like pin your metrics if you want to continue this information over time, if you want to write a small script to check this as you have different information come in, and then to plot the metrics themselves.
There's also another, just like the model card, there's another template in RStudio to build a monitoring dashboard. If maybe your company or your team is someone who needs to monitor these models over time and wants a dashboard deployed to help look at that.
So monitoring is extra important because models don't fail loudly. So if you're thinking about building a Shiny app, when things go wrong in Shiny, you kind of get that, what is it, the skull or like the sad loading screen. Things are super broken. Things are not going well. And you get this loud error and you know things are not going well. Models don't work like that. Models will fail silently. And they can still run even if there's no error. Even if the accuracy is 0%, your model will still give you an output. This is something we also see with ChatGPT, where it'll hallucinate. You can ask it things. It'll tell you the wrong thing very confidently. And that's where monitoring comes in, because if you aren't tracking how your model is performing, you are oblivious if your model is decaying.
if you aren't tracking how your model is performing, you are oblivious if your model is decaying.
So it's really important, whether it's in some sort of large system or something you're doing ad hoc on your own, to do some sort of model monitoring. And that will complete our cycle. You have your model versioned as you're creating different experiments, as you're updating your model. You can deploy your model. You can monitor your model. And we can think about this kind of over a long span of time.
So last year, I created a model, and I deployed it. I have my version locally. And maybe over the last year, I've been monitoring it, and I realize now I need to update my model and retrain it with the new data we've collected. Then I can create a new version of that model. I get to redeploy it and continue monitoring it so this cycle can go on and on.
So things that are interesting about vetiver that I think is something that we should be excited about is it's very composable. So internally with vetiver API and vetiver model, vetiver APIs are just Plumber APIs. So if you're someone who is a Plumber expert and wants to add in new endpoints, wants to add in, you know, whatever other infrastructure you want around this Plumber endpoint, you can add that into your vetiver API just as you'd expect. It's also something that is built to be kind of an extension of a data science workflow. So you don't have to change the types of models you're creating or anything like that. It is truly just something to add on, not to try to change what you're already doing. Finally, it works with the tools you like. This is kind of along the same lines of those ergonomics. It should be an extension of what is already happening in your data science workflow.
So here are some resources. But we have a lot of resources on the vetiver.rstudio.com website. If there are things that people are interested in learning about or want supported in vetiver, we're always taking requests. If you want to open an issue, links to the GitHub are on that site as well for both R or Python or even any feedback you have. We're very welcoming. We want anyone who has thoughts or opinions on vetiver to chime in. It's an open source project to serve everyone, to really serve this community.
Live code-along demo
So like I said, I primarily write Python. But today we're going to be doing R together. So I have my RStudio pulled up. I'm not a super fast typer, so hopefully we can all do this together at the same time. But if you're someone who would rather copy and paste and run code that way, I'll show you the file that I will be working off of.
So this is the file that we're going to be working off of. I'm going to be typing it slash copying and pasting it. The first part is we're all going to need to use this URL. So you will have to copy and paste this URL. So we're going to put on our data scientist hat. Everyone here is going to do a full data science workflow kind of end-to-end. And that always starts with, of course, the tidyverse.
So this data is from the kind of advertising outputs of the Super Bowl, which is a very American thing. It's American football. It's like their World Cup. This is looking at all of the advertisements run during the Super Bowl. And seeing how many likes they got on YouTube as of I think whenever this data set was created. So 2021. And this is from an earlier Super Bowl. It has things like was the ad supposed to be funny? Did it show the product? Does it have a celebrity in it? Does it have danger, element of danger, animals? So we're going to make a really small model based off that.
And we're going to be using the tidymodels framework. So we're going to just select kind of these middle columns of like, was it funny? Did it show the product? Is it patriotic? Making sure we're selecting those and the like count. So our input is going to be these features of the advertisement. And we're going to try to guess what the like count will be.
Now, tidymodels is a little bit newer. But what I have really enjoyed from this framework is that it makes it really easy to like swap in and out of different types of models. So if we want to start with like a random forest model, we use the exact same words. Random forest. And we can put it into a regression mode. So we're going to do just a little regression on whether or not these are good ads, I suppose. Or predicting maybe for a new advertisement how many likes that one would get on YouTube.
So then we're going to give it the features. So we're going to predict for like count using everything that we've selected, all of these variables here. So now we've created our model, almost. We're actually going to put this into a little workflow. So this is going to fit our model from using this random forest and this formula. And now you have a model that's fit. And this is kind of the beginning stages of that data science workflow. This is the point where we think data scientists have gotten to.
And there's such great tools and, like, great support for, you know, loading your data, doing, I can't really call this a lot of feature engineering or exploration. But, you know, learning about your data, selecting the columns you want. There's really great tools for making models. But this is where we think maybe people at this point want a little bit of help. Or maybe there's less tools available.
So next we're going to load vetiver. And so the first thing that we are going to do, just like we had seen in the slides, is use this vetiver model object. We're going to put in our trained model. So that's that trained random forest model. And we're going to give it a name. And we can see here that this model has things like all of these blueprints of what of the predictors that the model expects. And we can see, like, oh, this model later on should have these features, show these predictors, all this information.
All right. So we've created a vetiver model. And we're first going to pin it. So I'm going to do something that is just a local board for pins. But if you guys wanted to, you could look at the docs maybe after and move this to GitHub. I think there's even Kaggle, if people use Kaggle still. Okay. So I'll pull this to my folder. So you can move this into a variety of locations. And we're going to put versioned equals true. So there's a few different ways to have boards. If you actually only wanted one model in at any given point in time, you don't have to version it. But we do for this one.
Then we're going to write our pin. And for that, we're going to give it the board, which is V, and the vetiver model, which is V. And just like we had seen in the slides, we get this little pop-up of, oh, we should create a model card. So if we want to create a model card, you can go to our markdown files. And from a template, you can see there's the model card. And here's also the model dashboard. So if you want some templates to get started, you can do okay.
I'm not going to run this, but it is a parameterized report. So you can see here if you wanted to do your pins board folder with the pin we just wrote and the name of the pin, you could update this model card to be for the model we've just created. It uses these parameters throughout the model card itself. So if you get the model description and the prototype and all of those things.
So this isn't actually going to be too much, but I think it's important to see what you could do. If you want to read the board back in, the name, you'll use the board that it's saved on and then the name of the pin. So the name of the pin is our Super Bowl random forest. And we can see here we have a random regression model workflow of six figures. If you save it as like V2, we can see V1 and V2 are going to be the same exact thing. So we've just kind of saved it somewhere else and loaded it back in.
All right. So this is kind of the more exciting pieces where we're going to have things popping up. So, like I said, vetiver is built off of Plumber. So when we load this Plumber library and start a Plumber router, then for our last bit, we're going to create a vetiver API of holding V2. And we'll also run it. You can see if you're working off that file, there's a port. You don't need to specify all of those things. It's mostly if you want to get different debug information. So you'll run it. And it might take a second. OK, there we go.
This is an API that is running locally on your machine. This API endpoint comes with this automatically documented information. Of course, like we haven't really done anything too special or too exciting. And we get all of these different endpoints. So if you were to deploy this somewhere else right now, it's, of course, at HTTP 127.0.1. But you can imagine it's maybe on some other server. You can, without even downloading the model, you can see the metadata for the model. So this is the version. These are the required packages you need to run the model locally. If you wanted to get the input data prototype. So if you want to see what the input data is, if you want to see if you're like, oh, this model needs things that are funny, all these different columns.
But kind of for the final piece, you can actually make predictions right from the model. So if we wanted to try this out. We could do. If there was something that was not a funny ad that doesn't show the product, it's not patriotic. There's no celebrity. There's no danger. There's no animals that it'll maybe get eighteen hundred likes on YouTube. We can, of course, change these. So this is true. You can see how that changes. Oh, if it's dangerous now, it has nine thousand likes on YouTube. So you can interact with this model here.
But if you are running these in other locations, you can just do predict. Of course, I've shut this down now, but you could run predict and then put your data in here. And it would spit the model back out or spit the predictions back out. Just like I was running it locally, even if it was somewhere else to use that, you would do something like any point. Endpoint and let's say the URL was at X, Y, Z dot com or whatever I just typed in. Then you could do predict from this end point and then the data you have.
This is really the magic of, you know, the model is running somewhere else, but you're making predictions here. This is something that you could maybe put in a shiny application where the model is not living inside the application, but is deployed somewhere else. Maybe you're using the end point because it's a model that you want to update a little more quickly, and you don't want to change all the shiny code just to update the one model or something like that.
Wrap-up and Q&A
I think with that, I have gone through a small demo. You guys have learned all about MLOps, about versioning, about deploying models, about monitoring models. You are so patient and kind through my slide shuffling. So I appreciate everyone joining. If you have, let me scroll to chat. If you have questions, this would be a great time to ask, or I am online.
LinkedIn is just Isabel Zimmerman. This is my GitHub. I love to talk about this stuff, so please find me online. It would be super great to get any feedback that people have.
Thank you very much, Isabel. That was awesome. So much appreciated the hands on session. So the little demo was key for me. Very useful.
Actually, there is a whole repository. If you're looking for other demos, let me share that with you guys. Here is a whole repository of demos if you want to look at different things you can do with endpoints. If you want to look at what happens if you break your model and it's at an API endpoint. What do you see? There are lots of different things. This is should be all R. There's even one about how to, like, run a Docker file locally and everything. So lots of places to explore.
How to learn how to build an API with Plumber. I think the Plumber documentation is pretty superb. I would recommend going there. If you go through that demo, you actually have already built an API with Plumber. So, I think it's especially thinking about what do you need an API endpoint? I think it's trying it out. It's learning it. The vetiver documentation or the Plumber documentation are really good places to go. I also think that sounds super scary, but you realize it's not as scary as you think. And there's a lot of great demos and examples out there. Even on those different documentation sites that I would run through.
And then just learn and let your creativity go wild. Like, what if I added another endpoint here? Oh, I wish I knew how to get this prediction and then add 5 if you want to scale your YouTube likes, you're like, oh, what if I wanted to make all these predictions wrong? So messing around and just playing around with it.
Thank you very much. You're very welcome. Thank you all for joining me this morning, afternoon, evening.

