Resources

Isabel Zimmerman - Practical MLOps for better models | PyData Global 2022

www.pydata.org Machine learning operations (MLOps) are often synonymous with large and complex applications, but many MLOps practices help practitioners build better models, regardless of the size. This talk shares best practices for operationalizing a model and practical examples using the open-source MLOps framework vetiver to version, share, deploy, and monitor models. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, everyone. I am Isabel Zimmerman. I work for a company called Posit. If you're wondering where you've heard that before, there was a talk from my colleague Hadley Wickham yesterday. He was one of the keynotes. We are all about R and Python and thinking about multilingual teams and how to make their lives easier. I specifically work with MLOps and I kind of started my career as a software engineer slash data scientist and I worked a lot deploying models with Kubernetes.

Of course, if you know anything about Kubernetes, it's a little frustrating, so I decided to take out some of my Kubernetes stress with teaching my dog silly tricks. The first trick I taught was to teach him to sit. When I first taught him to sit, I call him to stand right in front of me and I tell him to sit and he figures it out and I give him a treat. But then I started taking him on walks and bringing him out into the real world and, you know, he's walking next to me on my side and I would tell him to sit and he had no idea what I meant. And, of course, the data scientist in me is like, oh, I overfit my model. I over trained my dog. Because he realized that — I realized that I'd only ever trained him to sit, like, in front of me. Never off to the side.

And, you know, that's kind of a hard lesson for a data scientist to learn, but all I knew was I was training for the right outcome to sit. And in my cozy living room, it totally made sense and it worked really well. He knew my task and he would sit on command. But when we went out into the real world on our silly little walks, there was a new set of challenges and he behaved differently. I realized I needed to, you know, expand my tool set and expand my mindset for how to train my dog. But this is not a dog training conference.

The real world value of models oftentimes comes from integrating them into some larger ecosystem. So, my advice for you is to bring your model on a walk. You can learn to operationalize a model using practices called MLOps. And MLOps is a set of practices to deploy and maintain machine learning models in production reliably and efficiently. And these practices can be hard. Especially with the Kubernetes-based models I had started out with, I felt like a lot of the tools that I was using, I had to be kind of a cloud architect as well as a data scientist.

The real world value of models oftentimes comes from integrating them into some larger ecosystem.

And I don't think that data scientists should be oblivious to everything in the DevOps or MLOps world. But there's definitely a space for tools that help data scientists more effectively communicate with their IT or DevOps teams. And these tools can still feel ergonomic for data scientists. So, I actually changed career paths. And a package called vetiver was made. Vetiver is an MLOps tool for data scientists specifically to help them version, monitor, and deploy models in production in both R and Python.

The data science workflow gap

And we thought this was really important to build a tool like this because when you start learning about data science, you see an image that looks something like this. You collect data. You understand and clean your data. If you're using R, that's with tools like the tidyverse or data table. If you're using Python, that's using tools like Pandas or NumPy or Suba. And then you get to training and evaluating your model. Once again, in R, you're using tools like tidy models or in Python, scikit-learn, PyTorch. And these different tools have best practices built in in a really ergonomic way.

If you're a data scientist, you write code that looks probably something like this. You're going to load in your data and, you know, select columns or whatever. But in almost every data science script, you have a line at the top that sets the random seed for reproducibility of your model. And this feels good. People know how to use it. They know it's a best practice of data science. When you get to training your model, you split it into test and training sets. So, you're not giving your model the answers to the questions you're about to ask it beforehand. And this is built into the tools we use. And it feels good and it's ergonomic and we know that this is a best practice.

And then you get into the actual, like, modeling part of your code. And you think about the right feature engineering for the job. And you think about how to train your model. And maybe putting these together in a pipeline to make sure that they're all, you know, organized and running in the right order. And this feels ergonomic and it's part of the tools you're using. But oftentimes, this is where you start. You know the data science code, but then you realize that working with a larger team, there's more to data science than training models.

You have to think about how you're sharing these models with others that's not emailing job lib files back and forth. You have to think about maybe this needs to be integrated into a larger application. How are you going to get the model from your local environment to there? And how do you make sure that this model is still performing well a month from now or even a week from now? And you realize that these ergonomic practices and the tools that you love to use aren't quite enough. And that's where machine learning operations comes in. The project I work on is to monitor version and deploy models specifically to help complete that circle.

MLOps components

So, what are some pieces used in MLOps? Most open source projects fall into one or maybe multiple of these orchestration, experiment tracking, versioning, serving, and monitoring pieces. Things like orchestration are going to put your models in larger pipelines. Experiment tracking is going to make sure all the pieces that are going into the model are organized. Versioning, serving, and monitoring will go into a little bit deeper right now.

So, vetiver helps with versioning, serving, and monitoring. And when people think about versioning, a lot of times you're thinking about Git. And we actually version a lot of things and mostly very badly. If you think about, you know, you're making a model and you train it. You start it out. You saved it as model. Doing a little bit more work with it, you have model final. And you feel pretty good. But maybe you get new data or you want to try out a new algorithm and it works better. And then you realize there's actually another iteration. And you can see how this data science workflow of versioning locally might not be the right fit.

We can see that this model final, model final, final doesn't scale for one model, let alone tens or hundreds of models. It would be really nice, you know, if we're thinking about what we want out of versioning, if these could all live in a central location that we can access, but also maybe our teammates. And especially if they could load right into memory and maybe have a little bit more context where I don't have to guess what model final really means.

Versioning with pins and vetiver

And there's actually a secret weapon that the project I work on uses. And this is another open source library called pins. And pins is also available in Python and R. And it helps you organize and store your models. And it does this by creating a board. So, the mental model for a board is maybe to, like, think of it's a place for your models and then you can pin your models onto your board. So, it's creating a place for models to be stored. We can see here this is a temporary board. But this could also be board underscore S3, Azure, GCS for Google Cloud Storage or Connect, which is Posit's homegrown pro product.

So, now that we have a space for our model to go, we can create a vetiver model object. So, this is a deployable model object because there's a lot of information that you have at training time that you can store that's useful from your model later. Here, all we're going to put in is our random forest pipeline we made before. And we're going to give it a name, ads. From there, we can vetiver pin write our temporary board. We're going to put our vetiver model V onto the temporary board.

And with that, we have some more context as well. Not only is this organized within the board for us automatically, we can see a description of the model. We can see the size of it, what kind of file it is, and some other required packages. And if we wanted to take this a step further, we could even give our model more context. So, we could save a tiny little piece of our training data to better debug when things go wrong in production.

If you think about it like a puzzle, it's a lot easier to make a puzzle when you know what the finished product is supposed to look like. And this is essentially giving that finished product to our model. And it's pretty easy to add this in. This is the same line of code before, but we just have one more argument. P type data for prototype data. And we're going to give it our X train.

Model cards

But we want to also think more holistically about tracking and versioning models. And there's the idea of something called a model card. And model cards are to make sure you're not only making good statistical models, but good ethical models that you've thought really wholly about. Model cards are kind of like recipes. You know, you never know as much about your cake of, like, oh, this batter is a little watery or something like that, than when you're making it. Model cards use that same mental model. They want to make sure that you can have a designated place to write down all the information you know when training your model that you can look back at later. So, that summary statistics, that's general documentation, as well as nods to fairness.

So, we saw this line of code earlier as well. Vetiver pin write, our V vetiver model onto our model board. But there's actually a little informative message that pops up after. And you can read the informative message or you can do whatever software engineer does and just copy and paste the code and run it and see what happens. This vetiver.model card. And when you run this vetiver.model card, there's a Quarto document that is generated. And Quarto is a flavor of Markdown that is used for documentation that you can actually run live code in. My slides were made in Quarto, as well as my blog is made in Quarto. If you're interested in learning more, tomorrow there is a talk on Quarto by my colleague Tom Mock to explore the wonderful world of this new tool.

So, if you open up your Quarto card, you can see there's some information that's automatically generated for you. And that's things like, you know, it's a scikit-learn pipeline. It's a model using four features. Here's when the model was created. And if you scroll down, there's also some information on that input data prototype. So, this model has got four features looking at houses, as well as a quantitative analysis looking at things like mean absolute error, mean squared error, and so on. And if you scroll all the way to the end, there's also places for you to document ethical considerations that you have and caveats and recommendations that you've uncovered while training this model.

And, of course, if you don't really have information for either of these off the top of your head, it's important to keep these on the model card because imprecise or incomplete information is better than none at all. My dad had always told me growing up that if you haven't written it down, you haven't thought it out. And model cards are a great place for you to write out all of the things that you've been thinking about at training time for your model.

My dad had always told me growing up that if you haven't written it down, you haven't thought it out.

Deploying models

So, we know about the beginning of this life cycle, and we know about what versioning a model is. And so, what happens when we have to deploy it? So, deployment means a lot of different things for a lot of different people. But the way that our team has defined it as is bringing a model off a local laptop into some sort of other architecture. We do this by creating a REST API endpoint. It makes your software engineer friends happy because they are pretty robust and testable, and it makes your data scientists happy because there's a lot of great tools to help you spin them up and maintain them.

To do this with vetiver, if you want to run an API locally, our vetiver model named V from earlier can be put into a vetiver API and you can run it. But, of course, the end goal of this is to not have a local API endpoint. So, if you were to want to deploy this onto our pro products, which is Connect, you could set up a Connect server and then just send it a model board, the name of your model, and then the specific version that you'd like to deploy. And it will do the rest of the work for you.

If you're looking to move this into maybe a Docker file or anywhere that ingests Docker files, such as AWS or Azure or Google Cloud Storage or Google Cloud Platform, you can do this in two different steps. First is to write out an app.py file. As long as you're giving your board and pin name to this function, it will generate this app.py file for you and get you at least most of the way there. And then you can write out a Docker file and send in the app.py file that you have just created. And then this Docker file will be written out for you that's usable out of the box, but that you can also edit to customize for your own deployment. And that's what deployment looks like using vetiver. These helper functions are really made to make it feel accessible to get this model off your laptop, or at least to have a Docker file in hand to pass off to somebody else.

Monitoring models

Now, once your model is off your laptop, a data scientist's work is not done. And monitoring means something unique in this context. We're not necessarily monitoring things like CPU usage or runtime. Here we're specifically looking at the statistical properties of the input data or predictions. And vetiver helps this out with a few helper functions. I won't go too deep into them right now. But it essentially will help you compute metrics over certain rolling window timeframes that you specify. Here it'll be for one week. It'll help you pin your metrics, especially if you get into that awkward situation where some of the dates overlap. Vetiver pin metrics will sort that out for you, overwrite with the newest data if necessary. And then finally, vetiver also helps you plot your metrics in the same format that is coming out of vetiver compute metrics to get kind of an out of the box quick peek into how your model is performing.

And this is really important to track. If you are not monitoring your model, you are oblivious to model decay. And of course, that makes sense. You need some data to make sure you're doing the right thing. But this is especially important because models break quietly. They'll continue to run and really just proudly give you very wrong answers. Even if your accuracy is like zero percent. Whereas on the other side, you know, applications will often give you big red Xs. Models will continue to give you bad answers. So, if you're not monitoring your model, you are oblivious to this decay.

Models break quietly. They'll continue to run and really just proudly give you very wrong answers. Even if your accuracy is like zero percent.

And that has completed our cycle in a very fast way. We have gone over versioning a model, deploying a model, and monitoring a model. But if we think about vetiver as a whole, why should I be excited about vetiver? Well, why am I excited about vetiver?

I think the first piece of this is composability. I've shown you some strong building blocks of vetiver API and vetiver model that are pretty simple to use right out of the box. But they're also composable within themselves to add new endpoints to your API or to make more complex models or custom models. So, not only is it composable with itself internally, it's also composable with other the larger ecosystem. So, externally. And this is because vetiver is built on really tested tools by the community. It's built on things like FastAPI and Pydantic. And there's such a community around these different tools that you can leverage all of the fun and amazing other projects that people have created.

The other reason to be excited about vetiver is the ergonomics. It feels good to use. It's pretty lightweight. And it works with the tools that you like to use. It's supposed to feel like a really natural extension of the data science workflow that you're already using. So, overall, vetiver helps you version, deploy, and monitor models in a composable and ergonomic way. Thank you all for joining me here today. It has been my pleasure to present for you all.

Q&A

Thank you, Isabel, for your presentation. At this point, if anybody has any comments, please put it into the your YouTube comment section and we will bring it up here.

In the meantime, I just wanted to say I think that vetiver is definitely filling a very important gap in the current data science ecosystem. I feel like this year we've seen a lot more talks compared to last year and the year before on sort of how to do the monitoring, whether it is maybe doing better testing to make sure that the model doesn't decay or whether it is a full solution like what you guys have here. So, thank you very much.

Yeah. It is my pleasure. I think we've taken a lot of knowledge from Python and the other. We have just taken a lot of knowledge from these different ecosystems. We've talked to a lot of data scientists and tried to understand really what are they looking for and what do they want from a MLOps solution. And a lot of that was I like my workflow the way it is. Can I just have something that adds on to it? There's a lot of different solutions that maybe you have to really declare something early on and change your workflow. So, this felt like the right way to go for us to help serve our customers and the needs that we had heard.

So, we have a couple questions. Let's start with the first one. Seems like vetiver is built on top of many dependencies. Isn't it hard to manage them all? So, this is okay. I'll try to not get too deep into the weeds on this one. This was a really fun task for my team and I. Because it's hard. You know, if you're deploying a scikit-learn model, you don't want to have to download a framework that has PyTorch installed and stats models installed and all of these other things. So, you know, how do you make sure you don't bloat your own project but also serve so many different people?

And we're able to do that with something called single dispatch. So, we actually can, like, break things down and you only install what you need. The dependencies that are core that we do have to manage are things like FastAPI and Pydantic, which a lot of packages have a lot of dependencies. So, these are manageable and they're pretty lightweight. Where things really do get difficult is when you're trying to manage multiple machine learning model platforms at once. So, we're able to break those apart. If you ever want to, like, chat about how weird and funky this is, I would love to hop on Discord and chat with you about that. It was so cool.

Cool. Thank you. And just as a reminder, there is on Discord a channel called Talks-Discussions, which Isabel and other speakers will be monitoring.

A question from Patrick. It looks like Vetiver is a decentralized version of some of these other platforms that require spinning up an on-prem or a cloud server to track this stuff. Is that true? So, let me make sure I answer this correctly. To have something that is shareable with your larger team, a lot of times the right tool is to be on-prem or in a cloud server that's using Docker files. So, Vetiver is able to bring your model onto those different places as well. I think one of the exciting things that we like to play around with with Vetiver is it makes it really easy to spin things up locally. So, that might be where you're feeling this decentralized bit. And that's really important because a lot of times when you're testing or development versions are very different from what your deployed versions feel like, things break very quickly. So, being able to quickly, rapidly prototype a small API and scale it safely is something that Vetiver focuses on a lot. So, yes, it can be easy to put up locally in a really decentralized way. But it also makes it easy to ship it to an on-prem or a cloud server instance.

And a question from Neil. How does Vetiver compare with MLflow? When would you prefer one over the other? Yes, this is a great question. So, there are a lot of ML ops tools out there. And MLflow is one that we've looked at a lot. And what they've done really well is they have a lot of investment into experiment tracking. So, if you are looking to track all of the hyper parameters that you've used when training your model, like every single time you've trained your model, that's something that MLflow does very, very well. So, if you're looking for experiment tracking, I think that's a good place to go. If you're looking for making APIs quickly and safely, I think that's where Vetiver shines, kind of for the reasons I just mentioned. Because it is like three lines of code to do a very simple spin up a model within an API. So, I'd say if you're looking for experiment tracking, MLflow is a great place to start. If you're looking for deployment of APIs, Vetiver might be your better option.