
Isabel Zimmerman - Holistic MLOps for better science | PyData NYC 2022
www.pydata.org Machine learning operations (MLOps) are often synonymous with large and complex applications, but many MLOps practices help practitioners build better models, regardless of the size. This talk shares best practices for operationalizing a model and practical examples using the open-source MLOps framework vetiver to version, share, deploy, and monitor models. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello, everyone. I am Isabel Zimmerman. If you're in this room, you're probably looking for a talk called Holistic MLOps for better science, hopefully. So I work for a company called Posit. If you are familiar with the company called RStudio. Last week, I think, we were RStudio. We are now Posit, really embracing the extension to all the beautiful things we've done for the R ecosystem into Python.
But a little bit more about me. This is me. This is my dog, Toast. I am a full-time open source software engineer. I write Python packages. Hello. But I'm also a grad student currently. So I get to, you know, spend my small amount of free time playing games like Mario Kart. If you're not quite familiar with Mario Kart, it's pretty low stress, in theory, unless you have friends like mine. And you get to drive little go-karts around. There's a famous track called Rainbow Road. So this is me. This is Toast. We're playing Mario Kart together.
But maybe more importantly for this talk, what is MLOps? MLOps is a set of practices to deploy and maintain machine learning models in production reliably and efficiently. And these practices can be hard. When I started, I was at a company where I was deploying machine learning models on Kubernetes systems, which is really like the worst of both worlds. You're trying to deploy model systems and you're working with Kubernetes. And I felt really I had data science skills and the tools I was using, they weren't quite built for people like me. They were built for maybe a cloud architect. And I didn't think that I should have to have maybe all the knowledge of a cloud architect or a systems engineer to be able to at least get my models to a point where a DevOps team could easily deploy them.
Real quick, has anybody in here like deployed a model before by a show of hands? Okay. And keep your hands up if that was just a delightful experience. Like best moment of your day. Okay. So maybe my hypothesis was correct. I ended up moving career paths to build a package called vetiver that is built for data scientists to help you guys and me, selfishly, deploy models a little bit easier. And if I think about, you know, who's making models, it's a lot of times people writing R code and people writing Python code. This is actually a package in both Python and R, so you can download it from CRAN or from PyPI.
The data science lifecycle
But when I was learning about data science, I learned about a data science life cycle that looked kind of like this. So you start by collecting data. You understand and clean the data using tools like the tidyverse or data table if you're in R. Or if you're in Python, it's things like Pandas, NumPy, Suba. From there, you're going to train and evaluate your model. Once again, in R, you're going to be using tools like tidy models and caret. In Python, it's things like scikit-learn or PyTorch. And when you're learning about these things, you learn about all of the best practices for making machine learning models and all the best practices for doing your data analysis work.
And these tools have done such a fantastic job of having these best practices kind of baked in that you might not even think about it. So if we look at data science code, some of this is more important than others for this talk. But this line four appears in most places if you are making a model, and that's setting that random state or setting your seed. And this is to ensure reproducibility. And it's something that we've all kind of accepted. We know this is a best practice. And it's like it's in code like we see all the time.
This is a little data set looking at predicting like counts on YouTube. If the ad is funny, if it shows a product. The data looks something like this. And when we get to modeling this data, before we make our model, we know that we're going to split this into a training and a test set. We're going to make sure we're not giving our model the answers to the questions before we're training it. This is built into the code. This is a best practice. We know about this.
And I'll show you one more. You know that you need to choose the right feature engineering for your job. So we need to like do an ordinal encoder here and use our random forest regressor. You're going to put this in a scikit-learn pipeline to kind of package up everything that's being fitted all at once. And we know things like this. We are data scientists. This is best practice.
But then you get to your first job and you realize there's more to life than setting random seeds and splitting things into training and test sets. There's questions like okay, so I trained my model. How am I going to hand it over to my teammates? Like am I going to email them a job lib file? Hopefully not. Maybe you'll use GitHub. But then what about what happens when you have to put this inside of some sort of application? Are you going to copy and paste this code into your application? Again, hopefully not. And what happens when you need to make sure your model is continuing to perform well a month, a year, five years from now? The world is broader than the scope that at least I first learned in school.
And this is really important because if you develop models, you should probably operationalize them. And this means a lot of different things to a lot of different people. But in general, I think we can kind of agree that the business value of models usually occurs when it's outside your local environment. So, there's more to this cycle. And this is what vetiver aims to help out people or help people with. Especially learning people who are learning to deploy and version and monitor a model.
Versioning models with pins
Okay. So, that was enough data science. We're back to Mario Kart now. When you log into Mario Kart or open up your little game, you realize there's different game modes. If you want to just learn the handling and drive around at a leisurely pace, you can go at 50 CCs. If you want to up the ante, you kind of got it figured out, you can go 100 CCs. If you know all of the game, like super ready for this, like blazing fast, drifting around corners, you're going 150 CC. Super fun. But you enjoy it at every level. And you kind of like you can play it at every level and you get all the good Mario Kart stuff. And we're going to think about model ops in the same way. Some things are simpler. Like it's a little bit slower. It's your 50 CC model ops. And we'll speed up as we go along.
So, we'll start with versioning. When people are thinking about versioning, normally it's kind of in the context of Git. And we actually version a lot of things very often and mostly very badly. This is possibly a familiar scenario. You build your model and you have it saved somewhere. And it's named model because we're creative. And then you do some more training. You get some new data. And you have your final model. And then you have, I think you guys know where this is going, a few more iterations of what this model actually ends up being. And we can see this doesn't work for one model. It doesn't scale for one model. And it lacks context between each iteration.
Versioning is helpful because it helps your models live in a central location. It helps them be discoverable by your team. Because you don't want to go, like, play hide and go seek with your models. And in a perfect world, it would be awesome if these could also load right into memory. So, we're not trying to go somewhere, download it, job, live, load, open your model, and then go. If we could just have one line and our model is in our Jupyter notebook and we're ready to rock and roll, that would be perfect. And this is where we look at our pins package. This is also something that was developed by our team. It's also something that's available in both R and Python. And it helps you with these demands, our list of demands we have.
So, what does it look like to use pins? The core piece of pins is the idea of a model board. So, we can think of this as a place for your model to be stored. It doesn't actually just have to be a model. This can be an arrow file. It could be JSON. It could be a CSV. And there's other data types if you're in R. And it actually can cross between languages if it's a compatible type, which is also super cool. But anyway, it's a place for things to be stored. And here we have a temporary board. But it could also be on S3, on Azure. It would just be like board underscore GCS or our personal reposits product called Connect.
So, there. There's a spot for your model to be stored. And what about the model part? So, that's going to be going into something called a vet model. When you first train your model, there's actually a lot of information in there that could be leveraged later to give you a more robust deployment. So, you're going to put in your pipeline that you've trained earlier. And we're going to give it a name called ads. And that's it. You can write your vetiver pin write your model to your model board. And it's versioned.
And I did promise that this looks the same in Python and R. So, on Python, we're making our board, we're making our vetiver model, writing our model to our board. On the Python side, we are making our board, making our vetiver model, and writing our vetiver model to our board. This is super useful for teams that are bilingual in, like, the R Python sense. So, there's less, you know, cognitive load when you have to context switch between languages.
So, that's at 50cc. We have our model versioned. But what if we want to up the ante just a little bit? It's super helpful later on down the road if you know what your input data should look like. This is just saving a little piece of data to better debug later when things go wrong. It allows you to have better error messages. You can kind of peek into your vetiver model and realize, like, oh, I have extra columns here or whatever. Like, you wouldn't really make a puzzle if you don't have the image of what the puzzle looks like. That sounds like madness. And very difficult to piece together later. This is the same concept. And this is the exact same code we were looking at earlier. We have one extra argument. And that is P type data or prototype data. And we're going to feed in a little bit of the X training data. It's going to be like a zero row data frame that's going to be translated into a Pydantic base model if you know what those are. And then your deployment later understands what the data should look like when it's coming into your model. Which is very useful because sometimes the real world doesn't look the same as what exists in your training set.
Model cards and ethical documentation
So, then we have our beautifully versioned model. We have some P type data saved. And then we want to think maybe a little bit more holistically about who this model is impacting. We want to make not only good models statistically, but good models ethically. Also, good documentation. Model cards were created by a team at Google. It's kind of like writing down a recipe for your model. But also giving a lot of other context. You will never know as much about something, especially in the modeling world, as when you're working on it at that moment. You can think that you're going to remember all of these silly little intricacies that you thought would be common sense. But I promise you, I do this on tests as a grad student. I'm like, oh, that makes perfect sense. I'll never forget that. And then I realize I do. Model cards give you an explicit place to write everything down that you've been thinking about.
So, we've seen this vetiver pin write a few times. But what I sneakily have been hiding from you all is that this gives you a little pop-up warning that says model cards provide a transparent, responsible recording. Use the vetiver Quarto template as a place to start. And of course, like anyone else, when you get any information, you just copy and paste the code and you run it. And this will give you a Quarto document. Quarto is an open source framework for technical publishing. My slides are written in Quarto. If you were at a talk yesterday by Daniel Chen, he also gave a talk about Quarto. It is the coolest thing ever. I could go on a whole tangent about how much I love this.
So, this creates a template. There's a little bit of parameters at the top where you can write in your pin information. And then it'll generate a document that looks something like this. And anything we can automate for you is automated. So, things like it's a scikit-learn pipeline. It's using four features. If you have a version, it'll say, like, version X was created at this time. And you can add some information about you and your team. And if you scroll down, you'll be able to see a printout of your p-type. You know, it's looking at different houses, like the type, square foot, bed bath. And some quantitative analysis about how your model is performing. This is all actually just code, even though it looks as beautiful as it does.
You can add any custom plots you want or any custom information. And if you scroll all the way down at the bottom, this is where it gets kind of interesting, especially for me when I want to think about model fairness and how my model is affecting people. And you might think, like, oh, my model does not have any ethical challenges. It's predicting YouTube likes. Okay. I'm not going to write anything down here. And for that, I would say there's kind of two things to think about. One, maybe think about asking the people your model is affecting. Maybe they don't have the same answer as you do from the developer side. And two, even if you have you've done your due diligence and you've asked everyone you should, I would just kind of leave this blank. Like, don't delete it. Because any incomplete information is better than none at all.
My dad has a good quote that I think I like to give to everybody else when they're thinking about model cards. And he said, if you haven't written it down, you haven't thought it out. So even though it feels a little bit slower and it's like, oh, what's this girl talking about for, like, five minutes about writing documentation? This is important stuff, too. So I think this is really important when you're thinking about holistic. What's getting deployed? What are the impacts of people?
My dad has a good quote that I think I like to give to everybody else when they're thinking about model cards. And he said, if you haven't written it down, you haven't thought it out.
Deploying models
All right. We have our little heart. We have versioned our model. And now it's time to think about moving our model out into the real world. Real world. And what is deployment? People have defined this in many different ways. But the way I think about this is any time it's not on your laptop, it is deployed. In vetiver, we think of this or mostly do deployments as API endpoints. It's useful because you can still communicate to your model almost as if it was in memory. Vetiver has some helper functions that you can just do, like, vetiver.predict, give the endpoint, give your data, and it will do the JSON to endpoint back to JSON back to data frame handoff for you. So it feels like your model is right there. It's also useful and will make all your software engineering friends very happy because APIs are testable. So it's quite robust.
So our model should go anywhere outside of our laptop. You can also test these locally to make sure they're working as you expect them to. And that's by just creating a vetiver API. You put your vetiver model inside of it. And myAPI.run will get you a local instance. Of course, we don't want a local instance. That's the whole point of deployment. So if we're trying to move this somewhere else, there is a one liner if you're moving it to our system connect. And you give it the connect server, give it the model board, the name, and the version if you have a version. If you don't want to give a version, it'll just find the latest one. Maybe less recommended. You probably want a robust version in place. But you can mess around and find out.
And if you're not using connect, there is other ways to move your model around. It's kind of a two-step process. One, you want to write an app.py file. And vetiver write app will help you out with this. It'll make a super small generated script for you where it's essentially creating a vetiver model with the board, pin name, and version you're looking for. And then it'll set up the API. And actually, a lot of cloud services right now only need the app.py file. But other places are interested in maybe a Docker file. And vetiver.writeDocker will write that Docker file. It'll get you most of the way there for most deployments. It'll read in this app.py file that was generated. It'll peek around for requirements.txt files. And you have these things in hand to either deploy yourself. If you are on AWS, I think you can upload this directly into ECR and ECS. And it kind of does some Docker magic. And other places have like a bring your own Docker file mentality as well.
So, this is deployment. But we're going to be a little bit more sophisticated. We're going to think about where everything is living now. So, we have to think we have our model somewhere. We have our REST API somewhere. And we have our local laptop. And in a perfect world, our Docker container is as small and skinny as possible. Makes it faster, makes it cheaper. And our model that we want to iterate on, that we want to store all these different versions, is going to live somewhere else. You might think it would be like nice to save this in your Docker file. But that's not quite the case. It's going to get very bloated. Especially because you're not versioning one model. You're probably versioning lots.
So, let's think about how this is going to happen. So, first, when your Docker container spins up, it's going to use that app.py file to load the vetiver model, start up the API. This is one of those holistic, like, best practices that vetiver kind of bakes in for you. If you use those two lines from before, most of this is already happening. Unless you're trying really hard to do some local pins model inside the Docker container. You have to try to make sure that this is a very large container. So, the Docker container is going to peek into your model. It's going to load it up. And then you can communicate to this Docker container just like any other API. You can post to it. You can interact with it. And it feels great.
Monitoring deployed models
But then, you know, you might have to do some analysis on your model. You might do some monitoring. You might have some weird instance or just any other ad hoc information that you need to get from your model. And now you don't have to peek into that Docker container anymore. You can just load it right from your pins board right into memory and use it as expected. You can do all your analysis on your model. And that kind of completes our cycle here.
We have our model is versioned with pins. It's deployed. It's either running in Connect with a one liner or you've made a Docker file to bring it to some other public cloud. And now it's time to monitor. Because once a model is deployed, a data scientist's work is not done.
I do have to say here that monitoring in this sense is going to be a little bit different than maybe you're used to. We're not particularly interested in this package at looking at, like, CPU usage or runtime. Here we're looking at statistical methods. So, like, RSME, MAE, like, is your model performing as well as you thought it was or expect it to?
And vetiver has some helper functions to help you compute pin and plot metrics. I'm not going to give you, like, too in depth of these, but just know they exist. They help you do things like store your metrics data and handles that awkward, like, I have a few days that overlap. You can choose if it overwrites on the last export or not. And a one liner that helps you plot the metrics using Plotly to get all that lovely interactivity.
And I think this is super important to say explicitly, but you should probably be monitoring your model if it's deployed. And that's because, you know, data science is it's a little funky. If things go wrong, you don't necessarily get an error message. You don't get that big X, like, cannot compile, things are failing. Like, your model can continue to give you answers, even if it's 0% accuracy. Like, even if it's the worst model in existence, it will confidently give you that answer. And if you're not monitoring, you might think that answer is right. So, it's super important. Because if you're not monitoring your model, you are oblivious to decay.
So, it's super important. Because if you're not monitoring your model, you are oblivious to decay.
And that completes our cycle. We can version deploy and monitor models. Version deploy and monitor models. We're ready for that school to industry shift. It's scary. It's doable. But there's a lot of MLOps tools out there. It's a happening space. And if you Google what does the MLOps landscape look like, you get an image that looks something like this. And it's terrifying.
Why vetiver?
So, what was I thinking about when I was building vetiver? Like, why is vetiver different? That's kind of a loaded question. But what was I thinking about when building it? And the first thing I was thinking about when I was building it is composability. So, this is important because I wanted just a few simple tools that you're able to compose within themselves to make complex objects. I really only showed you guys two things today. It's the vetiver model and the vetiver API. The vetiver model is, you know, taking your trained model and the vetiver API is making the API. But if you wanted to make an API with many endpoints, that's possible. If you wanted to do all of the crazy oh, my gosh, people do the wildest API gymnastics. This is built off of FastAPI. So, any way you want to extend it that's compatible with FastAPI is possible still. So, it's composable with itself and it can also leverage the entire ecosystem that's around these tools. And not only is it composable within itself, but it's, like, composable with maybe Dask if you wanted to, you know, train with Dask. This is ending just after your model is trained. It's supposed to feel kind of like just an extension of the workflow you already have.
Which brings us to our next point. I wanted to make a project that feels good to use and works with the tools that you like. I wanted something that helps kind of lower the barrier to entry to learning how to do these things that come after training a model. But I still wanted people to be able to use the tools that they liked. Which is why it's happening after the model is trained, not trying to come earlier in the workflow. Also, this has happened before. This might be kind of sacrilegious to say at a PI data conference, but some things are easier in R than Python. So, PINS gives us a great crossroads to leverage the best of both worlds. I've had times where somebody else on my team claims their data in R, they can pin it to a board using Arrow, and I just read it in and continue doing my modeling myself. And that's just a really easy workflow that works for us. But I'd love to share with all of you if you also deal with R people.
Q&A
If you have questions, I'd love to answer them. I'm at the Posit booth right across the hall from here. Or I can take questions now.
Yeah, hi. Yeah, this sounds really awesome. So, I just had a few doubts on the deployment model. So, when you say you deploy it, I assume that you want to have a Docker container and it's running hard to get. So, do you deploy to EKS, ECS, or some EC2? Like, where do you exactly deploy it? And how do you, like, log that? And get, like, CloudWatch logs that the server can go down? Yeah. So, the question is, where are you bringing this model? Where are you bringing the Docker container, really? And you can bring it to, like, ECS, ECR. I'm also looking at, we're looking at, like, Lambda as well. Anywhere that has a, like, bring your own Docker container mentality, which most public clouds do. And they'll have the logs there as well. vetiver does not do logging for things like CPU metrics or anything like that.
On the model monitoring piece, does the package have a perspective or opinion on how to create that feedback loop? So, you mentioned that there's helpers for calculating the evaluation methods. But is there an integrated pattern for how to kind of create that feedback loop? Yeah, that is an awesome question. The question was, how do we close that feedback loop? Of new data back to monitoring? As of right now, we do not have an opinion on that. I would love to talk to anyone who has strong opinions on this. I've been working around on different DAGs to figure out how to get that easier. So, I guess, best answer, don't know right now. There are ways to use DAGs, because it is so lightweight, to build out that framework. I'm sure you could use things like Airflow. I'm saying that tentatively. If someone else wants to chat with me about this, that would be awesome.
When you say monitor models, where does the model go? If it's stored in some storage or in a format? Yeah, that would be kind of up to the user to depend on how you want to pin or save those different metrics. We're able to store them in different pins, and then we bring new metrics in and then continue to store that on that model board.
So yeah, if you're making a prediction, it'll come back as a data frame. So wherever you're going to store that data frame, if you want to store it in a pin, that would be a pretty lightweight workflow to continue to use that ecosystem.
Are there any tools that's on monitoring for monitoring feature drift? Yeah, you would leverage probably... There's a lot of packages out there that's out of the scope of vetiver. We're looking more at the metrics themselves. I know Alibi has done amazing work. I've got to play around with that package a lot that you could plug in as... Yeah, you could plug that in and then store it in that metrics as well. So at the end of compute metrics, metrics is a data frame. So you could add a column, that Alibi or another feature drift detection could add in. Just compose.
Yes. Very basic question, but what kind of customizability could I... What kind of metrics do I see? Are they reset or could I... Yeah, you can use... These are scikit-learn metrics functions. Anything that has a Y true and a Y predict column will be able to be used in the vetiver compute metrics function.
So if I understood the question correctly, how would you collaborate on a model card? Yeah. Okay. So Quarto documents are just code. It looks... Actually, if you want to see my slides, here they are. This is what a Quarto document looks like. And so it is just code. It looks kind of like a Jupyter Notebook. So if you wanted to use it on Git, if you wanted to store it in some other central repository for everyone to collaborate on, you could do that.
That is correct. That is one of my favorite things about Quarto is you don't have to deal with the JSON weirdness of a Jupyter Notebook. Even if you don't have Quarto installed, it's still super readable. And it can execute code chunks. If you want to execute code chunks, you can actually embed whole applications. If you saw the Shiny talk yesterday, you can embed Shiny apps in your slides if you wanted to or in that document with a Quarto extension called Shiny Live. I think one thing that I really have loved about kind of spanning this R Python ecosystem is everything in R works together super well. You're like, oh, I want this. And I also want to use this. And how do I put it together? That I think is super exciting to see with Quarto. Like, you can write Python code. You can write R code. You can have applications. And it can be a document. And if you change one line of code, it's also slides or it's a book or it's a website. That's my little rant on, like, interoperability is so cool. But, yes, Quarto is awesome. Check it out.
Any other questions? Awesome. Thank you all for joining. I'll be out at the Posit booth if you're interested.

