Isabel Zimmerman - Holistic MLOps for better science | PyData NYC 2022

Transcript#

This transcript was generated automatically and may contain errors.

Hello, everyone. I am Isabel Zimmerman . If you're in this room, you're probably looking for a talk called Holistic MLOps for better science, hopefully. So I work for a company called Posit. If you are familiar with the company called RStudio . Last week, I think, we were RStudio. We are now Posit, really embracing the extension to all the beautiful things we've done for the R ecosystem into Python.

But a little bit more about me. This is me. This is my dog, Toast. I am a full-time open source software engineer. I write Python packages. Hello. But I'm also a grad student currently. So I get to, you know, spend my small amount of free time playing games like Mario Kart. If you're not quite familiar with Mario Kart, it's pretty low stress, in theory, unless you have friends like mine. And you get to drive little go-karts around. There's a famous track called Rainbow Road. So this is me. This is Toast. We're playing Mario Kart together.

But maybe more importantly for this talk, what is MLOps? MLOps is a set of practices to deploy and maintain machine learning models in production reliably and efficiently. And these practices can be hard. When I started, I was at a company where I was deploying machine learning models on Kubernetes systems, which is really like the worst of both worlds. You're trying to deploy model systems and you're working with Kubernetes. And I felt really I had data science skills and the tools I was using, they weren't quite built for people like me. They were built for maybe a cloud architect. And I didn't think that I should have to have maybe all the knowledge of a cloud architect or a systems engineer to be able to at least get my models to a point where a DevOps team could easily deploy them.

Real quick, has anybody in here like deployed a model before by a show of hands? Okay. And keep your hands up if that was just a delightful experience. Like best moment of your day. Okay. So maybe my hypothesis was correct. I ended up moving career paths to build a package called vetiver that is built for data scientists to help you guys and me, selfishly, deploy models a little bit easier. And if I think about, you know, who's making models, it's a lot of times people writing R code and people writing Python code. This is actually a package in both Python and R, so you can download it from CRAN or from PyPI.

My dad has a good quote that I think I like to give to everybody else when they're thinking about model cards. And he said, if you haven't written it down, you haven't thought it out.

Deploying models

All right. We have our little heart. We have versioned our model. And now it's time to think about moving our model out into the real world. Real world. And what is deployment? People have defined this in many different ways. But the way I think about this is any time it's not on your laptop, it is deployed. In vetiver, we think of this or mostly do deployments as API endpoints. It's useful because you can still communicate to your model almost as if it was in memory. Vetiver has some helper functions that you can just do, like, vetiver.predict, give the endpoint, give your data, and it will do the JSON to endpoint back to JSON back to data frame handoff for you. So it feels like your model is right there. It's also useful and will make all your software engineering friends very happy because APIs are testable. So it's quite robust.

So our model should go anywhere outside of our laptop. You can also test these locally to make sure they're working as you expect them to. And that's by just creating a vetiver API. You put your vetiver model inside of it. And myAPI.run will get you a local instance. Of course, we don't want a local instance. That's the whole point of deployment. So if we're trying to move this somewhere else, there is a one liner if you're moving it to our system connect. And you give it the connect server, give it the model board, the name, and the version if you have a version. If you don't want to give a version, it'll just find the latest one. Maybe less recommended. You probably want a robust version in place. But you can mess around and find out.

And if you're not using connect, there is other ways to move your model around. It's kind of a two-step process. One, you want to write an app.py file. And vetiver write app will help you out with this. It'll make a super small generated script for you where it's essentially creating a vetiver model with the board, pin name, and version you're looking for. And then it'll set up the API. And actually, a lot of cloud services right now only need the app.py file. But other places are interested in maybe a Docker file. And vetiver.writeDocker will write that Docker file. It'll get you most of the way there for most deployments. It'll read in this app.py file that was generated. It'll peek around for requirements.txt files. And you have these things in hand to either deploy yourself. If you are on AWS, I think you can upload this directly into ECR and ECS. And it kind of does some Docker magic. And other places have like a bring your own Docker file mentality as well.

So, this is deployment. But we're going to be a little bit more sophisticated. We're going to think about where everything is living now. So, we have to think we have our model somewhere. We have our REST API somewhere. And we have our local laptop. And in a perfect world, our Docker container is as small and skinny as possible. Makes it faster, makes it cheaper. And our model that we want to iterate on, that we want to store all these different versions, is going to live somewhere else. You might think it would be like nice to save this in your Docker file. But that's not quite the case. It's going to get very bloated. Especially because you're not versioning one model. You're probably versioning lots.

So, let's think about how this is going to happen. So, first, when your Docker container spins up, it's going to use that app.py file to load the vetiver model, start up the API. This is one of those holistic, like, best practices that vetiver kind of bakes in for you. If you use those two lines from before, most of this is already happening. Unless you're trying really hard to do some local pins model inside the Docker container. You have to try to make sure that this is a very large container. So, the Docker container is going to peek into your model. It's going to load it up. And then you can communicate to this Docker container just like any other API. You can post to it. You can interact with it. And it feels great.

Monitoring deployed models

But then, you know, you might have to do some analysis on your model. You might do some monitoring. You might have some weird instance or just any other ad hoc information that you need to get from your model. And now you don't have to peek into that Docker container anymore. You can just load it right from your pins board right into memory and use it as expected. You can do all your analysis on your model. And that kind of completes our cycle here.

We have our model is versioned with pins. It's deployed. It's either running in Connect with a one liner or you've made a Docker file to bring it to some other public cloud. And now it's time to monitor. Because once a model is deployed, a data scientist's work is not done.

I do have to say here that monitoring in this sense is going to be a little bit different than maybe you're used to. We're not particularly interested in this package at looking at, like, CPU usage or runtime. Here we're looking at statistical methods. So, like, RSME, MAE, like, is your model performing as well as you thought it was or expect it to?

And vetiver has some helper functions to help you compute pin and plot metrics. I'm not going to give you, like, too in depth of these, but just know they exist. They help you do things like store your metrics data and handles that awkward, like, I have a few days that overlap. You can choose if it overwrites on the last export or not. And a one liner that helps you plot the metrics using Plotly to get all that lovely interactivity.

And I think this is super important to say explicitly, but you should probably be monitoring your model if it's deployed. And that's because, you know, data science is it's a little funky. If things go wrong, you don't necessarily get an error message. You don't get that big X, like, cannot compile, things are failing. Like, your model can continue to give you answers, even if it's 0% accuracy. Like, even if it's the worst model in existence, it will confidently give you that answer. And if you're not monitoring, you might think that answer is right. So, it's super important. Because if you're not monitoring your model, you are oblivious to decay.

So, it's super important. Because if you're not monitoring your model, you are oblivious to decay.

And that completes our cycle. We can version deploy and monitor models. Version deploy and monitor models. We're ready for that school to industry shift. It's scary. It's doable. But there's a lot of MLOps tools out there. It's a happening space. And if you Google what does the MLOps landscape look like, you get an image that looks something like this. And it's terrifying.

Why vetiver?

So, what was I thinking about when I was building vetiver? Like, why is vetiver different? That's kind of a loaded question. But what was I thinking about when building it? And the first thing I was thinking about when I was building it is composability. So, this is important because I wanted just a few simple tools that you're able to compose within themselves to make complex objects. I really only showed you guys two things today. It's the vetiver model and the vetiver API. The vetiver model is, you know, taking your trained model and the vetiver API is making the API. But if you wanted to make an API with many endpoints, that's possible. If you wanted to do all of the crazy oh, my gosh, people do the wildest API gymnastics. This is built off of FastAPI. So, any way you want to extend it that's compatible with FastAPI is possible still. So, it's composable with itself and it can also leverage the entire ecosystem that's around these tools. And not only is it composable within itself, but it's, like, composable with maybe Dask if you wanted to, you know, train with Dask. This is ending just after your model is trained. It's supposed to feel kind of like just an extension of the workflow you already have.

Which brings us to our next point. I wanted to make a project that feels good to use and works with the tools that you like. I wanted something that helps kind of lower the barrier to entry to learning how to do these things that come after training a model. But I still wanted people to be able to use the tools that they liked. Which is why it's happening after the model is trained, not trying to come earlier in the workflow. Also, this has happened before. This might be kind of sacrilegious to say at a PI data conference, but some things are easier in R than Python. So, PINS gives us a great crossroads to leverage the best of both worlds. I've had times where somebody else on my team claims their data in R, they can pin it to a board using Arrow, and I just read it in and continue doing my modeling myself. And that's just a really easy workflow that works for us. But I'd love to share with all of you if you also deal with R people.

Q&A

If you have questions, I'd love to answer them. I'm at the Posit booth right across the hall from here. Or I can take questions now.

Yeah, hi. Yeah, this sounds really awesome. So, I just had a few doubts on the deployment model. So, when you say you deploy it, I assume that you want to have a Docker container and it's running hard to get. So, do you deploy to EKS, ECS, or some EC2? Like, where do you exactly deploy it? And how do you, like, log that? And get, like, CloudWatch logs that the server can go down? Yeah. So, the question is, where are you bringing this model? Where are you bringing the Docker container, really? And you can bring it to, like, ECS, ECR. I'm also looking at, we're looking at, like, Lambda as well. Anywhere that has a, like, bring your own Docker container mentality, which most public clouds do. And they'll have the logs there as well. vetiver does not do logging for things like CPU metrics or anything like that.

On the model monitoring piece, does the package have a perspective or opinion on how to create that feedback loop? So, you mentioned that there's helpers for calculating the evaluation methods. But is there an integrated pattern for how to kind of create that feedback loop? Yeah, that is an awesome question. The question was, how do we close that feedback loop? Of new data back to monitoring? As of right now, we do not have an opinion on that. I would love to talk to anyone who has strong opinions on this. I've been working around on different DAGs to figure out how to get that easier. So, I guess, best answer, don't know right now. There are ways to use DAGs, because it is so lightweight, to build out that framework. I'm sure you could use things like Airflow. I'm saying that tentatively. If someone else wants to chat with me about this, that would be awesome.

When you say monitor models, where does the model go? If it's stored in some storage or in a format? Yeah, that would be kind of up to the user to depend on how you want to pin or save those different metrics. We're able to store them in different pins, and then we bring new metrics in and then continue to store that on that model board.

So yeah, if you're making a prediction, it'll come back as a data frame. So wherever you're going to store that data frame, if you want to store it in a pin, that would be a pretty lightweight workflow to continue to use that ecosystem.

Are there any tools that's on monitoring for monitoring feature drift? Yeah, you would leverage probably... There's a lot of packages out there that's out of the scope of vetiver. We're looking more at the metrics themselves. I know Alibi has done amazing work. I've got to play around with that package a lot that you could plug in as... Yeah, you could plug that in and then store it in that metrics as well. So at the end of compute metrics, metrics is a data frame. So you could add a column, that Alibi or another feature drift detection could add in. Just compose.

Yes. Very basic question, but what kind of customizability could I... What kind of metrics do I see? Are they reset or could I... Yeah, you can use... These are scikit-learn metrics functions. Anything that has a Y true and a Y predict column will be able to be used in the vetiver compute metrics function.

So if I understood the question correctly, how would you collaborate on a model card? Yeah. Okay. So Quarto documents are just code. It looks... Actually, if you want to see my slides, here they are. This is what a Quarto document looks like. And so it is just code. It looks kind of like a Jupyter Notebook. So if you wanted to use it on Git, if you wanted to store it in some other central repository for everyone to collaborate on, you could do that.

That is correct. That is one of my favorite things about Quarto is you don't have to deal with the JSON weirdness of a Jupyter Notebook. Even if you don't have Quarto installed, it's still super readable. And it can execute code chunks. If you want to execute code chunks, you can actually embed whole applications. If you saw the Shiny talk yesterday, you can embed Shiny apps in your slides if you wanted to or in that document with a Quarto extension called Shiny Live. I think one thing that I really have loved about kind of spanning this R Python ecosystem is everything in R works together super well. You're like, oh, I want this. And I also want to use this. And how do I put it together? That I think is super exciting to see with Quarto. Like, you can write Python code. You can write R code. You can have applications. And it can be a document. And if you change one line of code, it's also slides or it's a book or it's a website. That's my little rant on, like, interoperability is so cool. But, yes, Quarto is awesome. Check it out.

Any other questions? Awesome. Thank you all for joining. I'll be out at the Posit booth if you're interested.

Isabel Zimmerman - Holistic MLOps for better science | PyData NYC 2022

Transcript#

The data science lifecycle

Versioning models with pins

Model cards and ethical documentation

Deploying models

Monitoring deployed models

Why vetiver?

Q&A