Model cards with vetiver for transparent, responsible reporting

Transcript#

This transcript was generated automatically and may contain errors.

Hi, my name is Julia Silge and I'm a data scientist and engineering manager at Posit PBC and today in this screencast we are going to talk about model cards. Model cards are definitely something that you may have heard about in the worlds of data science and machine learning, but today we're going to talk about what they are, when you make them, how to get started, and kind of what are the different ways you can really practically use them inside of an organization.

The main idea here is that we all know how important good documentation is, right? If we are going to choose to use software, often how good the documentation is makes a difference in how easy it is to get started, how easy it is to know how to use it, when is it appropriate to use it, what is its purpose, what are the normal use cases for it. It turns out models work this way too. When we document models, we think about how are we talking to people about what it's like, how we made it, what are the use cases that are appropriate for it. It really can help us be more responsible in our use of machine learning and set ourselves up for success in whatever applications it is that we're using machine learning.

it is also important to report metrics disaggregated by like something that is important.

So like, for example, I can look at differences in, in these metrics across my gender categories. So I have women and men here, and these, you know, these look pretty close to me, right? Like these are not very different, but it's important that I look at the disaggregated, at the disaggregated metrics so that I can know if I have differences, if I, if my model performs better for one category of, in this case, person, right? Like versus another, this is an important thing to have when we document our models.

So this is by gender. I can also look here by education. And here, here you'll notice, so let's look at accuracy. Accuracy kind of changes across this, this factor from less than college, some college, bachelor, master's degree, doctorate. Like we see a shift in accuracy. So we, and if we look, we see the same thing when it comes to log loss, like the, we're able to do a much better job predicting whether someone is on their way out of the job or not for these people with high education. Whereas these people who have lower education, we're doing a worse job. We're doing a worse job predicting whether they, attrition for that group.

So this, you know, this highlights how important it is to, to report model performance disaggregated by, you know, whatever is important or the kind of things that we know may be important here. Remember that these were not the inputs to the model. The inputs to the model were job satisfaction, monthly income, overtime, and department. So these, these are not things like, these are the things that go into the model. And at the same time, we see these kinds of shifts across the like different, we, so we don't see a shift in this demographic and we do see shifts and differences in this demographic. So when you document your model, this is exactly the kind of thing that we need to be clear about so that people can know and use this model in the best way possible.

A model card, also a great place to visualize model performance. So here, I can make an ROC curve for, for across the different departments in my imaginary company, where I, where I'm imagining that I'm a data scientist. And here, looking at this plot, I, maybe we do a bit better for like, we do a better job predicting accurately for research than for sales. You know, like that might be something, but that's not a, it's not like a, there's maybe a moderate difference there. And it looks like the number of people in human resources are much smaller. There's fewer of them. And so, you know, we have less, less confidence there as well. So that's good, important information to see.

Ethical considerations and caveats

Okay. So this was the, this was all this like quantitative analysis section, which is a super important part of our documentation for our model. Then we kind of get to these last couple sections here. So we have first ethical considerations. So let, let's notice what I said here. It's like, we considered how this model may exacerbate existing inequality in our company by performing better or worse for different groups. So I, then I kind of outlined somewhere I don't think there's significant differences. And then somewhere I do think there are significant differences. Like, I think I see differences here in education. So, so this is important for the ethical consideration. So the, not all models have as clear an ethical implication as this one here. But we, I think it's, you know, I think it is, I'm really thankful for tools and frameworks like this model card one that helped me identify when it is, and helped me do a good job of documenting a model in the best way possible.

And then the last section that the model card framework recommends is a caveats and recommendations. So I, you know, I kind of imagined what I might recommend in a situation here, some caveats, right? That these are like, we can't, we can't account for individual variation. You know, I, I have something here that like, if management is going to, if they're the users, like if, if people, let's say leaders in HR are the users for this model, they, they have to know that the scores are not equally accurate for all employees. And then the specific caveat that I have here, the model is less useful in predicting attrition for employees with less education than those with more. So like, we can't count on this model to be, to perform, to give us the same results across these different kinds of groups there.

Okay. So that's, that is the, the model card itself, the model card itself. So the next sort of little category of things I want to talk about is like, what do I do now? Let's say I spent a lot of time developing my model. And then I spent, you know, I sat down, I spent some time documenting my model and I have this great document that I have made. What do I do with it? Where does it go? What are some good practices around it? So I am going to walk through a couple of very, very concrete kinds of things that you can do.

So this is a Quarto file. And so if I come here and I decide to render the Quarto file, I've got, I've got some options. Let me say, first, I'm going to render it to HTML, because that will let me actually look at it here in RStudio and be able to show you kind of what it looks like. So it is, here it goes. And so what we've got here is the rendered HTML over here on this side. And so you can see, we've got, you know, our nice links, we have our, our output here, depending on who the audience was for this document, I might make these into some nicer tables. But you can, we can kind of like click back and forth here and see the gender, see education, see the department here, see the rest of my stuff. So here is some rendered HTML.

So now that this is rendered, it can go, it can go pretty much anywhere, you know, like it can go anywhere that HTML can go, which is a lot of places. Let me show you where I might, one place I might, I might, I might go ahead and publish it, I might take the HTML and deploy it to Posit Connect. So let's see if this, let me click this, it is going to upload the bundle and pop up a, if I click view here. So here it is. So I just published this to Posit Connect, which is a great place for me to be able to share this with people who I work with. One, the thing that is great about publishing this to Connect is that Connect is probably also where the binary artifact for the model is. So, and you can actually start to create, you know, like a, like a, like a dashboard or tags or use the organization that exists on Connect to be able to connect those things together.

So, so Posit Connect can be a great place to publish HTML if the people who are the users of the model can come to Connect and see it and see your published thing here. So that's, let's start with that. So option number one, we render to HTML, which, you know, that's what this looks like over here. And then we, we can then publish to any, any, you know, we can of course do something like GitHub pages, but, but maybe more likely something like Posit Connect.

Now let's say what are the other options that I can do here? So I'm over here in the terminal. I, what I, what we happened before is this thing rendered to HTML. I also can render to lots of other different kinds of things. For example, I can render to just plain Markdown. And then I am going to pop that open. The, the thing that I did with the little, the tab set, it does not actually work so well in Markdown. So I would maybe want to edit this file a little bit if my main target was Markdown. But what is great about Markdown and something I have done is that I can now check it into the repo where I have my model training code. So in, and in fact, it makes a pretty good readme.

So you could, you know, here, like you can, you can take that, that Markdown and, you know, you can call it the readme. And then if this were, let's say this project that I'm working on was a GitHub repo that I was, that I was, where I stored my information about this model, I can use that actually makes a fantastic readme for that, for that repo. When would I choose one or the other? I think I would choose HTML and Posit Connect when the users of the models were business stakeholders. And I think I would choose the, you know, like treating it like a readme when the main users of the model were maybe software engineer colleagues who need to know how the, what the model is like, how to use it, various things there. I think it's one thing I have found helpful is to think about the, the data artifact I'm making really matching the user who needs to read it, right? Like where are they going and what do they need to know and find?

Okay. We've got other options. If we come back here, I can, when I render the document, I can render it, you know, to a PDF here. When I render to PDF, these actually look quite nice these days. And if I render to a PDF, that lets me do something like, you know, upload it to Google Drive or to Teams or, you know, wherever it is, and like have a, I have then a, a artifact that business stakeholders, whoever needs to know what the model is like can go in and get, get. The other, I mean, the other thing I've done actually, like I have, I have done my fair amount of like rendering to Word and then uploading Word to Google Drive. And that also gives me this nice this nice sort of workflow where I can get something into Google Drive that people can comment on that I myself have made with Quarto. So actually the I'm even going to say we can publish straight to Confluence using Quarto, which is another really great thing to be aware of.

So this thing, this, by using this sort of code first approach to documentation, you get this reusable thing that you can put where it best makes sense for for the user that you like, who are the users, who are the readers, the people, the audience for this, you can put it where they, where they need to go.

Integrating model cards into model monitoring

So the next thing I want to talk about a little bit is how does this fit into model monitoring? So I'm going to show you an example of a model monitoring dashboard that, that incorporates model card information. So this is for a vetiver model that was trained that was trained with Python and that my co-worker colleague Isabel Zimmerman made. And you can see we've got a, you know, like a front page here that has model card information. And then it also has model monitoring information. How is it, how is it performing over time? And a model monitoring dashboard also can give us information, you know, like help us identify what is most likely to be misclassified.

So what I want to, I want to highlight a few things. The model card typically has information, is typically information that we wrote and that we created at the time of model development. And it is very, it's very useful to integrate it into a model monitoring dashboard, but a model monitoring dashboard is typically made so that we can monitor over time new data that is coming in. So the data, so I think the really important things to think about are what data is being used. When I, if we pop back over here and we look at this, the data that went as going into this is data that I had available to me when I trained my model. And so that is available for me to evaluate the model, choose a model, all the things that are part of doing model development.

When it comes to model monitoring, that's my model is in production and new data is coming in. I'm making predictions. And then I want to say, how is my model doing over time? So I need to evaluate using the new data, not the data available to me at the time. And I again want to highlight what is great about using this kind of code first approach to this kind of documentation, because I actually, like if I decided I wanted to do something like that, the first step for me is this, is this. Like I change it from being something that I was planning to render to an HTML report to a dashboard. And then I, you know, I start coming in and I, you know, I start doing things like row and whatnot, you know, to, to determine where my various, like, like what will my dashboard look like, but that like, this is the first thing that happens. And I'm able to incorporate the, in this case, the model card information that I have into a dashboard that then can be presented there.

So model monitoring is a separate sort of task than model documentation, but depending on who the users are of this information, you can integrate them and you can present them in ways that are really holistic and fluent.

model monitoring is a separate sort of task than model documentation, but depending on who the users are of this information, you can integrate them and you can present them in ways that are really holistic and fluent.

All right. So we spent some time talking about model cards today. We, we talked about when we make model cards, which is at the end of the model development process. We talked about how to get started easily using templates. What are the different parts that we, if we're going to use the framework from the 2019 paper, what are the different sections that we would want to want to have and be able to make sure we include. And then we talked about once it's written, where does it go? And who, where do we put it? Who is supposed to read it? I mean, I think the big takeaway is the person, the person who needs to read it is the person who is going to use the model, who is going to use that, that prediction to make a decision or to, to do something there. So I hope this was helpful and I'll see you next time.

Model cards with vetiver for transparent, responsible reporting

Transcript#

Model development and creating a vetiver model

The model card template

Walking through a completed model card

Quantitative analysis and disaggregated metrics

Ethical considerations and caveats

Publishing and sharing the model card

Integrating model cards into model monitoring