
Model cards with vetiver for transparent, responsible reporting
Good documentation helps us make sense of software, know when and how to use it, and understand its purpose. The same can be true of documentation or reporting for a deployed model, but it can be hard to know where to start. The paper “Model Cards for Model Reporting” (Mitchell et al. 2019) provides a suggested framework for organizing and presenting the essential facts about a deployed machine learning model, and the vetiver packages for both R and Python provide templates for getting started with your own model card. Julia Silge joined us on Jan 29th to share: 1️⃣ How to get started with your first model card 2️⃣ How a model card fits in with model monitoring 3️⃣ How to use Posit Team to author and publish your model card Link to paper: https://lnkd.in/eRbYpfEW GitHub Repo: https://github.com/juliasilge/model-card-workflow-demo Q&A Recording: https://youtube.com/live/tQsyImn18q4?feature=share Want to add future workflow demos to your calendar? We host them the last Wednesday of every month. ️ https://evt.to/aoimiohuw
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, my name is Julia Silge and I'm a data scientist and engineering manager at Posit PBC and today in this screencast we are going to talk about model cards. Model cards are definitely something that you may have heard about in the worlds of data science and machine learning, but today we're going to talk about what they are, when you make them, how to get started, and kind of what are the different ways you can really practically use them inside of an organization.
The main idea here is that we all know how important good documentation is, right? If we are going to choose to use software, often how good the documentation is makes a difference in how easy it is to get started, how easy it is to know how to use it, when is it appropriate to use it, what is its purpose, what are the normal use cases for it. It turns out models work this way too. When we document models, we think about how are we talking to people about what it's like, how we made it, what are the use cases that are appropriate for it. It really can help us be more responsible in our use of machine learning and set ourselves up for success in whatever applications it is that we're using machine learning.
Model development and creating a vetiver model
So let's get started. All right, let's dig into this. So first I want to talk just a little bit about the model development process here. Model cards are something you build as part of the model development process. So here I'm gonna say, let's imagine that I am a data scientist working at a company in our HR department. I am thinking about the employees at our company and let's say that here I've got data on attrition. For various different employees, do they leave our company or not?
I am going to go through a model development process here and split my data on employee attrition into training and testing. I'm going to save that testing data to use for my model card and then I'm going to fit my model here. So for this demonstration purposes, let's look at this and picture this as the time that we spend actually doing exploratory data analysis, developing a model, figuring out what the right kind of model to build is. Let's say I've done this. I've done this now and now it's time for me to deploy my model, to finish my model. So I'm going to load vetiver and pins and then I am going to connect to a Posit Connect board as a place to store my pins.
You can also use things like an S3 bucket or a network drive, a different place, but I am going to, for this demonstration purpose, use a demo Posit Connect server that we have here and I'm going to then connect. I'm going to create a vetiver model and then my model is done. I have gone through the whole model development process. I have found my final model and so what I am going to do is I am going to store this final model together with its metadata as a pin on Posit Connect here.
Now let's notice what happens here. It told me that it wrote the pin and then it says, hey, you know what you should do? You should create a model card for this model now that you published it. It says, you know, it tells me a little bit about what a model card is. It's a way for us to do this kind of transparent, responsible reporting here and it says that vetiver has a template as a place to start. So this is when in the process you go about thinking about this documentation for your model. It's when you've decided you are done with model development.
The model card template
Please go along with my story here that like this thing I just did in, you know, 30 seconds is like model development but like once I'm done with that process of model development, it's time for me to think about a model card. It's time for me to think about documenting my model. We do have a template for you to get started with and I can start one here. Let's say I'm just gonna the file, I'm just gonna write to like a little tiny file here that's R Markdown and I'm going to get the template. It's called vetiver model card like this and it is in fact in the, whoops, it is in fact in the vetiver package like so. And so what we see here is the, we see this whole template that got popped up here for me.
Incredible, right? So this is a R Markdown template that can let you get started. So if you have never written a model card, this is a great, a great place to start. You can pop open this thing and then see it has in it all these headings and the where this idea comes from is from a, this paper that was published in 2019, Margaret Mitchell is the main author, it's called Model Cards for Model Reporting. And the idea here is that like we're using machine learning models all the, all the time but we don't have a good framework for how to document them, how to go about saying how to use them. And so this paper, which I highly recommend that you, you know, that you, that you read, really outlines what are some of the, what is the kind of framework that you could use to document any trained machine learning model.
These folks wrote this paper in the context of working on fairness, accountability, and transparency in models. But, but these are, these are, this is a concept that can be used in any kind of model and in fact is a great way to document the kind of thing that you have there.
Walking through a completed model card
So, so here is a little, you know, like a very, a very, very sort of contrived kind of like here's a template for a thing to you to do. I want to now pull up a model card that would, that would maybe, that I have spent a little bit of time on. You know, in this demo I didn't want to sit here typing for you, but I do want to, like, let's restart. So, so let's say we're totally done with, we're done with this. We're done with model development and now we're writing this model card. Notice I am, I am working in Quarto now. I wrote this one as Quarto instead of R Markdown because I want to be able to show you a few different ways I might use it as a, as a Quarto document.
And so I spent a little bit of time for this model, the model, let me, let's, let's pull this up. So let's, let's run some of this code here so that I can see the, the, this model here. Well, let me print it out here. This classification modeling workflow that I made to predict attrition for employees. Let's walk through what, what are the things that I have done here? What are the things that we have done to put together a model card? So the first section is model details. And here we have a mix of things that I have been able to do in an automated way with things that I had to write myself. So for example, we can get some automated information off of the model itself and write code that I could reuse, you know, every single time I do a model card. But then there are pieces here that are about this particular model that I, as a human being have to write. So we kind of have this, you know, like many of the kinds of reporting and data artifacts we make, they're, they're part of it is code that is run and part of it is this, this human thought that goes into what is happening here.
So we've got some details. The next important part, the section that we have here is the intended use of the model card. And here I, you know, I outline kind of like what might, what might someone in this situation say? They might say like this, the primary use here is identifying employees at higher risk of attrition. We might, we also want to say like, who are the users of the model? In this case, I, you know, I'm saying like, oh, this is for us in HR to use, right? And then I say some things that are out of scope, right? Like we're not going to, in this case, I say, hey, we're not going to provide these scores to individual managers to say like, hey, this person needs a raise, you know, because they might, they might be at risk for attrition.
So we've got intended use case here. The next section after this is important factors. So this is, what are the important things to know about this model? This model has to do with human beings. And so that is, you know, like there are some very important factors here, but so many of our models do, right? So we highlight the, how this relates to demographics. We, we talk about, oh, when we evaluated the model, we examine aspects like how the model performs across different departments or in different demographic characteristics.
Now we go into the metrics. So here we're saying, how are we evaluating how this model is doing? So the, I outline here two metrics that I've decided to use in this model card. One of them is accuracy and one of them is mean log loss. Notice that I'm specific about how I computed it so that when other people come to look at this documentation, they're clear on what I did. And I said, I chose those metrics because the first one, accuracy, helps us understand what proportion of our predictions are correct. And then the, the second one, mean log, mean log loss helps us understand how close our predictions are to the true values there. So we talk about what metrics are we going to use?
Then we begin to talk about what data we're using. So the, we were specific about the training data sets, like how many examples are in it? Where did we get it? We taught, we're specific here about what, what is the shape of the data that goes into training this model? Like when, when we have that training data and we, in this case, job satisfaction is a factor. Monthly income is a integer value. And then we have more factors for whether someone works overtime and what department they're in here. So we're specific here also about the evaluation data set. What, what data are we using to evaluate the model? And in this case, it's a random sample that is held out from an original data set of employees here. So we split our data here into training and testing.
And typically when you're writing a model card, so remember model card is work that you do at the end of model development. And so at that time, what data do you have? You have this pool of data that you have divided into training and testing. And then the, that testing data here is what we're using as a, to evaluate, to evaluate our model here.
Quantitative analysis and disaggregated metrics
So now it's time for us to do some quantitative analysis here. To get that, that testing data back, and I am going to make predictions here with it. So remember, this is not the data that the model was trained on. It was not used for the model estimation, but it is data that I have at the time of model training. So, you know, we often call this the holdout set or the test set here. This is what I'm using for model card. I don't want to use the training set here to evaluate the model and the model card, because just like if I'm using that to try to choose a model, once again, like that would be, that would give us overly optimistic answers here.
So one thing that the paper really makes a point of, the model card paper, is that it is, we, it's very common to report overall model performance. So here are those metrics that I'm going to use, and I, I'm computing those here for, for us. And we see the accuracy is roughly 0.8, the mean long loss roughly 0.4. So accuracy is a model that's better when it's high. So closer to one is better. Log loss is a metric that's better when it's low. So closer to zero is better here. So it's very common for us to look at overall model performance, but something that the, that paper really points out that I think is super valuable, that it is also important to report metrics disaggregated by like something that is important.
it is also important to report metrics disaggregated by like something that is important.
So like, for example, I can look at differences in, in these metrics across my gender categories. So I have women and men here, and these, you know, these look pretty close to me, right? Like these are not very different, but it's important that I look at the disaggregated, at the disaggregated metrics so that I can know if I have differences, if I, if my model performs better for one category of, in this case, person, right? Like versus another, this is an important thing to have when we document our models.
So this is by gender. I can also look here by education. And here, here you'll notice, so let's look at accuracy. Accuracy kind of changes across this, this factor from less than college, some college, bachelor, master's degree, doctorate. Like we see a shift in accuracy. So we, and if we look, we see the same thing when it comes to log loss, like the, we're able to do a much better job predicting whether someone is on their way out of the job or not for these people with high education. Whereas these people who have lower education, we're doing a worse job. We're doing a worse job predicting whether they, attrition for that group.
So this, you know, this highlights how important it is to, to report model performance disaggregated by, you know, whatever is important or the kind of things that we know may be important here. Remember that these were not the inputs to the model. The inputs to the model were job satisfaction, monthly income, overtime, and department. So these, these are not things like, these are the things that go into the model. And at the same time, we see these kinds of shifts across the like different, we, so we don't see a shift in this demographic and we do see shifts and differences in this demographic. So when you document your model, this is exactly the kind of thing that we need to be clear about so that people can know and use this model in the best way possible.
A model card, also a great place to visualize model performance. So here, I can make an ROC curve for, for across the different departments in my imaginary company, where I, where I'm imagining that I'm a data scientist. And here, looking at this plot, I, maybe we do a bit better for like, we do a better job predicting accurately for research than for sales. You know, like that might be something, but that's not a, it's not like a, there's maybe a moderate difference there. And it looks like the number of people in human resources are much smaller. There's fewer of them. And so, you know, we have less, less confidence there as well. So that's good, important information to see.
Ethical considerations and caveats
Okay. So this was the, this was all this like quantitative analysis section, which is a super important part of our documentation for our model. Then we kind of get to these last couple sections here. So we have first ethical considerations. So let, let's notice what I said here. It's like, we considered how this model may exacerbate existing inequality in our company by performing better or worse for different groups. So I, then I kind of outlined somewhere I don't think there's significant differences. And then somewhere I do think there are significant differences. Like, I think I see differences here in education. So, so this is important for the ethical consideration. So the, not all models have as clear an ethical implication as this one here. But we, I think it's, you know, I think it is, I'm really thankful for tools and frameworks like this model card one that helped me identify when it is, and helped me do a good job of documenting a model in the best way possible.
And then the last section that the model card framework recommends is a caveats and recommendations. So I, you know, I kind of imagined what I might recommend in a situation here, some caveats, right? That these are like, we can't, we can't account for individual variation. You know, I, I have something here that like, if management is going to, if they're the users, like if, if people, let's say leaders in HR are the users for this model, they, they have to know that the scores are not equally accurate for all employees. And then the specific caveat that I have here, the model is less useful in predicting attrition for employees with less education than those with more. So like, we can't count on this model to be, to perform, to give us the same results across these different kinds of groups there.
Publishing and sharing the model card
Okay. So that's, that is the, the model card itself, the model card itself. So the next sort of little category of things I want to talk about is like, what do I do now? Let's say I spent a lot of time developing my model. And then I spent, you know, I sat down, I spent some time documenting my model and I have this great document that I have made. What do I do with it? Where does it go? What are some good practices around it? So I am going to walk through a couple of very, very concrete kinds of things that you can do.
So this is a Quarto file. And so if I come here and I decide to render the Quarto file, I've got, I've got some options. Let me say, first, I'm going to render it to HTML, because that will let me actually look at it here in RStudio and be able to show you kind of what it looks like. So it is, here it goes. And so what we've got here is the rendered HTML over here on this side. And so you can see, we've got, you know, our nice links, we have our, our output here, depending on who the audience was for this document, I might make these into some nicer tables. But you can, we can kind of like click back and forth here and see the gender, see education, see the department here, see the rest of my stuff. So here is some rendered HTML.
So now that this is rendered, it can go, it can go pretty much anywhere, you know, like it can go anywhere that HTML can go, which is a lot of places. Let me show you where I might, one place I might, I might, I might go ahead and publish it, I might take the HTML and deploy it to Posit Connect. So let's see if this, let me click this, it is going to upload the bundle and pop up a, if I click view here. So here it is. So I just published this to Posit Connect, which is a great place for me to be able to share this with people who I work with. One, the thing that is great about publishing this to Connect is that Connect is probably also where the binary artifact for the model is. So, and you can actually start to create, you know, like a, like a, like a dashboard or tags or use the organization that exists on Connect to be able to connect those things together.
So, so Posit Connect can be a great place to publish HTML if the people who are the users of the model can come to Connect and see it and see your published thing here. So that's, let's start with that. So option number one, we render to HTML, which, you know, that's what this looks like over here. And then we, we can then publish to any, any, you know, we can of course do something like GitHub pages, but, but maybe more likely something like Posit Connect.
Now let's say what are the other options that I can do here? So I'm over here in the terminal. I, what I, what we happened before is this thing rendered to HTML. I also can render to lots of other different kinds of things. For example, I can render to just plain Markdown. And then I am going to pop that open. The, the thing that I did with the little, the tab set, it does not actually work so well in Markdown. So I would maybe want to edit this file a little bit if my main target was Markdown. But what is great about Markdown and something I have done is that I can now check it into the repo where I have my model training code. So in, and in fact, it makes a pretty good readme.
So you could, you know, here, like you can, you can take that, that Markdown and, you know, you can call it the readme. And then if this were, let's say this project that I'm working on was a GitHub repo that I was, that I was, where I stored my information about this model, I can use that actually makes a fantastic readme for that, for that repo. When would I choose one or the other? I think I would choose HTML and Posit Connect when the users of the models were business stakeholders. And I think I would choose the, you know, like treating it like a readme when the main users of the model were maybe software engineer colleagues who need to know how the, what the model is like, how to use it, various things there. I think it's one thing I have found helpful is to think about the, the data artifact I'm making really matching the user who needs to read it, right? Like where are they going and what do they need to know and find?
Okay. We've got other options. If we come back here, I can, when I render the document, I can render it, you know, to a PDF here. When I render to PDF, these actually look quite nice these days. And if I render to a PDF, that lets me do something like, you know, upload it to Google Drive or to Teams or, you know, wherever it is, and like have a, I have then a, a artifact that business stakeholders, whoever needs to know what the model is like can go in and get, get. The other, I mean, the other thing I've done actually, like I have, I have done my fair amount of like rendering to Word and then uploading Word to Google Drive. And that also gives me this nice this nice sort of workflow where I can get something into Google Drive that people can comment on that I myself have made with Quarto. So actually the I'm even going to say we can publish straight to Confluence using Quarto, which is another really great thing to be aware of.
So this thing, this, by using this sort of code first approach to documentation, you get this reusable thing that you can put where it best makes sense for for the user that you like, who are the users, who are the readers, the people, the audience for this, you can put it where they, where they need to go.
Integrating model cards into model monitoring
So the next thing I want to talk about a little bit is how does this fit into model monitoring? So I'm going to show you an example of a model monitoring dashboard that, that incorporates model card information. So this is for a vetiver model that was trained that was trained with Python and that my co-worker colleague Isabel Zimmerman made. And you can see we've got a, you know, like a front page here that has model card information. And then it also has model monitoring information. How is it, how is it performing over time? And a model monitoring dashboard also can give us information, you know, like help us identify what is most likely to be misclassified.
So what I want to, I want to highlight a few things. The model card typically has information, is typically information that we wrote and that we created at the time of model development. And it is very, it's very useful to integrate it into a model monitoring dashboard, but a model monitoring dashboard is typically made so that we can monitor over time new data that is coming in. So the data, so I think the really important things to think about are what data is being used. When I, if we pop back over here and we look at this, the data that went as going into this is data that I had available to me when I trained my model. And so that is available for me to evaluate the model, choose a model, all the things that are part of doing model development.
When it comes to model monitoring, that's my model is in production and new data is coming in. I'm making predictions. And then I want to say, how is my model doing over time? So I need to evaluate using the new data, not the data available to me at the time. And I again want to highlight what is great about using this kind of code first approach to this kind of documentation, because I actually, like if I decided I wanted to do something like that, the first step for me is this, is this. Like I change it from being something that I was planning to render to an HTML report to a dashboard. And then I, you know, I start coming in and I, you know, I start doing things like row and whatnot, you know, to, to determine where my various, like, like what will my dashboard look like, but that like, this is the first thing that happens. And I'm able to incorporate the, in this case, the model card information that I have into a dashboard that then can be presented there.
So model monitoring is a separate sort of task than model documentation, but depending on who the users are of this information, you can integrate them and you can present them in ways that are really holistic and fluent.
model monitoring is a separate sort of task than model documentation, but depending on who the users are of this information, you can integrate them and you can present them in ways that are really holistic and fluent.
All right. So we spent some time talking about model cards today. We, we talked about when we make model cards, which is at the end of the model development process. We talked about how to get started easily using templates. What are the different parts that we, if we're going to use the framework from the 2019 paper, what are the different sections that we would want to want to have and be able to make sure we include. And then we talked about once it's written, where does it go? And who, where do we put it? Who is supposed to read it? I mean, I think the big takeaway is the person, the person who needs to read it is the person who is going to use the model, who is going to use that, that prediction to make a decision or to, to do something there. So I hope this was helpful and I'll see you next time.

