
Isabel Zimmerman – Explaining model explainability
Machine learning doesn’t have the same objectives as its users. While models look to optimize a function using the given data, humans look to gain insight into their problems. At best, these two objectives align; at worst, machine learning models make the front page of the news for unintended, but astonishing bias. Model explainability algorithms allow data scientists to understand not only what the model outcome is, but why it is being made. This talk will explain what model explainability is, who should care, and show participants how/when to use multiple types of explainability algorithms. This session shows the usefulness of a variety of algorithms, but also discusses the limitations. Told from a data scientist’s point of view, this session provides a use case scenario exposing unintended bias using healthcare data. The audience will learn: the basics of model explainability, why this is a relevant issue, how model explainability offers insight into unintended bias, and know how to deploy explainability algorithms in Python with alibi, the open-source library from Seldon. Speaker: Isabel Zimmerman – https://2021.berlinbuzzwords.de/member/isabel-zimmerman More: https://2021.berlinbuzzwords.de/session/explaining-model-explainability
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
I am Isabel Zimmerman, and I will be teaching about model explainability today. I am a software engineer at Red Hat's Artificial Intelligence Center of Excellence with their AIOps, so Artificial Intelligence Operations team. I'm also a very recent graduate from Florida Polytechnic University's Data Science program, and of course, the most important personal tidbit, what was my pandemic hobby? I've been spending hours as a homemade chemist to build the absolute perfect recipe for chocolate chip cookies.
So, in my everyday at work, I get to live the dream and play around with code and break code, and I get to spend a lot of time doing data science work as well. And I really stumbled upon this topic of model explainability actually when I was working as an intern. And it's just a topic that's stuck with me ever since, because people oftentimes think that machine learning answers all of our questions.
But in reality, machine learning doesn't even have the same objectives as its users. We'll dive into this mismatch of objectives between models and humans, and we'll talk about how explainability helps to close this gap. But I also want everyone to be able to walk away from today being able to tell their colleagues about what explainability is and why you should be implementing it. And I also want everyone to gain knowledge of just a few algorithms you can add into your own models and where you can add them into your machine learning workflow.
The explainability elevator pitch
So before we go any further, I think it's time to give you kind of the 30 second elevator pitch for explainability. We know a machine learning model is just some sort of algorithm that uses an input of data, such as a cat picture, to give an output of some insight into that data, such as identifying the photo contents. The output of the model really only answers the question of what. In this example, what is this a photo of? A model will tell you that it's a cat. The output of the model told you the contents of the photo, but not the logic behind making the prediction.
Explainability focuses on the why. Why did this model say this was a cat? Was it the eyes, the ears, the paws, the color? So models look to optimize a mathematical function using the given data, and humans look to gain insight into their problems. At best, these two objectives align, but at worst, machine learning models make the front page news for unintended but really astonishing bias. Explainable machine learning is necessary to close this gap between the machine goal of output and the human goal of understanding.
Explainable machine learning is necessary to close this gap between the machine goal of output and the human goal of understanding.
And we need it when it's not enough to just get the prediction from your model. The model must also justify how it came to this prediction.
Why explainability matters: bias and real-world impact
Admittedly, I'm probably never going to convince you that explainability is important by showing you pictures of cats. In fact, one of the most powerful reasons to use explainability is that it can help uncover bias and improve people's lives. So let's say a lender uses machine learning to determine someone's ability to pay back a loan. If the lender is using a black box model, someone who is denied a loan may not be given a reason why. This unknown reasoning is problematic because one, people will never know how they can change their habits to be approved for a loan. And two, it could be covering up this unintentional but harmful bias.
Where explainability fits in the ML workflow
So let's think about this entire workflow a little bit more carefully. Here's your vanilla data science workflow. And we know that data science starts with having a well-defined problem. And then you're going to gather your relevant data. You'll do some feature engineering to make sure you know what is all inside of it. Maybe put it into a workable state. You'll do some model tuning. And then we'll validate the model.
And we're going to take a pause at this model validation step. Because in your validation step, you'll be looking at some sort of metric, such as root mean squared error, or accuracy, or maybe precision or recall. And these help you know that your model is performing well. In our example, we'd be making sure that the cat model can robustly tell the difference between cats and dogs or that the people who are getting loans are able to pay them off. After tuning, you'll deploy your model and continue to look at it to ensure that it continues to perform as expected, so that there's no data drift or any weird, strange things happening in the model once it's out in production.
So this seems like a pretty well-rounded end-to-end machine learning workflow. Where does explainability fit in? And in general, explainability happens after a model has been validated. You can see there's really arrows all over. This is a process that's repeated many times. You may have to go back to feature engineering, model training, model tuning, revalidate the model. But you have to start with a model that is tuned and validated in order to have useful explanations.
I want to be very careful here, because I don't want people to think that you can add an explainability algorithm rather than making a thoughtfully engineered model. You may have heard that if you put garbage data into a good model, you just have a garbage model. And the same principles apply here. A bad model will never give actionable explanations. Model explainability does not fix poorly engineered models. It simply explains why decisions are being made.
Interpretable vs. black box models
So we're all data scientists here today. And in this talk, we have our data science hat on, and we're looking at our data. We can kind of do a visual, you know, exploratory data analysis. We see that the data is kind of all trending up and to the right. So we've done our bat, we understand our data, we can do some fake feature engineering looking at our data points here. And so now we're at our step where we're deciding which model to use.
And let's start off with the simplest algorithm we can. If you can recall your favorite middle school math teacher, hopefully y equals mx plus b rings a bell. It's a classic. I'm going to take a moment to give a very important reminder. This is a reminder that machine learning is at its core just math. Sometimes we get so caught up with accuracy or whatever on their evaluation metrics, we forget what a model is, and it's just math.
We can look at the options for fitting this model, and our thoughtfully engineered solution is the middle one. So we've trained and tested our algorithm with our data, and our algorithm is now our model y equals 5x plus 1, and it's best suited for our data set. And we know that it's best suited for our data set because on a very basic level, it minimized the average distance between each data point in the line. It was the line of best fit, it kind of goes in the middle of everything. We did a good job. And this is math we can interpret. We can confidently look at our model and see that this is the best one.
And we can see why it's the best. And we understand what happens to our prediction if we have different inputs. So we know that if we test a data point that's further to the right, we're probably going to be getting a higher prediction. And this is intuitive and it makes sense, especially when we have the graph right in front of us. And it's a great example of a very interpretable model that doesn't need an explainability algorithm for you to understand why it's making decisions.
Until you realize that models really don't look like y equals 5x plus 1 in production. They might be part of a larger pipeline, or maybe they're just too complex to be understood at a glance. As our data sets grow in size and complexity, so does the math that makes up our model. And when this math is no longer as simple as y equals mx plus b, it can become difficult to interpret and explain how our model is working and why it's making decisions.
And there are many different types of problems and many different types of models that can really benefit from explainability. However, there is one type of model that especially benefits, and that's your black box models. In fact, this is probably the most important time to use explainability algorithms. So black box models are models where you don't explicitly see how the variables of the model are interacting with each other or just being used in general. They're fairly commonplace in industry since they're quite easy to implement and algorithms exist for nearly every type of machine learning problem that's needed to be solved, and they're fairly easy to implement.
But they can also be very dangerous, and I'll repeat that again to make sure it sits in. Black box models can be very dangerous. When you don't know exactly how the model is creating predictions and you're solely relying on evaluation metrics like accuracy to determine if your model is doing well or not, it can be easy to have unintended consequences.
Black box models can be very dangerous. When you don't know exactly how the model is creating predictions and you're solely relying on evaluation metrics like accuracy to determine if your model is doing well or not, it can be easy to have unintended consequences.
Some common household names for black box models are neural networks or gradient boosts or ensembles. These models give very high accuracy at the expense of not knowing their inner machinations. Beyond your naturally occurring black box models, even some models that are technically interpretable are treated as black boxes. And that's because no data scientist is looking at, you know, a thousand different inputs into a random forest model, even though decision trees are classically labeled as interpretable models.
Our problems and our data are getting more complex, and data scientists have to rely heavily on evaluation metrics such as our accuracy for model building. And we're once again reminded that a model and a human have different goals. Models want to optimize their math problem. Humans want to understand their human problem. For some industries, and especially highly regulated industries such as healthcare or financials, it's not enough to just get the prediction. The model must also justify how it came to the prediction. Explanations can help with this. Machine learning models can only be audited when they can be interpreted. Explanations can help with this, too.
Limits of explainability
So to recap up until now, we know that machine learning models have a different goal than humans. We know that explainability comes after thoughtfully engineered models, and we know that we need explainability because of the increasing complexity of models and the increasing use of black box models. So I'm not an explainability salesperson, even though it might sound like it, and I also want to make sure that you guys understand or you all understand that you do not need these every time you create a model.
First of all, not all models need explainability. Some models are simple. We saw this a few slides back in Y equals MX plus B. We don't need to add a complicated computational effort if it's just not needed. If we have really robust feature engineering or if we have very in-depth subject matter expertise or if we just have a model that we can fully explain, it's not worth our time.
There's another limitation in use cases. Not all use cases need explainability. In all honesty, sometimes you just don't care. This is most noticeable in some types of forecasting or times where the output just isn't really important enough to justify the added complexity of these algorithms. Additionally, there's some use cases where the problem is very well studied and feature interactions are just overall really well understood. And so explainability isn't needed there.
Finally, you need to set very healthy boundaries with your explainability algorithms. Explanations help when you need to justify how a model came to a prediction. Explanations can expose bias. Explanations cannot fix biased models. They do not make changes to the underlying model or data. So you need to really remember what explainability can and cannot do for you.
Healthcare application: MRSA prediction
We've gone through our fine print for the limits of explainability, and it's time to look at an application. As a disclaimer, I don't work for a healthcare company, and I really don't claim to be giving sound medical advice. But in my free time, I'm really interested in the world of healthcare and the hospital ecosystem. I think they have really interesting problems that can really benefit from machine learning especially.
So I looked at this openly available sample data from MIMIC, which is de-identified patient data from a real hospital in the United States. I was particularly interested in better understanding healthcare acquired infections. And these are all infections that patients are not admitted with, but they contract from being in the healthcare environment, you know, just being in a hospital room and all the bacteria that's moving around in there. And I'm focusing in on one healthcare acquired infection called MRSA.
This data has a lot of inputs on patient demographics, such as age, gender, ethnicity, as well as medical information, such as symptoms, diagnosis, length of stay. I did a pretty vanilla feature engineering for this, and I made my model. It's a support vector classifier. And even though that the data set is unbalanced, it's about 90% accuracy, 90% accurate for predicting MRSA. And we can say that my model is performing fairly well.
And it could be, you know, in theory, ready to be in production, putting in production in quotes, because this is a fake in production moment. It's not a perfect example of a very big or highly complex data set or model. But this is our pseudo black box. The inputs are very wide, and I can't see exactly how the model is making these decisions. So it's a good contender for my actual absolute favorite use of explainers, which is the I'm just curious to see what's happening here usage. Curiosity can get you quite far.
Global and local explainability algorithms
My next step in this problem is to look at the contending algorithms for explaining the prediction of these infections. I do have to claim that all the examples I use in the next slides are fictitious and completely hypothetical. But I want to look at this model at a few different levels. And the first level is at a global scope. This is kind of your 5,000 foot view of your model, understanding how the model itself is making decisions. It can answer questions such as which features are important and what's the interaction between these features.
Next we have our local scope. So this level of explainability is all about understanding how the model has made a decision for a particular instance. There are many more local explainers than global explainers. And that's partially because global explainers are very computationally complex and therefore very time consuming and very expensive to run. The lack of global explainers also exists because not all instances have the same weights for each feature. So we can't generalize individual explanations to the whole model.
So maybe a clearer way of saying this is in our healthcare frame of mind, not all patients come in with the same symptoms. You wouldn't want to generalize the symptoms of one patient to the entire hospital since you would lose your ability to have a unique and accurate diagnosis. The same idea applies here. You want to diagnose a particular instance and hope that looking at the smaller area will give better results and explanations to other similar data points that you have. Local explainers can answer questions such as, you know, which features factored most heavily into this classification? And how would the prediction change if certain features changed?
So I'll first look at some or a global explainability algorithm. As a reminder, this is your super high-level overview of how your model is giving predictions. And the first algorithm we're looking at is called Shapley. And the goal of Shapley is to explain the or I guess contribute the contribution of each feature to the model. In the very simplest terms, Shapley tells me what inputs are important and how important they are.
So there's some pros and cons to Shapley. And the first pro that is really important is it's one of the few algorithms that can be used globally. Again, not many algorithms are able to be global. There is kind of like a hack because tree-based models are kind of fast. But if you're not wandering through a random forest, it's going to take a long time to train this. So I could see myself using Shapley to ask questions to my model such as, what are the most important features my model is using to predict MRSA? Would it be age, location of admission, or some sort of specific diagnosis?
So if I'm looking for a less computationally expensive explainability algorithm, I might be looking into more local algorithms. And we'll look at a few different algorithms, but we're going to start at, and surprise, yay, Shapley values can also be used locally. The benefit of this approach is that local versions of this algorithm are a lot quicker. But the negative side, hopefully, is that this is the number one favorite explainer for our evil data scientists. Research has shown that it is possible to hide bias in Shapley with adversarial attacks. This really isn't a big concern for those who are attempting to use this algorithm truthfully, but it could cause mistrust with interpreters of Shap explanations. You'd always have to walk into looking at a Shapley explanation with the possibility that it's fake in the back of your head.
And again, Shapley is still kind of expensive to run. It's still computationally robust. So for those looking for a less computationally expensive technique, local interpretable model agnostic explanations, and that is such a mouthful, you understand why they shorten it to LIME, is an algorithm where you assume linearity around a particular instance and create a simpler model. The key intuition behind this one is that it's much easier to approximate a black box model by using a simple model locally in the neighborhood of the prediction we want to explain as opposed to trying to approximate the entire model globally.
So it creates a local version of a linear model because Y equals MX plus B is the gift that your middle school teacher just keeps giving. In this case, it's our middle purple line. We might not have any idea of the equation for our red line, but we understand and can easily interpret Y equals 5X plus 1. And this is really a great moment for local explainability to be able to interpret this instance. LIME is also model agnostic, so it doesn't matter what your underlying model is. As long as you have a prediction function, LIME can be used.
However, there is some danger in using LIME because you don't know how far this explanation holds. We can see that there's endpoints in this image, but the fit is not always as straightforward as this graph may lead you to believe. There is fear that you can be overconfident in your explanations in LIME, and you could have misleading conclusions for unseen but similar instances.
Anchors actually work a lot like LIME. They both proxy local behavior of the model in a linear way. But remember that LIME broke because we didn't know to what extent the explanation held up. And anchor kind of steps up to the plate and incorporates coverage. I like to think of this coverage as going from a one-dimensional line in LIME to a two-dimensional area, which we can see since our small purple line has turned into a more robust rectangle that's implementing coverage. Anchors explain individual predictions of any black box classification model by finding a decision rule or set of features or range of features that anchors the prediction sufficiently.
Anchors main selling point is that they address the shortcomings of LIME. But the boundaries are still approximate and occasionally don't even exist, in which case anchors become no different than LIME, and they still carry the same pros and cons. In the context of my MRSA exploration, I could see myself using anchors to see what ranges of different features factor into a MRSA positive prediction. For example, an anchor might be able to tell me that the output is like ages 20 to 30 are at a 20% risk of contraction.
So LIME, SHAP, and anchors all look at what features are important and look at interpretable models for a particular data point. But contrastive explanation methods kind of shift gears and focus more on cause and effect. So you can test how your model output can be changed for each instance. Contrastive explanation methods focus on explaining instances in terms of pertinent positives and pertinent negatives. Pertinent positives refer to features that should be minimally and sufficiently present to predict the same class as the original instance. And pertinent negatives identify which features should be minimally and necessarily absent from the instance in order to maintain the original prediction class.
Kind of a mouthful, but humans really think in contrastive explanation methods naturally. If I was trying to get my partner to locate me at a restaurant, I might say that I have a hat on, but I'm not wearing red. If I wanted to have an explanation for this graph, my CEM algorithm would tell me that the star is light purple because Y is greater than 10, but X is less than 12.
While CEM is useful for human understanding, they get less useful when the classes aren't similar. And this is kind of intuitive. You can distinguish clear yet subtle differences between a cat and a dog, but the differences between a cat and the flu are so numerous that it doesn't even seem logical to list them all out. If my CEM output is that all people who contract MRSA are admitted through the emergency room, but did not show signs of respiratory distress, the location would be a pertinent positive and symptoms would be a pertinent negative.
Counterfactual explanation is the minimum possible change required to generate the desired output. So kind of think if X had not occurred, Y would not have occurred. In our example here, if I wanted to change my light purple classification to dark purple, the star would have had to move down at least that set amount. On the positive side, counterfactuals don't require access to the data or even the model. They only need the predict function to generate outputs. But the cons for counterfactuals are similar to CEMs. Oftentimes there are so many ways to change the output. In fact, there's so many ways to change the output that it's no longer useful to list all of them out. A counterfactual that I might uncover with a patient is maybe someone who was admitted through the emergency room would not have acquired an infection if they had entered through urgent care instead.
Demo: running explainers on the MRSA model
And that's just a few of the many possible algorithms, but I'm ready to test a few out. And I have just a small sample here of the many different open source Python libraries that are available. Not all libraries have the functions we went over, so it's good to do some research for what best suits your needs. If you see the link at the bottom, there's a really cool GitHub repo called ethical ML that has a giant list of all these different open source libraries you can use. And today in my demo, I'm going to be using Alibi Explain, since it contains a handful of algorithms that I'm interested in implementing.
So let's hop over to my notebook. And just to give you a little bit of background of where I'm at is I'm using something called Operate First, which is an open source production grade OpenShift environment. So anyone who is wanting to type in odh.operatefirst.cloud, you're able to pull up your Jupyter notebooks and demo everything here. You can clone my GitHub repo and play with my code yourself.
And we can see here that I start out doing my super normal data science stuff. I get to import my libraries, split into train and test sets. I'm loading my model, I'm fitting my model, building predict functions. And I'm going to be doing SHAP to start out with, and it'll be the local version first. I'm going to take a pause here. And first of all, it's my call to action. You want to see if I'm an evil data scientist or not, you can play with my code yourself. But also, we want to look at this. I'm only using 108 data samples, and it's already warning me about having slower run times. SHAP is quite slow.
So once I fit my SHAP explainer, we see it's coming from a black box model. It's going to be doing some classification, and it'll be at a local and global level. There's also some other parameters you're more than welcome to play around with if you desire. And I'm not running this live. Again, this is quite long to sit and watch load. So we're going to jump right into the visualizations.
And this is our local visualization. And how you would interpret this force plot is we start at our base value. If the final output is below the base value, the patient would be MRSA negative. If the final value was above the base value, the patient is MRSA positive. And we can see that here. And each one of these bars is a feature. And you can see the features are so small on this side, it doesn't even give a name. But the larger the bar, the more important the feature was. And we're going to see that for the classification of MRSA positive, the most important features are ethnicity and admission location and ethnicity and gender.
And when I first ran this model, I kind of had some data science red flags coming up. I was like, I had done pretty basic feature engineering. I hadn't done anything weird with my model. So it didn't quite make sense why ethnicity and gender were three of the top four features for this model's output. So I wanted to make sure that maybe this was a fluke. Maybe this is just the one patient. I wanted to look at a global level.
And this is my SHAP output at a global level. And we can see here that each one of the points is a particular instance or a particular patient. These are all the top features. The higher up on the list, the more important the feature was for creating predictions overall for the model. And we see here that, once again, we have ethnicity, gender, and ethnicity. And this was my accidental and now on purpose call to action to kind of check out your models every now and then. Because this was not what I had expected. If I was in the healthcare world, I might pull in a subject matter expert to see if this was right. As a non-expert, I wouldn't think that ethnicity and gender are the most important factors in receiving an infection. But this is what my model's output was.
So it's a great example for why explainers can expose bias that was completely unintended. My accuracy was incredible. And it looked like this model could be ready to be put into production. But my explainers say, slow down there. Take a pause. And look at what's happening with your data.
So it's a great example for why explainers can expose bias that was completely unintended. My accuracy was incredible. And it looked like this model could be ready to be put into production. But my explainers say, slow down there. Take a pause. And look at what's happening with your data.
Recap and Q&A
So to recap, models are complex. Explainable machine learning is necessary to close the gap between the machine goal of output and the human goal of understanding. And we need explainability when it's not enough to just get the prediction. The model must also justify how it came to the prediction. Explainability helps us understand, though, why of models.
To recap, black boxes are models where you input data and some magic occurs and you get an output. They're really great for speed running a data science workflow and making really high performing models. But they're less great when you're trying to understand what's happening in the some magic occurs portion of the black box. Explainability algorithms help crack open the black box and peer in to see how decisions are being made.
To recap, we have algorithms that focus on feature importance. Such as Shapley, which gets a gold star for being able to aggregate feature importance both locally and globally. We have Lime, which creates a simple and easily understood y equals mx plus b to explain a local instance. And anchors, which does everything Lime does, but better. To recap, we also learned about algorithms that take a look at the cause and effect view on a model. Such as contrastive explanation methods and counterfactuals.
To recap, explainability algorithms are just, they're just really cool. And they can be very insightful when you use them carefully. I hope you're all curious enough to try an explainability algorithm or two on your models. And I'm excited to hear about your new insights and experiences. So thank you all for your time.
Yes, Isabel, thank you for this great talk. I think that was very interesting. A lot of insights, a lot to learn.
Yes, there's someone in the chat. Andrew asks, how to evaluate explanation? So that is a very good question. There's an incredible book that I've read like three times now that's called Interpretable Machine Learning. And it's all online. And I recommend looking into that. A lot of it is, you can implement something called trust scores, which is kind of explainability for explainability. And you bring up a really good point that it is kind of an interesting paradox that we're using black box explainability models to look at black box regular models. So there's still a lot into production. This is fairly new. So I don't have a strong, we look at accuracy answer for you. But there are kind of additional steps to look at the robustness of your explanation.

