Isabel Zimmerman – Explaining model explainability

Transcript#

This transcript was generated automatically and may contain errors.

I am Isabel Zimmerman , and I will be teaching about model explainability today. I am a software engineer at Red Hat's Artificial Intelligence Center of Excellence with their AIOps, so Artificial Intelligence Operations team. I'm also a very recent graduate from Florida Polytechnic University's Data Science program, and of course, the most important personal tidbit, what was my pandemic hobby? I've been spending hours as a homemade chemist to build the absolute perfect recipe for chocolate chip cookies.

So, in my everyday at work, I get to live the dream and play around with code and break code, and I get to spend a lot of time doing data science work as well. And I really stumbled upon this topic of model explainability actually when I was working as an intern. And it's just a topic that's stuck with me ever since, because people oftentimes think that machine learning answers all of our questions.

But in reality, machine learning doesn't even have the same objectives as its users. We'll dive into this mismatch of objectives between models and humans, and we'll talk about how explainability helps to close this gap. But I also want everyone to be able to walk away from today being able to tell their colleagues about what explainability is and why you should be implementing it. And I also want everyone to gain knowledge of just a few algorithms you can add into your own models and where you can add them into your machine learning workflow.

The explainability elevator pitch

So before we go any further, I think it's time to give you kind of the 30 second elevator pitch for explainability. We know a machine learning model is just some sort of algorithm that uses an input of data, such as a cat picture, to give an output of some insight into that data, such as identifying the photo contents. The output of the model really only answers the question of what. In this example, what is this a photo of? A model will tell you that it's a cat. The output of the model told you the contents of the photo, but not the logic behind making the prediction.

Explainability focuses on the why. Why did this model say this was a cat? Was it the eyes, the ears, the paws, the color? So models look to optimize a mathematical function using the given data, and humans look to gain insight into their problems. At best, these two objectives align, but at worst, machine learning models make the front page news for unintended but really astonishing bias. Explainable machine learning is necessary to close this gap between the machine goal of output and the human goal of understanding.

Explainable machine learning is necessary to close this gap between the machine goal of output and the human goal of understanding.

And we need it when it's not enough to just get the prediction from your model. The model must also justify how it came to this prediction.

Black box models can be very dangerous. When you don't know exactly how the model is creating predictions and you're solely relying on evaluation metrics like accuracy to determine if your model is doing well or not, it can be easy to have unintended consequences.

Some common household names for black box models are neural networks or gradient boosts or ensembles. These models give very high accuracy at the expense of not knowing their inner machinations. Beyond your naturally occurring black box models, even some models that are technically interpretable are treated as black boxes. And that's because no data scientist is looking at, you know, a thousand different inputs into a random forest model, even though decision trees are classically labeled as interpretable models.

Our problems and our data are getting more complex, and data scientists have to rely heavily on evaluation metrics such as our accuracy for model building. And we're once again reminded that a model and a human have different goals. Models want to optimize their math problem. Humans want to understand their human problem. For some industries, and especially highly regulated industries such as healthcare or financials, it's not enough to just get the prediction. The model must also justify how it came to the prediction. Explanations can help with this. Machine learning models can only be audited when they can be interpreted. Explanations can help with this, too.

Limits of explainability

So to recap up until now, we know that machine learning models have a different goal than humans. We know that explainability comes after thoughtfully engineered models, and we know that we need explainability because of the increasing complexity of models and the increasing use of black box models. So I'm not an explainability salesperson, even though it might sound like it, and I also want to make sure that you guys understand or you all understand that you do not need these every time you create a model.

First of all, not all models need explainability. Some models are simple. We saw this a few slides back in Y equals MX plus B. We don't need to add a complicated computational effort if it's just not needed. If we have really robust feature engineering or if we have very in-depth subject matter expertise or if we just have a model that we can fully explain, it's not worth our time.

There's another limitation in use cases. Not all use cases need explainability. In all honesty, sometimes you just don't care. This is most noticeable in some types of forecasting or times where the output just isn't really important enough to justify the added complexity of these algorithms. Additionally, there's some use cases where the problem is very well studied and feature interactions are just overall really well understood. And so explainability isn't needed there.

Finally, you need to set very healthy boundaries with your explainability algorithms. Explanations help when you need to justify how a model came to a prediction. Explanations can expose bias. Explanations cannot fix biased models. They do not make changes to the underlying model or data. So you need to really remember what explainability can and cannot do for you.

Healthcare application: MRSA prediction

We've gone through our fine print for the limits of explainability, and it's time to look at an application. As a disclaimer, I don't work for a healthcare company, and I really don't claim to be giving sound medical advice. But in my free time, I'm really interested in the world of healthcare and the hospital ecosystem. I think they have really interesting problems that can really benefit from machine learning especially.

So I looked at this openly available sample data from MIMIC, which is de-identified patient data from a real hospital in the United States. I was particularly interested in better understanding healthcare acquired infections. And these are all infections that patients are not admitted with, but they contract from being in the healthcare environment, you know, just being in a hospital room and all the bacteria that's moving around in there. And I'm focusing in on one healthcare acquired infection called MRSA.

This data has a lot of inputs on patient demographics, such as age, gender, ethnicity, as well as medical information, such as symptoms, diagnosis, length of stay. I did a pretty vanilla feature engineering for this, and I made my model. It's a support vector classifier. And even though that the data set is unbalanced, it's about 90% accuracy, 90% accurate for predicting MRSA. And we can say that my model is performing fairly well.

And it could be, you know, in theory, ready to be in production, putting in production in quotes, because this is a fake in production moment. It's not a perfect example of a very big or highly complex data set or model. But this is our pseudo black box. The inputs are very wide, and I can't see exactly how the model is making these decisions. So it's a good contender for my actual absolute favorite use of explainers, which is the I'm just curious to see what's happening here usage. Curiosity can get you quite far.

Global and local explainability algorithms

My next step in this problem is to look at the contending algorithms for explaining the prediction of these infections. I do have to claim that all the examples I use in the next slides are fictitious and completely hypothetical. But I want to look at this model at a few different levels. And the first level is at a global scope. This is kind of your 5,000 foot view of your model, understanding how the model itself is making decisions. It can answer questions such as which features are important and what's the interaction between these features.

Next we have our local scope. So this level of explainability is all about understanding how the model has made a decision for a particular instance. There are many more local explainers than global explainers. And that's partially because global explainers are very computationally complex and therefore very time consuming and very expensive to run. The lack of global explainers also exists because not all instances have the same weights for each feature. So we can't generalize individual explanations to the whole model.

So maybe a clearer way of saying this is in our healthcare frame of mind, not all patients come in with the same symptoms. You wouldn't want to generalize the symptoms of one patient to the entire hospital since you would lose your ability to have a unique and accurate diagnosis. The same idea applies here. You want to diagnose a particular instance and hope that looking at the smaller area will give better results and explanations to other similar data points that you have. Local explainers can answer questions such as, you know, which features factored most heavily into this classification? And how would the prediction change if certain features changed?

So I'll first look at some or a global explainability algorithm. As a reminder, this is your super high-level overview of how your model is giving predictions. And the first algorithm we're looking at is called Shapley. And the goal of Shapley is to explain the or I guess contribute the contribution of each feature to the model. In the very simplest terms, Shapley tells me what inputs are important and how important they are.

So there's some pros and cons to Shapley. And the first pro that is really important is it's one of the few algorithms that can be used globally. Again, not many algorithms are able to be global. There is kind of like a hack because tree-based models are kind of fast. But if you're not wandering through a random forest, it's going to take a long time to train this. So I could see myself using Shapley to ask questions to my model such as, what are the most important features my model is using to predict MRSA? Would it be age, location of admission, or some sort of specific diagnosis?

So if I'm looking for a less computationally expensive explainability algorithm, I might be looking into more local algorithms. And we'll look at a few different algorithms, but we're going to start at, and surprise, yay, Shapley values can also be used locally. The benefit of this approach is that local versions of this algorithm are a lot quicker. But the negative side, hopefully, is that this is the number one favorite explainer for our evil data scientists. Research has shown that it is possible to hide bias in Shapley with adversarial attacks. This really isn't a big concern for those who are attempting to use this algorithm truthfully, but it could cause mistrust with interpreters of Shap explanations. You'd always have to walk into looking at a Shapley explanation with the possibility that it's fake in the back of your head.

So whereas for the global Shapley explainer, I would see what features were most important for the model in general, or like the hospital in general, I can use local Shap to see what features were most important for a specific patient to be flagged as someone who would likely become MRSA positive.

And again, Shapley is still kind of expensive to run. It's still computationally robust. So for those looking for a less computationally expensive technique, local interpretable model agnostic explanations, and that is such a mouthful, you understand why they shorten it to LIME, is an algorithm where you assume linearity around a particular instance and create a simpler model. The key intuition behind this one is that it's much easier to approximate a black box model by using a simple model locally in the neighborhood of the prediction we want to explain as opposed to trying to approximate the entire model globally.

So it creates a local version of a linear model because Y equals MX plus B is the gift that your middle school teacher just keeps giving. In this case, it's our middle purple line. We might not have any idea of the equation for our red line, but we understand and can easily interpret Y equals 5X plus 1. And this is really a great moment for local explainability to be able to interpret this instance. LIME is also model agnostic, so it doesn't matter what your underlying model is. As long as you have a prediction function, LIME can be used.

However, there is some danger in using LIME because you don't know how far this explanation holds. We can see that there's endpoints in this image, but the fit is not always as straightforward as this graph may lead you to believe. There is fear that you can be overconfident in your explanations in LIME, and you could have misleading conclusions for unseen but similar instances.

Anchors actually work a lot like LIME. They both proxy local behavior of the model in a linear way. But remember that LIME broke because we didn't know to what extent the explanation held up. And anchor kind of steps up to the plate and incorporates coverage. I like to think of this coverage as going from a one-dimensional line in LIME to a two-dimensional area, which we can see since our small purple line has turned into a more robust rectangle that's implementing coverage. Anchors explain individual predictions of any black box classification model by finding a decision rule or set of features or range of features that anchors the prediction sufficiently.

Anchors main selling point is that they address the shortcomings of LIME. But the boundaries are still approximate and occasionally don't even exist, in which case anchors become no different than LIME, and they still carry the same pros and cons. In the context of my MRSA exploration, I could see myself using anchors to see what ranges of different features factor into a MRSA positive prediction. For example, an anchor might be able to tell me that the output is like ages 20 to 30 are at a 20% risk of contraction.

So LIME, SHAP, and anchors all look at what features are important and look at interpretable models for a particular data point. But contrastive explanation methods kind of shift gears and focus more on cause and effect. So you can test how your model output can be changed for each instance. Contrastive explanation methods focus on explaining instances in terms of pertinent positives and pertinent negatives. Pertinent positives refer to features that should be minimally and sufficiently present to predict the same class as the original instance. And pertinent negatives identify which features should be minimally and necessarily absent from the instance in order to maintain the original prediction class.

Kind of a mouthful, but humans really think in contrastive explanation methods naturally. If I was trying to get my partner to locate me at a restaurant, I might say that I have a hat on, but I'm not wearing red. If I wanted to have an explanation for this graph, my CEM algorithm would tell me that the star is light purple because Y is greater than 10, but X is less than 12.

While CEM is useful for human understanding, they get less useful when the classes aren't similar. And this is kind of intuitive. You can distinguish clear yet subtle differences between a cat and a dog, but the differences between a cat and the flu are so numerous that it doesn't even seem logical to list them all out. If my CEM output is that all people who contract MRSA are admitted through the emergency room, but did not show signs of respiratory distress, the location would be a pertinent positive and symptoms would be a pertinent negative.

Counterfactual explanation is the minimum possible change required to generate the desired output. So kind of think if X had not occurred, Y would not have occurred. In our example here, if I wanted to change my light purple classification to dark purple, the star would have had to move down at least that set amount. On the positive side, counterfactuals don't require access to the data or even the model. They only need the predict function to generate outputs. But the cons for counterfactuals are similar to CEMs. Oftentimes there are so many ways to change the output. In fact, there's so many ways to change the output that it's no longer useful to list all of them out. A counterfactual that I might uncover with a patient is maybe someone who was admitted through the emergency room would not have acquired an infection if they had entered through urgent care instead.

Demo: running explainers on the MRSA model

And that's just a few of the many possible algorithms, but I'm ready to test a few out. And I have just a small sample here of the many different open source Python libraries that are available. Not all libraries have the functions we went over, so it's good to do some research for what best suits your needs. If you see the link at the bottom, there's a really cool GitHub repo called ethical ML that has a giant list of all these different open source libraries you can use. And today in my demo, I'm going to be using Alibi Explain, since it contains a handful of algorithms that I'm interested in implementing.

So let's hop over to my notebook. And just to give you a little bit of background of where I'm at is I'm using something called Operate First, which is an open source production grade OpenShift environment. So anyone who is wanting to type in odh.operatefirst.cloud, you're able to pull up your Jupyter notebooks and demo everything here. You can clone my GitHub repo and play with my code yourself.

And we can see here that I start out doing my super normal data science stuff. I get to import my libraries, split into train and test sets. I'm loading my model, I'm fitting my model, building predict functions. And I'm going to be doing SHAP to start out with, and it'll be the local version first. I'm going to take a pause here. And first of all, it's my call to action. You want to see if I'm an evil data scientist or not, you can play with my code yourself. But also, we want to look at this. I'm only using 108 data samples, and it's already warning me about having slower run times. SHAP is quite slow.

So once I fit my SHAP explainer, we see it's coming from a black box model. It's going to be doing some classification, and it'll be at a local and global level. There's also some other parameters you're more than welcome to play around with if you desire. And I'm not running this live. Again, this is quite long to sit and watch load. So we're going to jump right into the visualizations.

And this is our local visualization. And how you would interpret this force plot is we start at our base value. If the final output is below the base value, the patient would be MRSA negative. If the final value was above the base value, the patient is MRSA positive. And we can see that here. And each one of these bars is a feature. And you can see the features are so small on this side, it doesn't even give a name. But the larger the bar, the more important the feature was. And we're going to see that for the classification of MRSA positive, the most important features are ethnicity and admission location and ethnicity and gender.

And when I first ran this model, I kind of had some data science red flags coming up. I was like, I had done pretty basic feature engineering. I hadn't done anything weird with my model. So it didn't quite make sense why ethnicity and gender were three of the top four features for this model's output. So I wanted to make sure that maybe this was a fluke. Maybe this is just the one patient. I wanted to look at a global level.

And this is my SHAP output at a global level. And we can see here that each one of the points is a particular instance or a particular patient. These are all the top features. The higher up on the list, the more important the feature was for creating predictions overall for the model. And we see here that, once again, we have ethnicity, gender, and ethnicity. And this was my accidental and now on purpose call to action to kind of check out your models every now and then. Because this was not what I had expected. If I was in the healthcare world, I might pull in a subject matter expert to see if this was right. As a non-expert, I wouldn't think that ethnicity and gender are the most important factors in receiving an infection. But this is what my model's output was.

So it's a great example for why explainers can expose bias that was completely unintended. My accuracy was incredible. And it looked like this model could be ready to be put into production. But my explainers say, slow down there. Take a pause. And look at what's happening with your data.

So it's a great example for why explainers can expose bias that was completely unintended. My accuracy was incredible. And it looked like this model could be ready to be put into production. But my explainers say, slow down there. Take a pause. And look at what's happening with your data.