Resources

Bryan Shalloway - Understanding, Generating, and Evaluating Prediction Intervals

For many problems concerning prediction, providing intervals is more useful than just offering point estimates. This talk will provide an overview of: - How to think about uncertainty in your predictions (e.g., noise in the data vs uncertainty in estimation) - Approaches to producing prediction intervals (e.g., parametric vs conformal) - Measures and considerations when evaluating and training models for prediction intervals While I will touch on some similar topics as Max Kuhn’s posit::conf(2023) talk on conformal inference, my talk will cover different points and have a broader focus. I hope attendees gain an understanding of some of the key tools and concepts related to prediction intervals and that they leave inspired to learn more. Talk by Bryan Shalloway Slides: https://github.com/brshallo/posit-2024/blob/main/shalloway-posit-conf.pdf GitHub Repo: https://github.com/brshallo/posit-2024

Oct 31, 2024
18 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, so NPR's Planet Money did an episode about a month ago on some of the problems with how residential solar companies operate. And there's this excerpt about a couple who had signed a 20 year lease on solar panels, and it says that they had used about half as much electricity from their power company as the previous year. And at first, this sounds pretty good until you realize that the solar company had promised these new panels to replace all of their power.

So this represents clearly a poor prediction in terms of how much energy was going to be consumed or how much energy was going to be produced by the panels. But it's not just bad because it's inaccurate. It's also bad because the solar company hadn't provided any notion of the range of potential outcomes that might come from installing these panels. They provided the couple with a very narrow picture about what would happen. And then when that wasn't met, the couple was, you know, reasonably upset.

And this might seem like a somewhat contrived example, because you might just think, okay, the solar company was intentionally misleading this couple. But I imagine a lot of people in this room that have been in situations where they provided predictions that maybe didn't have measures of uncertainty with them. And then the stakeholder that received them was upset or surprised when things didn't go how those predictions had suggested they would.

So today I'm going to be talking about some of those situations where it's really important to provide measures of uncertainty. And I'll also be talking through a procedure that works pretty well in regression context where you are predicting some continuous variable.

When prediction intervals matter

Okay, so there are some situations where your predictions are going to lead directly to some outcome. So you might picture something like spam filters, where they will sort the junk out of your inbox without any work on your part. But there are also many, many situations where your predictions are going to go to some intermediary agent. And that intermediary agent is going to ultimately make some decision. So maybe you're doing climate forecasts that are going to be used in planning solar farms. Or maybe you are predicting transit ridership that's used by a city planner in determining bus routes. Or maybe you're like me and you do forecasts on sales figures that are then helpful in operations and planning.

In each of these cases where your model is providing information to some agents, it's often important to provide more than just a prediction, but to give some additional context on that prediction that can be helpful in the greater decision making for your stakeholders. Now there's lots of different types of information you might provide. You might provide information about how appropriate is this model for the particular observations you're looking at. Or you might provide information that gets at what is driving the prediction that you're actually providing. What I'm going to be talking about today though is measures of uncertainty. And I'm going to suggest that rather than providing a point estimate for your prediction, you often want to provide a range within which your prediction is going to fall.

So to illustrate this latter point, I'm going to talk through a couple different models for pricing. So let's think about something like commodities. Commodities are relatively simple products. You've got mature markets with reasonably full information. So the price for a commodity at any given moment is typically going to fall within some relatively narrow band. If we compare this to something like used cars where you have a really complicated product and potentially gaps in information or, you know, intense customizations associated with that product, the uncertainty in what an appropriate price is going to be a lot greater. And this is part of why buying used cars is notoriously this exercise in haggling.

It's this greater uncertainty that gives greater agency to the downstream agent in coming to that final price. But we also see that as models improve, as you get more information, the role of those intermediary agents can be reduced. So we see things like Carvana or other online platforms that lean more on their prediction and reduce the role of those intermediary agents. So my point with this is just to say that measures of uncertainty are helpful both in informing stakeholders, but also can be helpful in defining the bounds within which we want those stakeholders to actually operate.

Confidence intervals vs prediction intervals

So up to now, I've just been showing prediction intervals at a single outcome. But in the context of predictive modeling, we often want to show prediction intervals across, you know, a range of potential inputs, right? And I want to talk briefly too about this confusion that often happens in the context of prediction between confidence intervals and prediction intervals. This is just something that I feel like comes up almost every time I talk about prediction intervals, so I just want to spend a little time talking about it. As people often confuse these two and will frequently output confidence intervals when what they mean to be doing is outputting prediction intervals.

So confidence intervals mostly come from uncertainty in fitting the model. And they represent intervals related to the average across the entire population at some point. And confidence intervals are really useful in contexts where you're evaluating coefficients or doing other types of statistical tests. But in our context, we're more interested in prediction intervals. Prediction intervals are driven by uncertainty in the sample and reflect the variability in individual outcomes.

So just to kind of, like, iron, you know, knock this message home about the types of phrasing that is commonly used between these two types of intervals, if we think of this as a model of, let's say, baby weight by age, and we want to describe confidence intervals, you might have a statement like, we're 90% sure that the average weight among all three-month-olds is between 12 and 12 and a half pounds. This is kind of like a confidence interval-like statement. Whereas for prediction intervals, you'd say something like, we're 90% sure that the weight of an individual three-month-old will be between 10 and 14 pounds. This first statement is about making global claims about the world. We're saying, what is the average weight across all three-month-olds? Where the latter represents a measure of uncertainty on an individual observation. We're saying, what's an appropriate weight for some given three-month-old? And when we're outputting predictions, we generally care about these latter types of statements.

And when we're outputting predictions, we generally care about these latter types of statements.

Outputting and evaluating prediction intervals

So let's talk briefly about outputting predictions. Some types of models, like linear regression, come set up to easily output prediction intervals already. In the predict function, here I have this ellenfit object, which is just the model I already fit. And I'm giving it some new set of observations, and I'm saying that I want to output prediction intervals, and I want them to be 90% prediction intervals. And what this is going to do, then, is output lower and upper bounds for each of these observations that together constitute my prediction interval.

I just want to make a note, too, that here I'm specifying predint. The default for a lot of packages, if you don't specify the type there, is going to output constant rules, whereas, again, what you want in this context is often prediction intervals. So just be careful that you're actually outputting the right thing. When you're evaluating prediction intervals, it's often helpful to visualize them. So in this context, I've built a model on price. And here I have some new set of observations that I'm producing prediction intervals on that the model never saw. Each of those teal bands represents a prediction interval for some observation. And what the red point represents is the outcome, is the actual outcome that occurred.

And kind of like the baseline way that we use for evaluating prediction intervals is in terms of coverage. So we can say that an individual observation is covered if the outcome actually fell within our prediction interval. So in this case, if we are outputting 90% prediction intervals, that means that 90% of the time, our outcome should actually fall within those bands. So this is kind of like the baseline measure that we want to use. We want to see, is our coverage rate actually what we're expecting it to be? But we don't just care about coverage. We also care about the interval width. Because if we have more narrow intervals, that means our model is doing a better job fitting the data. And having less uncertainty in your predictions is going to be more useful for whatever stakeholders you're sharing them with. So we have kind of this dual goal of meeting our coverage requirements while also having as narrow intervals as possible.

Weaknesses of simple linear regression for prediction intervals

But simple linear regression, the prediction rules that get outputted from this type of model have a number of weaknesses that go with them. One is that there's no guarantee of coverage on out-of-sample data. So it's not necessarily the case that if we give a new set of observations to our model, that those observations are going to fall within our bands at the rate with which we expect them to. Another weakness is that as we're trying to fit other types of models to try and improve our fit, lots of these different model types don't actually output prediction intervals by default. So we are somewhat constrained in terms of the types of fits we can build.

And then also, these types of prediction rules from simple linear regression come with a lot of assumptions. A big one is that they assume constant variance across observations. But if we think back to the example with used cars, you might think that a car that has 10,000 miles on it is going to have a lot less uncertainty in terms of what the potential price range is compared to a car with 100,000 miles on it that may have a lot more uncertainty in terms of what the eventual price is.

Conformal prediction and conformalized quantile regression

So what we want then is some type of model and procedure that is going to give us some guarantees around coverage. It's going to be relatively model agnostic so we can use different types of fitting procedures. And it's going to be assumption-free, or at least mostly assumption-free. So these are kind of like what we want in our procedure for producing prediction intervals. And to achieve these, we're going to turn to this field called conformal prediction. And this is a field of study that's committed to producing good measures of uncertainty with minimal observations. And I'll just say that conformal prediction is very popular right now. There's lots of different procedures you can use. And it can be used also in classification contexts and different types of problem spaces. But today, I'm going to be talking about it for regression contexts where we're predicting some kind of continuous variable. And in this case, I'm going to be talking through a procedure called conformalized quantile regression that is a good heuristic to use in these situations.

And the two main steps of this procedure is essentially to first fit lower and upper bounds, and then to adjust those bounds using calibration data. So to talk through this in a little bit more detail, I'm first just going to talk about kind of the more traditional approach for producing prediction intervals using, you know, the simple linear regression method I was talking about before. So kind of the classical approach for producing prediction intervals is you'll start with some model that was fit on the expected value. Then to get your prediction intervals, what you're actually doing is you're just taking that expected value, and to get your upper bound, you're adding some measure related to the average error. And then to get the lower bound, you're taking that kind of like average error measure and subtracting it from your expected value. So, you know, the expected value plus or minus these kind of average error measures give you that interval.

This is different from the procedure that you follow when using quantile regression. So in quantile regression, you don't actually even need the model fit based off of expected value. What you're doing is you're fitting models directly on those lower and upper bounds. So let's say we want prediction intervals that are 80% prediction intervals. In this case, our lower bound is a model that's actually being fit to the 10th percentile data, and our upper bound is the models we fit to the 90th percentile data. So in this case, rather than saying, okay, give me a model for the expected value, we're saying give me two models, one for my lower bound and one for my upper bound, and I'm just directly predicting those bounds.

This offers some advantages here because it gives a little more flexibility. As you can see in this case, we're no longer constrained to having kind of constant variance across our inputs. That's one area of flexibility. Also, you can fit lots of different models that are able to be trained based off of this quantile output. So this kind of opens up the types of models you can use as well. So you kind of get this ability to fit the quantiles expands what you can actually do in terms of how adaptive your prediction intervals can be. So that's the first step, is you just start basically training based off the lower bound and the upper bound.

The next step is to just pass in some out-of-sample data that you're going to use to calibrate these bounds. So it's common that your initial bounds may be a little bit too narrow. So I pass in this next set of data. It's going to let you calibrate some adjustment on those bounds. So here, we'll take each of these observations on our calibration data set, and we'll score them based off of if they're covered and their distance from the nearest bound. And then if we're fitting an 80% prediction interval, we'll take the outcome that falls at the 80th percentile in terms of distance from the nearest bound, and then we'll use that score as a concept to move the upper and lower bounds to expand the upper and lower bounds. And now we have these conformalized quantiles that have our coverage guarantees. So now we have a flexible, adaptive approach that also has these coverage guarantees that we want.

So now we have a flexible, adaptive approach that also has these coverage guarantees that we want.

In terms of implementations of this approach, in R, the probably package has a bunch of really useful tools for doing conformal inference. Max Kuhn gave a talk on this last year that I really recommend checking out. In terms of this conformalized quantile regression method, there's this function int conformal quantile that is an implementation of this with random forests. In Python, you can use the maybe package. And then I also really recommend checking out the article as well as associated YouTube videos on a general introduction to conformal prediction if you want to get a little bit more detail on these procedures and also where you can use them in different contexts. So my hope is that you continue to develop your tool set and ability to share measures of uncertainty when sharing predictions with stakeholders. Thank you.

Q&A

Thank you very much. So again, questions on Slido, please. So we're going to have you unplug so the next person can. And then we're going to move over for our Q&A.

So how would conformal prediction work or would it work well on count data and other distributions with strict boundaries?

So I haven't done, I haven't used conformal prediction a lot in the context of count data. I will say that like the big assumption that conformal prediction has is there's still this exchangeability assumption. So essentially just that your data is, it's similar to that your data's IID. That's kind of the biggest option is you don't have to look out for. But I haven't done anything with, I haven't used it for count data. So I don't want to comment exactly and where it fits in there.

Okay. You just mentioned data. We also have a question on this. What are you using as the calibration data set? Are you splitting the data and using it like train and test data or how does it? It's basically in this context, for the example I gave, you just have another holdout data set that you essentially use for calibration. So picture your train, your calibration data, and then your test data. So it's just like an additional calibration data set. There's another approach to conformal prediction where you don't need to have this separate calibration data set, but that procedure takes a lot longer. But if you want to look into, it's called full conformal inference or it's sometimes called transductive conformal inference. So if you want to like search those terms and that's a different procedure. But here I'm just using this like split conformal procedure, which is kind of the most common. It's the fastest and it's kind of the easiest to set up.

What kind of guarantee is the coverage guarantee? So the coverage guarantee is essentially just that if you say, okay, I want, you know, 90% prediction intervals to actually have coverage rates that are 90%. That's what the guarantee is. In, in terms of, I think the question might be asking, there's this notion of marginal coverage versus conditional coverage. Marginal coverage basically just means that you have the 90% is across your data sets. But it's not necessarily local to each region. So some areas you might have like 95% coverage, other places you might have like 85% coverage, but it's going to vary. Conditional coverage is basically that you have this coverage rate more consistently across your dataset. The coverage guarantee from conformalized quantile regression is just as a marginal guarantee, but it tends to actually have decent conditional coverage because the quantile regression procedure has that flexibility built in. So it's like only as the guarantee for marginal coverage, but it usually works pretty well in terms of getting decent conditional coverage as well.

Thank you. So we're basically at time. So please give up a hand for Brian.