Bryan Shalloway - Understanding, Generating, and Evaluating Prediction Intervals

Transcript#

This transcript was generated automatically and may contain errors.

Okay, so NPR's Planet Money did an episode about a month ago on some of the problems with how residential solar companies operate. And there's this excerpt about a couple who had signed a 20 year lease on solar panels, and it says that they had used about half as much electricity from their power company as the previous year. And at first, this sounds pretty good until you realize that the solar company had promised these new panels to replace all of their power.

So this represents clearly a poor prediction in terms of how much energy was going to be consumed or how much energy was going to be produced by the panels. But it's not just bad because it's inaccurate. It's also bad because the solar company hadn't provided any notion of the range of potential outcomes that might come from installing these panels. They provided the couple with a very narrow picture about what would happen. And then when that wasn't met, the couple was, you know, reasonably upset.

And this might seem like a somewhat contrived example, because you might just think, okay, the solar company was intentionally misleading this couple. But I imagine a lot of people in this room that have been in situations where they provided predictions that maybe didn't have measures of uncertainty with them. And then the stakeholder that received them was upset or surprised when things didn't go how those predictions had suggested they would.

So today I'm going to be talking about some of those situations where it's really important to provide measures of uncertainty. And I'll also be talking through a procedure that works pretty well in regression context where you are predicting some continuous variable.

When prediction intervals matter

Okay, so there are some situations where your predictions are going to lead directly to some outcome. So you might picture something like spam filters, where they will sort the junk out of your inbox without any work on your part. But there are also many, many situations where your predictions are going to go to some intermediary agent. And that intermediary agent is going to ultimately make some decision. So maybe you're doing climate forecasts that are going to be used in planning solar farms. Or maybe you are predicting transit ridership that's used by a city planner in determining bus routes. Or maybe you're like me and you do forecasts on sales figures that are then helpful in operations and planning.

In each of these cases where your model is providing information to some agents, it's often important to provide more than just a prediction, but to give some additional context on that prediction that can be helpful in the greater decision making for your stakeholders. Now there's lots of different types of information you might provide. You might provide information about how appropriate is this model for the particular observations you're looking at. Or you might provide information that gets at what is driving the prediction that you're actually providing. What I'm going to be talking about today though is measures of uncertainty. And I'm going to suggest that rather than providing a point estimate for your prediction, you often want to provide a range within which your prediction is going to fall.

So to illustrate this latter point, I'm going to talk through a couple different models for pricing. So let's think about something like commodities. Commodities are relatively simple products. You've got mature markets with reasonably full information. So the price for a commodity at any given moment is typically going to fall within some relatively narrow band. If we compare this to something like used cars where you have a really complicated product and potentially gaps in information or, you know, intense customizations associated with that product, the uncertainty in what an appropriate price is going to be a lot greater. And this is part of why buying used cars is notoriously this exercise in haggling.

It's this greater uncertainty that gives greater agency to the downstream agent in coming to that final price. But we also see that as models improve, as you get more information, the role of those intermediary agents can be reduced. So we see things like Carvana or other online platforms that lean more on their prediction and reduce the role of those intermediary agents. So my point with this is just to say that measures of uncertainty are helpful both in informing stakeholders, but also can be helpful in defining the bounds within which we want those stakeholders to actually operate.

Confidence intervals vs prediction intervals

So up to now, I've just been showing prediction intervals at a single outcome. But in the context of predictive modeling, we often want to show prediction intervals across, you know, a range of potential inputs, right? And I want to talk briefly too about this confusion that often happens in the context of prediction between confidence intervals and prediction intervals. This is just something that I feel like comes up almost every time I talk about prediction intervals, so I just want to spend a little time talking about it. As people often confuse these two and will frequently output confidence intervals when what they mean to be doing is outputting prediction intervals.

So confidence intervals mostly come from uncertainty in fitting the model. And they represent intervals related to the average across the entire population at some point. And confidence intervals are really useful in contexts where you're evaluating coefficients or doing other types of statistical tests. But in our context, we're more interested in prediction intervals. Prediction intervals are driven by uncertainty in the sample and reflect the variability in individual outcomes.

So just to kind of, like, iron, you know, knock this message home about the types of phrasing that is commonly used between these two types of intervals, if we think of this as a model of, let's say, baby weight by age, and we want to describe confidence intervals, you might have a statement like, we're 90% sure that the average weight among all three-month-olds is between 12 and 12 and a half pounds. This is kind of like a confidence interval-like statement. Whereas for prediction intervals, you'd say something like, we're 90% sure that the weight of an individual three-month-old will be between 10 and 14 pounds. This first statement is about making global claims about the world. We're saying, what is the average weight across all three-month-olds? Where the latter represents a measure of uncertainty on an individual observation. We're saying, what's an appropriate weight for some given three-month-old? And when we're outputting predictions, we generally care about these latter types of statements.

And when we're outputting predictions, we generally care about these latter types of statements.

So now we have a flexible, adaptive approach that also has these coverage guarantees that we want.

In terms of implementations of this approach, in R, the probably package has a bunch of really useful tools for doing conformal inference. Max Kuhn gave a talk on this last year that I really recommend checking out. In terms of this conformalized quantile regression method, there's this function int conformal quantile that is an implementation of this with random forests. In Python, you can use the maybe package. And then I also really recommend checking out the article as well as associated YouTube videos on a general introduction to conformal prediction if you want to get a little bit more detail on these procedures and also where you can use them in different contexts. So my hope is that you continue to develop your tool set and ability to share measures of uncertainty when sharing predictions with stakeholders. Thank you.

Q&A

Thank you very much. So again, questions on Slido, please. So we're going to have you unplug so the next person can. And then we're going to move over for our Q&A.

So how would conformal prediction work or would it work well on count data and other distributions with strict boundaries?

So I haven't done, I haven't used conformal prediction a lot in the context of count data. I will say that like the big assumption that conformal prediction has is there's still this exchangeability assumption. So essentially just that your data is, it's similar to that your data's IID. That's kind of the biggest option is you don't have to look out for. But I haven't done anything with, I haven't used it for count data. So I don't want to comment exactly and where it fits in there.

Okay. You just mentioned data. We also have a question on this. What are you using as the calibration data set? Are you splitting the data and using it like train and test data or how does it? It's basically in this context, for the example I gave, you just have another holdout data set that you essentially use for calibration. So picture your train, your calibration data, and then your test data. So it's just like an additional calibration data set. There's another approach to conformal prediction where you don't need to have this separate calibration data set, but that procedure takes a lot longer. But if you want to look into, it's called full conformal inference or it's sometimes called transductive conformal inference. So if you want to like search those terms and that's a different procedure. But here I'm just using this like split conformal procedure, which is kind of the most common. It's the fastest and it's kind of the easiest to set up.

What kind of guarantee is the coverage guarantee? So the coverage guarantee is essentially just that if you say, okay, I want, you know, 90% prediction intervals to actually have coverage rates that are 90%. That's what the guarantee is. In, in terms of, I think the question might be asking, there's this notion of marginal coverage versus conditional coverage. Marginal coverage basically just means that you have the 90% is across your data sets. But it's not necessarily local to each region. So some areas you might have like 95% coverage, other places you might have like 85% coverage, but it's going to vary. Conditional coverage is basically that you have this coverage rate more consistently across your dataset. The coverage guarantee from conformalized quantile regression is just as a marginal guarantee, but it tends to actually have decent conditional coverage because the quantile regression procedure has that flexibility built in. So it's like only as the guarantee for marginal coverage, but it usually works pretty well in terms of getting decent conditional coverage as well.

Thank you. So we're basically at time. So please give up a hand for Brian.

Bryan Shalloway - Understanding, Generating, and Evaluating Prediction Intervals

Transcript#

When prediction intervals matter

Confidence intervals vs prediction intervals

Outputting and evaluating prediction intervals

Weaknesses of simple linear regression for prediction intervals

Conformal prediction and conformalized quantile regression

Q&A