Claus Wilke | Visualizing uncertainty with hypothetical outcomes plots

Transcript#

This transcript was generated automatically and may contain errors.

I'm going to talk about visualizing uncertainty with hypothetical outcome plots. I just tweeted my slides so you can find me on Twitter at Klaus Wilke, very easy to remember, just my first name and my last name. And I'm going to talk about, it's kind of an experimental package which you can find here at this location and again if you find the slides on Twitter that will be the easiest to find all of that.

Okay, so to motivate that, let's look at this extremely original plot. It's of cars, you may have heard of it. We did not coordinate this. So fuel efficiency miles per gallon in this case plotted versus a displacement of the engine and you see it goes down and I fitted a nonlinear model here and I'm sure you have seen plots like that. So you have this line that was fitted and then there's this gray band and that tells us something about the uncertainty of that fit, right? Every time we model something there's uncertainty and we are used to displaying it like this but what does that actually mean, right? The truth is most people don't know, like I don't really know, nobody knows but it has to be there because if it's not there you get complaints, right? What's your uncertainty? Oh there's a band, okay now I know.

Okay, what that really means is that I mean this line is just one of multiple ways that we could have fitted it and so there's really an ensemble of different possible lines, we could plot fitted draws and that's not bad. The fundamental problem though is if we have the fitted draws, okay there's the line that goes here and then there's a little gap here so does that mean the line would never go here, right? Can the line ever go here? So you don't know, right? And the problem fundamentally with uncertainty is that you don't know but when you see a plot it looks certain, right? It's really difficult to plot uncertainty because the plot is kind of certain but it's not certain.

And so there was this idea that was popularized very recently of the hypothetical outcome plot was this paper plus one 2015 by Jessica Holman which is to actually animate to cycle through different potential fits and that to some extent reduces this because it's not static it's more clear that these are alternatives and you don't really exactly know what it would be but it's something like that, right? So it's a much more intuitive way of thinking or of experiencing the uncertainty.

So it's a much more intuitive way of thinking or of experiencing the uncertainty.

Okay so last August Hadley tweeted hypothetical outcome plots are a great way of communicating uncertainty to non-experts and then he put forth this challenge. Someone should make an RStats package to make these easier to create. Would be easy on top of gganimate. Famous three words. Okay so I was thinking I've done a lot of work with ggplot2 lately and I really like gganimate and I thought well maybe I can do that, let's see. And so what I'm talking about today is really my discovery of what are the useful things that maybe a package could contribute in this world.

Okay so there's really three questions that Hadley's challenge poses. So the first one is how do we generate the outcomes? It's many different ways that they could be generated so it's not trivial. Once we have them how do we get them into ggplot2? I'm just assuming we do this with ggplot2, right? gganimate though you of course you could use some other platform also if you wanted to. And the last one is is there anything to be done? And you'll see in a second why that is actually a meaningful question.

So is there anything to be done? Immediately after Hadley tweeted this there was this following response. You can make them easily with tidybase and gganimate. The hop examples in this talk were done that way. I'm adding examples to the tidybase vignettes when gganimate hits crown. Okay done. Problem solved. Well okay so this goes to how do we generate outcomes, right? So we can do Bayesian MCMC sampling. That's what the tidybase does and that's great but not everybody is a Bayesian, right? Or maybe in some models it just takes too long to do MCMC sampling or whatever the reason is you may want to do other things. So maybe you want to bootstrap, right? Resample the input data or maybe you want to fit a regular regression model and then just sample from the normal approximation to the uncertainty distributions that you get, right?

So yeah there's two research papers as far i'm aware so it's a really active research in the two papers that have been published in in direct comparisons where people had to judge like how uncertain something is from a hop versus some other visualization of uncertainty generally people get a better sense of the uncertainty from the hop and people are better at actually judging accurately what the chance for example is.

Can you can you repeat have confidence bands with animations the standard error around the so for instance the the top right graph you had the all the possible lines can you also add the confidence bands to that? Yeah i mean it's it's just a plot right so you just layer them on top of each other so you could use so with the confidence band there's a little subtle problem in the sense that if you use a geom smooth to draw the confidence band it uses slightly different math from my geom stat smooth have to do a little more work but it's like three additional lines and it's shown in the in the vignettes that i wrote for the package i mean yeah you just layer plot layers on top of each other and you show whatever you want to show yeah.

Thanks uh really cool talk just wondering has anybody asked you about maybe incorporating p values into this you know bootstrapping visualization and if somebody were to ask you about it how would you respond.

Uh so in my in my day job i'm a scientist i'm a biologist i've seen lots of figures in my life i've never seen a p-value visualization that i thought was useful or credible so i really don't know i mean that doesn't mean it's not possible it's just i've never seen it i don't know how to visualize p values in a way that is useful.

Claus Wilke | Visualizing uncertainty with hypothetical outcomes plots | RStudio (2019)

Transcript#

Bootstrapping with ggplot2

Sampling example: chocolate bar ratings

Normal approximation and animation styles

Q&A

Featured software#

ggplot2

rstudio