Insights in 5-D! (Using magic small-multiples layouts) - posit::conf(2023)

Presented by Matt Dzugan Using Small-Multiples (faceted graphs) is an effective way to compare patterns across many dimensions. In this talk, I'll walk you through some ways to lay out your individual facets according to the underlying data. For example, maybe each facet represents a city or point on a 2D plane - we'll explore ways to organize facets in a grid that mimics the data itself - unlocking your ability to explore patterns in 4+ dimensions. Other solutions to this problem rely on manually-curated lists that map common layouts to a grid, but in this talk, we'll explore solutions that work on EVERYTHING. I'll show you how to incorporate this technique into your viz and how I built the libraries since there are some interesting data science concepts at play. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Lightning talks. Session Code: TALK-1174

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Cool. Thank you, everybody. I am here to present about Insights in 5-D. We have a little bit of a wizard and magic theme, just like the magic that got the slides onto the screen.

So I'm going to talk through small-multiple layouts. My name is Matt Dugan. I've never worn one of those hats, but I did on this slide. I am the director of data at a PR software company called Mukrak, and I live not far from right here.

Introducing the layout problem

Before we get started, or kind of as an intro, we'll start off with a question. I want you guys to look at this visualization. I did not create it. This is a great visualization from Twitter. I want you to tell me what you learn from this graphic. I realize it's a little small, but those are each from the early days of COVID. They're different U.S. states, and you can see how they're progressing, I think, in the first three or four months of the pandemic. So you can think to yourself, what sort of things do you learn from this plot? Look at the different states.

But then if I apply some of my wizard magic and just rearrange where the states are, you might learn something different, or you might find it a little more easy to walk away with some insights from this. And let's just think about quickly to get to the clickbait title, 5D. We have time on one axis. We have number of cases on another axis. Then latitude and longitude are sort of at play in here as another couple of dimensions. There's also color. So I promise we are in 5D here a little bit.

Now, this is actually using a package called Geofacet, which some of you may have been familiar with. But I want you to walk away with one thing. The layout of those little facets, it matters. It helps make it more easy for your audience to walk away with some insight.

The layout of those little facets, it matters. It helps make it more easy for your audience to walk away with some insight.

So we're going to be talking about these layouts. But before we get into it, I want to do a quick recap on facets and small multiples in R and how we interact with these. So let's just revisit our favorite iris data set. Here's a simple scatterplot of sepal length versus sepal width. I don't know what those are, but we all know the data well. I can use what's called facet wrap, which will give me one little panel or one facet, one little graph for each of the iris species. Those are also three words that I've never said out loud in my life, and today will not be the first day that I do that.

Then there's also facet grid, which is a little different. It relies on categorical data. In this particular case, I added columns called pedal, what do I call it here? Like the pedal width class, and it's splitting up into cases where the pedal is wide or thin. This is in order to use facet grid, you have to have these categorical features like this.

Now, what I want to do is I want to use a little bit more of an exciting data set, kind of something a little bit geographical, like the US COVID case we were looking at earlier. Let's look at some US election data. So what I've done is I've taken all of the counties in the state of California, and you don't need to read this whole thing, it's not a course on how to make ggplots, but what you should see is that we're doing facet wrap on the county name, and what you can see is just like with the iris data sets, each county is arranged alphabetically. Now, it might be a little tricky to take away any insights. The alphabetical sorting of the counties is not really related to anything we know of in real life. We don't see the counties in alphabetical order.

So this is where the magic of facet warp comes in. What if I want to warp them, I add these two parameters, the macro X, the macro Y, to sort them in the big picture by longitude and latitude. What you can see now here is that each of the little facets is sorted like a little mini map of California. If you really squint, you can sort of see the shape of California. It's also no coincidence to the political enthusiasts that the blue ones are along the West Coast, and some of the more red ones are a little bit further inland. These are just the geodemographic patterns of California.

Now, unlike the geofacet package, I didn't have a pre-canned layout of these counties. This was computed magically on the fly, given longitude and latitude. So since these are just numerical columns, I can actually do this with other data. What if I wanted to do it by population density or median age? Think of plotting each county in California by their population density or their age. I can take this layout and snap it to the grid. That's exactly what this is. See how San Francisco is on the right, Sierra is up at the top left. That's exactly what this is doing.

So this can be a fun way, maybe not always great for publications because some of the text can be a little small, but I like to use this for exploring data to help me learn about what's going on in the data. Here's another giant one of all the L-stops here in Chicago. I wanted to do a thematic one. For those who do know the L layout, you would remember that the O'Hare stop is kind of way far away from all the other stops. You can see these patterns spatially, which is pretty cool.

There's a paper here that explains that algorithm, how the snapping happens. I won't go into the details, but it's from 1987. This is all culminating in the fact that I actually just released this package, and it's called Facet Warp. So you guys can all look at it. It's a pun on facet wrap, but in this case, facet warp. Thank you.