John Burn-Murdoch | Reporting on and visualising the pandemic

Transcript#

This transcript was generated automatically and may contain errors.

Hello everyone, thanks for having me today. It's a real honour to be able to speak to you all. My name's John Burn-Murdoch. I'm going to be talking for the next 40 minutes or so about the lessons that I've learned from visualising the COVID-19 pandemic over, well, I guess it's now about 12 months.

So to set the scene, what I want to do is start off by talking about the perspective I'm coming from in this talk. So data visualisation is obviously an enormously broad church. There's all sorts of different uses for database in different settings. But what I'm talking about today specifically is data visualisation for a mass audience. Now, as a visual journalist, I'm always dealing with a large audience to some extent. The Financial Times is behind a paywall, so it's perhaps not the largest audience of all. But what's really changed dramatically over the last year working on the pandemic is that that audience has become enormous. It's become truly global.

It was quite obvious from as early as the sort of first couple of weeks of March 2020, that I was suddenly, my charts were suddenly being seen by an audience of millions across the world instead of just the tens of thousands or hundreds of thousands who would typically see them on the Financial Times website.

So the principles I'm going to be talking about today and the best practices are all geared towards communicating to a hugely diverse audience of people in terms of their familiarity with charts. I'm not saying that what I recommend today is going to be the case if you're producing a visual installation for an art gallery or a sort of coffee table style piece of work. But really just thinking about data visualisation fundamentally is an act of communication and particularly of mass communication.

Understanding how people consume charts

So to start with something that I suspect reads as incredibly obvious, but I think is actually relatively little discussed in the database world. I'm going to talk about the fact that to create really effective, really powerful charts, you really need to understand how people consume charts. Now to sort of talk a little bit about why that's not as obvious as you might think, if you look online at the numbers of tutorials there are for making graphics, making data bits, whether it's in R, Python, D3, you name it, generally they're focused on the technical element of constructing a chart. What chart type to choose, how to get your geometries, your shape, space and colour exactly right, how to do things more efficiently, optimising interactivity, that kind of thing. There's relatively little on thinking about what more qualitative elements you can optimise to ensure that people understand, enjoy and remember your charts.

This is where I perhaps get almost a bit too corporate or business focused in my view of this, but when you're making charts for a mass audience, it serves you well I think to think about them almost as if you're creating an advert. There is decades of research from people who create billboard ads for example on what you should do in terms of messaging, in terms of how you use text, in terms of how you use colour and all of that to elicit a specific response in your audience. I think it's that communication way of thinking that is relatively underrepresented at the moment in best practice guidelines for charts.

So as I say, when I've been creating charts that I'm hoping and latterly expecting are going to be seen by an audience of millions, I've really had to focus on first of all how do I ensure they're clear enough to reach and be easily understood by millions of people and then how do I make sure that they stick in their memory and they're not just post 79 out of 100 that ping on their Twitter feed every day.

The Borkin research: where eyes go on a chart

So without any further ado and I think quite fittingly given that I'm talking about lessons I learned from the pandemic, it's time to look at the science. So I'm going to show you here some work that is not my own work, but it's probably the single most influential data viz research paper that I've ever come across and it's one that I and our wider team at the FT use very frequently and constantly have in our mind when we're thinking about how to optimise our work for understanding by a mass audience. And that is this paper from a team led by Michelle Borkin in the US on what it is, what are the characteristics of visualisations that lead people to recognise the story of that graphic and then to recall it later on.

So the slide I'm going to show you in a second is looking at an eye tracking experiment from this paper and I'm going to show you a GIF so it's going to loop if you don't see it first time that's fine it'll come again. But what it is is it's showing where people's eyes actually focus when they're given a chart for a relatively short period of time, 10 seconds, 30 seconds or so and asked what do you see here, what can you make out, what's the main takeaway. And so what you're going to see is a sort of simple illustration of a chart and then a blue dot on that chart showing where people look when they're presented with that chart.

And what you see is that blue dot moves straight up to the title of the chart, has a little explore of the title, then comes down and reads off the axis label to see what units the chart is showing and then reads across the chart content itself. The key point here is that that initial focus goes straight to the title and the researchers did additional follow-up analysis of the same group of participants and found that when people later on were asked to remember any information about a chart and just write a sort of free-form description of what they'd seen, words featured very prominently. The title, any annotations, labelling were all mentioned far more often when people remember the chart than the actual visual content of the plot itself.

And that is something we really sort of centre on in our work at the FT. Our view is very much that you can make a stunning chart, you can have all your geometry right, it can be technically executed perfectly but if the messaging, if the text is absent in the worst case scenario or too sterile, too dry, you're going to lose a hell of a lot of people.

Our view is very much that you can make a stunning chart, you can have all your geometry right, it can be technically executed perfectly but if the messaging, if the text is absent in the worst case scenario or too sterile, too dry, you're going to lose a hell of a lot of people.

And the way I like to think about that is that your text can help your chart be understandable, enjoyable and memorable for someone who is quote-unquote not a chart person. Those of us watching or participating in the conference today I'm sure consider ourselves chart people or data people to a greater or lesser extent and that means when we see a chart it's already a thing that we're familiar with and that we like and that we have a sort of way of understanding, whereas there are millions and millions and millions of people out there, a lot more non-chart people than chart people as it were, who need that helping hand.

The coronavirus trajectory tracker: origin story

So to give you an example from the pandemic of this sort of way of thinking in action I'm going to talk through the origin story as it were of our coronavirus trajectory tracker, so probably the most high profile chart that we've put out over the last 12 months. And I'm going to start by showing how this was sparked, how this came about, by showing you an email I got from one of my colleagues on I think this was March the 10th or 11th and so she's one of our general news reporters who was covering the sort of nascent pandemic back then, or certainly the nascent epidemic as far as the UK and Europe was concerned. And she just asked me, she's wondering if I had any data on daily cases in Italy, which of course at the time was sort of the focus of all the world's attention, there were grim pictures coming out of there in the news reports of bodies in morgues and that kind of thing. And she's asking me okay do we have any data on daily cases for Italy and can we compare that to the UK, perhaps other countries as well.

She hadn't managed to find that data anywhere else. And so I rustled up that same afternoon or evening or whatever it was, a quick version, sort of version 0.0 as it were of what became our high profile trajectory trackers. So you can see for those of you who remember the final version of this chart or who've encountered the final version of this chart, the geometry, what is shown, what is plotted here is essentially unchanged from the final version. We've got the number of days since a country first reached its 100th confirmed case going horizontally, and then you've got a log scale, a logarithmic y-axis of the total number, cumulative number of confirmed cases up to that point.

But obviously, you know, if I published that, I doubt it would have cut through to a huge audience the way it ultimately did. There's nothing here really telling you what's going on. There's a lot of assumed knowledge. Obviously, I sent that to my colleague in the context of this email chain. We both knew what we were talking about and looking at. But if I just put this, if we publish this on the FT, on social media, you know, a few people, again, chart people or COVID data people would have thought, oh, right, interesting. But it wouldn't have had any sort of mass resonance.

So here's the next version of this when it's been styled as we style all our charts on the Financial Times. So we've got that trademark sort of salmon pink background. I'm now using the colours from our main colour palette, again, doing some sort of semantic use of colour there. There's a bit more text on the chart now. We've got both axes labelled. I've got that source and footnote at the bottom as well. But again, sure, this, in terms of aesthetics, in terms of design, this is now neater. This is a better fit for what we would publish on the FT. But it's still missing those crucial ingredients.

So here's the final version that was published. I think this was on March the 12th, so still about two weeks before the UK, for example, actually went into lockdown. And now we've got all of that additional labelling added. Again, going back to that research, these are the key bits of information that are going to be the first thing that someone encountering this chart focuses on. And they're going to set the scene for everything else that they take away from it. So we've got a title, a very active title, descriptive narrative title, that most Western countries were on the same coronavirus trajectories. Hong Kong and Singapore, by contrast, have managed to slow the spread. So straight away, you've got a message which grabs you, which brings you in, and which tells you something that is happening in the world.

The fact that there's a chart here as well is almost incidental. But the point is, whether or not you are someone who builds and works with database multiple times a day, or someone who last really spent any time with charts when you're aged 16 and doing some compulsory maths or science education, whichever of those groups you fall into, the words here tell you what's going on. And so if you are someone who doesn't consider themselves a chart person, you've got your entry point. You can now understand the geometry of the chart. You can work out what's going on with the lines, with the shape, with the space, with the colour, because you know what message it's trying to tell.

The other thing we've obviously got here is these annotations towards the right-hand side, actually explaining some of the stuff you can see on the chart. So this is more geared towards someone who has not been following the pandemic and COVID data in a huge amount of detail. They immediately have the queries that they would have had of this chart answered. So that text, when we looked at what people were talking about around the context of this graphic on social media and on the Financial Times website, that text was the key part that really seemed to cut through and cause people to keep coming back to this chart and indeed to share this chart with other people.

So the first lesson of the pandemic for me, I think, was that using text and other annotation is really critical for making sure a chart goes from being a chart for dataviz people, shall we say, to a chart that is truly accessible to a really mass audience of people who haven't been, they're not involved in charts and data every day. And suddenly, this is the sort of democratization of charts by using text to make them more accessible.

Smart vs. clear: the log scale debate

The next lesson, and this is for me one of the most fundamental lessons anyone can learn in the process of building dataviz, is that there's a big difference between making the smartest version of a chart you possibly can and the clearest version of a chart you possibly can. As with everything, there are obviously exceptions where you can do both of these things, but during the pandemic, I think there have been several examples of how these really do, these really are in tension with one another a lot of the time.

So sometimes, and this was more the case, I think, in the earlier part of my career, but it's still a recurring theme, and I think this is something that a lot of people here today will recognize as well. Sometimes we can get into the habit of thinking that effective data visualization is about optimizing for precision and objectivity. There's a sort of list of technical tasks, and if you nail every one of them, you'll make the perfect chart. You're using maths and geometry to arrive at the best solution. It's a very sort of technical and structured process. But then I actually publish a chart, and it suddenly becomes very clear that that is simply not the case. That is a convenient but very rarely true description of what we should be aiming for when we make a data visualization.

So the first difference, and I'm just going to flick back between these two charts. This is the one that I've shown you already. This was the data plotted on a logarithmic y-axis. The point of this was to say, well, we know that the pandemic, especially in the early stage, spreads exponentially. So the shape of the curves on this chart, if we plotted it on a linear axis, on a linear scale, they would all be arcing upwards from gentle slopes to steep slopes. And to me, you're using a lot of visual bandwidth there just to show something that we assume is the default, that all of these curves are going to be getting steeper and steeper in these early weeks.

So by going for a log scale, you sort of free up more bandwidth, as it were, in the reader's head, and you allow them to focus on what matters, which is you can now do the key thing that readers at this stage wanted to do, which is compare the trajectories of different countries on a straight line. You can look at this and say, OK, the virus seems to be spreading more quickly in Spain than in France, and France, in turn, more quickly than the UK. You can also, because it's a straight line, you can forecast out ahead and work out where the rate of cases in your country is going to be a few days or a week further down the line. You can answer that critical question that we set out to answer here, if we refer back to my colleague's email, which is how are countries doing compared to Italy?

So my thinking here was all framed around this idea that when someone's looking at this chart, there's a whole sort of framework they're operating in that determines what their takeaway message is, is about are these two countries or three countries on the same course, or how many days until the country I'm interested in is at a certain level of cases that I know sort of means something. And therefore, that they're not thinking, well, how many pixels represent 100 cases? They're not worried about what's going on on the y-axis. They're looking for more of a sort of overarching message.

However, let's have a look at some of the feedback that started pouring into that. And I'll say just as a quick aside here, for anyone who isn't in the habit of publishing their work on social media, and I know it's not for everyone, and there are certainly some sub-optimal features of Twitter, shall we say, but it's an incredible way to work out what does and doesn't work in a chart, and to gauge the sort of reaction again of both chart people and non-chart people very quickly. So I was inundated with messages around responding to these charts for week after week after week, and continue to be this year.

But this was a snapshot of some of the initial feedback to that chart. Now, obviously, loads of people understood the log axis fine, but any critical feedback is always worth paying attention to. And the fact that the majority of people got it doesn't negate the fact that plenty of people didn't. So lots of people here, very confused by what's going on in the y-axis. Some of them perhaps know what a log scale is, but just thought it wasn't clearly explained. Others clearly are just baffled, and that's not a reaction you want with your chart.

However, my response to that was not to just think, oh, right, okay, back to the drawing board, start again, it's got to be linear, people aren't getting this. My response was to think, okay, I've got a, I'm in an advantageous position here that I'm on Twitter where I'm presenting these charts, and this is a space where I can continue these conversations. So what I did was I immediately posted a long explanation, or not too long, but a relatively thorough explanation of exactly why logarithmic y-axes were well suited to visualizing data that grows exponentially. That got a very good response, and we followed up on that at the FT by actually making a whole video explainer which talked about all of the decisions we were making when we made these charts and how to understand them. And this seemed to work really, really well in terms of softening some of that initial confusion that some people had to the chart.

And my favorite thing about this is that what you get when you do this kind of work, what you'll see when you explain things clearly and have them in the public domain, whether it's Twitter, Facebook, whatever, is your audience will start helping you explain this stuff to other people.

What you get when you do this kind of work, what you'll see when you explain things clearly and have them in the public domain, whether it's Twitter, Facebook, whatever, is your audience will start helping you explain this stuff to other people.

So what we started to see quite soon was these are other people jumping into the replies to me explaining to the people who've been confused, look, this is a log scale, this is what they do, don't worry about it, and actually pointing people to the video where I'd explain this. So instead of thinking, well, you know, I've just published the chart, I'm done, I'm going to walk away, by sort of staying in that conversation around the chart, I was able to ensure that as few people as possible actually remained confused, and to really try and improve understanding of log scales full stop.

Just as a quick aside, something I find quite amusing when I look back at these messages is that you'll see on the left-hand side there, two of the people who'd been confused by the log scale, well, one of them, their account is now suspended, and the other one, their account no longer exists. So I like to think of that as being a little example that people who don't understand log scales have other failings in life, shall we say, that lead them to be kicked off Twitter. Anyway, back to the important stuff.

The use of animation was enormously pivotal to that in terms of how it grabs attention more than a static chart, in terms of how it builds suspense, and in terms of how you get that shock, that surprise factor at the end when you compare the two.

So the key point there being, where it's suitable, animation is an incredibly powerful technique to use to convey a message. Now, I think it should be used sparingly. If you suddenly make everything animated, it's like the old days of 90s internet where you've got pop-ups everywhere. Everything's too noisy, it's hard to actually get across your key point and everything gets drowned out. But using animation sparingly, I think, is an incredibly powerful technique.