Resources

Matt Thomas & Mike Page | How the Tidyverse helped the British Red Cross respond to COVID | RStudio

Full title: Cognitive speed: How the Tidyverse helped the British Red Cross respond quickly to COVID-19 We will discuss the importance of cognitive speed, defined here as the rate in which an idea can be translated into code, and why the Tidyverse excels in this domain. We will demonstrate this idea in relation to a suite of tools we were required to rapidly develop at the British Red Cross in order to respond effectively to the COVID-19 pandemic. To do this, we will exhibit how elements of the unifying design principles outlined in the ‘tidyverse design guide - Tidyverse team’ relate to the notion of cognitive speed, giving specific examples for various design considerations. We believe this talk will encourage reflection on better design practices for future R developers, using the design principles of the tidyverse as the guiding beacon. About Matt: Dr. Matt Thomas is Head of Strategic Insight and Foresight at the British Red Cross. Matt's team aims to help the Red Cross become more anticipatory and proactive by producing insights and tools including the Vulnerability Index (https://britishredcrosssociety.github.io/covid-19-vulnerability/) and Resilience Index (https://britishredcross.shinyapps.io/resilience-index/). He holds a PhD in Evolutionary Anthropology and, prior to joining the British Red Cross, was researching topics including reindeer herders in the Arctic, hunter-gatherers in the Philippines, and witches in China. Outside of work, Matt writes a column for an anthropology magazine (https://www.sapiens.org/column/machinations/) as well as fiction. About Mike: Mike Page is a data scientist on the Strategic Insight and Foresight team at the British Red Cross. Here, he helps to develop a suite of open source tools and dashboards including the Vulnerability Index (https://britishredcrosssociety.github.io/covid-19-vulnerability/) and Resilience Index (https://britishredcross.shinyapps.io/resilience-index/). Mike is also the author of several R packages including mortyr and newsrivr. In his spare time you can find him rock climbing around the Alps

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, my name's Matt and I'm Head of Foresight and Insight at the British Red Cross.

Cast your mind back to March of 2020, what now feels like several lifetimes ago. The COVID-19 pandemic was starting to grip the world and the UK, where I am, was entering its first national lockdown. The British Red Cross was in the midst of planning our emergency response. We were gearing up to deliver food parcels and deliver medicines, support people who were lonely or socially isolated, and help people who had been moved into emergency accommodation.

But how did we know who needed support and where we might find them?

The day before lockdown began, I made our first Git commit to what would become known as the Vulnerability Index. My colleague Elle and I were rapidly designing a map to model vulnerability to COVID-19 in neighbourhoods across the UK. This map would be used to plan not only the British Red Cross's pandemic response, but would also inform other voluntary and community sector organisations, as well as government and public health bodies. It needed to be expansive, rigorous, and accurate, and we needed to build it quickly.

700 commits, more than 100 data indicators, and several new grey hairs later, this is where we are now. My team and I built a set of open source tools that measured the vulnerability of different geographical areas of the UK and their ability to cope. Today we have a suite of dashboards, maps, and datasets that anyone can use for free.

How did we get there? We had to figure out what makes people vulnerable to the disease. COVID-19 isn't just a public health emergency, it's also a social and economic emergency, and we needed to account for vulnerability in a multidimensional way. We gathered data on over 100 indicators related to people's clinical vulnerability and socioeconomic vulnerability, as well as their wider health and wellbeing needs, but we had to find some way to merge everything into a single vulnerability score, and we needed to do it fast.

So I picked up the phone and I called Mike.

Mike joins the project

Hey, I'm Mike, a data scientist at the British Red Cross. After agreeing to redeploy into Matt's team, I took a quick peek at the repositories he had created, excited to see what I might be working on. My reaction was, of course, panic.

What I thought was going to be a project in its infancy had already morphed into thousands of lines of code, split across hundreds of files, datasets, and repositories. Add to this the pressure of a pandemic and the need to develop rapidly, I wondered what I got myself into. Thankfully, I was soon to learn that these feelings of panic were unfounded. The project had a silent hero working in the background that allowed me to quickly get up to speed. This hero was the Tidyverse.

Why the Tidyverse?

So why the Tidyverse? Every data science problem presents a unique landscape which must be navigated. Here at the British Red Cross, we like to think of the Tidyverse as the cognitive superhighway through many of these landscapes. The signs are clear and easy to follow, and it's easy to show colleagues and your future self how to navigate back through the terrain.

But this is not to say the Tidyverse is always the correct path to take for every problem. Depending upon the task at hand, other tools and paradigms may be better suited, such as Pandas and Datatable. However, at the British Red Cross, we often have a requirement to develop and collaborate quickly, and this is where we believe the Tidyverse shines most brightly. It quickly allows us to translate our ideas to code.

This is because the unifying design principles of the Tidyverse are centred first and foremost around cognitive performance, not computational performance.

This is because the unifying design principles of the Tidyverse are centred first and foremost around cognitive performance, not computational performance.

To demonstrate why this is important, I'm going to show you some example code snippets from the vulnerability index Matt introduced earlier. For each snippet, I'll demonstrate how the unifying principles of the Tidyverse make development quicker and easier, allowing us to respond rapidly to the emerging pandemic.

Human-centred and inclusive principles

First, let's take a look at the human-centred and inclusive principles. Together, they acknowledge that data scientists come in all shapes and sizes. This means the Tidyverse APIs and documentation are designed to be read and written by everyone, irrespective of their background. This means it's not just for computer science graduates whose idea of a fun weekend is arguing about whether a rate should be 0 or 1 indexed.

This quote from the Tidyverse design principles book sums up the human and inclusive principles nicely.

One particularly powerful idea is that of affordance, the exterior of a tool, so suggest how to use it.

For example, take a look at this code snippet written by Matt that calculates a mean total response time from the fire and rescue authorities using a classic dplyr-style pipeline.

Without ever having seen the code, and only a little bit of prior Tidyverse knowledge, it's quick to read and intuitive to understand what is going on. Take the fire and rescue authority stats, filter only for fires in English dwellings and create a new variable called year, select only the variables of interest, and finally, group by a geographical variable to find the average 3-year fire and rescue response time in a given area.

The code almost reads like English prose, listing off a set of clear instructions. There is no requirement to reason about things that computers are optimized to do, such as indexing and subsetting. The result is code that is quicker to write, read, and collaborate on.

Consistency

Another important design principle of the Tidyverse is that of consistency. Principally, this idea means finding the smallest possible set of key ideas and using them again and again. This manifests itself in two ways in the Tidyverse, in data structures and in function APIs.

The code snippet I'm about to show demonstrates the power of this idea effectively. This code generates a data frame of weights across four domains of vulnerability where only two domains should be weighted at any given time, and the sum of the weights must equal one. The main power of this snippet lies in the first function, crossing, which generates all combinations of variables found in the dataset while de-duplicating and sorting the inputs.

Here, the yellow arrows demonstrate the consistent tidy data frame structures used throughout this example. Using one common data structure over and over again reduced the cognitive load required to reason about the problem we were challenged with. One reason for this is that the consistent data structures mean the APIs of the Tidyverse communicate seamlessly.

For example, we can see here how the functions from TidyR and BplyR fit together like Lego blocks. This mostly removes the need to reason about the input and output structure of the data being passed throughout the Tidyverse, making it quick to learn new tidy functions and packages.

Consistency also extends to the naming of functions throughout the Tidyverse. In general, Tidyverse function names offer imperative verbs over nouns, long names over short names, and thematic unity between similar groups of functions. The end result are functions that are easier to remember and implement. For this particular code snippet, I recall a deep sense of joy as I intuitively knew the name of the functions I needed to recall, despite only using several of them a handful of times.

Composability

The final design principle of the Tidyverse is that it should be composable. This means that complex problems should be solved by combining many smaller pieces, with each smaller piece easy to reason about. This is realised in the Tidyverse for a predominant functional programming paradigm that uses immutable objects. This means problems are solved by combining functions that transform data, rather than by creating objects whose state changes over time.

We saw the power of this idea previously, where a consistent immutable Tidy DataFrame object was used throughout. This long code block demonstrates the idea of composability nicely. Try not to be drawn in to immediately understand it.

Instead, notice how many smaller pieces of the Tidyverse are used to solve an overall more complex problem. In some instances, some functions such as mutate are used over and over again. The end result is we are able to solve complex problems more quickly. In this particular case, it allowed us to assess the vulnerability of different regions of the UK, split across different vulnerabilities of the vulnerability index.

So let's turn back to Matt and find out how these design principles have impacted our response to the COVID-19 pandemic.

Real-world impact

Have you noticed how all the principles are linked together? The Tidyverse principles are rooted in cognitive theories of mind, and they make it easy for humans like me and Mike to pull insights out of datasets and make new tools for people.

One example of how we've used our vulnerability index to design better services for people is our COVID-19 Hardship Fund. Aviva, an insurance company, donated £5 million to us to give to people who are facing economic hardship during the pandemic. We used our vulnerability index, alongside other research, to figure out that the people most likely to need financial support included those without access to state welfare, older isolated people, people facing homelessness or living in temporary accommodation, and survivors of gender-based violence.

In one region of England, this mapping, combined with local intelligence from our frontline teams, led to more than half of referrals to our Hardship Fund coming from the most vulnerable areas. As of December 2020, we've helped more than 7,000 people with financial assistance.

So following the Tidyverse design principles has made it faster for Mike and I and others to get our ideas out of our heads and translated into something useful in the real world. This cognitive speed is really key for us, working as a humanitarian organisation. To put a spin on the well-known Agile phrase, we need to move fast and save things.

To put a spin on the well-known Agile phrase, we need to move fast and save things.

And we're not just working with one another, it's often said your most important collaborator is yourself, six months down the line.

Even if you don't use the Tidyverse itself, we suggest using all four design principles as inspiration for future package and code design. Writing code and packages in the future, think about cognitive performance, not only computational performance. For our work on the Vulnerability Index, this meant that there hasn't been a single time that Mike or I had to explain our code to one another, which is remarkable. We should often be writing code that is human-centric and designed around how people actually think and work. The key is to build tools and analyses faster for real-world impact. Thank you for listening.