Roche & Novartis: Effective Visualizations for Data Driven Decisions || Posit (2020)

Transcript#

This transcript was generated automatically and may contain errors.

My name is Marc van de Meulebroeke. I work at Novartis, and I just want to say a few motivating words on why we're doing what we're doing and what you can expect to hear today.

We start from the insight that graphics and visuals are such an important component of the work we do as quantitative scientists. They're so important for ourselves to gain insights into data, as well as for communicating results and conclusions to our stakeholders.

And if you've been following all those coronavirus charts out there for the past few weeks, like I did, you will agree that graphics are key to understanding what's going on.

Unfortunately, however, we are not always good at creating effective graphs. How often do we see or even create ineffective visualizations? And do we even know which choices or features make a graph effective for a particular purpose or audience?

This is our first focus, to convey this understanding. And this point is actually part of a wider initiative beyond today's webinar, and you will find a corresponding link at the end of these slides.

Secondly, once we do know what makes an effective graph, how easily can we actually implement it? That is, create it quickly and with minimal effort. This is our second focus, the development of an R package that has good graphical principles sort of built in, building upon many of the great packages out there and minimizing the code required for the user.

We quickly realized that these themes are of common interest across Roche, Novartis and probably many other companies and institutions. So we've started to team up and tackle them together. And one of the goals today is also to call for additional interested contributors.

Which leads me to today's agenda. After this introduction, my colleague Mark Bailey from Novartis will speak about the principles of effective visual communication generally. He will then hand over to Charlotte Fruchtenicht and Diego Saldana from Roche, who will focus more specifically and a bit more technically on the R package which we've started to develop. The goal of this package, as I said earlier, is to make it really easy to implement those good principles in practice, specifically in a health care or drug development context, which is where we work.

All this is work in progress. And at the end of the webinar, we will call for further contributors and we will hopefully also have a few minutes for Q&A. So with that, Mark, do you want to start?

Principles of effective visual communication

Thank you, Mark. So effective data visualisation is a key skill for all quantitative scientists, including statisticians, epidemiologists or data scientists, whatever your current title is. Traditionally, this is a skill has not been a primary focus in the training and development of quantitative scientists. In graduate degrees, the focus is often on analytical and technical competencies. This skill of visual communication is often self-taught or developed later on in the job.

So effective visualisation is extremely important. It is a key skill that allows us to convey complex concepts and information to others. If we get this right, it helps support appropriate decisions or actions. If we get this wrong, it can lead to misinformation, confusion or even harm, especially in clinical and medical research.

The visualisation on the right is a good recent example of this covering the current coronavirus pandemic. The visualisation was recently published in The New York Times. It displays the concept of social distancing and why it is important. It is important because it helps break the transmission chain. The graphic also illustrates the concept of exponential growth. This is something we've heard a lot of recently. Here we see an infected individual who goes on to infect two others. They in turn infect two others until we begin to see many individuals infected with the virus, all stemming from a single person. By practising social distancing, we may break this chain, therefore reducing the number of subsequent infections, which could be substantial, as displayed in the bottom graphic.

We are not always good at it. On the right is an example from a clinical study report. Waterfall plots are commonly used to report treatment response in oncology clinical trials. Aesthetically, this plot doesn't look good, from the choice of patterns to the general look and feel. For me, it feels like it was produced by a typewriter rather than modern software. There's a lot not to like about this plot.

But as I said earlier, if we get visual communication wrong, it can lead to the wrong conclusions. And in working in medical and clinical research, this can often result in harm. So apart from the aesthetic limitations, the real failing of this graph is not the look and feel, but that the graph expects the reader to make calculations mentally to compare treatment effects across multiple doses. Waterfall plots are often used to support decision making, especially in critical indications found in oncology. So asking the reader to make those calculations mentally isn't an effective way to communicate.

So what do we mean by effective? We don't necessarily mean beautiful.

So to illustrate here is an example of a visually appealing graph from David McAnalyst. He is a data artist and journalist from the UK. The visualisation is called Mountains out of molehills, a timeline of global media scare stories. The visualisation attempts to investigate the number of news story reports and if there is a sense of overreporting. And so this could cause unnecessary panic. It covers earlier pandemics such as SARS, bird flu and spine flu, but also video games and killer wasps. Visually, it is appealing and draws you in, but I would argue it's not effective as key information is hidden away in the legend.

So if you look at the legend, there are numbers and brackets that provide the deaths associated with each story. It's very difficult to see, I can imagine. This information is important to determine if there really is overreporting, but the information is not factored into the main body of the graph. Also, the subtitle uses languages such as scare stories. It has comparisons to annual seasonal flu deaths and it also has comparisons to topics such as video gaming. All of these design choices indirectly influence the reader, which can then influence decision making.

So beautiful and effective, that's what we really want. So here's an example of a recent effective visualization published by The Economist. The visualization captures a recent phrase that we hear a lot of now, flatten the curve. The graph summarizes this concept really well. So we have two distributions of a time which represent the number of infections in the population. Each distribution represents a strategy. Strategy one for not taking preventative measures such as social distancing and strategy two that does. You can see that strategy two reduces the peak of the distribution through social distancing. This strategy extends the epidemic over a longer period of time, but reduces the peak number of infections. This special visualization is effective in introducing this concept of flattening the curve.

But can we be even more effective? So here's another version of the same graph. Aesthetically, it's not as appealing as the previous one. But by adding a dotted line and a simple annotation health care system capacity, this visualization now highlights the why. Why do we want to flatten the infection curve and spread an epidemic over a longer period of time? The purpose of flattening the curve is not to overwhelm the health care system. So the dotted line illustrates that the first strategy would result in more infections than the health care system could cope with. This, sadly, can have terrible consequences that we are currently witnessing now.

But by adding a dotted line and a simple annotation health care system capacity, this visualization now highlights the why. Why do we want to flatten the infection curve and spread an epidemic over a longer period of time? The purpose of flattening the curve is not to overwhelm the health care system.

So both graphs are very effective ways to convey complex concepts to the population and get messages out there. And I just want to highlight that both of these slides were influenced by a tweet from Carl Bergstrom. The link is there if you want to follow up on it. So effective visualization is effective visual communication. Effective graphs are visually appealing, intuitive, legible. Then use the correct graph type and axis scales. Then use proximity and alignment to facilitate comparisons. Then use labels and annotations to add clarity to the message, as I illustrated before. And more importantly, effective use of visualizations enable clear and impactful communication. Evaluate our influence with our stakeholders and facilitate informed decision making.

Yeah, I think in general, as Mark mentioned before, it's really about knowing your audience. So I really couldn't give you one rule of thumb for everything.

Thank you, Diego. I'll move on to a question regarding compatibility. Two people have asked about intentions to combine this with LaTeX or somehow create a link to LaTeX. And on the other hand, whether we are planning to create some templates for reporting with R Markdown. So both of the questions maybe can go to Mark. Yeah, at the moment, I guess the table one is an example where the engine can change, but the underlying data or the reporting of the data should not change. So I do believe that LaTeX is an option, but hasn't been implemented. I possibly need to hand over to Charlotte to confirm, but we would be looking at different engines for reporting.

Yeah, just to add to that, actually, there was and I saw there was another question on the chat about GT summary. So the reason why we build our own table rendering was to support different engines, namely Cable, GT and DT . And and that is specifically because some of these already come with LaTeX support and that is implemented already, at least in the rudimentary form regarding templates. I think that's a great idea. We haven't really we haven't really reached that stage yet. But we we try to to create the functions at the moment in a flexible way that allows easy integration in different Markdown output formats as well as Shiny.

Thank you, Mark and Charlotte. Here's the question. Is it planned to have only survival analysis or other epidemiological model or analysis? I think the answer is clearly, yes, we want to expand to many different types of analysis, common ones and gradually, perhaps a little less common ones as well.

One question I think is important here, just coming in, what level of proficiency are you looking for in contributors? Maybe Charlotte, do you want to answer this one? Sure. I guess we're we are open for all levels of proficiency. So as we said, this this package is at a really early stage. So we'd love to hear and see people contribute who have who have a lot of experience in developing our packages, from which we could probably learn quite a bit, being more in the data science and domain expert side at the moment. But then also people who come in who would be like target users or who want to write documentation or vignettes are very welcome. So all levels of proficiency are welcome to join the team.

Thank you very much. And maybe a last question, because we only have one minute left. Someone has asked about the possibility to incorporate corporate branding, colour schemes and the like on top of the good graphical principles. The answer is clearly yes. That's the way the package is designed. It is it should be possible to to plug your personal or your company's corporate branding at the very last step on top of the visualisations.