Resources

History repeats itself: What the Du Bois Challenge taught me about reproducing design (Simi Ndaba)

History repeats itself: What the Du Bois Challenge taught me about reproducing visualisations Speaker(s): Simisani Ndaba Abstract: In 2024, I participated in the Du Bois challenge to recreate W.E.B Du Bois’s iconic 1900s charts on African American sociology. By reproducing the old graphs and their annotations, themes, accessibility and visible contrasts, I levelled up my visualisation skills. I learnt about complexity of design, colour palettes, fonts, styling, and of course appropriate R packages. This experience empowered me beyond the DuBois challenge and used what I learnt to take part in the 30-day chart challenge, Genuary and Tidy Tuesday. In this talk, I will share what I learnt so that you too can more easily become familiar with unfamiliar charts and craft your own visualizations to regale stories. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everyone. I have a question for you. Have you ever seen a graph and thought to yourself, how am I going to recreate this? Right? That's what I thought the first time I saw W. E. B. Du Bois' 1900 Paris Exposition Charts. They're intricate, they're bold, they're colorful, and they tell more than just a story. They tell more than the numbers, but the people behind the numbers.

But historical visualizations are like that. They teach clarity over constraint. That is, they show you how to focus and foster on creativity, efficiency. They narrow down efforts and decision making.

Historical visualizations as inspiration

Take Charles Menard's 1921's cartographic depiction of Napoleon's disastrous Russian march into Russia, which comprises and compresses multiple dimensions of the impact of the winter, dates, longitude, latitude, and the troops. All in one graphic, without losing its clarity.

O'Florian Nightingale's coxcomb diagram, which not only shows how really the soldiers died during the Crimean 1957 war, but the reality that they died not from bullet wounds, but from the squalor conditions they got because of the preventable diseases. The policy from this chart changed and created the nursing profession.

O'William Playwright's 1912's plot depicting the quarter price of wheat as well as the paid labor. Not only was it the foundation for statistical depiction, but also showed national and economic debt, as well as trade balances.

Now, recreating Dubois' revolutionary charts showed that you can actually create a checklist of reproducible design. And when you follow that checklist in your own visualizations, it goes to show that when you plot way and how you plot your visual elements, you can actually create your own creative visuals, which makes it easier for people to see. And when it makes it easier for people to see, they read faster, they understand what you're trying to say, and it also makes you a better data visualist.

and it also makes you a better data visualist.

And if you follow that checklist again and again and again, you can get better feedback, and you can also get people understanding you are better.

Who was W.E.B. Du Bois?

So we've probably seen this man's image before, but we have absolutely no idea who he is. Can we tell who he is? Have we seen him before? William Edward Bogart Dubois, or Dubois as he would like to be called, was born in Massachusetts in 1968, and he lived and he worked as a scholar, a civil rights activist, a historian, and he actually worked at Clark Atlanta University, just southwest outside downtown.

His life spanned from the American Reconstruction to the Civil Rights Movement when he died in Ghana in 1963 at the age of 95.

So the original data visualizations and captions that he used himself depict a time that he used to refer to a certain demographic. So I hope no one will be offended. It's just for our edification.

What made the charts unique

So what made the charts unique? Actually, at the time in his tenure at Clark Atlanta University, him and his students handcrafted 60 charts. Some of them are on the screen there. What they did was to actually show the black American progress from the enslavement to where they actually moved from where they were staying, the property they owned, the livestock, the land, and not only were they unique like that, but they were actually written in English and French by Dubois because he was actually bilingual, to appease to the French audience.

So what also was unique about them was the layout. Some of the layout was unique because some of the graphs, as we can see there, were designed in different shapes and styles, and they were put together in the same space. The narrative were actually put on the graph to give some context to the data, to explain to the people what exactly was going on. And not only did he use narrative, he also used different fonts for the topography to actually guide the reader to what was going on.

There was also the use of bold colors. He liked using what they call the pan-African colors, black, red, white, green, yellow, and they were bold and vivid and also drew people to what he wanted to make the point on.

So I actually used these three elements, the narrative, the layout, the topography, and the color, and used them as elements to recreate the design. So putting all the designs together actually created the reproducible, actually relived the old visualizations as part of the reproducible design.

Recreating the designs in R

And when I talk about reproducible design, I'm talking about taking a graph and breaking it into its visual elements, like we were talking about, the fonts, the color, the layout, and putting them together. Step by step, you have your reproducible design.

So looking at the layout, to recreate the stack area chart, which talks about how certain people were enslaved from 1719 to 1870, I used the original CSV file, but the original file was laid out in a wide format. And when it comes to mapping software, they don't usually want a wide format. What they usually want is a long format, which means that they need a category column as well as a value column.

So I went ahead and changed the wide column, the wide format, into a long format, using the Pivot Longer from TidyR. So what this means is that they kept the year as it is, and the function created two functions, status for the category column, and count for the value column. This makes it easier for multiple dimensions to show. And I also used the A package to make the indentation and the structure nice.

So now that we have the long format, I could go ahead and create the stack area chart. Using ggplot, I used the year to plot the x-axis, the count to plot the y-axis, and the status to fill in the color. So to hard-code the color, I used the Scale Fill manual to use the black and green that he used in the original plot.

So continue with the layout. I used a somewhat complicated plot of his, which has the US states, the annotations, the narrative, the legends, and the pie graph. So this had to be broken down into one, two, three, four, yeah, four pieces. The USA states, the pie graph, the legends, and the annotations.

So with the USA states, there was a shapefile available, but I felt like I wanted to create my own data CSV to manually pick and choose the colors each state represented.

With the pie chart, there was an original CSV file that was available which showed the percentage of the professions. So what I did was I used the chord polar to actually create the pie chart from the bar chart, and I also had to color-code the professions.

Using the ggforce package, I used the geom circle to create the circles to correspond to the professions. I used the ggforce package because I didn't want to use the same ggplot package. Not that there's anything wrong with ggplot2, but I just wanted to try something different.

And with the annotate function, I used to add the narratives which comes under the theme element. And putting all these together, the US states, the annotation, the legends, the pie chart, I used the cow plot. I thought the cow plot was a lot, I found it was a lot easier to use. Instead of, there's nothing wrong with patchwork, I just thought that it was easier for everything to come out okay.

This was arranged using the plot grid function. Moving on to the color, I mostly used the scale field manual, as we saw earlier, to try and match the colors used in the original plot, which I think worked out very well.

As for the topography, that was in the narrative, which is also an essential part of the design. When we talk about topography, I'm talking about the styling and the way it appears in the plot. I used the show text so that it could appear on the plot, and tried to find a number of fonts using the Google font so that the theme could show in the plot.

Reproducible visual design principles

After recreating the designs, I found that there are a couple of reproducible visual design principles that I could follow to create a brand new visual. First is the intent. To follow the intent, I looked at the purpose and understanding of the original choices, or what I wanted to do. They're the inputs, which mean the visual elements, the data, the font, the color, the narrative, that are used to shape the design.

The structure, the layering of all the elements that make up a graphic, as we saw in the states, the pie chart, and the annotation. The coding choices that you can use in any tool, as in the quad polar that I used to create the pie chart. You could also use... What did I use? There was... Oh, the gg4, the gg... No, the geom circle, as well as... What was the other one? I have to go back. Those are the coding choices I got to use.

Of course, the fixed outputs that come from the coding choices. Of course, the parameters can be changed so that the different frames can come out differently. Of course, the shareability, because they're all coded, they're easy to share, and they should be interoperable.

Applying the principles to a new challenge

Following these principles and the elements, I thought maybe I can give it a crack in another data visualization challenge. I chose the January Data Visualization Challenge. Has anyone heard of the January Data Visualization Challenge? It's actually a challenge that comes out every January. I started at the end of January. Actually, this challenge lets you use algorithms to create artwork.

When I started, the daily theme was brutalism. Brutalism is an architectural design style which comprises of raw material, like concrete, and it's not exactly ornamental. It's just very structural. I had to think about, since it's not a very ornamental design, I had to make it interactive for people to look at it better. That's why I used the principles.

For the intent, I thought, okay, maybe I can make it into an animation. Instead of making it hard to look at, I can make it interesting to look at. With the input, I thought of using blocks and shades of gray to keep to the original styling and the theme of the style.

Putting the structure, I had to make sure that the placement of the blocks changed depending on whether they don't hit or not. What also mattered was the size. The size of the blocks, they really do matter. Putting it in one plot, I had to be careful of the clarity and the proportion, as well as the color. The coding choices had to do with the transition. The transitioning of the blocks, unfortunately, we won't get to see the code, but at the end, I'll show you a link to where you can see it.

As for the fixed outputs, the length, the square frame, and the size can also be changed with the parameters and the shareability, the comment tools and the scripts provided.

Reflections and what I'd do differently

Then I thought, well, since everything was okay and working out all right, with the recreation, I think there's some things that I could have done better. It was the color. As you saw earlier, in my recreation, I used a light yellow color when I could have used the tan. The tan could have looked a lot like the original plots.

As for the topography, instead of trying to think of how it could look just like the original, I could have just used the topography from the Dubois style and picked out the public sands and the charter.

From there, I don't think I would have changed anything of the map, but I do want to thank my friends Gita, Billy, Miley, Gwen, Liz, AJ Stock, who's the creator of the challenge, Lydia Gibson, who couldn't make it today, Nick Crane, and John Hammond. Thank you so much.

Would you like to ask any questions?

Q&A

One question is, does Dubois data exist electronically, like in a package or a data set, or do you have to create the data set manually? Oh, no. The data sets already exist, and they're actually the original ones, but some of them do come out in the wide format, so some of them do have to be changed and in shape files, so they all come from institutions, from universities, so they already exist. They're all there.

Oh, actually, for the challenge, they are collected in a single place. It's just that AJ Stock had to collect everything, so they originally come from museums and institutions, but you can all find them in the Dubois GitHub repository.

Did you end up having to use additional software, for example, Illustrator or Inkspace, to polish up the designs after creating the ggplot2 object? No, not at all. I used everything in R. I hardcoded it. I didn't use any large language models because this was done last year, and because I'm from Africa, we don't usually use large language models yet, so everything was hardcoded. I googled everything. Yeah, so, no, I didn't use any of it. I just used R.

Do you think that an LLM would be effective at writing code to replicate these visualizations? Using an LLM to recreate this? I think it would be, but you're going to have to really guide it because the layout is so different. It's unusual, so you're going to have to tell it, move this chart here, move that map there, move it up and down. It's possible, but I think it would be easier to do it yourself.