Visualization Design Principles for the Pandemic
I am a researcher in information visualization. I’ve focused on lots of things in the past, including uncertainty visualization, statistical graphics, and bioinformatics. There are lots of current pandemic data visualization challenges that feel directly connected to my past or current areas of research. How do we translate the argots of virology and epidemiology into contexts that are useful for mass audiences? How do we communicate exponential growth, confidence intervals, log scales, model errors, and simulation results to audiences who may not have seen those sorts of things before?
Back in those halcyon days of “mid to late January” I was collecting all of the visualizations of COVID-19 spread and mortality I could find. I had big ideas about making more user-friendly Kaplan Meier curves, risk theatres, forecast needles, and surprise maps. You might have noticed very little from me on that front, despite the wide availability of COVID-19 data. It’s not just that it’s been difficult for me to be productive under the current circumstances (although let’s not discount that). It’s that there are real costs for getting these things wrong. I want to be very very certain that I’m doing something right, contributing to signal rather than noise.
Well, now that we are in the midst of a global crisis where we are exposed to many different competing visions of how to present pandemic data to wider audiences, I’d like to throw my own hat into the ring and present what I think are some simple guidelines for how we in visualization or data science can help.
I’ve tried to synthesize my expertise in visualization, virology, and volatility into a single concrete design principle for your COVID-19 visualizations:
1. Consider Shutting the Fuck Up
Amanda Makulec was slightly more polite in her list of considerations for making charts about COVID-19, but I feel like the subtext is still there: maybe, just maybe, in a situation where misinformation, uncertainty, and anxiety are rampant, the world does not need your particular expertise of “making a nice chart.” William Chase was similarly nice enough enough to throw in a “probably” in his admonition that we should not be making COVID-19 charts. So I’ll try to follow their good examples and soften this just a little: maybe, hear me out here, your time would be better spent amplifying experts, lending them your labor and support if they ask for it, but otherwise just getting out of their way.
By all means code or draw or chart on your own time if it helps you build your skillset or soothes your anxiety or gives you something to do to pass the time, but think, deeply think, if your chart would be helpful to share with the world. Then, once you’ve thought about that, think even more deeply if you have the expertise, foresight, and humility to have given a good answer to the previous question. Maybe this is not the time to build up your portfolio, evangelize your plotting technique or library, or goose up your existing projects with a dataset that you know will get more eyeballs on than usual.
There are many dangers for getting these question wrong, but I’d like to focus on three Shutting-the-Fuck-Up-Motivations that are driving me in this particular historical moment:
1a. Our expertise is not universal
Here’s an interview with a lawyer. He is not an epidemiologist, and appears to not have really talked to any virologists and epidemiologists before deciding, based on “a skill of cross-examination,” and that the virus was not much to worry about. His perspective appears to have been enormously influential with the US federal government despite these flaws.
Here’s a Medium post by someone with a background in “artificial intelligence and cognitive architectures.” It looks pretty innocuous now, with a big postscript at the end and a bunch of caveats about not trusting the numbers, but the title of the piece used to be called “Flattening the Curve is a Deadly Delusion” and had a central thesis based on a misunderstanding of containment versus mitigation.
There are other examples that I could point to, from the scale of a tweet all the way up to opinion columns in newspapers of record, but I’m already starting to hit up against my usual “be specific in praise but general in blame” rule. But the general point I’m trying to make is that we are seeing a lot of people assuming that their ability to write code or run a model or read articles makes their opinion worth just as much (or even more) than an epidemiologist’s. There are lots of cooperating culprits here: an ongoing campaign towards the “death of expertise,” a pernicious culture of “tech solutionism,” a dysfunctional tech culture that is dominated by charlatans, and of course the good old-fashioned Dunning-Kruger effect that Silicon Valley seems to reward when harvesting its periodic crop of tech bros.
My intent here is not to enumerate all the causes for why so many people have unearned senses of generalized authority, but to point out that we as visualization designers are especially vulnerable to it. I’ve spoken elsewhere of how visualization folks often fall victim to the “Batman metaphor” of work: assuming that we can swoop in, abstract the data problem into terms we’re familiar with, and then just disappear into the night, job done. And this particular crisis, with the easy availability of (certain kinds of) data, and the large numbers of examples to crib from, makes everything look so easy, from a charting perspective. This illusory ease is generating too many Batmans. If we aren’t working with experts, we are likely walking into trouble.
1b. It is very easy to be deceptive and very hard to be helpful
Here’s a twitter thread of a few dozen of the things that are keeping Evan Peck up at night when it comes to visualizing COVID-19 data. I’m sure he’s thought of plenty more since then (that was March 9th, after all; an eternity in pandemic-time). Maggie Koerth, Laura Bronner, and Jasmine Mithani wrote a whole dang article about how difficult and uncertain it can be just coming up with individual numbers about the pandemic, let alone a predictive model based on those numbers. Big organizations with lots of resources and smart thinkers can, have, and will get some of these charting questions wrong, let alone the data modeling questions that underpin those charts. We who are closer to the citizen science side will get it wrong too.
The WHO has used the term “infodemic” to describe the deluge of false and misleading information that is spreading in parallel with (or ahead of!) the pandemic. Without closely working with real people with real problems, there is a very good chance that we are making the problem worse, not better. There are really tough data problems here! The data we have is noisy, largely incomparable across regions, and incomplete exactly where we need it most. The models we are using to account for these issues are complex (but maybe not complex enough), based on lots of assumptions, and changing every day. If you do manage to produce a graphic that you think will change the way people think for the better, you’ve been given the burden of keeping that model as up to date and responsive to critique as possible. You’ve signed up for a long haul software engineering project. Stale data is costing lives.
1c. This is not our time to show off
Kenneth Field recently wrote a whole article about responsible mapping of the pandemic. In general it’s got some good advice: total case numbers may not be informative, be careful about your projections, be wary of sensationalism, make sure the map is telling the story you want it to tell, etc. A month later he produced this map, where total case numbers dominate and the nature of the stacked radial bars in the coxcombs means that comparing trends and trajectories is difficult. I know exactly why he did this seemingly hypocritical thing: it’s because it’s a cool map, especially if you are in the visualization + GIS + cartography in-crowd. It references Nightingale coxcombs, one of the most famous visualizations of all time (and relevant for the current crisis as well, since its rhetorical purpose was to advocate for more health resources). It’s very data-dense, smoothly interactive, and rewarding to play with.
There are lots of other pandemic data visualizations that seem like they would have been very cool to work on. Harry Stevens could (and, after the dust has settled, probably should) write a whole paper about how he put together the combination of animation, visualization, and simulation that went into the “simulitis” piece. As technical artifacts to engineer, as objects of technical culture to reflect on, as ways of pushing the envelope of what kind of data we can present to mass audiences, there are so many opportunities here.
And yet, here we are. I chose the examples above because they are from people I generally trust, but there were much less laudatory examples I could have chosen. How fun or elegant or interesting something is to design or engineer or analyze is probably not particularly well correlated with how helpful something is to make and distribute. We are long past the point, if it ever existed, where we can fall back on the old canard, “I’m just an engineer.” I am worried that we are too infatuated with the intellectual exercise of making interesting COVID-19 charts (or reflecting on what makes existing charts “interesting”) that we are losing sight of what problems we should be using these data to solve. We’ve already got plenty of authoritative maps to look at and fret about, so adding yet another one to the pile seems to be of limited utility.
To go even farther, It feels at best crass, and at worst dangerous, in the absence of a specific unmet data visualization need, and in an era where there are many people who desperately need their stories communicated, to build something flashy just because you can, or to show how good you are.
Arguments against shutting the fuck up:
Sure, you might say, “but…”
…look at how much impact we could have!
Siouxsie Wiles and Toby Morris’s “flatten the curve” graphic I think is going to be discussed and debated long after this crisis is over. Harry Stevens’ “simulitis” explainer I mentioned above is already one of the most-viewed Washington Post articles during a period filled with lots of other critically important articles. That kind of fame and impact is pretty tempting. And if our graphic is the rhetorical push that keeps just one person inside instead of spreading the disease, think of the lives we could save!
To that impulse, I would first caution you that not all impact is good impact. And as an academic exercise or hobbyist project, without a strong connection to real people with real problems, you’re unlikely to be able to distinguish good from the bad. What do you want people to do, when they have seen your chart? And how will you know they are doing it?
…everybody else is doing it wrong!
I am worried that we are at peak-Yeats-quoting, as a society, but it does seem like we are smack dab in the middle of an era where “the best lack all conviction, while the worst are full of passionate intensity.” You don’t have to look very far to find examples of governments, let alone individual people, getting important data science questions wrong. So it’s natural to want to get in, roll up one’s sleeves, and set things right.
I have a similar response to this one as the prior one. How do you know? And how would you be able to tell if you went right, where others went wrong? Especially in visualization, where a lot of our design guidelines are largely evidence-free affairs cobbled together from half-remembered Tufte passages, we should be very skeptical that our design will be strictly superior to what’s already out there.
… consider the marketplace of ideas!
There are big chunks of my argument here that smack of naïve credentialism: the notion that we should trust everything that experts say just because they are experts. Many of the most insightful graphics have come from people we haven’t normally considered in our ranks. And experts in a domain must often rely on intermediaries to translate their work into something palatable for mass audiences (or at the very least overcome the curse of knowledge). There’s clearly a market for COVID-19 information, and presumably there are some incentive structures to make the cream rise to the top.
To this point I mainly claim that the incentives here are pretty damn perverse. Perhaps the situation is different if you are an established news agency or trusted organization, but the things that go viral (do we have a more period-appropriate word for this yet?) in most information environments are not the most sober, reflective, or accurate bits of information.
…I’m directly working with the CDC/WHO/etc. to make graphs! It’s my job!
Okay, you got me on this one. Great! Keep up the good work. You’re not really the target audience here. Presumably one person saying a rude word in a Medium article is not enough incentive to cause you to halt your important efforts. For the rest of us, however, consider how this particular hypothetical scenario, with specific stakeholders, specific project goals, and direct communication with experts, differs from our own.
Conclusion
Look, I can’t stop you from making charts. And in this historical moment, when so much feels out of our control, there is something mentally very soothing about applying our specific hard-won skills to tackle an immense problem. We all want to feel like we are helping. But think of ways of helping that aren’t just downloading a bunch of csv files of case counts, looking up some d3 tutorials, and getting to work.
For this problem, we (for I think a pretty expansive definition of “we”) don’t have the expertise to deal with the data quality issues, modeling assumptions, and domain knowledge that go into the large COVID-19 datasets that are lying around. Even if we did, we further lack the expertise to deal with the communication challenges involved in presenting that complexity and uncertainty in that data to mass audiences. Even if we did solve those, we are likely unequipped to deal with the ethical and rhetorical responsibilities involved in designing and sharing our visualizations to be used responsibly.
If you really can’t resist the urge to chart, try telling smaller stories. How is your neighborhood affected, or your city? What are the personal data you can collect about your experiences? There are lots of places where we can have an impact, even a data visualization impact, without immediately jumping to yet another visualization of country-level caseloads and exponential curves.
Just because you have a data science hammer doesn’t mean you have to go looking for data science nails. The money or resources you donate to people in need will count for just as much no matter how many Kaggle competitions you’ve won. The groceries you offer to buy for your at risk neighbors so they don’t have to brave the crowds won’t be used any differently based on how many stars your github repo has. The policies you advocate for, the picket lines you chose not to cross, the burdens you ease through caring for others around you: all of these are concrete steps that you can take without making a chart. Consider taking a back seat for a little bit.