This blog post accompanies a paper to be presented at EuroVis 2023, “Teru Teru Bōzu: Defensive Raincloud Plots.” For more details, read the paper.
It was a dark and stormy night.
Visualization people always stress the importance of looking at your data but, I’ll be honest, we would, right? There’s some motivated reasoning there. Like “huh, the people who look at charts for a living want you to look at charts, big surprise.” The usual proof that visually inspecting your data is a good idea hinges on things like Anscombe’s Quartet:
If you just looked at the summary statistics, we’d argue, then all of those four datasets would seem to be the same. But if you plot them, then you can immediate see important differences. Chalk up another win for charts.
That’s great and all, but I feel like we often stop there, and I do not think Anscombe’s quartet is the the slam dunk case everybody thinks it is. I think if we sat with things a little longer you’d start developing some unease. Are summary statistics really so bad? For instance, without that plotted trend line, would you be able to do “regression by eye” and figure out that the trends were (nearly) identical in all four charts? Would you be able to visually estimate the means and medians with any precision, or would you be using easily manipulated “perceptual proxies?” What happens if you don’t have a dozen or so points to look at, but thousands or millions or billions? So, to me, it’s not just an issue of a fight between summary statistics and visualization, but what kinds of summary statistics, and what kinds of visualization. Really, there is a whole gamut of designs that exist between “a couple of summary statistics printed in a table” on one end, and “a totally unannotated plot of all of the raw data values” on the other. That design space for distributions might look something like this:
- Plots of raw data to help you do “sanity checking” and identify things like missing data or outliers, or cast doubt over the appropriateness of particular models.
- Plots of density to help you assess the overall shape of your data, especially when scale or overplotting make the plots of raw data break down.
- Plots of aggregate or inferential statistics for things you couldn’t reasonably be expected to visually estimate from the other plots, or estimate with enough precision for the tasks at hand.
This is where the Allen et. al’s idea of a “raincloud plot” kicks in, as a way of ticking as many of these boxes as possible. There have been other terms for these kinds of charts, from “ensemble charts” (my boring suggestion from a few years back) to “pirate plots” (a personal fan favorite), to the somewhat sinister sounding “v-plots”. But Allen et al. called them “raincloud plots” and that, to me, is perfect. They nailed it, there. I mean, just look at them:
These plots have big puffy “clouds” (density plots) and little drops of “rain” (jittered dot plots of individual values). It’s a nice, memorable name that affords fun tricks like making rainclouds of rain data, in the same way that you can make pie charts out of pie. And they seem to be functional, too: all together, you get tons of information about the distribution in ways that the individual components in isolation could not give you.
The first part of the paper is to look at the design space of “stuff that looks kind of like clouds and presents the same univariate distributional data in multiple ways.” This is, as you might expect, a pretty big space. I came up with 216 potential designs without really breaking much of a sweat, just going off of the raincloud designs I observed in the wild, plus univariate visualization techniques proposed in the literature.
I structured the design space by leaning heavily (perhaps too heavily) on the whole “raincloud” metaphor. The “rain” is the chart component that visualizes raw values (stuff like strip plots or dot plots). The “cloud” is the chart component that visualizes the overall distributional shape (stuff like histograms or kernel density estimates), and then I included a sort of grab-bag category I called “lightning” for visualizations of the derived statistics that would be difficult or annoying to estimate from the rain and clouds (stuff like visualizations of mean and error, quartiles or statistical moments). I, at least, think it’s fun to play around in this design space: I made an “I’m feeling lucky” button in the Observable notebook associated with this paper for precisely that reason. There are some really wacky variations!
But questions naturally arise. The most pressing, to me, is whether these things are overkill, underkill, or just enough kill. Raincloud plots are very expensive, in a way: you’ve now got two or three chart types to simultaneously visualize and explain, with all of the design decisions and hyperparameters involved in each. All just to describe a single univariate distribution! What do your viewers get for that expenditure? In the paper, I attempted to formalize the costs and benefits of rainclouds in terms of “defensiveness”: a good raincloud plot should make it hard to be tricked: to see things in a distribution that aren’t there, or to miss important attributes of the distribution that aren’t reliably surfaced. Through this lens, it’s possible to run simulations to identify failures: things that got past us even in the mutual coverage of the raincloud sub-components.
What I found was, at least for a number of examples, that these raincloud plots really do work as advertised: the sub-components provide mutual coverage of distributional feature. Got missing data that doesn’t show up in the histogram or box plot? That’s fine, the rain component will cover it. Got way too many points in your strip plot, so overplotting makes it impossible to compare regions? That fine, all you’re missing is density information, and a density plot or histogram will give you that directly.
But not always. Certain raincloud sub-components introduce artifacts. For instance, a lot of potential designs for rain either pack or discretize the marks of raw values in order to avoid overplotting. That’s great if you’ve got lots of points, sure, but if you need to distinguish between, say, integer or float data, these layout strategies can destroy the visual cues you’d need to determine that something is amiss. And jittered dot plots (the most common, and arguably canonical, form of rain component), have this totally meaningless y-axis that means it’s hard to tell, for instance, if two rainclouds are even representing the same distribution or not.
I discuss more failures and successes in the paper, but for me the real opportunity is less to dogmatically suggest what the “best” raincloud plot is, but more just to be present at the birth of a new chart paradigm, a new tool in our visualization arsenal. Before the raincloud forever solidifies into its final design, we have an opportunity to make sure we’re giving people the right tools they need, and tweak this still nascent visual form to something we can present with confidence.