Bad Ideas In Visualization
It’s been a while since I’ve really been able to troll to my heart’s content, so here are some things in information visualization that I suspect are either a) bad, b) useless, or, at the very least, c) not nearly as useful as people tout them to be. I will be limiting myself to two sentences of justification per bad idea, but additional sentences can be generated upon demand. The idea with this constraint is to provide just enough detail that you see what I’m getting at and can get mad at me, but not enough to provide enough nuance for you to think I am in any way being constructive, let alone correct. These ideas are roughly sorted in order of controversy, so feel free to bail out when you get tired of this particular shtick.
The idea: Show relationships between high-dimensional data by setting up a bunch of parallel slope charts between different pairs of dimensions.
Why it’s bad: These things only become legible (let alone useful) with extensive interaction and with very lucky choices of axis orderings, and yet academics have spent who knows how many person-hours and brain cells trying to “fix” them with animation or textures or 3D or complexity metrics or what have you. Total sunk cost fallacy.
The idea: A parallel coordinate plot but the axes are all in a circle for some reason. You can also fill in the area and it makes a pretty shape.
Why it’s bad: Everything wrong with parallel coordinate plots but they are even harder to read, and perhaps even more misleading because we can’t help but read them as a big shape even though the actual shape that we see is highly dependent on the axis ordering and thus pretty much arbitrary.
The “Data-Ink Ratio”
The idea: From Tufte, this is a measure of the relative amount of “ink” on a page spent directly encoding data versus the ink spent doing things that are not encoding data. If you have lots of ink spent on non-data elements, that’s likely “chart junk” and should be removed.
Why it’s bad: If interpreted literally the notion of “chart junk” is simply false (serifs on fonts, tick marks, and filled areas rather than lines all have occasional empirical or aesthetic benefits for legibility but all spend ink that doesn’t directly encode data). If interpreted figuratively (as an argument for minimalism) it creates a lot of boring charts that I largely don’t care about.
The idea: Show high-dimensional data by making a little icon (say, a flower) for each data point and having each dimension be encoded by a different part of the icon (say, your age is the height of the stem, your weight is the radius of the petals, your number of siblings is the number of leaves at the bottom).
Why it’s bad: Other than the occasional artistic project where actually, you know, being able to read things is of secondary concern, glyphs are so bad at communication and so reliant on poring over the legend that they are mostly used for designers to show off rather than actually solve any real problems.
Two Dimensional Projected Scatterplots of High-Dimensional Data
The idea: Use a distance learning or projection technique (say, PCA or tSNE) to condense high-dimensional data down to two dimensions and then visualize the resulting projection as an approximation of the high-dimensional space.
Why it’s bad: Nobody with any actual connection to your data knows what the axes of the resulting chart mean beyond a very basic folk-algorithmic “these points are close together so I guess they are similar,” and you could have communicated that information directly with hierarchical clustering or an adjacency matrix or something other than whatever unique set of linear algebra or machine learning dark magics you had to do to make your weird, uninterpretable, scatterplot.
Node-Link “Hairball” Diagrams
The idea: Visualize a complex graph or network as a node-link diagram, and use some sort of algorithm (such as a force-directed layout) to try to space out the nodes, but despite all that effort you usually just get a “hairball” of hopelessly tangled or illegible lines and dots.
Why it’s bad: Any node-link diagram that is simple/small enough to be legible is likely also simple/small enough that you could get whatever information you needed with pretty much any old layout. And, similar to parallel coordinates above, the illegible ones can’t really be saved by any amount of interaction or computationally intensive layouts no matter how many researcher person-hours are spent trying to show you otherwise.
The idea: Assess whether two distributions are similar (e.g., to test whether an empirical distribution is “close enough” to a normal distribution) by plotting the quantiles of your observations against the quantiles of the reference distribution. If the resulting points are all close to a 45º diagonal line, then the two distributions are close enough for government work.
Why it’s bad: “Similar enough” is a fuzzy and idiosyncratic concept that is usually compressed down to a binary decision (like, “yes, this statistical test is appropriate because the data are normal enough”), which means I’m never actually sure how you would use a QQ plot as a decision criterion in a way that wouldn’t be captured by a binary statistical test. Maybe there’s some Anscombe’s Quartet equivalent showing a fatal distributional feature that shows up in a QQ plot that wouldn’t be caught by an actual repeatable deterministic process; if so, I haven’t seen it yet.
The idea: Show hierarchical information as proportionally sized boxes that are nested inside other boxes etc. etc. If you want to get fancy, you use various computational wizardry to try to optimize the aspect ratio of the various boxes so they are as a square-like as possible.
Why it’s bad: Everybody seems to have given up on using these for actually showing trees of any complexity and instead just uses them as a replacement for pie charts because they heard that pie charts are bad and they want to look like one of the cool kids. We’re pretty bad at precisely estimating the relative area of squares with differing aspect ratios and orientations, so much so that I wonder if tree maps are really any better than the pie charts they replace.
Stacked Area Charts and Theme Rivers
The idea: Divide your bar charts or area charts into differently sized colored chunks, and ask people to make decisions based on how much these chunks vary in size over time or across categories. Maybe try to make it look pretty by making everything into smooth curved shapes.
Why it’s bad: You know how I said “we’re pretty bad at precisely estimating the relative area of squares with differing aspect ratios” all of three or four sentences ago? Still true with these kinds of charts: having the alignment of bars or areas being just sort of arbitrary (and dependent on other data values) means that it’s really really hard to make comparative judgments between time points or categories, which is supposed to be the whole point of these kinds of charts anyway.
Bespoke Visual Analytics Tools
The idea: Spend several months of programmer and/or grad student time and effort building a one-off dashboard or visualization system for your collaborators in another domain.
Why it’s bad: Did your friends really have data or analytical needs so weird that you had to make an entire thing from scratch? From a cost-benefit standpoint they probably could have stuck with Excel or something.
The idea: Observe the myriad qualia from our brief time on this planet and put a subset of them in a metaphorical killing jar until they are simple and uniform enough that you can type them into a spreadsheet.
Why it’s bad: The process of creating data results in something that is just sort of by construction at least one of boring, useless, or will be used by horrible people for horrible ends. People are mad at “anecdotes” and “anecdata” but it kept our ancestors from eating the berries that were poisonous after the first couple times they tried them and that should be good enough for us.
The idea: Run electricity through melted sand in a very precise way so that people can put numbers into the melted sand and the melted sand spits out those numbers, or some new numbers based on those original numbers, on request.
Why it’s bad: I used to be able to read books in one sitting and consistently get eight hours of sleep and if a stranger wanted to yell at me they would have to look me up in the phone book or something first. I guess there are some good videos of cute animals forging unlikely friendships or whatever but hard to say that is a fair trade.
The idea: Give dominance over the planet and its resources to some apes that periodically try to exterminate either other species or (just as frequently) themselves.
Why it’s bad: Trilobites were all over the place for hundreds of millions of years and didn’t detonate any nuclear weapons at all during that time period. It’s been less than a third of a million years for us and we’ve detonated over two thousand nuclear weapons, which is not a record that generates confidence in our long-term prospects.
The idea: Make people revisit or re-analyze their preconceptions by taking an intentional absurd set of positions in the hopes that peoples’ defense of them will cause either novel ideation or the creation of stronger theoretical bases for belief.
Why it’s bad: Socrates was the only person who was good at this and he was still so annoying that the government killed him. Also, people have enough to be mad about these days without somebody going out of their way to make people madder: provocations seem to be more about generating ironic or critical distance from problems so you can avoid the risk or vulnerability that comes from actually trying to solve them.