Looks Good To Me: Visualizations As Sanity Checks

20 density plots and 20 dot plots of the same distributions. One distribution has a lot of repeated values.
Screenshot of Tableau Prep showing histograms of different data fields.
Tableau Prep is a data prep tool that uses histograms to show what the distribution of values looks like. But what data quality issues do histograms make it easy to see, and which do they hide?
A normal dataset and a “flawed” dataset. Our tool chooses parameters to automatically hide the flaw.
An example of our adversarial visualization attack. We choose an initial sample of 100 “clean” points and then added a spike of 20 points to the middle to generate a “dirty” dataset. The first two columns of visualizations show big visual differences between the clean (left) and dirty (right) data, but by smoothing the data and reducing the number of bins and removing transparency, as in the last two columns, we can make the differences all but disappear.
20 dot plots of distributions, but one distribution has many missing values.
1 guilty distribution with missing values, 19 innocent ones. Can you find it? Solution at the bottom of the page.



Michael Correll

