“We have visualization everywhere… we should be proud!”: A Trip Report from EuroVis 2023
This post is my (perhaps vain) attempt to summarize what I got up at at the EuroVis 2023 conference in Leipzig, Germany. This conference was a weird one for me. It was my first pandemic-era conference without any sort of hybrid participation option at all (or really much of a recognition that COVID was a thing at all, really), my first (and, hopefully, only) self-funded conference trip (a result of being laid off early in the year at just the wrong time window between having a paper submitted and having it accepted), and my first time back at EuroVis since 2011 in Bergen, Norway, which some will note is twelve entire years.
That particular 2011 conference was where I gave my first conference talk ever. I went by myself (we didn’t have “send the whole lab to Norway” money laying around, after all) and presented, in retrospect, one of the ugliest pre-loaded powerpoint templates ever created, and read paperbacks in my hotel room while I tried to adjust to jet lag. This year I was even less able to adjust to jet lag, but at least my powerpoint didn’t have any drop shadows or animated transitions. But I still felt weird, a superposition of lucky and unlucky, senior and junior, peripheral and central.
In my notes I wrote that I had the feeling the whole week that my “conference muscles had atrophied”: my abilities to do in-person schmoozing, provide undivided attention to multiple hours of talks in a row, to connect people together, or even just tailor my dosages of sleep versus caffeine, were groaning with complaint after three years of under-use. And, I reasoned in my self-centered way, if I was this rusty, how were the next generation(s) doing, some of whom were having their very own first conferences here? Something to worry about, I guess! I increasingly view my role as an academic is to find ways to get out of the way so all those fresher and smarter and more enthusiastic people can step up, and I’m a bit worried that the pandemic has created a long tail of top-heavy stagnation through the combination of tightening budgets, limited opportunities for cohort-building, and general institutional conservatism.
But if there are places to get those muscles back in fighting trim, EuroVis is not such a bad start. The conference is much smaller than other places I go to present my work, like CHI or VIS, and so you get at least on nodding terms with large portions of the conference-goers, if through nothing else then by standing in line during coffee breaks, shared lunches, or the communal parties. And the material is presented in far fewer parallel tracks (it used to be just one, but now it’s back up to two simultaneous sets of talks), so you are forced a little out of your comfort zone and into a wider range of material. To mix at least two or three metaphors, I think one of the central goals of conferences is to function as a way for ideas to engage in Brownian motion in a confined space, colliding or otherwise making sparks, and that kind of serendipity is hard to come by if you just stick with the “comfort food” of papers you would be likely to read anyway, conference or not.
The standard boilerplate for the rest of this post applies: inclusion or exclusion from the list of papers I will discuss is more tightly connected to me trying to build a narrative and keep to space limits than any objective measure of paper quality — I saw, heard, and liked more papers than are included here. Likewise, I tend towards the information visualization side of things as opposed to other visualization areas like rendering or volumetric data or things in that vein, so that’s another bias.
A third bias this year is that I am increasingly sick to death of talking about ML and AI: I feel like I’m often just repeating myself, unhelpfully critical, and not changing any minds about it, so even though there was quite a bit of ML/AI/XAI/HILML stuff this year, it’s conspicuously absent from my list below, except where I couldn’t help myself. As an example, I apparently scribbled in the margins of my notes “I don’t trust the visualization community to ‘solve’ explainable AI for the same reason I don’t trust my dog to balance my checkbook” during the panel I’ll discuss later, and it’s like, c’mon buddy, get over yourself and lighten the hell up. I will note, though, that if you are interested, I was one of the participants in this year’s Schetinger et al. “Doom or Deliciousness: Challenges and Opportunities for Visualization in the Age of Generative Models,” so a fun exercise there might be to try to figure out which of the curmudgeonly participants yelling about AI was me, specifically (although reading the paper I found that I was less curmudgeonly than I assumed).
But with that out of the way, to the papers. I’ve grouped them all by roles that I feel are under-represented in visualization, stuff that people do that we aren’t capturing, or stuff that we should be thinking more about than we are. As alluded to before, the word “we” is doing a lot of heavy lifting here, and may bear little or no resemblance to any field of academic study beyond the contents of my head.
The Auditors
I grouped these two papers together because they both tackle a subject that for some reason it’s both simultaneously banal and impolite to bring up, which is that data pretty much always sucks. It just does! And most of the actual influential and important work of data science is done in an (often vain) attempt to make data not suck, munging and cleaning and preparing and checking the stuff. But that kind of tidying or verifying often doesn’t make for fun or exciting papers or algorithms, especially when novel technical solutions to these kinds of problems are often blown out of the water by the less technical solution of “let us look at what we’re doing and use the tool we’re already used to.” So it’s always refreshing to see people thinking and designing in this space, instead of the flashier (but ultimately, I think, over-saturated and less interesting) space of “making dashboards to show people things they already know using data that’s already been hammered into submission by somebody else.”
Tasks and Visualizations used for Data Profiling: A Survey and Interview Study
Roy A. Ruddle, James Cheshire, Sara Johansson Fernstad
Okay, so if the people doing this data checking work aren’t using the shiny tools we make for them, what are they doing? Well, as per this paper, lots of stuff that they (rarely) write down or otherwise document. This interview study focuses on “profiling” which I sometimes call “sanity checking” and think is this crucial step of analysis: seeing what kind of data you actually have, especially if it’s your first time exploring this dataset. If your first instinct upon seeing a .csv or whatever of new, hitherto unexplored data is to immediately start fitting predictive models to it, know that you are my enemy. But for how vital (and ubiquitous!) this process is, this work showed that we’re still very much in what one of my former colleagues used to call “the cowboy days.” There are a few “workhorses” of visualization or analytical techniques (comparatively simple things like scatterplots and histograms), but each analyst in this study had different sets of tasks, goals, and tools that they employed.
The authors suggest one way of adding some structure (and so potential standardization or automation) is to create more standard “rulebooks” with tips and examples, but I would also suggest that perhaps we could explore other metaphors as well, both more and less prescriptive than suggested by a rulebook. Checklists, sure, but also “recipes”: if you want to bake, say, a causal analysis cake, what ingredients should you have on hand, and what substitutions are permissible? Tips and instructions, sure, but also “preconditions”: what steps should reviewers take if they want to check that a paper’s analysis and conclusions follow from the data they collected?
Ferret: Reviewing Tabular Datasets for Manipulation
Devin Lange, Shaurya Sahai, Jeff M. Phillips, Alexander Lex
If profiling is the step you take once you are about to start a data analysis session, there’s also a verification step that should happen at the end of analysis. This paper talk used the falsification scandal around spider researcher Jonathan Pruitt to motivate the use case here, but since the paper was submitted there have been more, equally prominent examples of researchers faking or modifying their data to drum up evidence for their own conclusions. The issue is that detecting this kind of fraud looks like it’s a huge pain in the ass, especially since these kinds of manipulation can be both subtle and wide-ranging: look at the crime-scene style forensics people are doing to uncover the most recent crop of suspected scientific fraud(s) in psychology, for instance.
The tool in this paper is an attempt to automate away some of this work, looking at things like mismatched fonts and numerical precision to automatically detect anomalies. I’d be really curious to sort of combine the two papers in this section together: to see what these open science sleuths are really doing as part of their investigative work, and to see how much of that work we can usefully automate or support with analytical tools rather than ad hoc scripts or manual inspection.
Alvitta Ottley (whose best paper this year, led by Saugat Pandey, “Mini-VLAT: A Short and Effective Measure of Visualization Literacy” was one of those “couldn’t find a way to sort it into my themes but is definitely worth your time” papers) beat me to the buzzer in asking the sneaky question I wanted to ask, which was on the order of, okay, if you make this system that makes it very easy to detect cheaters and identify anomalies, couldn’t the fraudsters use the same tool to get more sophisticated in their fraud? That is, to keep manipulating until they “pass” all the checks in place? Although, at some point, if it’s that much work to do the fraud, it might be way easier to collect the real data. But, if not, are we due for a fraudster evolutionary arms race?
The Storytellers
The papers in this section are likewise grouped together because they deal with another one of my hobbyhorses, the role of visualization as a rhetorical and narrative tool. There’s all sorts of stuff about accuracy and precision and what have you, but charts are more about persuading people (which can be a long, involved, and fuzzy process) than extracting individual numbers and putting them into people’s heads. Stories are very important! Or, to put things another way, if your chart is not about persuading someone to do something, why on earth did you make it, and why should I care that you did?
But one of the emerging suspicions I’ve had as the academic visualization community has started exploring the space of narrative and rhetoric in visualization more deeply is that our existing constructs and heuristics just don’t seem up to the task (granted, I am over a decade late to this particular revelation, but still). I also am starting to worry that the academic community is more of a follower than a leader in this space: many (most? all?) of the really prominent, really effective data stories in the news media or in citizen science don’t seem to be taking much from our community, but doing their own thing and leaving us there to document what they did so we can marvel at it later. Maybe that’s okay, but it is something I wish we’d think about more, as a field (that is: should the field be more prescriptive or descriptive, more designers or curators, more predictive or reflective?).
The last thing that the papers in this section have in common is that they left me with a sort of ambivalence about the use/foregrounding of people in visualization. Data visualization, as the “unemphatic art” has often been dinged (by me and others) for alienating/abstracting/otherwise downplaying the suffering of people (it’s hard to get motivated by a bar chart counting victims, and easier to get motivated by personal stories, say). But the solution to this “problem” really doesn’t seem to be “well, just add the people back in.” Or, if it is, we often lack the right constructs and measures to verify that this solution is working as it is supposed to.
Data Stories of Water: Studying the Communicative Role of Data Visualizations within Long-form Journalism
M. Garreton, F. Morini, D. Paz Moyano, G.-C. Grün, D. Parra, M. Dörk
This paper presents a survey of different data stories dealing with water, most often water issues as reflected in bigger issues around changing climate and shrinking resources, as we enter what some climate justice folks are calling “the century of hell.” There’s a lot of gnarly data that has to be communicated to wide audiences as part of this stuff: risks and uncertainty, disasters or impact areas. Here the data stories are decomposed per their topics but also their framing and rhetorical goals.
But what I found interesting here, in a Holmesian “the incident of the dog in the night time” way, were the absences in this corpus of stories. The first was, as alluded to in the introduction of this section, the rarity of people. You’d think that if you wanted to communicate, say, that climate change will increase the risk of flooding in a region, that you’d want to show how individuals are (or are not) impacted, to personalize or otherwise tug at heartstrings not swayed by line graphs or thematic maps. But the authors suggest that this insertion of people as victims might be counterproductive. As famous data visualization researcher Karl Marx said, “the philosophers have only interpreted the world, in various ways. The point, however, is to change it.” If you, the reader of the data story, are only a passive victim of forces beyond your control, the message that direct action needs to be taken could be lost.
Relatedly, the other conspicuous rarity in the corpus was using data stories to propose solutions. There’s an assumption, perhaps, that data visualization is only good for showing, well, the existing data, and less suited to moving from that data to a concrete call to action. I wonder if there’s some sort of naïve positivism at play there, an assumption that once you finish up “just showing the data” or “letting the data speak for themselves” your job is done. I don’t think that’s the case, especially for stuff like climate change where your job isn’t done just by finding the right dataset. Presenting background data is just the first step in a rhetorical project, not the last step, and certainly not the only step.
Do Disease Stories Need a Hero? Effects of Human Protagonists on a Narrative Visualization about Cerebral Small Vessel Disease
S. Mittenentzwei, V. Weiß, S. Schreiber, L. A. Garrison, S. Bruckner, M. Pfister, B. Preim, M. Meuschke
I got several hundred words into a postmortem of Fivey Fox once. I ultimately ditched the piece because a) it was a bit too mean-spirited even for my tastes and b) it seemed wrong to kick FiveThirtyEight when it was down. For those of you who missed out, Fivey was a sort of mascot/commentator for FiveThirtyEight who would jump in on the margins of their pieces predicting upcoming elections and cheerily explain that, hey, just because things were predicted to happen one way, that doesn’t mean that they will happen that way, and other truisms. I intensely disliked him, as a harbinger of a trend of mascotification and otherwise disparaging users and their own agency and expertise, but I bring him up here to further complicate the “hey, just add people back in” idea to increase empathy/engagement/connection with data. For one thing, the question is, well, okay, what kinds of people?
This paper attempts to put together a data story that’s more like, you know, an actual story, with a protagonist at the center rather than little people-like things on the margins, cribbing off of standard narrative forms like Campbell’s hero’s journey. The experiment at the core of this paper is about the different protagonists (a doctor versus a patient as the “hero”), but the use of the word “personas” in the Q&A perked my interest. In much of modern software development and UX design, there’s a focus on “personas” and “user stories” (“I am an X and I need to do Y”, say). But visualization (for, admittedly, strong historical and empirical reasons) anchors a lot on “tasks” as the atomic unit of analysis (and so the resulting task analyses, horse-race experiments comparing time and error on atomic tasks, etc.). This focus on task never sat well with me: real analytical goals seem to require context and agency and perspective that I’m not sure are best captured by tasks alone. So I liked that there was an acknowledgement here that different personas could result in different designs, even if the tasks and goals are similar.
The Prognosticators
This last section will also function as my wrap up, since I’m already way way above my usual word budget for these sorts of things. The general topic here is on thinking about the future and trajectory of the field, which in turn means I will be focusing more on keynotes and panels, since I feel one always has more license for that kind of grandiose prediction in those sorts of fora than in the traditional conference paper. I (for admittedly self-centered and perhaps idiosyncratic reasons) think this kind of reflection is valuable, especially for a field like visualization that seems to average about one or two identity crises every few years. I think we’re about due for another, if I’ve timed things right.
Capstone: Seeing is learning in high dimensions
Alexandru C. Telea
Despite having occasionally designed systems that use projections of high dimensional space down to two-dimensions, I still think they are usually a bad idea: the axes are often uninterpretable (especially for the audiences that are meant to interpret them), and the actual layout of points is highly contingent on hyper-parameters or stochastic elements or otherwise general linear algebra-related sorcery, to the extent that it’s hard to know what’s important to take away from such visualizations, and how I would describe anything that I did learn from it.
This talk provided an interesting path forward for some of this stuff, starting first off with the metaphors we use. A projection as a map, sure, but also as something like a cockpit for navigating the space, or a stress test for identifying problematic or difficult regions of the data. A quote I liked enough to write down verbatim was “dots do not explain data… they are disconnected samples, and I want to explain phenomena.” And, later on, that if I want to successfully explain these phenomena (especially to the people signing the checks), I need to use “normal language.”
Also of interest to me was a “bake off” of different proposed projection techniques that show that, despite the interminable number of figures where they unroll cake rolls or classify handwriting digits or any of the other mindnumbing repetitive projection examples I always see, in general, with a few exceptions, there’s not really a dominant projection technique that, for reasonably selected hyper-parameters, gives you the best results on new high-dimensional data. As with almost everything important in computer science, it’s about tradeoffs. To me that puts increased pressure on coming up with things to optimize rather than esoteric measures of stress or coherence: quality metrics that focus on the explainability or otherwise human-centered utility of the projections.
Panel: The Future of Interactive Data Analysis and Visualization
Johanna Schmidt, Timo Ropinski, Alvitta Ottley, Michael Sedlmair, and Marc Streit
The title of this blog post comes from a response brought up by Jean-Daniel Fekete during the Q&A, offering a counterpoint to the usual doom and gloom (to which I will readily admit to contributing my fair share) that accompanies panels that ask us to introspect about the future of the field. It’s easy to point to bad examples in visualization, of problems like misinformation or complexity or lack of agency in the face of automation that seem to be getting worse, but it’s also easy to point to examples of bad writing, art, music, or any other field of human endeavor. Riffing on Sturgeon’s Law, Fekete claimed that “80% [of visualization] will be crap, 20% will be good, and if they want to improve, they will know where to go.” In other words, while, sure, academic visualization has a ways to go, visualization itself has risen to ubiquity, and there are acknowledged experts in our field that are able to communicate with practitioners, to provide useful guidance and oversight if needed, and, hey, that’s not nothing. Plenty of other fields of human endeavor have worse records.
But the rest of the panel was interesting as well, positive or not, starting off with hot takes from each of the panelists (paraphrased and re-snarkified by yours truly):
Ottley: We need to know when visualization is not the answer to our collaborators’ problems, and propose something more useful.
Ropinski: We need to prepare for AI rather than visualization to be the main workhorse of analysis.
Schmidt: We’re being left behind by not integrating with data science.
Sedlmair: We need to increase the diversity of the work (and papers) that appear in our conferences.
Streit: We need to stop making so many one-off, self-contained tools and lean more on established frameworks (like, say, PowerBI or Tableau).
I agree with almost all of these positions to some extent (I will leave it as an exercise to the reader to determine the parts I disagree with vehemently), but I do want to point out that the calls about integration with other fields have started ringing a bit hollow for me, if for no other reason than every single academic field in existence seems to be calling for more interdisciplinary work (in often the same sort of way as people admonish themselves with “I should really exercise more” or “I should really eat better”), and yet, here we are.
Sure, it’s a good motivator when the panelists asked “who’s a psychologist in this room?” and received crickets in response, or when you read yet another op-ed by some prominent AI dude who should really know better saying “we should really be studying both humans and computers” and HCI professors get to be snooty at him for a few days, but, ultimately it’s the job of a field, especially a field like visualization that purports to be applied, to package things up in useful ways and otherwise show our worth. Presenting data to people in useful ways is not just a temporary embarrassment until the algorithms get good enough to automate away analytics, or a sideshow for hoi polloi who haven’t taken enough statistics classes and need pretty colors: visualization has real value now, and will have real value for the foreseeable future, and we shouldn’t be coy or shy about that.