Methodological Counter-Terrorism: Making Open Science Boring
In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.
—G.K. Chesterton, The Thing
People seem to like heroes and villains, but to me the existence of heroism or villainy is usually a sign that some process, somewhere, is broken. American newspapers still periodically share those stories that go something like “small town pitched in and raised money to pay for this kid’s life-saving surgery!” and my reaction is usually not “aww, what a heart-warming and inspiring story” but “why on earth do we live in a country where a small town needs to come together and move heaven and earth so that somebody doesn’t die from a treatable condition?” Likewise, I will hear about people abusing their power and, to me, the fact that we have designed structures of power where such abuse is possible is a problem that exists in parallel with individual bad actors. I appreciate stories about heroes and villains, but I would much rather live in a world where I don’t have to cross paths with very many of either. Instead of saving somebody from a burning building, maybe next time spend a few bucks changing the batteries in the smoke alarms.
This is all prelude for me to lay out a vision statement in advance of my appointment as an Open Practices Co-Chair at the premiere visualization conference, IEEE VIS. This is an appointment that fills me with trepidation, mostly because it is a (relatively recent) position that has been filled with crusaders who have articulated strong visions for open science futures, butted their heads against entrenched bureaucracy, and often lost. Steve Haroz, one of the first Open Practices Chairs, and somebody who has been speaking and writing about open practices for years, ultimately resigned from the same position for reasons he has articulated both on his blog and elsewhere. I am not a crusader. Nor do I have any special expertise around experimental methods, publication processes, or science communication. So what can I offer? I have decided that my vision for the future of open practices is that I want to live in a world where open science is boring. Other folks have made arguments for open science on claims of virtue (science that is more open is more equitable and inclusive than closed science) or on desire to seek the truth (open science is more likely to be true than closed science), but those are not my goals in this document. Rather, I am going to make argue for boringness as an ultimate goal of all of these efforts around replicability or open access or what have you.
“Ah,” you might say, you sassy hypothetical reader you, “open science is already boring.” Well, for one, as I allude to in the title of my piece, it’s a field where careers are on the line, lawsuits are drawn up, and accusations of “methodological terrorism” are bandied about and refuted, which doesn’t sound particularly boring to me. And I think this excitement happens for reasons beyond the usual Sayre’s Law issues that come from academics being petty, grudge-holding weirdos. But my second refutation is that I want to change what kind of boring open science is. Open source software folks have long used the canard “free-as-in-speech versus free-as-in-beer,” and I want to draw a similar comparison between “boring-like-doing-your-taxes” versus “boring-like-brushing your-teeth.” Hear me out:
Doing your taxes in the U.S. sucks. Your options as a regular human being are either to pay somebody to do it for you, rely on sketchy and expensive software, or just grit your teeth and go through it yourself. All with the distinct possibility that if you screw up the process, you could ruin your life. You don’t feel better after doing it, just relieved that the process is done. It can be time-intensive and frustrating and involves being drowned in details. Brushing your teeth, by contrast, is a couple minutes of your life every day, produces benefits both short term (fresh breath!) and long term (no cavities!), and you can do it while thinking about other things. Boring-as-in-taxes is periodic arduous and tedious work. Boring-as-in-brushing-your-teeth is recurring easy but habitual effort.
I want open science practices to be brushing-your-teeth boring. You barely have to think about it. Maybe your parents had to scold you into doing it some while you were growing up, and you have to do some scolding yourself with your kids, and occasionally have to remember to put floss on the grocery list, but as an adult it is (hopefully) such an engrained, everyday habit that it may as well not exist as a topic of discussion. I guess the parents in this case are our academic mentors or advisors, and the scolding is rejection by reviewers, and the “running the toothbrush under the faucet for a couple seconds so it’s damp as an attempt to trick your parents into thinking you brushed your teeth” is research misconduct? I don’t know, I’m not a metaphor doctor. But the general idea is that open science should be this kind of everyday dull.
Here’s an example of what I mean by boring open science. I had a paper that had some studies in it. I learned a few years ago that somebody had replicated it (I think as a class project). I learned this not because they had emailed me desperately begging for my data, or because I had to work closely with them to make sense of my code and make sure my protocol was being followed correctly, but because I was looking for my paper on google and their replication showed up. They didn’t have to bother me to do their thing. Being bothered (and having to dig up old code or data from some archive somewhere, or answer back and forth emails with subject lines like “issues regarding your study”) is potentially exciting, and potentially the bad kind of exciting. So boring, here, is that a replication is just something that can happen without it causing a stir. This type of boringness was only possible because I uploaded my study materials in a usable format to a website that persisted long after I left the institution where I originally wrote the paper.
There’s a related kind of boring at work where if their replication had found something that failed to match up with my paper (I think they did find a significant difference in a particular set of means where we failed to encounter one, for instance), it wouldn’t cause a big fuss because nobody would be fool enough to base core knowledge in the field on the results of a single study. You’d say, “Oh, interesting. Let’s run some more studies to look at these discrepancies” rather than an immediate retreat to “Oh crap, is everything we know about the field wrong?”. So the process of replicating an experiment is boring (you just download some stuff from a website somewhere and run the thing) and the outcome of a replication is also boring (because an individual result is grist for a future meta-analysis or follow-on study rather than something that can upend an entire field’s self-conception).
There are several problems with this anecdote above as something to universalize across the IEEE VIS community. The first is that, to get to this future where replication and re-analysis is boring, we have to do a bunch of extremely non-boring work that involves upending existing structures of recognition, communication, and reward. The second is that it involves making paper authors and reviewers do a bunch of boring-as-in-taxes work that they are currently not used to doing (not just prepping materials and documenting code but currently uncommon stuff for the field like preregistrations and registered reports). The last is that, while I want open science to be boring, I don’t want the visualization field as a whole to be boring, and lots and lots of some of the most interesting work in visualization (to me, at least) has very little to do with things like quantitative experiments that you’d want to replicate.
Okay, so how do we get to the right kind of boring? Ideally in a way that doesn’t leave a bunch of people in the lurch or destroy the diversity of the field, but also doesn’t result in a bunch of lukewarm half-measures that end up satisfying nobody? Those are all very good questions. I don’t know. Here are some bundles of thoughts about it all, though.
Making Transparency Boring
Making research outputs and data available is not boring. For papers, it mostly seems to rely on a particular Kazakhstani woman getting up to various international heroics, or individuals paying eye-wateringly large fees to journals with profit margins that make drug dealers envious. So long as research is done within the constraints of the for-profit academic publishing racket, there’s only so much that one can do (especially in official roles) to support it. I don’t know, hopefully Plan S will help with that; we’ll have to wait and see. I also hope that the culture of putting stuff up on arxiv and OSF generally continues to catch on in the visualization community, which should also help things along.
It’s here around the notion of transparency and sharing where I think a position as an Open Practices Chair can have the most impact. I still know people who don’t know that they are allowed to upload preprints, or how to do things like get an approval to upload to arxiv. I still see people who redact important information they could have easily shared because they don’t know that OSF lets you create view-only links for anonymization, or that you can use services like Anonymous GitHub to share source code without worrying about breaking anonymity while under review. A tutorial page and a line or two in a submission or reviewing form would do a lot of potential good here: “learning to navigate an unfamiliar open science website’s UX” I think is a prime target for toothbrush-style boringness.
Where we get into trickier waters is around what should be required or suggested beyond “share your preprints, please” and into the realm of sharing data, code, and analyses. For one, there’s a bit of conflation of suggestions versus requirements and carrots versus sticks once you are talking about conference organization: it’s hard to go “this is just our opinion” when authors are already doing all sorts of gamesmanship around author anonymization and paper types and so on to increase their chances of paper acceptance, and so would be tempted to read the tea leaves in even the most non-dogmatic suggestions. There’s also this weird prisoner’s dilemma issue where research transparency tends to actually work, in that it makes people more likely to scrutinize your paper and identify errors, but that increased scrutiny and more identified errors are also strong reasons for papers to be rejected. There are longitudinal benefits, sure (I am much more likely to cite your work or compare against it if I can actually, you know, look at it), but if we fall at the first hurdle, then that’s it. So there needs to be some adjustment of the gamemanship here in any event, and unilateral action (especially unilateral punishment) is not going to be the thing that does it.
The other issue is that it’s all well and good to say “share your data, please,” but it’s hard to discuss what that actually means for many VIS papers. Too specific of a definition, and we exclude huge swathes of the community. Too broad, and we produce sort of lukewarm anodyne “gee, it sure would be nice to do something about all this totally inaccessible, bit-rotted, and unevaluated science someday” recommendations that don’t actually recommend anything.
There was a recent panel on pre-registration at VIS this year that discussed these issues (and a town hall talk by the current and continuing Open Practices chairs, which I can’t find the recording of). There was a lot of nuance in those discussions, especially in comparison to the usual framing when these potential mismatches are brought up to the open science community. The usual responses to the problem that the typical open science goals aren’t always a great fit to the type of research that gets done in our field (especially around things like design studies that are longitudinal, highly qualitative, and organized around goals of transferability rather than generalizability) are either to point to examples that sort of fit with our work, or to sort of reverse the burden of proof and go “well, why don’t you help us write better guidelines, then?”, neither of which is totally satisfying as a response, especially with a goal of (eventual) total boredom in the result.
My personal opinion on this is interesting only in the sense that it satisfies literally nobody in this debate, and that’s that I think we should a) take a frog-boiling approach where we make gradual but steady changes until one day we wake up and it’s the transparent future we want and b) we should be overly generous with praise and prizes until we get there. b) in particular seems to rub people the wrong way. For some, having accessible data is the bare minimum needed to evaluate scientific work, and so we shouldn’t need to reward this low bar. For others, ribbons and prizes are weird, slightly patronizing, and unevenly awarded nonsense that doesn’t seem to have much impact on actual decision-making (the people who care about research transparency are still going to care, and those who don’t won’t be swayed by a stochastically awarded little .png of a blue ribbon somewhere). And a last group would reiterate my earlier concern that there’s lots of work that doesn’t fall neatly into the usual reward systems (ask me about trying to get a TVCG Replicability Stamp some time) and so a reward scheme ends up being a punishment scheme for this type of work. I get it, I do. But, to me, the systemic deck is stacked against transparency, and so it needs to be tilted slightly in its favor, and the carrot is an empirically better option than the stick, here (a little more on this in the section below).
Making Replicability Boring
Whenever I would mention replicability as a thing the field should do more of, I would do two things: mention the RepliCHI effort at ACM CHI, and mention papers like this one and this one that re-analyze and (attempt to) replicate prior work, respectively. That won’t cut it, in the regime of boringness. The RepliCHI effort has by now petered out because (and this is my impression, so I could be wrong here) it’s heroic, the opposite of boring. Lots and lots of effort by a select group of roving experimenters for very little reward. Building a boring field norm around replicability is just not going to involve good Samaritans roving from paper to paper, helping those in need, and disappearing into the night with a wink and a thumbs up. As for the papers, I often bring them up because they are so interesting as examples, and interestingness thrives where boredom fears to tread.
In keeping with the discussion above about how many of our papers don’t seem to fit the usual open material models, I wonder how many visualization papers are really “replication ready”: this is a higher bar than just having data tables and stimuli available, but requires work with interpretable conclusions that are connected to testable theoretical constructs. That is, after we replicate something we should have learned something useful from the replication. I think there are fewer candidates for that kind of work than people think. This is not always a negative: if the way you make the insights from your work transferable, generalizable, or credible is not through replication, then you shouldn’t really care about it. But I do want to raise the issue as evidence that just doing a bunch of replications of random studies in random papers is not going to “solve” this problem, such that it exists.
I’m sure I sound like a broken record on this subject, but to me the solution to making replicability boring is to do more theorizing and self-organization as a field. If we have strong theoretical underpinnings of our work, our natural curiosity (and perhaps natural contrariness) as scholars will lead us to want to test these theories, to examine where they break down, and to confirm for ourselves what our academic ancestors have asserted without proof. Replication just for the sake of replication I don’t think will get us anywhere: it should be purposeful and intentional.
Making Critique Boring
Ideally open reviewing and post-paper critiques and even retraction are all boring (but occasionally disappointing) events. Just during the particular morning when I composed this sentence I: cut myself shaving, smashed my thumb in a drawer during the process of making a sandwich, over-steeped my tea, and made who knows how many typos and other errors, among a host of other mistakes or omissions. It is reasonable to expect similar error rates in my research life as well, where I generally attempt to do things that are more complex than making tea or sandwiches.
And yet, from experience, I can tell you that it is not boring to have your work challenged, to have to frantically dig up data and code from years ago in the faint hope that you can remember what you were thinking or doing back them, to swing metronome-like from instinctual anger at “how dare this person say I screwed up” to the cold gnawing pit of self-doubt that is “oh shit, did I screw up?” Somebody could express these doubts as gently as they like, they could give me a glass of warm milk and a plate of cookies and whisper “hey, do you think we could revisit that paper from a decade ago?” while tucking me into bed, and I would still hate it, and still be anxious about it happening. So when it’s public and not wrapped in as many niceties it sucks all the more.
I’m not sure how to square the circle here, between critique and reconsideration being needed to move the field forward (at a pace faster than one funeral at a time) while being careful to shepherd and support the actual human beings who make up the field, especially when critique crosses the line into harassment. I do, however, think I disagree with this blog post from Jessica Hullman that this sort of anxiety has led to a culture of toxic positivity in VIS/HCI (even if I agreed with it, I’m not sure I mind too much, since positive reinforcement is, in operant conditioning land, one of the more effective tools we have). I think there are lots of places where we are highly critical and occasionally outright mean as a field: it just doesn’t seem to be scientific critique that is allocated in useful quantities in useful places and for the most useful ends.
In any event, boring critique seems a far ways off for us. But the glimpses of the pictures I see of it include a healthy set of periodic meta-analyses and theoretical re-imaginings and reconfigurations, where old thinking and stale ideas can be killed off, but where junior and developing scholars are safe to experiment, develop, and speak up without professional reprisals but while still receiving constructive guidance. Building a new normal I think is the key, here: we are used to frequent bugfixes, patches, and updates in our software, so why is correction in academia treated as “something exceptional or immoderate”?
In practical terms, all of this hemming and hawing leads me to similarly wishy-washy conclusion: we should be extremely hesitant to create new coercive requirements, but we should also be extremely willing to produce a panoply of model papers, tutorials, repeatable workflows, and positive anecdotes about transparent work. Make it “soft” and easy to be a transparent and open researcher, like sliding slowly into an easychair after a long and hard day. If there’s some open science equivalent of the Sleepytime Bear, that’s who I want to be.
Thanks to Arvind Satyanarayan and Lonni Besançon for feedback on this post.