I am trying to space this particular collection of pet peeves out far enough away from any of my usual paper deadlines so I can make it at clear as possible that I am not subtweeting (subblogging?) anybody. The title gives it away, I suppose, but this post springs from a question I have had many times as I engage in all sides of the academic paper reviewing process (as author, reviewer, and coordinator):
How can we make people (authors, reviewers, readers) spend less time on work they don’t like, and more time on work they do like?
I ask this because I am very very tired of the following inter-connected failure modes when it comes to academic papers. In line with “questionable research practices,” to some extent these are “questionable paper practices,” although I’m more hesitant to assign judgments about epistemic or scientific reliability to these practices in the same way that I might with QRPs. Namely:
A paper is submitted somewhere. It is rejected. The authors resubmit the paper, perhaps unchanged, to the same venue (or one very similar to it), and, eventually, through either luck or reviewer fatigue, the paper gets in. Titled “roulette” because you are mostly just spinning the wheel until you get lucky. For grad students you only get a few spins of the wheel before your job search, for early career faculty only a few more spins before your tenure case, etc.
A paper is submitted to a journal or conference. It is rejected, and so the authors resubmit the paper, perhaps unchanged, to a slightly less prestigious venue, repeating the process until they find one with sufficiently forgiving standards to accept the paper. Titled “price is right” because, just like the game show, you are trying to get as close as possible to the (metaphorical) “actual retail price” without going over (if you shoot too low then your work doesn’t get the accolades it deserves because it is “hidden” in a low-tier venue; if you shoot too high then you get rejected and have to resubmit elsewhere).
A paper is submitted to a journal or conference. It is rejected, but with borderline scores. The reviewers point out some flaws in the paper, but they seem addressable (say, the reviewers wanted the authors to run different analyses, or engage with some new sources). So the authors address the concerns, send it out again, where an entirely different group of flaws are revealed by new reviews (some of which might even contradict the first set of flaws), and the paper is again rejected. If and when it is finally accepted after a few rounds of this, it is a Frankenstein-like paper full of counter-arguments and defensive citations and other bolted-on monstrosities. Titled “whack-a-mole” because you have to reactively address quasi-random stimuli.
These three scenarios, in addition to being not mutually exclusive (you can make whack-a-mole style changes in service of retargeting a different venue for price-is-right reasons, for instance) are to some extent negative externalities of systems with good intentions. For instance, there is going to be noise in the reviewing process, so authors should not interpret a particular set of reviewers’ words as the gospel truth. Sometimes work is not the right fit for a particular community, and would have a better home (and more utility and response) elsewhere. And, of course, revising your paper to make it stronger based on external input is just sort of how the system is supposed to function. But I still think the above three scenarios are failures. They:
- Waste a lot of time and effort. The reviewers might have to review a substantially similar piece of work multiple times in a row, writing reviews that stand a good chance of being partially or entirely ignored. The authors have to deal with the time and hassle of repeated submissions and low level edits and formatting changes for each round of revisions and then fret about things for a while.
- …on less important things. If it’s a really good or really bad paper then you (probably) don’t have to play these games. It’s only the papers in the “messy middle” where these failure modes really make themselves felt. This middle is (generously) where the vast majority of all the papers I have ever written live, so it’s not meant to be a disparagement, just a reflection that the more ambiguous a paper’s quality is, the more time and effort is likely to be spent on it in the current system.
- …that don’t even produce better outcomes. For two of the failure modes, the paper could be substantially the same throughout the process. In the price-is-right scenario I guess there’s the benefit that the high profile venues get to maintain a reputation for exclusivity by rejecting the work, but I don’t see how that helps anybody I actually care about (the marketing goals of academic publishers do not really tug at my heartstrings). The whack-a-mole scenario is supposed to produce better work but can sometimes produce a worse final result, since the authors end up with a weirdly paranoid piece of scholarship full of various bits of rhetorical scar tissue as an attempt to deal with a diverse (occasionally contradictory) set of recommendations.
- …and makes everybody upset. If I were a reviewer and I wrote a careful and thoughtful review, and the paper authors just ignored me and resubmitted the work unchanged, I’d be pretty peeved (especially, as has happened to me, if I encounter the work again, word for word, in a different venue). Likewise, if I were an author, I would feel pretty peeved about having to spend years of my limited time on this precious planet getting a single, relatively minor, paper out of resubmission limbo, long after I am sick to death of it and have moved on to other projects (this has also happened to me).
Note that these failure modes are somewhat unconnected to the projects of “moving the field ahead” or “doing good science.” I think solving some of these issues might have knock-on effects for those things (e.g. if people are wasting less time doing silly things then maybe they will use some of those savings to do useful things), but I am not particularly focused on those heady epistemological issues here. I’m more looking at personal incentives: as much as people talk about a “file drawer problem” in research, we have sort of an opposite effect here where the perception is that the only acceptable outcome for research work is for it to be published somewhere, eventually for the work to “count” and the effort involved being wasted. And, in keeping with personal incentives here, I am focusing on what would make my life easier as an author, reviewer, editor or coordinator, given that I am stuck in a system where academics are expected to periodically review and publish papers.
All that being said, there have already been a few proposals to address these sorts of issues:
Accept everything and let Science sort it out.
Also known as the “run it up the flagpole and see who salutes” strategy. The idea here is that, even without a review process or formal journal structure, the cream will rise to the top through empirical utility or public interest or citations or what have you. Heeson and Bright’s takedown of peer review settled on a solution of roughly this form, and the Machine Learning community is sort of inadvertently in the midst of an experiment of this sort where it’s very common for papers to end up on the preprint servers long before they are formally reviewed or accepted, and so are widely distributed ages before the “official” pipeline gets through with them.
There have been several issues with this approach. The first is that, if you have a bunch of unvetted stuff that operates the same as thoroughly vetted stuff, you run into problems once you have to actually use the work (either to present it to others, to use as a basis for new research, or to evaluate a claim). This is already a bit of a problem in ML: I’ve heard stories of the form of people getting their papers rejected because they didn’t compare their method to some baseline in another paper, only to find that the baseline paper in question has been stuck in arxiv limbo for years because it has major flaws or even outright errors. And it is far more than“a bit of a problem” in medicine and epidemiology, where lots of low-quality COVID-related preprints have, at best, muddied the waters about matters of critical concern in the pandemic, and at worst gotten people killed in their tens of thousands.
The rejoinders to the issues raised above are a) the stakes for most work are not nearly so high: e.g. maybe we as a society would want to vet your policy-forming epidemiology paper to some high standard, but maybe we would care less about, say, a tongue in cheek paper reliant on your audience watching U.S. public television in the 80s and 90s. Although note the blatant intellectual snobbery involved in making decisions about which areas of human knowledge “deserve” this hypothetical lower level of scrutiny, and also the pragmatic fact that acceptance/rejection rates are just bafflingly set at the per-field area through arbitrary in-group pressure (e.g. even middle-of-the-road philosophy journals have acceptance rates on par with the most selective of STEM journals like Nature or Science), and b) it’s not like standard peer review is doing such a great job either. For every story of a dodgy preprint circulating where it wasn’t wanted, I’m sure I could dig up just as many about cronyism, corruption, and all sorts of other underhanded peer-review shenanigans, and that’s long before we get to the issues of low-quality or noisy reviews drowning out signal.
So I think there’s something here, in that I think we could stand to accept more papers more often. How much nicer I think we should be depends on my mood and whether I slept well the night before. On my more radical days I think that even a paper that is totally falsified and/or plagiarized or even Sokol hoax-level word salad is at least interesting as an artifact, in that it proves that the research area is interesting enough, or the rewards for publishing about it high enough, that people would resort to such efforts. It is a rare experiment that tells us absolutely nothing about anything, even with confounds and low power and all sorts of other potential deficiencies. So why not accept it and put it somewhere? On my grumpier days my stance is closer to “it’s already hard enough find interesting new papers, and we already accept a lot of garbage, so now we should make the problem even worse?” It is worth noting that neither of these two stances are correct.
Punish people for being profligate with their papers.
A natural counterpoint for “we should publish more papers” is “we should submit fewer papers.” I’ve heard a few varieties of this one: either everybody only gets to publish X total papers in their career, and so have to think carefully about which ones to publish, or only X total papers “count” on your CV, or if you get a paper desk rejected you are barred from submitting a paper for some time-out period, etc. etc. Some sort of punishment scheme to stop people from flooding people with low-quality work or engaging in “salami publishing.” The only one of these schemes that seems to have stuck is that a few places are making you have to agree to do X reviews for a conference for each submission you make, and even that is more about the moderating reviewer workload rather than punishing over-submitters.
There’s sort of a prisoner’s dilemma with this solution, which is that it would certainly help me do a lit review or a job search if all of the people in my field only wrote a handful of papers, but then I would be in deep trouble if I ever decided to have more than a handful of discrete ideas, or for whenever I needed to lean on my publication record for professional reasons. There’s another asymmetry in that the benefits for writing a paper, even a bad one, are often short term or immediate (you get a talk and a CV line!) but the costs for writing a bad paper may not be fully incurred until somebody discovers that your paper is bad (which may takes years or be reliant on serendipity) and even then may not be incurred by the right people (the “replication crisis” impacts an entire field, so responsibility can be dissipated amongst students or supervisors or what have you).
While there seems to be (slowly) changing attitudes around things like open science and replicability, as far as I know there is not really any coherent movement towards people publishing less (other than the occasional tweet thread proposing a scheme or two), and plenty of pressures in the opposite direction (you apparently need a bunch of stuff on your C.V. just to get an internship these days; there’s absolutely just dire gross nonsense these days on the job market). These schemes for publishing fewer things all seem to come from comfortable senior people; I’ve rarely heard the more junior folks (who are already struggling to be heard in their communities) call for schemes like these, with the exception of whenever I as a grad student had to investigate the related works in a new area and felt inundated in papers.
So there’s an is-ought gap here where I think it’s all well and good to say that we should be publishing fewer (but higher quality) papers, but I’m not sure what the right socio-political levers are: we’d need to independently fix the hiring market, the grant system, the for-profit journal system, etc. etc. before this seems like a just or equitable outcome. Right now what people seem to be doing to enforce these sorts of things is unilaterally setting very high bars for acceptance in the papers they review, which (since these are often individual crusaders with their own personal “bars”) mostly succeeds in adding another round of reviewer roulette and/or pissing people off.
Make reviewing more like a contract.
Many of the ills mentioned above occur because the process of reviewing often devolves into haggling— the authors have to work out some mutual arrangement with the reviewers in a quasi-adversarial setting. And, like real-world haggling, there is a desire to avoid being seen as a sucker: reviewers want to avoid looking stupid or non-savvy for not seeing the flaws in a particular paper that they would otherwise let through, and the authors don’t want to commit to doing a ton of extra work revising or resubmitting their work. It’s this adversarial setting where a lot of dumb annoying frictions and inefficiencies happen. So let’s make it less adversarial.
One way of doing so is through mechanisms like registered reports. Similar to preregistration, the researchers put together a specific plan for how they are going to conduct an experiment, and then follow the plan. But a difference is that reviewers review this plan before it is enacted, and, like a contract, say that they will fulfill their end of their bargain (accept the paper) if you fulfill yours (execute the experiment as designed). So presumably you can’t run an experiment, write up the paper, then, to adapt the Fisher quote, “call in reviewers to learn what the experiment died of” and have to toss out all or some of that work based on reviewer comments and start all over again.
There is a categorical difference (in STEM fields at least) between a paper where the experimental design is so bad that you need to redo it, and a paper where the experiment is fine but the analysis, argumentation, or discussion needs to be improved. Reviewers in my field are for some reason very coy about this distinction, and will refuse to say if they think an experiment is “salvageable” or not. This wastes a lot of time where authors think that they just need to “reframe” their experiment rather than start from scratch, because the reviewers are too polite (or maybe too scared of being wrong?) to say that no amount of reframing will address their qualms or fix the methodological issues they have found. Registered reports would force people to get their quibbles out of the way ahead of time.
The main issue is that “we’ll do this thing, just trust us” is a mental block that reviewers occasionally have a hard time overcoming (it’s why so many papers with totally fixable issues seem to get bumped up to major revisions or even rejections; you have to some extent trust that somebody will do the right thing without necessarily being able to verify that they did it correctly). There’s also a bit of an issue that the registered report model doesn’t exactly match up with all the other sorts of papers that aren’t running one (or a very small number) of experiments whose hypotheses, analysis plan, and likely impacts can be detailed in a useful way. I will need to do some self-interrogation to figure out what these registration-proof scientific works look like, but at the very least it’s not a priori clear to me what the registered report version of, say, “going digging into some archives with a general idea in mind and a critical lens or two to apply” would look like, and whether people would be able to extend sufficient trust about to accept the resulting scholastic artifact sight unseen (although note that this model is pretty similar to the current model of grants or book proposals, with the caveat that the “trust” there to some extent comes from pre-existing political or social capital).
There are “contractual”/non-adversarial steps we can take even without significant systemic reform or new genres of submission types. For instance, I sometimes participate in a conference where papers with concerns that remain after reviewing are “shepherded” by a committee member who works with the authors to correct the issue(s) before final acceptance— you could imagine greatly expanding the role of a “shepherd” in contrast with a “reviewer.” And a number of institutions have starting holding “paper swaps” or “paper workshops” where people working towards the same conference deadline will pass papers around to get criticism from a presumably friendlier audience. It is at the very least sort of weird that we are supposed to be collegial and cooperative and building towards shared knowledge except for when it’s time to determine the table of contents of a journal, and then all of the sudden the teeth come out and work is guilty until proven innocent.
Better divide the labor of publishing
Academic reviewing combines checks for correctness, copyediting, and checks for “contribution” or “novelty” or whatever other values the venue purports to have. This is pretty silly, as these are very different types of labor and require very different skillsets. A reviewer approving of a paper based on rigor or conceptual correctness may lack the time or expertise to do a copy-editing pass as well, especially since nobody is getting paid (except for the journals). Newspapers are famously running out of money but still employ editors and fact checkers and whoever else, so don’t tell me your journal (with profit margins that make drug dealers envious) can’t spare some money for that. “Reviewing” as this monolithic thing rather than a combination of (occasionally interconnected) facets results in a number of icky results. For one thing it makes reviewing “well” either very rare or very time-consuming. For another it conflates “passed peer review” with “is statistically rigorous” or (even more dangerously) “is correct.” It also produces a sort of black box system where it’s hard to see precisely what went wrong as the author of a rejected paper, since all sorts of issues of varying levels of severity might be wrapped up in that single “reject” decision.
It would be pretty easy to just adopt the newspaper model here. Journals could employ targeted “fact checkers” for areas such as quantitative methods, open data standards, typographical errors, venue relevance, or all sorts of micro-tasks, rather than asking random people to take on all of that work periodically. This would also let people (for better or worse) adapt to the “house style” of a particular venue with some confidence that they won’t have their choice of methods or models discarded out of hand, or evaluated by somebody totally unfamiliar with them. There are of course dangers that new methods or techniques would not be supported, and that changes in the field would happen faster than changes in the fact-checking personnel, but that’s already sort of what happens with reviewing anyway, just less transparently. Another downside is that hiring these people would take a lot of time and money that would be hard to do in an ad hoc way, and so it would be (even more) difficult for us to totally break away from a for-profit publishing model.
If this movement around of reviewing labor happens at the per-institution rather than per-journal level then there’s a different sort of issue where the rich will get richer (since they can afford more checkers, and checked papers presumably would have a tactical advantage). But also it would give institutions oversight and control over papers that they didn’t have before. E.g., the current inciting incident that led to Google firing Timnit Gebru is that she allegedly didn’t follow their internal paper auditing process (even though pretty much nobody was apparently following that process, because it was silly), and now there’s reports that internal editors are asking researchers to “strike a positive tone” about how Google is referred to in research work. Yikes.
Fight the system to accelerate its demise.
A common thread in the categories of solutions above is that they are reformist efforts that leave many of the issues of the academic review and publication model intact. Or, if they do seem to promise revolutionary change, to be up against perverse incentives or entrenched interests. An increasingly popular option in similar socio-political quandaries is to “heighten the contradictions” of the system until it loses all legitimacy.
Most of the projects in this space are on the reviewer side, just because that seems to be where most of the power is located— lots of people need specific papers to get accepted for career-related reasons; it’s unclear that I would ever need to review for a particular venue, and so my ability to pick and choose or just say “no” is increased. For instance, there is the 450 Movement, with the perfectly rational idea that a review of a paper is an expert consultation, and expert consultations in every other context cost money, so pay me $450 every time you want me to review. The goal here is not necessarily to make money, but to heighten the contradictions around why reviewers aren’t paid for their labor. It’s not a particular pledge I would sign on to (mostly because I am still very guilt-driven as a person, and so long as I actively submit papers to conferences and journals I feel an obligation to offset the amount of reviewing work I create with the amount of reviewing work I provide), but I certainly get the idea. And of course there are similar pledges around open data or statistical rigor or any other sets of values you care to have.
I have also learned that you really don’t need as much institutional power as I thought you did before you can play these sorts of games as a paper writer as well without doing that much damage to your career (I say more as a perhaps naïve hope, and with the understanding that you still need some power or security for these to work well). It really is that easy to take your ball and go home, so to speak. You can throw up the work on arxiv or as a piece of science communication for a wider audience and let the actual publication happen at your leisure (since all the people you want to see it have already seen it). You can organize your own workshops or venues for the kind of work that you do and let the bigger venues muddle along as needed. You can build followings on Twitter or TikTok and get orders of magnitude more eyeballs on your ideas than if you put out a journal paper. We have lots of options as researchers for communicating and vetting our ideas that don’t involve repeatedly asking different groups of random strangers if they liked our document enough for us to be able to finally stop editing it.