A Semi-Ordered List of Things That Annoy Me In Research Papers

16 min readApr 5, 2019

Cherry trees in bloom at the University of Washington

Academic writing is tough. You are building a case for the soundness and reliability of your work (one that is strong enough to weather the storms of obstinate reviewers). You are also trying to convince others of your work’s importance and relevance. For such a high stakes and important task, academic writing can just plain suck. It can be dry or formulaic or imprecise or careless. This is not always the fault of the writers! In STEM especially we have to learn from examples, many of which are poorly written themselves. We are frequently working in languages or argots that are foreign to us, and with very little in our environment incentivizing us to do better. And we have to do all of that under very stressful or time-constrained conditions. I don’t know how to fix all of those problems. I do, however, know how to collect things that annoy me and then urge you not to do those specific things if you can help it.

This document is not meant to call out specific papers. I’ve certainly made my fair share of the mistakes on this list. Nor is it meant to be complete: there’s plenty of badness to go around. Nor is it meant to be authoritative: I’m just one person. What this is supposed to be is a repository for pet peeves in papers. If you’d like to add to the list, feel free. I’m focused here mostly on STEM research papers, because those are the ones I read the most. If you’re mad about other types of academic papers, you’re on your own.

This is also a list of annoying things, rather than a list of constructive feedback, or even advice on how to write a better paper. Positive advice on writing well is really hard, and requires that you know what you’re talking about. Talking about things that annoy you is very easy and requires no expertise whatsoever, so that’s the route I chose.

Introduction Issues

Teasing

The abstract and introduction are supposed to do many things, but one of their main purposes is to get the reader hooked. I should want to know more about the subject and authors’ findings. Therefore, the introduction should whet my appetite, and later sections are supposed to get into the nitty gritty. Some people have interpreted this goal to mean that they should be coquettish in the introduction about what their actual results were. For instance, a teasing introduction might conclude with: “We ran an experiment on X, and found some interesting results that suggest new ways of thinking about Y.” Those sorts of statements are supposed to get me motivated (“ooh, I wonder what they found out about X!”) but instead it just makes it sound like the authors’ results were so bad, ambiguous, or complex that they feel ashamed of them. A paper is not a murder mystery — you don’t need to hide whodunit until the final page. Summarize the main experimental results up front; I won’t be mad.

“Since The Dawn Of Time”

In middle/high school you probably got made fun of for starting essays with “ever since the dawn of history, we have wondered about X” or “X is an important topic,” or “Webster’s Dictionary defines ‘X’ as…” The reason you got made fun of is not just that it’s a cliché, but also that it’s spectacularly uninformative. It doesn’t tell me what you think, what you’re going to do, or even why a problem is interesting (people have been worried about lots of things). There are other ways of motivating a problem other than saying we’ve been worried about it for a long time. The sentence “X is a long-studied problem in Y (see paper Z)” is just a fancier way of writing “ever since the dawn of history…” It’s a “since the dawn of time” that went to grad school. If you think I need background, then give me some background. If you think the problem needs motivating to show that it’s actually more interesting than I think it is (or isn’t solved despite people thinking that it’s solved), then by all means tell me so. But don’t give me two paragraphs of vague quasi-historical generalities.

This goes for “we are the first paper to study X,” as well. If you’re wrong, then you might have sunk the paper: “after all,” the reviewer might say, “if the authors can’t even get that elementary fact about their own novelty right, what are the chances they did everything else right too?” Even if you’re right, you still haven’t shown me why I should care. The existence or non-existence of prior work is neither an indicator of importance nor utility.

Contributions as “What I Did On My Summer Vacation”

In some fields, such as HCI, one of the norms is including a “contributions” paragraph or subsection in the introduction. The reasoning is that the reviewers ought to judge the paper on the size and quality of its contribution to the field. This contribution is supposed to be different from the summary of the work in the abstract. The abstract tells you what the authors did. The contributions tell you why you should care. Therefore, the following is emphatically not a good sentence for a contributions paragraph:

The contributions of this paper are 1) an experiment looking at X 2) a tool looking at Y, 3) an evaluation showing that people can use our tool to complete task Z.

An experiment is not a contribution. A tool is not a contribution. An evaluation is not a contribution. A contribution to the field is new, generalizable knowledge. I should be able to do something differently now that I’ve seen what you’ve done. I should know something new and interesting that I didn’t know before. What new facts about X did you learn? What lessons did you learn when you designed for Y? Do the results from Z indicate that we should rethink how we build or evaluate tools in the future? That’s all information that I can use. Just knowing that you did something might be interesting, but it’s not a research contribution unless the field gets something out of it. You don’t always need explicit implications for design (you may not have learned something that I can immediately use), but you should have contributed something. Doing work, even good work, is not inherently contributing to the field (see Tamara Munzner’s “summer vacation” gripes for a similar but subtly different paper issue).

The Table of Contents Paragraph

The old canard about presentations is “tell ’em what you’re gonna tell ’em, tell ’em, then tell ’em what you told ‘em.” This remains true for research papers. Without structure and signposting, a research paper can read like an unordered list of facts. But (but!): if I see one more paragraph like this at the end of an introduction, I will lose it:

We begin by discussing the related work in Section 2. Section 3 covers our experimental methods. Section 4 presents the results of our experiment, which are discussed in greater detail in Section 5. We conclude in Section 6 with a summary of our findings, and directions for future work.

If it were an actual table of contents (with visual structure and nesting), then sure, fine. I could scan it like any other table and it would tell me where to go to find the information. If your paper format is strange or non-traditional enough that you feel the need to justify why it’s organized this way, or tell me what parts of the standard model I should be looking for, then fine. But in paragraph form, and largely repeating the standard form of a STEM research paper, it tells me information that I know already, in a form that’s hard for me to use if I didn’t already know it. Instead of doing this, try just having an actual transition sentence at the end of each section. If you do it right, it will almost make the paper seem like a coherent single document that flows together, instead of a grab bag of random sections. I’m not the only person who’s mad about this.

Related Works Regrets

The way we write related works sections in STEM is just sort of fundamentally broken. The purpose they are supposed to serve is:
1) Situate the work in the context of a larger community of scholarship. What’s new about this work? What inspired it?
2) Provide credit and attribution for methods and ideas that are important to the work, but are not original to the paper.
3) Provide concrete support for empirical claims made by the paper.

What they actually do is:
1) Provide proof that the authors know what they are talking about, because they’ve clearly read a lot of papers.
2) Provide proof that the work “belongs” in your journal or conference, because they cite the bigwigs in your field.
3) Provide proof that the authors weren’t the first people to make an otherwise unsubstantiated claim.
4) Provide lists of people who do related things, so they can be used as reviewers.
5) Avoid pissing off the reviewers who do related things, since they might be upset if they weren’t cited.

You’ll notice that the list of things that a related works section is supposed to do sounds cool, and like a story. It can tell me how the authors came across their idea, the perspective they took when solving the problem, and the things they tried that didn’t work (and why). The thing that the related works actually does sounds uh… boring. It sounds like a largely unstructured list of references. If you worked in a narrow enough field, you could imagine writing the exact same related works section for every single one of your papers and it being equally successful at making potential reviewers feel nice that you cited them, and provide enough authoritative ammunition to show that you read your sources, etc. etc. That feels wrong to me.

Related works sections should be fun to read: they should give me a nice history lesson about an area of the field, show me where the current gaps are, and hint at what the solutions might look like. Related works sections, in practice, are almost never fun to read.

Defensive Citations

One of the purposes of a reviewer is to identify existing scholarship that authors might have missed. The scientific literature is so big that nobody knows it all, and there is nothing new under the sun. Pointing out existing work that the authors might have missed can serve several purposes. It can force the authors to drastically alter the claims they make in the paper (they can no longer claim “we are the first to study X”, for instance). It can force the authors to consider other alternatives, ideally collecting more data and performing new analyses. Pointing out missed references should improve the paper by making the claims of the paper more accurate, or the novelty of the new work more readily apparent. A missed reference should be a weakness in the current paper that the authors should address in order to make the paper stronger. Just adding it to the bibliography is not “addressing” it.

Yet, because of the weird ways that references and citations work and are rewarded, this meaningful engagement with missed references is not reflected in practice. Authors are asked to “contrast their work with X et al. and Y et al.,” for instance. This looks like a fundamental challenge to the paper’s validity: you’re claiming to make a contribution, but X and Y did similar things. Is your contribution still valid or novel? However, it’s almost always not. In most cases, the reviewers are satisfied if you cite X and then have a brief sentence about why your paper is different.

These sentences, being the result of backroom-style negotiations with anonymous reviewers, and being inorganic to the paper’s process (after all, you were unaware or unfamiliar with the references when you did the main work), stick out like a sore thumb. It’s jarring to see a sentence like “While X et al. also did Y, they did not include Z” in an otherwise narrative trip through the works that inspired your paper. And, since missing more than a few papers of this type might be enough for a reviewer to drop you down from “minor revision” to “major revision”, some authors preemptively litter their related works sections with these sentences like a spiky palisade against anybody trying to sneak through and take away their claim of novelty.

Methods Maliciousness

OOA (Overuse Of Acronyms)

People in Computer Science especially love their TLAs (Three-Letter Acronyms). It makes things sound serious and structured. I didn’t give people a widget, I deployed the WDS (Widget Distribution System). It also makes you sound like you’re part of the in-crowd. Oh, you don’t know how PCA relates to MDS? For shame. If you have to subject your system to torture to get it to fit into an acronym, then you’re doing a disservice to your readers. Just say what you did and call it “the method” or “the prototype” or what have you. That being said, systems papers with catchy names are much “stickier” than those without, so feel free to give it an arbitrary noun as a name. I’ll take “Foo: A System for Distributing Widgets” over “FoO: Functionality Of widget-distribution Optimization” any day.

There is one aspect, however, where acronyms just make me irrationally angry, and that’s when they are used to label the conditions of an experiment. For instance, you might be testing three types of widgets: a baseline control widget, an experimental high-performance widget, and a mid-range widget. I’ve seen lots of papers where they will call these things like “Baseline Widget (BW), High Performance Widget (HP), and Middle-Range Widget (MR),” and then their results sections and figures tables are all “BW performed significantly worse than HP, (p<0.05)” or “Our results indicate that BW is better for summative judgments than MR but worse than HP” for sentence after sentence. If you’ve only got a few conditions, then the acronyms don’t really buy me much space. If you’ve got lots of conditions, there’s just no way I’m going to remember what acronym corresponds to what condition without flipping back and forth in the paper a few times. And if I’m skimming the paper before going into the thing in detail, your figures might as well be in a cypher. I’m fine with “Our Method/Treatment” vs. “Control” if you have to. Little graphical icons that are representative of the different conditions are great too, if your study in amenable to that. But don’t make me have to remember that the WDS is your method, but the TDS is the baseline method, and the SDS is the method from prior work that you begrudgingly tested because a reviewer told you to.

Person-hours != Sentences in the Paper

Papers, especially STEM papers, often have the flavor of a lab notebook. The authors write down everything they did, in the order they did it, and what happened at each step. That’s fine, and makes it easier to re-implement or replicate the study or technique in the paper. Some of this stuff I think would go better in supplemental material, but that’s dependent on the norms of the field and other pragmatic concerns. Also, things don’t always have to be chronological, especially if you want to avoid teasing me about your results, as mentioned above. But I can usually tell when there’s a part describing something a grad student banged their head against for a couple weeks before giving up. Just because you spent a lot of time doing something doesn’t mean I’m obligated to spend a lot of time reading about it. Spend the most time on the parts that are the most important and the most interesting, and less time on the things that aren’t.

All According to Plan

This is not to say that you shouldn’t write about failures. In fact, I think we should be writing much more about them. I am very interested in knowing what didn’t work, and why. From a quantitative research standpoint, I also want to know that you aren’t just p-hacking to victory and only presenting me the stuff that made it out of the file drawer. Or ran the experiment and then came up with the hypotheses later (HARKing). It’s sort of suspicious if you ran a study about something novel and at least one or two things didn’t work out how you expected.

This advice applies even for papers where you didn’t run a quantitative experiment. I’d much rather read a paper about a system that didn’t work, and why it didn’t work, than read a paper about a system that worked fine but from which we learned nothing. Brian Whitworth’s takedown of Microsoft’s Clippy in “Polite Computing” comes to mind as a useful post mortem. Don’t exclude what went wrong just because you want a cleaner narrative. Stuff that didn’t pan out is often the most important part of the work, especially for system building. If you built a system using best practices and it did well, then all I’ve learned is that your instincts were good and the best practices seemed to work in that scenario.

Bonus words to avoid: “Clearly.” If something really is obvious from what you’ve told me, then the “clearly,” is just redundant. If it’s not obvious (and it frequently isn’t), then that “clearly” is either presumptuous or insulting.

Results Ruiners

The way that we report results is fundamentally broken — I don’t know how to fix it without building up strong norms about open data and preregistration and moving away from dichotomous statistical tests so good luck with all that. Some things that annoy me, however:

Numbered Hypotheses

You know how I mentioned several paragraphs up that if you keep overusing acronyms I’ll have no idea what you’re talking about? Okay, at least with acronyms there’s a prayer that I’ll remember what they are referring to (maybe I remember “Our New Method” is “ONM,” say). But I pretty much have no chance with sentences like “our results support our third hypothesis” and “our results support H3.”

As far as I can tell the whole “H1, H2, H3,…” thing is just a weird idiosyncrasy of the HCI community, but relying on readers to remember what your hypotheses were after several pages of methods and related work seems to be pretty common. Remind me what the hypothesis was before you talk about your results! Don’t rely on my memory of a numbered list from a few pages ago!

Tiny, Busy Charts With No Annotations

Look, I understand the impulse. The results from our experiments were just so nuanced, so interesting, so detailed, that I don’t want to sum them up with a single bar chart with a couple of bars in it, or just a couple of confidence intervals. I want to divide things up and really let people dig deep into the data. And we’ve got so much content to get to, and so little space to do it, that it’s very tempting to just reduce that figure size a little bit and buy us the space we need. But these two impulses often result in figures that are extremely small but have a lot of things going on. If you must have figures that try to summarize lots of data, then you’re going to have to walk me through it, either in text, visually, or both. This is especially true if the message I’m supposed to get from a particular chart is subtle or requires me to attend to tiny changes in values in different parts of the chart.

Ideally, just by scanning the abstract, figures, and figure captions, I should have a reasonably high-level view of what the paper is about, what you did, and why I should care. Treat your figures as part of the story you are telling, and not just dense repositories for random data values that you sprinkle in to try to convince reviewers that you did a lot of very smart, science-y work.

Bonus words to avoid: “trending towards significance,” “marginally significant,” or “nearly significant.” If you want to use dichotomous statistical tests that’s your decision, but you can’t have your cake and eat it too. I have no idea if a p-value is “trending” anywhere from just one sample point.

Discussion Disasters

Overreach

Building empirical knowledge is incremental, minor, and evolutionary rather than revolutionary. The process of reviewing, accepting, and citing papers is unfortunately heavily skewed towards the flashy and exciting. To produce papers that have both of those contradictory sets of properties, ordinarily very sane and measured people will take advantage of the discussion section to just go totally nuts about what the implications of their work might be. For instance, “while our work focuses on widget distribution, we believe that these principles for distribution apply more broadly. As food scarcity is a distribution problem, our work has concrete implications for solving world hunger.” Or to take a less extreme example, maybe you just add in an untested hypothesis at the end, a la “while we did not explicitly test users preferences for different kinds of widgets, we believe that customers are willing to wait for longer periods of time in order to receive their favorite widgets.” Ideally a discussion section should build on and summarize your results; it shouldn’t be the section where you speculate wildly.

Future Work Wishlists

As with related work, I think there’s a mismatch between what a future work section is supposed to do and its actual utility in the paper. I think that it’s supposed to plant your flag in the area and indicate where you are going to be putting your effort in the near future. In this way it solidifies the contribution of the paper by indicating how your work connects to broader patterns of scholarship and allows people to build on your effort (now that we know this, we can now do that). What I think it ends up being in practice is a way for people to desperately try to wallpaper the holes in the paper in an attempt to insulate it from reviewer criticism. E.g., “a more thorough evaluation of X remains future work” or “we intend on deploying to group Y, by which time we will be able to make stronger claims about Z.” Similar to related works, there is then a strong incentive to promise the moon in future works sections because there’s no accountability to see if you actually end up doing any of the things you promise, and it’s a way to prove to reviewers that you were mindful of potential gaps or weaknesses in the work.

Conclusion Catastrophes

I don’t think I’ve ever written a good conclusion. They are supposed to be largely redundant with the introduction, but concise, and with a punch, and with a call to action of what we should do now that we know the things in the paper. That turns out to be really hard. I usually just restate a sentence or two from the abstract and call it a day. Anyway, people have been wondering about how to write better papers since the dawn of time.

Thanks to Alper Sarikaya for comments on this post.