The Clippy-ization of Human-Computer Design

Michael Correll
12 min readDec 22, 2021

--

An grid of dorsal views of different porcelain crab species (which are not “actually” crabs, but from another family of decapods) juxtaposed with a bunch of copies of Clippy the Microsoft Office Assistant asking you if you are writing a letter. You are not, at this moment: you are reading alt text.
Image adapted from “South eastern Pacific species of Petrolisthes, Allopetrolisthes, and Liopetrolisthes (Porcellanidae)” by J. Antonio Baeza, CC BY 4.0, via Wikimedia Commons

There is a concept in evolutionary biology called carcinization, where groups of totally unrelated decapods will, through the process of convergent evolution, end up looking like crabs. My understanding of the process (and I should note that “my understanding” of evolutionary biology does not carry me very far in this situation, so much of this paragraph is highly likely to be total horseshit): you have things like King Crabs that are squat decapods but are not actually “true” crabs. Instead, they (according to who you ask) evolved from a hermit crab-like (also not a true crab!) ancestor, gradually lost the reliance on the shell that they carried around, and got squatter and more crab-like. A crab is a useful shape to be, if you’re a decapod: you can hide under stuff pretty easily, and there’s lots of flexibility in how you grow (much easier to molt your exoskeleton than to try to find the perfect new shell). If it ain’t broke, don’t fix it. Plus, many of the resulting species are tasty.

In this piece, I am writing about another piece of convergent evolution, what I am calling “Clippyization.” I swear I am not the first to come up with this particular insight, even with roughly this same formalization, but I have had the hardest time tracking down the originator, so for now I will resist the temptation to name this particular law after myself and just define it thusly:

Clippyization: the nearly inescapable tendency for a project in Human-Computer Interaction, given enough time and scope, to turn into Clippy

Just as a refresher, Clippy is the common name for the intelligent assistant that shipped with Office and would pop up to ask if you were writing a letter and wanted help. Wikipedia insists that the correct name for the character is “Clippit” but I assume that this is a bit of trivia in the same sense that Encyclopedia Brown’s real first name is “Leroy” or whatever; I have never referred to the paperclip as “Clippit” and I can’t imagine why I ever would.

The rationale behind Clippy I think is sound: software environments are getting increasingly complex, with this complexity comes a desire to be guided or advised on what to do next. It makes total sense that you could use user data to recommend actions, point users to relevant places to find help, or even have the system take proactive action to get things back on track.

And yet, people hated Clippy. Some of this hate was probably just part of the inchoate rage amplification structures that were just getting spun up in the early 2000s internet (on par with all the people writing parody songs or making Flash videos where they murdered Barney the dinosaur or whatever), but even if you remove try-hard faddish over-exaggeration, I still think there is a core of undeniable disfunction and poor design at the core of Clippy. And yet, I see signs of this same disfunction creeping back into technology after a couple decades of dormancy. I therefore argue that Clippyization is an active design and user experience menace to be fought where possible.

I have structured this essay around three aspects of Clippy that have been proposed as the root cause of the feature’s failure, and the resulting Clippy-related menace that lives with us today. I should note that, in the sources I consulted, everybody was extremely sure that their particular failure was the root cause of Clippy’s failure, and so I would ask you to forgive a little bit of hindsight bias here in the service of the broader causes of thinking about where Clippyization shows in the modern era. I would also ask you to note that these failures are to some extent interconnected, which I think is interesting, and suggests that there’s some sort of gestalt “Clippyness” that brings all sorts of baggage on board.

Goofy, Pointless Anthropomorphism

Photoshopped Google error message reading “Uwu We made a fucky wucky!! A wittle fucko boingo! The code monkeys at our headquarters are working VEWY HAWD to fix this!” Please don’t make me type that sentence again.
The faux-conversational “gee whiz shucks I guess all of your data just got stolen” tone of software communication has not noticeably improved since this image’s genesis in 2018. I suppose it is an improvement over “PC LOAD LETTER”-style totally obscure user-unfriendly error reporting, but not by that much.

Alan Cooper, when asked about the hatred for Clippy, pins Microsoft’s “great mistake” on a “tragic misunderstanding” of HCI research:

…if people react to computers as though they’re people, then we have to put the faces of people on computers, which in my opinion is exactly the incorrect reaction. If people are going to react to computers as if they’re human, the one thing you don’t have to do is anthropomorphize them, since they are already using that part of the brain.

In other words, we already treat our devices (and the software that runs on them) like people in many key ways. People were already yelling at or congratulating their machines long before there were virtual agents popping up on various screens and trying to chat with them. And so adding another layer of anthropomorphism is gilding the lily, at best, and a very precise mix of confusing and patronizing at worse. And Clippy wasn’t doing anything that a system couldn’t do without an agent (it’s important to note that a lot of the auto-formatting and so on that Clippy would try to do still happens in Office, it just happens without a virtual agent telling you about it).

To me, the root of this issue is not the existence of an agent per se, but that Clippy was more of a mascot than a partner. Now, I am not anti-mascot. One of my favorite twitter accounts presents a visual encyclopedia of the various mascots Japan has come up with, from characters associated to specific towns or companies to ones advertising more general concepts of public safety. And of course I would gladly give my life for Blåhaj, the IKEA shark. But mascots also have their place. I don’t need a mascot popping up to cheer me on while I’m writing an obituary, or a Dear John letter, or, really, most of the time I am trying to do real work.

And yet software companies are doubling down on making mascots show up in their software, and communicating with their users in chummy mascot-y ways even when it is unneeded or inappropriate. As hinted at by my choice of image for this section, this mismatch is most obvious and enraging in error reporting. If I have just had a bunch of my work erased, or I am trying desperately to get something to work under pressure, or am otherwise fighting with the software system, then having a robot or a bee or an octopus or what-have-you pop up and tell me about a “wittle mistake” or whatever is condescending and enraging.

So why does software keep anthropomorphizing stuff it doesn’t have to, and talking to its users like they are children? I have a few suspicions. The first is that anthropomorphic stuff can demo really well in the short term, and only starts to get really irritating over the long haul. James Fallows suspect these apparent short term gains are part of why Clippy made it out the door:

Clippy suffered the dreaded “optimization for first time use” problem. That is, the very first time you were composing a letter with Word, you might possibly be grateful for advice about how to use various letter-formatting features. The next billion times you typed “Dear …” and saw Clippy pop up, you wanted to scream.

The other reason is that the people who think your character is a bad idea don’t get a voice in your decision-making process. If you’re an everyday user, you might not like a character popping in to interrupt you all the time when you’re trying to get your work done. But, if you’re an executive, then you’re used to treating people around you as assistants to whom you delegate lots of crap, so what’s one more assistant? Plus, think of the branding opportunities! And if you’re an engineer who suspects that your users are mostly idiots anyway, then maybe their complaints that you are patronizing them or making them feel uncomfortable fall on deaf ears. For Clippy specifically, Roz Ho even reports that the overwhelmingly negative focus group feedback on Clippy (especially from women) was tossed aside:

We did a bunch of focus-group testing, and the results came back kind of negative. Most of the women thought the characters were too male and that they were leering at them. So we’re sitting in a conference room. There’s me and I think, like, eleven or twelve guys, and we’re going through the results, and they said, “I don’t see it. I just don’t know what they’re talking about.”

Your characters or communicative tone can be creepy, annoying, and insulting, and your users might even tell you so in no uncertain terms, but those people are probably just being sourpusses, right? Who needs them. And so the march of unnecessary mascots moves on.

Impoliteness

I will sometimes say “Jesus Christ not now Pastabot” whenever software interrupts me at inappropriate times (for instance, to ask me to rate a mobile app or watch an unskippable ad). But not all the time, because then I would be yelling about pasta at all hours, and that’s not the life I want to live.

Brian Whitworth in “Polite Computing” suggests that, if we ascribe agency to software (as we seem to do, see above), then it is therefore possible for software to be “impolite” or just otherwise be rude by removing choice and agency, and failing to respect the wishes of the user. E.g., the pop-up ad:

Suppose one is browsing the Internet and a pop-up window suddenly appears. You were looking at one thing, then found yourself forced to look at another. Your cursor, or point of focus, was ‘‘hijacked’’. Your choice to look at what you want to look at was taken away. This is not illegal, but it is impolite. Users don’t like their screen real estate being commandeered by pop-up windows (or pop-under windows that must later be closed). Apologists suggest pop-up ads are the computer equivalent of TV commercials, but TV ads leave the viewer channel under viewer control. Pop-up ads initiate a new communication channel (a new window), and bypass user control in doing so. TV viewers expect commercials, but pop-up ads are unexpected. They are the equivalent of a TV commercial changing your channel for you. Browsers that repress popup ads are popular because pop-up ads are impolite.

Clippy (well, Whitworth calls the character “Mr. Clippy,” which I somehow find only slightly less objectionable than “Clippit”) is singled out as a particularly egregious example of impoliteness (although n.b. that in this particular 2005 piece he singles out Google and Amazon as examples of software companies that are polite in how they surface ads and collect data, which is darkly ironic in hindsight). Every time you would open Word, Clippy would pop up. Even if you repeatedly dismissed Clippy’s advice at an individual action or per-session level, Clippy would just bounce back over and over again. Clippy couldn’t take a hint that if you kept hitting that “close” or “hide” button, then maybe, just maybe, you didn’t want any advice for a while. If I am frustrated with a task, then having a peppy cartoon character pop up and provide, all smiles, unsolicited (and very often irrelevant) advice is potentially enraging: think of someone sarcastically backseat driving for you while you are lost. Per Whitworth, “Clippy ignored user disinterest, non-use and repeated help request denial.” If you treat Clippy as a friend, then that friend is pretty rude!

Clippy was built on thousands of hours of interactions in Office (someone once told me that it was the largest single stash of desktop user data at the time), and was powered by all of the Bayesian statistical intelligence that Microsoft could throw at the problem. But there was a relative lack of information or modeling about interactions with the agent itself.

Human-computer interactions have gotten more sophisticated these days, but this sophistication doesn’t always result in better, more polite outcomes. Popups and notifications continually interrupt us and demand our attention unless we spend considerable effort to set up our preferences perfectly or are willing to shut off the entire firehose and lose the good with the bad. We receive unsolicited, in many cases irrelevant, recommendations or suggestions that we have to dispense with at the individual level. We’ve replaced a paperclip popping in during the middle of important work to tell us “looks like you’re writing a letter!” with a patchwork of recommendation engines essentially doing the same thing but that don’t even have the courtesy to have a big button that lets you tell them to go away; at best, this “go away” button is hidden within settings submenus or configuration files, or constructed ad hoc by hitting “thumbs down” on dozens of irrelevant suggestions.

Consent Theater

Two (photoshopped) phone notifications: one from Duolingo in the persona of Duo the owl saying “Looks like you forgot your Spanish lessons again. You know what happens now!” and one from home security company ADT saying “Intruder Alert (Back Door). Proceed with caution.”
Duo the Duolingo owl is a pest. If you don’t shut off all the notifications from the app, he will pop up on your phone multiple times a day and bug you in increasingly annoying ways. This badgering does not make him look like a good friend, and makes it very easy to read him as a sinister presence, as in this viral image of unclear provenance: Duo will stop at nothing, including murder and kidnapping, to get you to learn Spanish.

The crux of Clippy’s impoliteness above seems to be that the character doesn’t go away in spite of a strong signal (say, repeatedly clicking that the advice was not needed) that this dismissal is what the user wants. The user thinks they are communicating this choice to the agent, but they are not, really.

There are all sorts of other examples where software purports to be giving you choices and agency, but is not. For instance, many software updating dialogues don’t give you the option to refrain from updating: you get a “yes” or a “remind me later.” They do not accept “no” for an answer. And the post-GDPR landscape of websites asking you for permission to use cookies is similar: it’s just one click to accept being tracked, but sometimes multiple clicks, sometimes having to turn down each individual surveillance setting, to opt out (and apparently lots of websites will track you anyway). You can say no, but they try to make it as difficult for you as possible, and in many cases all of that work ends up being ultimately irrelevant to the goal you have (say, to not be tracked without your consent). Fassl et al. call this kind of design “consent theater,” and it is getting more and more common as what the designers want users to do (click on ads, pay money, “engage”) and what goals the users actually have (to do work, to be entertained, to learn something) diverge dramatically.

I debated putting this example in the “impolite” part above, but it is so personally annoying to me and so indicative of a fault in how we think about design that I had to give it pride of place here. You see, Twitter originally was a reverse chronological list of all of the things that your followers posted. However, with this reverse chronological format, Twitter is unable to suggest or recommend other content that you “might have missed,” or filter the “viral” stuff to the top. And so they introduced a “Home” view which is mostly a reverse chronological list of stuff, but injected with or re-sorted via recommendations from trending topics, or things the people you followed liked, or whatever the Sacred Algorithm might want to throw at you. You can switch back to something that is close to the original “Latest” view, but Twitter does not respect this choice. At some point in the future, it will silently switch back to Home view, because that’s where it wants you to be for the highest amount of engagement. The ability to toggle between timelines looks like a setting over which you have control, but it’s more of a suggestion. Providing guide-rails for user actions is one thing (“are you sure you want to delete this?” has saved me on many occasions), but having the user’s input entirely ignored because the designers think they know better is another.

There are large parts of our computing environment where the user cannot meaningfully say “no,” or at the very least has to do a bunch of work to get that “no” to stick. That’s an awful way to treat people in the real world, so I don’t understand why we find it acceptable in the digital one.

Wrap Up

A screenshot of a news article from the Guardian opened up in an incognito tab on my computer. It’s got a whole line of ads, a pop up begging you for money, and a pop-under asking you to set your cookie strategies. Hey, if you’re the kind of person who reads alt text, can you tell me if this situation is as bad as I think it is for screen readers? It has to be, right?
What happens if you try to read an article on The Guardian’s webpage in an incognito page. I was trying to find a particularly embarrassing article to test this on so we could all have a laugh, but it turns out not to matter because you can’t even read the headline under all the focus-stealing garbage.

In the 1966 film “What on Earth!”, the conceit (that has since been repeated in other media) is that, if an alien race came to Earth, and were seeking to know about the dominant lifeform on the planet, they might reasonably conclude that automobiles, not humans, are running the show. We carve out the heart of our cities, sacrifice our time, health, and wealth, and passively accept the constant threat of death or dismemberment to make sure our cars stay happy. Every time someone wants to build (let alone guarantee) affordable housing for people there’s immense pushback: but even in the densest American cities we always make sure we build plenty of places for cars to “live”: there are eight parking spaces for every car in America. You could see how one could get the impression that the “wellbeing” of cars seems to have greater societal and political importance than the wellbeing of people.

I think that a similarly dispassionate observer of the software, especially software for the internet and mobile devices, would look at the world we have built, with unskippable ads and constant surveillance and a dominance of algorithmic feeds instead of direct user curation, and would similarly conclude that the primary users of software are not people but Clippy, or beings very much like Clippy. Clippy would have a great time on the modern internet. Terabytes upon terabytes of user data collected (with dubious consent) that makes those few thousand hours of Office telemetry laughably sparse. Clippy-style “it looks like you’re writing a letter…” recommendation rationales that control what everybody looks at or watches or listens to down to the second. And if any of the upstart people try to hit that “don’t show me this tip again” button, well, you can always choose to ignore them. As far as I’m concerned, Clippy won the design war: all software is Clippy now. Or if it’s not Clippy now, it will be soon.

--

--

Michael Correll

Information Visualization, Data Ethics, Graphical Perception.