The Pattern and the Person
What Grammarly got wrong about writing and writers
Somewhere in the long chain of meetings between “what if we built this” and “why are you suing us,” surely someone at Grammarly must have thought about whether the writers whose names and reputations and decades of accumulated professional judgment they were planning to bottle and sell as a subscription feature might want to be consulted about that. It’s likely that someone somewhere said something at some point. Maybe they sent an email that got buried. Maybe the question came up in a sprint review and was noted and deprioritized because the launch timeline was fixed and the legal team said grey area, phew.
This is speculation, but it’s informed speculation, because this is how it goes. The machine moves fast and the uncomfortable question gets answered by not being answered, and seven months later Casey Newton finds out he’s been moonlighting as a Grammarly editor without his knowledge or consent, and writes, with admirable restraint, “I’ve long assumed that before too long, AI might take my job. I just assumed that someone would tell me when it happened.”
The feature has been pulled, a class action lawsuit has been filed, and the CEO of Superhuman, which owns Grammarly, has issued the requisite LinkedIn apology, careful to acknowledge the feelings without quite acknowledging the wrong. And soon, as is tradition in these moments, the news cycle will move on. Let’s slow this down for a second.
Copyright shmopyright
The feature in question is Expert Review, with the premise that users can get their writing evaluated through the lens of real, named professionals whose published work Grammarly had scraped, analyzed, and used to train a model that would then produce feedback in their style. The named experts included, among many others, Nilay Patel, Timnit Gebru, Stephen King, and a Cambridge historian who had been dead for several weeks by the time Grammarly put him to work critiquing undergrad essays, none of whom were asked, in life or in death.
To build something like this, you have to believe that a writer’s voice is an extractable property of their text, that if you measure enough things about how someone writes, you can surface the pattern that makes them them and reproduce it on demand. This is a belief with a long history in computational linguistics, and it’s not entirely wrong. You can measure a lot of things. Sentence length, clause density, how often a writer hedges versus how often she just says the thing, vocabulary range, paragraph length variance, whether the important word tends to land at the beginning of a sentence or the end. These observations are real and interesting. Paragraph length variance, for instance, is a surprisingly good proxy for whether someone writes to a rhythm or writes to fill space. Taken together they give you something like a stylometric fingerprint, a set of tendencies that show up consistently enough to be recognized.
But these instructive tools fall short as soon as we get into the messy business of being human. The statisticians can tell you what sentences are possible, they cannot tell you which ones are good. The poet needs the word that could only come here, in this poem, after this line, and most of the time that word is nowhere near the most probable one. In fact, if you handed a great poem to an automated writing evaluator, a lot of what makes it work would register as errors, with syntax bent out of shape in ways that shouldn’t cohere and somehow do. The eval would want to clean it up, and that would ruin it. What the eval sees as a mistake is often the exact mechanism by which the poem does what it does, which is to disorient you slightly and make you feel something you hadn’t felt a moment before. Turning that into an optimization target is a high-order ask and, so far, not one that’s been met.
The deeper problem is that measurable tendencies in writing are produced by a sensibility and judgment, and metrics capture the output of that judgment without capturing the judgment itself. A model trained on those metrics will produce text that looks right without having access to the reasoning that generated the original. So what Grammarly built was not a reproduction of these writers’ voices but a reproduction of their fingerprints, which look like the person from a distance and have nothing of them up close. The feedback it generated was sounds plausible in the way a very good forgery is sounds plausible, and a good reader will always know, and eventually find out.
The More You Write
It’s a grim irony, who ends up in a dataset like this. The writers most useful to Expert Review were the ones with the largest, most coherent, most recognizable bodies of work, people who spent years publishing distinctly and prolifically enough that their names mean something to a reader. And those are exactly the people whose voices are most obviously not reproducible, because the more data you have on how someone writes, the clearer it becomes that the model has no idea why. The stylometric signal gets stronger and the impersonation gets hollower in equal measure. The prolific career turns out to be the most thorough record of its own irreproducibility, which is a strange thing to have to say but here we are.
The people who caught this either already suspected something was wrong, or they knew the writing well enough to feel that something was off. Probably both. The impersonation failed loudest for the writers with the most established voices, which also means the silent version of this failure, like writers with smaller audiences, earlier careers, and less critical mass to generate outrage, has almost certainly been happening without anyone noticing. The damage distributes itself to the people least positioned to contest it, as it tends to.
Grammarly could probably have run this longer, maybe indefinitely, if they hadn’t (brazenly, ridiculously) attached real names to the product. Every major language model will approximate a specific writer’s style if you ask it to, which is a widespread and largely uncontested capability because the simulation is one step removed from an explicit claim, in the middleground of public domain-ish. The decision to curate a list of real people and surface their names as a premium feature was the decision to make the implicit explicit, and in doing so to make the claim legible enough to contest. Someone in that product meeting thought naming the authors was the transparent thing to do, possibly even the respectful thing. What it actually did was turn a stylistic approximation into an identity claim, and the writers whose identities were being claimed eventually noticed, because of course they did. You know when someone is pretending to be you because it’s icky.
The assumption underneath Expert Review that a voice is a pattern, that a pattern is data, and that data is fair game didn’t originate at Grammarly and it won’t die with this lawsuit. It’s the water these products swim in, the logic that gets applied every time a model is trained on human creative work without asking, every time a product is designed around approximating a specific person’s judgment without compensating or even notifying them. Most of the time it happens at a scale and a remove that makes it hard to see and harder to contest. This time it’s super visible, because the writers it affected had the platform and the profile to make noise about it.
Pay attention when writers who have spent careers building something irreplaceable are standing up to say: that’s mine, and you can’t have it. The noise doesn’t last and the industry doesn’t stop, but the signal is real, and it’s telling us something true about what’s being taken and how hushed it usually goes.



That they named Timnit Gebru feels a lot like they knew what they were doing. What a weirdly specific way to troll.