Large language models don’t mean to make things up. They’re not lying, per se. They’re not guessing in the way people do. They are (still) predicting: the next word, the next phrase, the next sentence, over and over, until a complete response takes shape.
That prediction process has gotten astonishingly good. But it also means that if the model has seen enough similar examples, it might say something that sounds true but isn’t. This is what we call a hallucination: fabricated information, often delivered with fluency and charm. Hallucinations are, honestly, a feature of the system. These models are optimized to sound right, not be right. There’s no embedded instinct to pause when the facts feel flimsy. Just a relentless drive to finish its thought and spit it out.
Hallucinations happen when the model has no idea what it’s talking about but wants to be helpful (this trait is as annoying in models as it is in people). If the training data doesn’t include enough examples, the model will still try valiantly to complete its task. That might mean inventing a name, a date, or a study that sounds real enough to pass. When the prompt is vague or suggestive, like “Tell me about the Stanford empathy study from 2022,” AI might hallucinate a response because while this study does not exist, it totally sounds like it should. Most of the time it just wants to be agreeable. So it barrels ahead, getting things confidently wrong; bless its heart.
“Fixing it”
Grounding helps. Retrieval-augmented generation (RAG) can rein the model back to reality by pulling in real sources from the web. But that system has its own cracks. Sometimes the retrieval fails or doesn’t find anything useful (have you been on the internet lately?). Sometimes the model decides not to cite at all. Sometimes it cites but still blends or garbles the sources. You get something that looks grounded but isn’t. And because the model was never really trained to care about grounding — it just learned through rote instruction when it’s expected — it may revert to making wild guesses if a source isn’t handed to it on a platter.
This brings us to citations, which seem like a simple fix. Just include sources! Just show your work! Alas, citations can also be hallucinated. Fake URLs, fake authors, fake article titles. Worse, even real citations can lend false authority. People don’t always check. And the mere presence of a citation increases trust, even if the content behind it doesn’t hold up. That’s a problem for everyone. So, this inevitably becomes a design debate. Should we interrupt the flow of the conversation with clunky links? Should we let the model decide when to cite? Should we fake it til we make it? It’s a question of user experience, but also of accountability.
In theory, reinforcement learning from human feedback (RLHF) is supposed to provide backup. It teaches the model to prefer grounded, accurate answers, because there’s an actual person checking facts and judging what’s good and what’s bad. But there’s such a thing as over-teaching AI. The model learns to hedge. It says “some people believe...” or “there’s limited evidence...” It mimics the tone of caution without changing the substance. And sometimes it learns to avoid complex or niche truths entirely, defaulting to the safe answer instead of the correct one. Basically, it learns to be a people-pleaser with a very convincing shrug.
This is because most RLHF systems aren’t optimizing for what AI is actually up to; judges don’t see that part. The goal is to optimize for responses that feel satisfying to humans. That usually means polite, balanced, and having the appearance of being intelligent. When human judges reward tone over content, the model picks up tone. When they reward usefulness, the model learns how to look useful. And when the judges aren’t sure, the model doesn’t really learn anything. In the end it’s all performance art. The model learns to serve what we seem to want, not what we might need. And honestly, all of these linguistic gymnastics would make anyone seek an alternate reality.
Coming down
Hallucinations aren’t going anywhere. They persist not just because of technical limitations, but because we keep pretending this is a problem that can be fixed with enough polish. But what if the problem isn’t polishable? What if it’s fundamental?
We’re still treating AI like it should reflect the best parts of us — our intelligence, our logic, our clarity — while quietly offloading the messier parts: our confusion, our contradictions, our need to feel right even when we’re not. We tell it that lying is bad but reward it for sounding confident. We expect intelligence to be grounded in truth but train it on oceans of unverified noise. And then we’re surprised when it stitches those contradictions into its answers.
We’re trying to smooth this weird world into something answerable. But the world isn’t smooth. It’s full of jagged edges. Why would we expect AI to be so certain in a universe that isn’t?
If we want different behavior, we might have to look in the mirror. Maybe AI needs to be allowed its own kind of reasoning; one that doesn’t just imitate human answers but acknowledges complexity and stops mid-sentence when it should. Maybe we need to stop designing for neatness and start designing for friction. Because otherwise, we’ll keep asking these systems for facts and getting convincing fictions in return.