Hallucinations:
when the model is
confidently wrong.
A language model's job is to predict plausible next tokens — not to tell the truth. Most of the time those two goals overlap. When they don't, the model produces a fluent, grammatical, well-cited sentence that simply isn't true. The fluency is the trap.
The paragraph above reads beautifully. It also gets the year wrong, the field wrong, and includes a husband at a ceremony three years after his death. None of those errors broke the prose. That is the whole problem with hallucinations: they are smooth on the surface and broken underneath, and the model is built to produce the smooth surface no matter what.
You'll often hear hallucinations described as "the model lying" or "making things up." Both framings are misleading. The model has no concept of truth, no internal flag for I-don't-know. It has a probability distribution over next tokens, learned from a frozen snapshot of text. Sometimes the highest-probability tokens line up with reality. Sometimes they don't. What we call a hallucination is the second case, dressed in the same fluency as the first.
in language models.
They are a feature of how language models work.
Five flavors, one mechanism
"Hallucination" is a single word for several different failure modes. They share a cause but they look different in practice and need different defenses. Click through the five most common types.
The first three are about content — the model says false things. The last two are about contract — the model violates an explicit instruction. Different problems, often confused.
How the model gets there
To see a hallucination form in slow motion, watch the model answer one token at a time. At every step it picks the most fluent continuation; at no step does it consult a database, ask a friend, or check a fact.
The fluent sentence is the result of a chain of locally-optimal picks. Notice how a single bad sample at the year forces a plausibly-consistent — and equally wrong — choice on the field, two steps later.
This is the structural reason hallucinations don't go away with bigger models. A bigger model has better priors and gets the right answer more often. But it answers the same way: by picking probable tokens. When a fact is rare in training data, or contested, or absent, the model still produces a sentence — because that's what it does. Refusing to answer is a learned behavior layered on top, not a default.
Confidence is not correctness
A natural intuition: if the model sounds sure, it's probably right. This is the most dangerous misconception in the field. The model's expressed confidence — and even its internal probability — is only loosely tied to whether the answer is true.
The top-right quadrant — confident and wrong — is where most real damage gets done. A user can defend against a hesitant wrong answer; a confident one slides past.
This is called calibration failure. A well-calibrated model that says "I'm 80% sure" should be right 80% of the time. Modern LLMs, before any post-training tricks, are systematically over-confident — especially on the kinds of niche, specific questions where they're most likely to be wrong.
What actually helps
You cannot remove hallucinations from a language model. You can make them rarer, easier to catch, or move responsibility for the answer somewhere else. Four techniques carry most of the weight.
Most production stacks combine two or three. RAG + low temperature is the modern default; verifier models get added when the cost of being wrong is high enough to justify the latency.
How to catch them
Three patterns do most of the heavy lifting in detection. None of them requires access to the model's internals — they all work from the outside, by re-asking, checking, or comparing.
Self-consistency: ask N times, look for agreement
Sample the same prompt at moderate temperature several times. If the model returns the same answer in 9 of 10 runs, it's probably learned. If every run gives a different specific number, it's probably guessing. This is cheap, model-agnostic, and surprisingly effective on factual questions.
Citation grounding: every claim must point somewhere
Force the model to cite a source for every factual claim, then verify each citation programmatically — does the URL exist, does the quoted passage actually appear in the document, does the date match? Citations are the part of the answer easiest to falsify, and falsifying them catches a huge fraction of hallucinations.
Cross-check: pit two answers against each other
Get the same question answered by two systems — two models, two retrieval sources, the model with and without context — and surface the disagreements. The agreements are probably true; the disagreements are exactly where to look harder.
What hallucinations are not
The word "hallucination" gets stretched to cover anything the model gets wrong. Most of these are different problems with different fixes.
Not lying.
Lying requires knowing the truth and saying otherwise. The model has no internal representation of truth — only of what is statistically likely to be said. A confident wrong answer is not malice; it is the same machinery that produces confident right answers, applied to a question it can't actually answer.
Not stale knowledge.
If the model says Joe Biden is the U.S. president after his term has ended, that's a knowledge cutoff problem, not a hallucination. The model is correct as of its training data. The fix is updating context (RAG, system prompt), not retraining.
Not user disagreement.
If the model gives a tonally awkward email or a structurally different essay than you wanted, that's a preference miss, not a hallucination. Hallucinations are about truth, not taste.
Not solvable by scaling alone.
Bigger models hallucinate less, but they don't hallucinate zero — and they hallucinate more confidently. Frontier models in 2025 still produce fabricated citations, made-up court cases, and confident wrong dates. Architecture changes the floor, not the ceiling.
Not always bad.
The same machinery that fabricates a citation also writes a poem, names a startup, or invents a metaphor. Generation that goes beyond the training data is the point of the technology. The trick is steering it: invention when you ask for invention, fidelity when you ask for fact.
Fluency is not truth.
Language models are professional sentence-finishers. Most of the time the most fluent sentence is also the true one. When it isn't, you get a hallucination — and you only catch it by checking, retrieving, or asking again.
Sources & further reading
- Ji et al., Survey of Hallucination in Natural Language Generation — the canonical taxonomy paper.
- Lin, Hilton & Evans, TruthfulQA — the benchmark that quantified the problem.
- Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — the original RAG paper.
- Wang et al., Self-Consistency Improves Chain of Thought Reasoning — the basis for the agreement-vote pattern.
- Kadavath et al., Language Models (Mostly) Know What They Know — calibration in modern LLMs.
- Mata v. Avianca, S.D.N.Y. 2023 — the lawyer who filed fabricated case citations. Read it.