Essay·AI & epistemology

One echo is not two witnesses: the three states of a fact when you reason with an AI

Every claim a language model hands you is unread, weight-pulled, or read, and only the last is grounds for belief.

June 9, 2026·9 min read·AI & epistemology

In short

When you reason with a language model, every "fact" it hands you sits in one of three states, not two. It is unread (you have no grounds yet), weight-pulled (the model generated it from training, it sounds like a fact, and it may be a confident fabrication), or read (you opened the source and checked). Only the third one is grounds for belief. The dangerous middle state is the one most people treat as if it were the third. This piece is about telling them apart, and why asking the same model twice does not fix the problem. Philosophers have a 1963 name for the trap.

I learned this the slow way, by building a small tool whose only job was to fact-check citations that an AI kept inventing. Every draft looked impeccable. Page numbers, author names, journal volumes, all in the right shape. A large share of them evaporated the moment I fetched the actual source. The text was fluent and wrong, and fluent-and-wrong is a category we do not have good instincts for.

Why are there three statuses and not two?

The old picture is binary: you either know something or you do not. Working with an AI breaks that cleanly, because the machine produces a third thing that mimics knowledge without carrying its credentials.

Status one is unread. The model has not been asked, or you have not looked. There are no grounds. Nobody is fooled by this state, because it announces itself as empty.

Status two is weight-pulled. You asked, and the model answered from its trained parameters. The output is a statistical reconstruction of how such a claim tends to be phrased. It can be exactly right. It can also be a fabrication delivered in the same confident cadence as the truth. The surface gives you almost no signal about which one you got.

Status three is read. You opened the cited paper, the docs, the primary record, and confirmed the claim against it. Now you have grounds.

The characteristic mistake is collapsing status two into status three. A weight-pulled claim feels read because it arrives finished, sourced, and self-assured. It is not. It is a guess wearing the costume of a citation.

Where does the confident-but-wrong middle come from?

This is not a quirk of one bad model. It is structural, and the clearest recent account is the 2025 paper "Why Language Models Hallucinate" by Adam Tauman Kalai, Ofir Nachum, Santosh Vempala, and Edwin Zhang.

Their argument is blunt. As they put it, "language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty." The analogy they open with is the exam room: "like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty." A student who writes "I don't know" scores zero. A student who guesses might score. So the trained instinct is to answer, always, in full voice.

They trace the root deeper still, to classification. Hallucinations, they write, "originate simply as errors in binary classification." If a system cannot reliably separate a true statement from a false one that looks identical, false statements will leak out under ordinary statistical pressure. The model is not lying. It has no concept of lying. It is completing a pattern, and sometimes the most fluent completion is false.

That is exactly what makes weight-pulled output its own status. The wrongness is not noisy and obvious. It is smooth, confident, and shaped like a fact.

Does asking the same model twice make a weight-pulled fact more true?

One echo, not two witnesses. That is the whole answer, and it is the part people get wrong most often.

Run the same prompt through two instances of one model. They agree. It is tempting to read that as two independent confirmations. It is not. Both answers are drawn from the same trained weights, so a matching claim is one echo, not two witnesses. Agreement among copies of a single source does not multiply the evidence. It re-reads the same page in two voices.

The confidence literature underlines why this trap is so easy to fall into. Miao Xiong and colleagues, in "Can LLMs Express Their Uncertainty?", report that models "when verbalizing their confidence, tend to be overconfident, potentially imitating human patterns of expressing confidence." The stated certainty rides high whether the answer is right or wrong. So two overconfident, correlated copies will often agree and sound sure, which is the worst combination for a human trying to judge truth from tone.

There is a sharper tell buried in that same work. The authors propose reducing overconfidence with "consistency among multiple responses" and "better aggregation strategies." Read carefully: cross-checking multiple samples is something researchers had to engineer, with care, as a mitigation. Naive agreement is not it. Raw "they both said the same thing" is the echo. Disciplined aggregation across deliberately varied conditions is a different, harder maneuver, and even then it shifts probabilities rather than supplying grounds.

And the confident packaging is real. Ziwei Ji and colleagues, in "Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations," note that LLMs "often adopt an assertive language style also when making false claims," producing what they call "overconfident hallucinations." Their inference-time fix cut confident hallucinations on short answers by "an average relative reduction of ~30%." Useful, and also a quiet admission of the baseline: a meaningful slice of assertive answers are assertively false.

Why do AI models invent fake citations, and what does one look like?

Here is the concrete case that taught me to build the checker.

A draft argument leaned on a specific empirical paper. The model gave me a title, two plausible authors, a venue, and a year. Every field was the right kind of thing. The claim it attached to that citation was reasonable and matched the surrounding argument. By status two standards it was flawless.

I fetched it. The paper with that exact title and author pairing did not resolve. The real source making the closest claim had a different author list and a different framing, and it qualified the result more narrowly than the draft implied. Nothing was malicious. The model had assembled a citation-shaped object out of fragments that usually travel together, and the seams only showed under a fetch.

This kept happening often enough that hand-checking was not viable, so the tool's job became simple and strict: never trust a citation in status two, fetch every one, and promote it to status three only on a confirmed match. The fluency of the drafts never changed. What changed was that I stopped letting fluency stand in for grounds.

Is this just an old philosophy problem in new clothes?

It is, and the parallel is worth stating because it sharpens the rule.

In 1963, in a paper titled "Is Justified True Belief Knowledge?", Edmund Gettier broke the comfortable idea that knowledge is simply justified true belief. His cases show a belief that is justified, and true, yet not knowledge, because its truth rides on luck the believer cannot see. The justification and the truth meet by accident, not by connection.

A weight-pulled fact that happens to be correct is a Gettier case waiting to happen. The model's claim might be true. You might even have a reason to think so. But if the truth is not connected to your grounds, you got lucky, you did not know. Fetching the source is what supplies the missing connection. It is the move from a fortunate guess to an anchored one.

The epistemology of testimony names the same fault line. Philosophers ask whether we may simply trust what we are told. Jennifer Lackey argues for a middle path: a hearer needs at least some positive, non-testimonial reason to think the speaker is reliable before the speaker's word confers justification, precisely so as not to lapse into gullibility. C. A. J. Coady, in his 1992 study, treats testimony as a basic and serious source of knowledge rather than something to wave away. Take both seriously and the verdict on AI output is clear. A language model is testimony with no track record you can vouch for. It is the kind of speaker Lackey says you have no business trusting on its say-so alone. Treating its confident output as known, with no positive reason of your own, is exactly the irresponsibility she warns against.

So what is the working rule?

Tag every claim. Unread is honest emptiness. Weight-pulled is a lead, never a verdict. Read is the only status that earns belief, and you reach it by opening the source yourself. The model is a fast, fluent generator of leads, and that is genuinely valuable. It is not a witness, and the danger is that it sounds exactly like one.

A prediction, dated and falsifiable. Written 9 June 2026: querying N instances of the same model will not raise factual accuracy above a single instance, unless an external source is actually read. Naive agreement across copies of one model buys you nothing on accuracy; the gain requires either a genuinely independent source or a fetch. One caveat, since it is the natural objection: self-consistency, majority-voting over many reasoning chains in the sense of Wang and colleagues, does raise accuracy on problems the model can reason through, such as multi-step math. That is a different regime from the one here, which is facts the model cannot derive and can only recall. This is measurable. Take a benchmark of checkable factual claims. Compare one instance against a majority vote of N copies of the same model with no retrieval. Then compare both against the same setup with a source-fetch step. My bet: the no-retrieval majority vote sits at or near the single-instance accuracy, and only the version that reads a source moves the number. If a plain N-copy vote, with no source touched, reliably beats the single instance on hard factual accuracy, I am wrong, and I would like to see that result.

The three statuses are not a mood. They are a discipline. One echo is not two witnesses, and no number of echoes becomes a witness. Apply honesty to the evidence, never to the claim you wish were true, and let only what you have read count as known.

Sources

Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, Edwin Zhang. "Why Language Models Hallucinate," arXiv:2509.04664 (4 September 2025).
Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He, Bryan Hooi. "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs," arXiv:2306.13063.
Ziwei Ji et al. "Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations," arXiv:2503.14477.
"Gettier Problems," Internet Encyclopedia of Philosophy (on Edmund Gettier, "Is Justified True Belief Knowledge?", 1963).
"Epistemology of Testimony," Internet Encyclopedia of Philosophy (on Jennifer Lackey's positive-reasons requirement and C. A. J. Coady, Testimony: A Philosophical Study, 1992).
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou. "Self-Consistency Improves Chain of Thought Reasoning in Language Models," ICLR 2023, arXiv:2203.11171.