Lesson 8.1 — Why AI Makes Things Up

Magnifying glass over text document checking for errors

AI hallucination is not a bug in the sense of a coding error that will be fixed in the next update. It is a fundamental feature of how large language models work. Understanding why AI makes things up — and developing reliable habits for catching it when it matters — is one of the most important skills this course can give you.

What Hallucination Actually Is

The term "hallucination" is borrowed from psychology, but it is not a perfect analogy. AI models do not have experiences or perceptions. A better metaphor is confabulation — a neurological phenomenon where a person with memory damage fills in gaps with plausible-sounding fabrications, completely sincerely, with no intention to deceive.

AI models work by predicting what text should come next, based on statistical patterns learned from vast quantities of training data. They are extraordinarily good at producing coherent, confident-sounding text. But they have no access to a ground truth database of facts. They have patterns.

When a model is asked about something outside its training data, or something highly specific (a date, a statistic, a citation), it cannot simply say "I don't have that information" in its core operation — it generates what statistically seems like the right answer. That generated answer is often wrong.

Key takeaway: AI hallucination happens because language models generate plausible text, not verified facts. The better a model is at sounding confident, the more dangerous its hallucinations can be.

Five Real-World Examples

1. The Schwartz Legal Case (2023)

New York lawyer Steven Schwartz submitted a legal brief citing six case precedents — all of which were generated by ChatGPT and did not exist. Cases like Varghese v. China Southern Airlines and Martinez v. Delta Air Lines were fabricated. The lawyer had used ChatGPT to research case law and asked it to confirm the cases were real — it said yes. The court sanctioned both the lawyer and his firm. This case is now widely cited in discussions of AI liability.

What went wrong: The lawyer trusted AI output without independent verification, then used AI itself to verify — which is circular and useless. AI will confirm its own hallucinations.

2. Air Canada Chatbot (2024)

Air Canada's AI chatbot told a passenger that the airline offered bereavement fares and that the passenger could apply for the discount retroactively after purchasing a ticket. Neither statement was true. When the passenger sought the refund, Air Canada argued it was not responsible for the chatbot's statements. A Canadian tribunal ruled against the airline: Air Canada was bound by what its chatbot said.

What went wrong: The chatbot hallucinated a policy that didn't exist. The company's argument that it wasn't responsible for its own AI's statements failed legally.

3. CNET AI Articles (2023)

CNET published a series of personal finance articles written by AI without disclosure. When fact-checkers reviewed them, they found multiple errors — incorrect interest rate calculations, wrong figures, and financial claims that were simply wrong. The articles had appeared authoritative and well-structured.

What went wrong: AI-generated content on numerical topics (finance, medicine, statistics) is particularly prone to errors because models do not calculate — they pattern-match.

4. Google Bard Launch Demonstration (2023)

In Google's launch demonstration for Bard (now Gemini), the AI stated that the James Webb Space Telescope had captured "the very first images of a planet outside of our own solar system." This was incorrect — the first exoplanet images were taken years earlier by different telescopes. Google's stock dropped approximately 8% following this highly visible error.

What went wrong: Even in a prepared demonstration with high stakes, AI hallucinated a factual error that went unchecked before publication.

5. Fabricated Academic Citations

Multiple researchers have documented AI models generating plausible-sounding academic citations — author names, journal titles, volume numbers, page numbers — that do not exist. The citations follow the correct format, use real journal names, and have plausible author names, but the actual papers have never been published. This has caused significant problems for researchers who cited AI-generated references without checking.

Risk Table: When Hallucination Matters Most

Different types of claims carry different levels of hallucination risk. Use this table to calibrate your verification effort:

Claim type	Hallucination risk	Verification approach
Recent events (last 6–12 months)	High	Check a current news source
Specific statistics and numbers	High	Find the primary source
Legal or regulatory details	High	Check official or primary source
Academic citations and references	High	Verify each citation exists
Medical information	High	Cross-reference clinical sources
Named quotes from real people	High	Check the original source
Technical documentation details	Medium	Verify in official documentation
Historical events (well-documented)	Medium	Spot-check key facts
General concepts and explanations	Low-Medium	Read critically; check if confused
Creative content, opinions, ideas	Low	Content is generated, not factual
Common knowledge	Low	Occasional errors; worth skimming

Verification Strategies

Strategy 1: Source-test the claim Ask the AI: "What is your source for [specific claim]?" If it cites a source, go and check that source directly. Do not ask the AI to confirm — confirm yourself.

Strategy 2: Cross-tool verification Search the same claim in a second tool (Perplexity, Google, a specialist database). Agreement between independent sources increases confidence.

Strategy 3: Ask the AI for its uncertainty "How confident are you in that figure? Is this something you might be uncertain about?" AI models, when prompted directly, will often acknowledge uncertainty they didn't volunteer.

Strategy 4: Check numbers independently For any statistic that matters, find the primary source — the original study, the official report, the government data. Not another article citing the same claim.

Strategy 5: The expert test If you have access to someone expert in the subject area, ask them to review the key claims. This is not always possible, but it is the gold standard.

Strategy 6: Never use AI to verify AI This bears repeating. Asking ChatGPT to confirm a fact that ChatGPT just told you provides zero additional verification. The model is not checking against external reality — it is generating consistent-sounding text.

Building Verification into Your Workflow

The goal is not to verify everything AI tells you — that would be slower than not using AI at all. The goal is to verify the things that matter, proportionate to the stakes.

A practical rule: The higher the consequence of being wrong, the more you verify.

For a casual conversation or a rough first draft where you will do further research: low verification.

For a document that will be published, presented, or acted upon: verify every specific factual claim.

For medical, legal, or financial decisions: use AI to understand the landscape, then seek professional advice for the decision itself.

Practice Task

Take any AI-generated response you have already received — something from earlier in this course, or a conversation from the past week. Identify the three most specific factual claims in it (statistics, dates, citations, named examples). Verify each one against an independent source. Note how many are accurate, how many are slightly wrong, and how many are significantly wrong. This is calibrating your own sense of AI reliability in real conditions.

AI Hallucinations