When an Agent Lies: AI Hallucination as Ethical Engineering Problem

In a medical information agent I audited, 4.2% of responses contained fabricated citations, dosage information, or treatment recommendations that could directly harm patients. In low-stakes applications, hallucination is a nuisance. In high-stakes contexts, it is an ethical violation with measurable consequences, and the engineering community treats it with insufficient seriousness.

When does hallucination become an ethical failure rather than a technical limitation?

Hallucination becomes an ethical failure when a system is deployed in a context where fabricated outputs can cause material harm to people who have no way to verify the system’s claims, and the deployer knew or should have known about the hallucination rate.

Hallucination as ethical failure is the framing of AI confabulation not as a technical imperfection to be minimized but as a safety-critical defect that must be treated with the same severity as any other defect in systems that affect human welfare.

I draw a distinction between hallucination in a creative writing assistant (low stakes, user can evaluate the output) and hallucination in a medical information agent (high stakes, user may lack the expertise to identify fabrication). The technical phenomenon is identical. The ethical significance is not. A chatbot that invents a fictional restaurant recommendation is amusing. A chatbot that invents a drug interaction warning is dangerous. The engineering response should differ accordingly.

The medical information agent I audited had been deployed with a 4.2% hallucination rate measured on its evaluation set. The team considered this acceptable. I asked what a 4.2% defect rate would mean if this were a pharmaceutical product. At the volume of 8,000 daily queries, 4.2% yields 336 potentially harmful responses per day. No physical product with a 4.2% defect rate in safety-critical components would pass regulatory review. The agent was held to a lower standard because the harm was informational rather than physical. But the harm to the person who acts on fabricated medical information is not informational. It is physical.

How should hallucination rates be treated in safety-critical deployments?

Hallucination rates in high-stakes contexts should be treated like safety-critical defect rates: measured continuously, held to explicit thresholds, and gated in deployment pipelines with the same rigor as any other safety metric.

I implemented a hallucination monitoring system for the medical agent that treated confabulation the same way site reliability engineering treats error rates. I defined a hallucination budget (analogous to an error budget): the maximum acceptable rate of fabricated outputs per time period. When the budget was exhausted, the system automatically degraded to a more conservative mode that returned only verbatim excerpts from verified sources rather than generated responses.

This is the same principle behind safety-as-architecture: the system degrades gracefully rather than failing silently. The hallucination budget was set at 0.5% for any response containing medical claims. Achieving this required a verification layer that cross-referenced every medical claim against a curated knowledge base before delivery. The verification layer added 340ms of latency and $0.003 per query in compute costs. These are measurable engineering tradeoffs, not abstract ethical principles.

What engineering approaches reduce hallucination in ethical context?

Reducing hallucination in high-stakes contexts requires a defense-in-depth approach: constrained generation, retrieval-grounded responses, claim verification, confidence-calibrated delivery, and human review for borderline cases.

Retrieval grounding: Every response must be grounded in retrieved source material. The system generates responses based on specific documents, not parametric knowledge. I implemented this as a hard constraint: if the retrieval layer returns no relevant sources, the system says so rather than generating an answer. This is the same principle behind RAG as data infrastructure.
Claim verification: A post-generation verification step checks every factual claim in the response against the source material. Claims not supported by retrieved documents are either removed or flagged with an uncertainty indicator.
Confidence-calibrated delivery: Responses include an explicit confidence indicator (high, medium, low) derived from retrieval quality and claim verification results. Low-confidence responses include a disclaimer and a recommendation to consult a professional.
Domain-specific guardrails: The system is prohibited from generating certain categories of content (dosage recommendations, diagnostic conclusions, treatment plans) regardless of what the user asks. These prohibitions are enforced architecturally, not through prompt instructions.

What does this mean for the ethics of AI deployment decisions?

Deploying an AI system with a known hallucination rate in a high-stakes context without adequate mitigation is an ethical decision, not just a technical one, and the deployer bears moral responsibility for foreseeable harm.

The FDA’s framework for AI in medical devices is beginning to address this, but most AI systems that provide medical, legal, or financial information operate outside regulatory frameworks designed for those domains. The absence of regulation does not eliminate the ethical obligation. An engineering team that deploys a system known to fabricate medical information at a 4.2% rate has made an ethical choice, whether they recognize it or not.

Hallucination is not a quirk. In high-stakes contexts, it is a defect. And defects in safety-critical systems are engineering failures with ethical consequences. The appropriate response is not to shrug and mention that “all LLMs hallucinate.” The appropriate response is to measure the rate, set an acceptable threshold, build mitigations, and refuse to deploy when the threshold cannot be met.

When does hallucination become an ethical failure rather than a technical limitation?

How should hallucination rates be treated in safety-critical deployments?

What engineering approaches reduce hallucination in ethical context?

What does this mean for the ethics of AI deployment decisions?

More Essays

Emotional AI and the Boundary of Machine Perception

The Alignment Tax: What Responsible AI Actually Costs

Synthetic Data Ethics: When Fake Data Creates Real Bias