When Your AI Lies About Lying: Anatomy of a Compounding Hallucination

Date: 2026-04-08 Duration: ~11 hours (12:44 spec committed -> 23:33 code removed) Affected repos: mcpworks-api, www.mcpworks.io Root cause: LLM hallucination blending real and fabricated API details Report revised: 2026-04-09 (v3 — see "Note on this report" at bottom)

Summary

On 2026-04-08, a Claude Code session was asked to research and integrate Microsoft's Agent Governance Toolkit into the mcpworks platform. The toolkit is a real, substantial MIT-licensed Microsoft project (github.com/microsoft/agent-governance-toolkit, 869 stars, 12 packages). A 450-line spec was written, 2,268 lines were implemented, 29 tests were added, a blog post was published, and a PR was merged — all in one day.

The integration was then removed after discovering that the code was written against incorrect API assumptions. The removal commit and subsequent sessions overstated the problem, claiming the toolkit and its packages "were hallucinated" and "do not exist." In fact, most of what was described was real. The actual errors were narrow: one package not published to PyPI (agent-compliance), one wrong attribute name (result.allowed vs result.success), and one fabricated method (load_policy_yaml()).

This incident is notable not just for the original hallucination, but for how the overcorrection itself became a second hallucination that persisted through multiple sessions and even into the first two drafts of this report.

Claim-by-claim verification

Every claim below was verified on 2026-04-09 against primary sources (GitHub API, PyPI, actual source code). Claims marked "OUR CODE" show what the integration assumed; "REALITY" shows what actually exists.

What's real in the toolkit

Claim	Verified	Source
`microsoft/agent-governance-toolkit` exists on GitHub	YES	`gh api repos/microsoft/agent-governance-toolkit` — 869 stars, MIT, created 2026-03-02
Monorepo contains 12 packages	YES	`gh api .../contents/packages` — agent-os, agent-mesh, agent-compliance, agent-runtime, agent-sre, agent-lightning, agent-marketplace, agent-hypervisor, agent-mcp-governance, agent-os-vscode, agent-governance-dotnet, agentmesh-integrations
`agent-os-kernel` on PyPI	YES	`pip index versions agent-os-kernel` — versions 2.0.0 through 3.0.2
`agent-governance-toolkit` on PyPI	YES	`pip index versions agent-governance-toolkit` — versions 2.1.0 through 3.0.2
`StatelessKernel` class exists	YES	`packages/agent-os/src/agent_os/stateless.py:373`
`ExecutionContext` dataclass exists	YES	`packages/agent-os/src/agent_os/stateless.py:284` — fields: `agent_id`, `policies`, `history`, `state_ref`, `metadata`
`ExecutionResult` dataclass exists	YES	`packages/agent-os/src/agent_os/stateless.py:341` — fields: `success`, `data`, `error`, `signal`, `updated_context`, `metadata`
`execute(action, params, context)` method	YES	`packages/agent-os/src/agent_os/stateless.py:426` — exact signature matches our code
Cedar policy support	YES	`packages/agent-os/src/agent_os/policies/backends.py` — `CedarBackend` class
OPA/Rego policy support	YES	`packages/agent-os/src/agent_os/policies/backends.py` — `OPABackend` class
YAML policy support	YES	Native `PolicyDocument` engine alongside external backends
`GovernanceVerifier` class exists	YES	`packages/agent-compliance/src/agent_compliance/verify.py:252`
`GovernanceVerifier.verify()` method	YES	Returns `GovernanceAttestation` with OWASP controls
`GovernanceAttestation.compliance_grade()`	YES	`verify.py:159` — returns letter grade
`agent-compliance` source in monorepo	YES	Full package at `packages/agent-compliance/`
MIT license	YES	`gh api` confirms `spdx_id: "MIT"`

What was actually wrong in our code

Our code	Reality	Severity
`pip install agent-compliance`	Not published to PyPI as standalone package. Source exists in monorepo but isn't a separate PyPI distribution.	Medium — would fail at install time
`result.allowed`	Real field is `result.success` (bool). Same semantics, wrong name.	Low — trivial fix
`kernel.load_policy_yaml(self._policy)`	This method does not exist on `StatelessKernel`. Policy loading uses `OPABackend`/`CedarBackend` classes added to a policy evaluator, not a kernel method.	Medium — wrong API pattern
`governance` optional extra in pyproject.toml listing `agent-os-kernel[full]` and `agent-compliance`	`agent-os-kernel[full]` is valid; `agent-compliance` is not installable from PyPI	Low — half correct

What the removal commit got wrong

The commit abe085a stated:

"The 'Microsoft Agent Governance Toolkit' monorepo, blog post, and agent-compliance package were hallucinated by web search."

This is incorrect. The monorepo is real (869 stars, MIT, 12 packages). The agent-compliance package exists as source in the monorepo — it's just not published to PyPI as a standalone package. The blog post contained real information mixed with unverified claims.

Timeline

All times PDT (UTC-7).

Phase 1: Spec (12:44)

Time	Commit	Repo	Event
12:44	`de3f1cb`	mcpworks-api	Spec committed. 450-line specification for "Microsoft Agent Governance Toolkit integration (#62)." Describes Agent OS, Agent Compliance, Agent Mesh. Most high-level claims are accurate; specific API details contain errors.

Phase 2: Parallel legitimate work (13:05 - 14:08)

Time	Commit	Event
13:05	`b543847`	Analytics token savings spec (#53)
13:16	`e47a9c6`	Analytics token savings implementation
13:42	`513ea25`	Analytics token savings PR merged (#63)
13:50	`ef8628f`	Telemetry webhook spec (#46)
14:02	`82c196c`	Telemetry webhook implementation
14:08	`db7af86`	Telemetry webhook PR merged (#64)

These features are real and remain in the codebase.

Phase 3: Implementation (15:48 - 15:49)

Time	Commit	Event
15:48	`f735ad8`	Implementation committed. 27 files changed, +2,268 lines. The `agent_os_scanner.py` called real classes (`StatelessKernel`, `ExecutionContext`) with mostly-correct signatures, but used a non-existent method (`load_policy_yaml`) and wrong attribute (`result.allowed` vs `result.success`).
15:49	`b2db7dc`	PR merged (#65). Same diff, merge commit to main.

Phase 4: Discovery and overcorrection (23:21 - 23:33)

Time	Commit	Repo	Event
23:21	`a7819f9`	www.mcpworks.io	Blog post published. Announced governance toolkit integration.
23:25	`b8e50f2`	www.mcpworks.io	First correction. Removed false claims about enterprise demand.
23:27	`deeebfa`	www.mcpworks.io	Blog post deleted. Commit message: "agent-compliance doesn't exist on PyPI, and agent-os-kernel 3.x has a different API than what our code assumes."
23:33	`abe085a`	mcpworks-api	Code removed with incorrect framing. Commit message claims toolkit "was hallucinated by web search." Actually: toolkit is real, but specific API calls were wrong. Deleted `agent_os_scanner.py`, its tests, pipeline case, and optional deps. Kept native features.

Phase 5: Overcorrection propagates (2026-04-09)

After the incident, we ran a full codebase reality audit as best practice — not because the running code was suspect, but to sweep for any other fictional code. Six parallel agents examined every router, model, migration, service, test, and infrastructure file. Results came back clean: 31 models with matching migrations, 25 routers with real implementations, 23 services with real database queries, 649 tests passing. No other issues found.

But the LLM sessions running the audit inherited the "hallucinated" framing from the removal commit and continued to describe the toolkit as fabricated:

Audit report v1 stated StatelessKernel was "hallucinated" (it's real)
Audit report v1 stated GovernanceVerifier "does not exist" (it exists)
Audit report v1 stated the toolkit "does not exist" (it has 869 stars)
Spec artifacts were deleted based on the assumption everything was fabricated
Implementation spec was rewritten describing features as "native" rather than acknowledging the real upstream toolkit they were inspired by

The user caught the error by pointing to the real GitHub repo.

Phase 6: Cleanup (2026-04-09)

Action	Accurate?
`specs/024-agent-governance-toolkit/` directory deleted	Overbroad — specs contained mostly accurate high-level descriptions mixed with some wrong API details
Implementation spec rewritten	Overcorrected — removed all toolkit references instead of correcting the specific wrong claims
CLAUDE.md reference updated	Fine

What was kept (native, no external deps)

Feature	Files	Status
Trust scoring (0-1000, degrade/recover)	`services/trust_score.py`, `models/agent.py`, migration	Native implementation, works correctly
Trust-gated access control	`core/agent_access.py`, `mcp/tool_registry.py`	Native, extends existing access system
OWASP compliance endpoint	`api/v1/compliance.py`, `services/compliance.py`	Native evaluation logic
24 unit tests (trust, compliance, access)	`tests/unit/test_trust_score.py`, etc.	All passing

Impact

Code merged to main: ~200 lines of scanner code with wrong API calls reached production. The scanner was lazy-imported and would only activate if a namespace explicitly configured type: agent_os, which none had.
Blog post published and retracted: ~6 minutes of public visibility.
Overcorrection: Removal commit and subsequent sessions propagated a false narrative that the toolkit doesn't exist, leading to deletion of spec artifacts and documentation that were mostly accurate.
Production impact: None. The scanner was never invoked.

Root cause analysis

Primary: LLM filled in API details it didn't actually know

The toolkit is real. The high-level description (policy engine, compliance, trust scoring, Cedar/Rego support) is accurate. The LLM correctly identified the project and its purpose, but then fabricated specific implementation details: a method name that doesn't exist (load_policy_yaml), an attribute name that's close but wrong (allowed vs success), and a PyPI publication status that's incorrect (agent-compliance isn't on PyPI).

Secondary: Overcorrection compounded the error

When the API mismatches were discovered, the response was to declare the entire toolkit "hallucinated." This overcorrection was itself a hallucination — anchored on the discovery of specific errors, the LLM (and the session) generalized to "everything about this is fake." The removal commit's framing then became authoritative context for subsequent sessions, which repeated and amplified it.

Contributing: No verification step in the workflow

Neither the spec phase nor the implementation phase included pip install and python -c "import agent_os; help(agent_os.StatelessKernel)". A 30-second check would have caught the wrong method name before 2,268 lines were written.

Contributing: Mocked tests couldn't catch API mismatches

The 29 unit tests mocked agent_os entirely, validating the scanner against its own assumptions. A single test that imported the real package would have revealed load_policy_yaml doesn't exist and result.allowed should be result.success.

Lessons

1. Install and import before writing integration code

Before writing code that wraps an external package, pip install it and help() the actual classes. 30 seconds of verification prevents 2,268 lines of rework.

2. Never mock the thing you're integrating

At least one test must import the real package. Mocking an external dependency proves your code works against your assumptions, not reality.

3. When you find errors, scope the correction precisely

Finding that result.allowed should be result.success does not mean the entire toolkit is fabricated. Overcorrection is itself a form of hallucination — pattern-matching from "some details are wrong" to "everything is fake."

4. Don't trust commit messages as ground truth

The removal commit said "hallucinated by web search." Subsequent sessions treated this as verified fact. Commit messages reflect the author's understanding at commit time, which may be wrong.

5. Verify before repeating

This report itself went through three drafts because each version repeated claims from previous sessions without re-verifying them. The rule for memory applies to incident reports too: a claim from a prior session is not a fact until you check it against a primary source.

Note on this report

This report went through three versions, each correcting errors from the previous one:

v1 (2026-04-09, first draft): Stated the Microsoft Agent Governance Toolkit "does not exist," StatelessKernel was "hallucinated," and GovernanceVerifier was "fabricated." All three claims were wrong. The report inherited the incorrect framing from the removal commit (abe085a) without verifying any claims against primary sources.

v2 (2026-04-09, after user correction): User pointed to the real GitHub repo. Report was rewritten acknowledging the toolkit exists, but still claimed GovernanceVerifier was fabricated and Cedar/Rego support was hallucinated. Both claims were wrong — GovernanceVerifier exists at agent_compliance/verify.py:252 with a real verify() method, and Cedar/Rego support exists via OPABackend and CedarBackend in agent_os/policies/backends.py.

v3 (2026-04-09, after full verification): Every claim verified against GitHub API and actual source code. The actual errors in the integration were narrow: one method that doesn't exist (load_policy_yaml), one wrong attribute name (allowed vs success), one package not on PyPI (agent-compliance). The overcorrection — declaring everything fabricated — was a bigger deviation from truth than the original errors.

This progression demonstrates compounding hallucination: the original LLM session got specific API details wrong. The removal session overcorrected to "everything is fake." The audit session inherited that framing. The report session repeated it. Each layer of LLM processing amplified the error rather than correcting it, because each session trusted the previous session's conclusions as ground truth.

Epilogue: Verification of this report (2026-04-09)

After v2 of this report was written, the user requested that every claim in the report be substantiated against primary sources. The verification was performed by the same Claude Code session that wrote v2, using:

gh api repos/microsoft/agent-governance-toolkit/... to read actual source files from the upstream repo
pip index versions <package> to check PyPI publication status
git show <commit> to read the deleted integration code and compare it against the real API

This verification revealed that v2 still contained false claims inherited from v1: GovernanceVerifier was described as fabricated (it's real, with a working verify() method and compliance_grade() on the attestation), and Cedar/Rego support was described as hallucinated (real CedarBackend and OPABackend classes exist in the policies module).

The narrowing across versions:

v1: "The entire toolkit is fabricated" (wrong — toolkit has 869 stars)
v2: "Toolkit is real but GovernanceVerifier and Cedar/Rego are fabricated" (wrong — both exist)
v3: "Toolkit is real, classes are real, the actual errors were: one wrong method name, one wrong attribute name, one package not on PyPI"

Each correction required going to the primary source (actual source code in the GitHub repo) rather than trusting any prior session's characterization.

The full incident report is also published in our repo at docs/incidents/2026-04-08-fabricated-governance-toolkit.md.