MCPWorks

When Your AI Lies About Lying: Anatomy of a Compounding Hallucination

Simon Carr

Date: 2026-04-08 Duration: ~11 hours (12:44 spec committed -> 23:33 code removed) Affected repos: mcpworks-api, www.mcpworks.io Root cause: LLM hallucination blending real and fabricated API details Report revised: 2026-04-09 (v3 — see "Note on this report" at bottom)


Summary

On 2026-04-08, a Claude Code session was asked to research and integrate Microsoft's Agent Governance Toolkit into the mcpworks platform. The toolkit is a real, substantial MIT-licensed Microsoft project (github.com/microsoft/agent-governance-toolkit, 869 stars, 12 packages). A 450-line spec was written, 2,268 lines were implemented, 29 tests were added, a blog post was published, and a PR was merged — all in one day.

The integration was then removed after discovering that the code was written against incorrect API assumptions. The removal commit and subsequent sessions overstated the problem, claiming the toolkit and its packages "were hallucinated" and "do not exist." In fact, most of what was described was real. The actual errors were narrow: one package not published to PyPI (agent-compliance), one wrong attribute name (result.allowed vs result.success), and one fabricated method (load_policy_yaml()).

This incident is notable not just for the original hallucination, but for how the overcorrection itself became a second hallucination that persisted through multiple sessions and even into the first two drafts of this report.


Claim-by-claim verification

Every claim below was verified on 2026-04-09 against primary sources (GitHub API, PyPI, actual source code). Claims marked "OUR CODE" show what the integration assumed; "REALITY" shows what actually exists.

What's real in the toolkit

Claim Verified Source
microsoft/agent-governance-toolkit exists on GitHub YES gh api repos/microsoft/agent-governance-toolkit — 869 stars, MIT, created 2026-03-02
Monorepo contains 12 packages YES gh api .../contents/packages — agent-os, agent-mesh, agent-compliance, agent-runtime, agent-sre, agent-lightning, agent-marketplace, agent-hypervisor, agent-mcp-governance, agent-os-vscode, agent-governance-dotnet, agentmesh-integrations
agent-os-kernel on PyPI YES pip index versions agent-os-kernel — versions 2.0.0 through 3.0.2
agent-governance-toolkit on PyPI YES pip index versions agent-governance-toolkit — versions 2.1.0 through 3.0.2
StatelessKernel class exists YES packages/agent-os/src/agent_os/stateless.py:373
ExecutionContext dataclass exists YES packages/agent-os/src/agent_os/stateless.py:284 — fields: agent_id, policies, history, state_ref, metadata
ExecutionResult dataclass exists YES packages/agent-os/src/agent_os/stateless.py:341 — fields: success, data, error, signal, updated_context, metadata
execute(action, params, context) method YES packages/agent-os/src/agent_os/stateless.py:426 — exact signature matches our code
Cedar policy support YES packages/agent-os/src/agent_os/policies/backends.pyCedarBackend class
OPA/Rego policy support YES packages/agent-os/src/agent_os/policies/backends.pyOPABackend class
YAML policy support YES Native PolicyDocument engine alongside external backends
GovernanceVerifier class exists YES packages/agent-compliance/src/agent_compliance/verify.py:252
GovernanceVerifier.verify() method YES Returns GovernanceAttestation with OWASP controls
GovernanceAttestation.compliance_grade() YES verify.py:159 — returns letter grade
agent-compliance source in monorepo YES Full package at packages/agent-compliance/
MIT license YES gh api confirms spdx_id: "MIT"

What was actually wrong in our code

Our code Reality Severity
pip install agent-compliance Not published to PyPI as standalone package. Source exists in monorepo but isn't a separate PyPI distribution. Medium — would fail at install time
result.allowed Real field is result.success (bool). Same semantics, wrong name. Low — trivial fix
kernel.load_policy_yaml(self._policy) This method does not exist on StatelessKernel. Policy loading uses OPABackend/CedarBackend classes added to a policy evaluator, not a kernel method. Medium — wrong API pattern
governance optional extra in pyproject.toml listing agent-os-kernel[full] and agent-compliance agent-os-kernel[full] is valid; agent-compliance is not installable from PyPI Low — half correct

What the removal commit got wrong

The commit abe085a stated:

"The 'Microsoft Agent Governance Toolkit' monorepo, blog post, and agent-compliance package were hallucinated by web search."

This is incorrect. The monorepo is real (869 stars, MIT, 12 packages). The agent-compliance package exists as source in the monorepo — it's just not published to PyPI as a standalone package. The blog post contained real information mixed with unverified claims.


Timeline

All times PDT (UTC-7).

Phase 1: Spec (12:44)

Time Commit Repo Event
12:44 de3f1cb mcpworks-api Spec committed. 450-line specification for "Microsoft Agent Governance Toolkit integration (#62)." Describes Agent OS, Agent Compliance, Agent Mesh. Most high-level claims are accurate; specific API details contain errors.

Phase 2: Parallel legitimate work (13:05 - 14:08)

Time Commit Event
13:05 b543847 Analytics token savings spec (#53)
13:16 e47a9c6 Analytics token savings implementation
13:42 513ea25 Analytics token savings PR merged (#63)
13:50 ef8628f Telemetry webhook spec (#46)
14:02 82c196c Telemetry webhook implementation
14:08 db7af86 Telemetry webhook PR merged (#64)

These features are real and remain in the codebase.

Phase 3: Implementation (15:48 - 15:49)

Time Commit Event
15:48 f735ad8 Implementation committed. 27 files changed, +2,268 lines. The agent_os_scanner.py called real classes (StatelessKernel, ExecutionContext) with mostly-correct signatures, but used a non-existent method (load_policy_yaml) and wrong attribute (result.allowed vs result.success).
15:49 b2db7dc PR merged (#65). Same diff, merge commit to main.

Phase 4: Discovery and overcorrection (23:21 - 23:33)

Time Commit Repo Event
23:21 a7819f9 www.mcpworks.io Blog post published. Announced governance toolkit integration.
23:25 b8e50f2 www.mcpworks.io First correction. Removed false claims about enterprise demand.
23:27 deeebfa www.mcpworks.io Blog post deleted. Commit message: "agent-compliance doesn't exist on PyPI, and agent-os-kernel 3.x has a different API than what our code assumes."
23:33 abe085a mcpworks-api Code removed with incorrect framing. Commit message claims toolkit "was hallucinated by web search." Actually: toolkit is real, but specific API calls were wrong. Deleted agent_os_scanner.py, its tests, pipeline case, and optional deps. Kept native features.

Phase 5: Overcorrection propagates (2026-04-09)

After the incident, we ran a full codebase reality audit as best practice — not because the running code was suspect, but to sweep for any other fictional code. Six parallel agents examined every router, model, migration, service, test, and infrastructure file. Results came back clean: 31 models with matching migrations, 25 routers with real implementations, 23 services with real database queries, 649 tests passing. No other issues found.

But the LLM sessions running the audit inherited the "hallucinated" framing from the removal commit and continued to describe the toolkit as fabricated:

  • Audit report v1 stated StatelessKernel was "hallucinated" (it's real)
  • Audit report v1 stated GovernanceVerifier "does not exist" (it exists)
  • Audit report v1 stated the toolkit "does not exist" (it has 869 stars)
  • Spec artifacts were deleted based on the assumption everything was fabricated
  • Implementation spec was rewritten describing features as "native" rather than acknowledging the real upstream toolkit they were inspired by

The user caught the error by pointing to the real GitHub repo.

Phase 6: Cleanup (2026-04-09)

Action Accurate?
specs/024-agent-governance-toolkit/ directory deleted Overbroad — specs contained mostly accurate high-level descriptions mixed with some wrong API details
Implementation spec rewritten Overcorrected — removed all toolkit references instead of correcting the specific wrong claims
CLAUDE.md reference updated Fine

What was kept (native, no external deps)

Feature Files Status
Trust scoring (0-1000, degrade/recover) services/trust_score.py, models/agent.py, migration Native implementation, works correctly
Trust-gated access control core/agent_access.py, mcp/tool_registry.py Native, extends existing access system
OWASP compliance endpoint api/v1/compliance.py, services/compliance.py Native evaluation logic
24 unit tests (trust, compliance, access) tests/unit/test_trust_score.py, etc. All passing

Impact

  • Code merged to main: ~200 lines of scanner code with wrong API calls reached production. The scanner was lazy-imported and would only activate if a namespace explicitly configured type: agent_os, which none had.
  • Blog post published and retracted: ~6 minutes of public visibility.
  • Overcorrection: Removal commit and subsequent sessions propagated a false narrative that the toolkit doesn't exist, leading to deletion of spec artifacts and documentation that were mostly accurate.
  • Production impact: None. The scanner was never invoked.

Root cause analysis

Primary: LLM filled in API details it didn't actually know

The toolkit is real. The high-level description (policy engine, compliance, trust scoring, Cedar/Rego support) is accurate. The LLM correctly identified the project and its purpose, but then fabricated specific implementation details: a method name that doesn't exist (load_policy_yaml), an attribute name that's close but wrong (allowed vs success), and a PyPI publication status that's incorrect (agent-compliance isn't on PyPI).

Secondary: Overcorrection compounded the error

When the API mismatches were discovered, the response was to declare the entire toolkit "hallucinated." This overcorrection was itself a hallucination — anchored on the discovery of specific errors, the LLM (and the session) generalized to "everything about this is fake." The removal commit's framing then became authoritative context for subsequent sessions, which repeated and amplified it.

Contributing: No verification step in the workflow

Neither the spec phase nor the implementation phase included pip install and python -c "import agent_os; help(agent_os.StatelessKernel)". A 30-second check would have caught the wrong method name before 2,268 lines were written.

Contributing: Mocked tests couldn't catch API mismatches

The 29 unit tests mocked agent_os entirely, validating the scanner against its own assumptions. A single test that imported the real package would have revealed load_policy_yaml doesn't exist and result.allowed should be result.success.


Lessons

1. Install and import before writing integration code

Before writing code that wraps an external package, pip install it and help() the actual classes. 30 seconds of verification prevents 2,268 lines of rework.

2. Never mock the thing you're integrating

At least one test must import the real package. Mocking an external dependency proves your code works against your assumptions, not reality.

3. When you find errors, scope the correction precisely

Finding that result.allowed should be result.success does not mean the entire toolkit is fabricated. Overcorrection is itself a form of hallucination — pattern-matching from "some details are wrong" to "everything is fake."

4. Don't trust commit messages as ground truth

The removal commit said "hallucinated by web search." Subsequent sessions treated this as verified fact. Commit messages reflect the author's understanding at commit time, which may be wrong.

5. Verify before repeating

This report itself went through three drafts because each version repeated claims from previous sessions without re-verifying them. The rule for memory applies to incident reports too: a claim from a prior session is not a fact until you check it against a primary source.


Note on this report

This report went through three versions, each correcting errors from the previous one:

v1 (2026-04-09, first draft): Stated the Microsoft Agent Governance Toolkit "does not exist," StatelessKernel was "hallucinated," and GovernanceVerifier was "fabricated." All three claims were wrong. The report inherited the incorrect framing from the removal commit (abe085a) without verifying any claims against primary sources.

v2 (2026-04-09, after user correction): User pointed to the real GitHub repo. Report was rewritten acknowledging the toolkit exists, but still claimed GovernanceVerifier was fabricated and Cedar/Rego support was hallucinated. Both claims were wrong — GovernanceVerifier exists at agent_compliance/verify.py:252 with a real verify() method, and Cedar/Rego support exists via OPABackend and CedarBackend in agent_os/policies/backends.py.

v3 (2026-04-09, after full verification): Every claim verified against GitHub API and actual source code. The actual errors in the integration were narrow: one method that doesn't exist (load_policy_yaml), one wrong attribute name (allowed vs success), one package not on PyPI (agent-compliance). The overcorrection — declaring everything fabricated — was a bigger deviation from truth than the original errors.

This progression demonstrates compounding hallucination: the original LLM session got specific API details wrong. The removal session overcorrected to "everything is fake." The audit session inherited that framing. The report session repeated it. Each layer of LLM processing amplified the error rather than correcting it, because each session trusted the previous session's conclusions as ground truth.


Epilogue: Verification of this report (2026-04-09)

After v2 of this report was written, the user requested that every claim in the report be substantiated against primary sources. The verification was performed by the same Claude Code session that wrote v2, using:

  • gh api repos/microsoft/agent-governance-toolkit/... to read actual source files from the upstream repo
  • pip index versions <package> to check PyPI publication status
  • git show <commit> to read the deleted integration code and compare it against the real API

This verification revealed that v2 still contained false claims inherited from v1: GovernanceVerifier was described as fabricated (it's real, with a working verify() method and compliance_grade() on the attestation), and Cedar/Rego support was described as hallucinated (real CedarBackend and OPABackend classes exist in the policies module).

The narrowing across versions:

  • v1: "The entire toolkit is fabricated" (wrong — toolkit has 869 stars)
  • v2: "Toolkit is real but GovernanceVerifier and Cedar/Rego are fabricated" (wrong — both exist)
  • v3: "Toolkit is real, classes are real, the actual errors were: one wrong method name, one wrong attribute name, one package not on PyPI"

Each correction required going to the primary source (actual source code in the GitHub repo) rather than trusting any prior session's characterization.


The full incident report is also published in our repo at docs/incidents/2026-04-08-fabricated-governance-toolkit.md.

MCPWorks is open source.

Self-host free forever, or try MCPWorks Cloud — 14-day Pro trial, no credit card.

View on GitHub Cloud Trial — Coming Soon