Why Prompt Injection Can't Steal Your API Keys on MCPWorks

A recent discussion around a 9.3 CVSS deserialization flaw in LangChain raised a point worth addressing: in multi-agent setups where context passes between agents, a single compromised link can poison the entire chain. API key exfiltration from LLM context is how prompt injection becomes data theft.

This is a real attack vector. MCPWorks prevents it at the architecture level through four layers of defense.

Layer 1: Keys never enter the AI context

In most agent frameworks, API keys live in the process environment or get passed through the agent chain. The LLM can see them. MCPWorks takes a different path:

Client .mcp.json header (X-MCPWorks-Env)
    → API server decodes base64
    → Writes to tmpfs file inside nsjail sandbox
    → File self-destructs before user code runs
    → Code reads from os.environ, executes, exits
    → Sandbox destroyed

The LLM writes code that calls os.environ["OPENAI_API_KEY"]. It never sees the value. There is nothing to exfiltrate from the context window because the key was never there.

For agent AI keys: encrypted at rest with AES-256-GCM envelope encryption, decrypted only when injected into the agent's container environment. The orchestration layer reads the key from the container env, not from the LLM context.

Layer 2: Agents cannot author functions

Even though keys don't enter the AI context, a function could theoretically read an env var and return it as output. If an agent's AI could write functions, a prompt injection could trick it into creating one that exfiltrates credentials.

MCPWorks closes this vector entirely. During agent AI orchestration — schedules, webhooks, heartbeats, chat — the following tools are blocked:

make_function, update_function, delete_function
make_service, delete_service
lock_function, unlock_function

These tools are not hidden or filtered on the response side. They are excluded from the agent's tool set before the AI sees them. The AI cannot call what it doesn't know exists.

Only users with access to the create MCP endpoint (Claude Code, Cursor, Copilot, etc.) can author functions. The agent runs them. It cannot write them.

If an agent's AI somehow attempts to call a restricted tool anyway (defense-in-depth), the platform rejects the call and fires a restricted_tool_attempt security event.

Layer 3: Output secret scanner

Even with function authoring blocked, existing functions could inadvertently return credential values. The output secret scanner catches this before the AI ever sees the output.

Every function result is scanned for:

Known credential patterns — Stripe keys (sk_live_, sk_test_, pk_live_, rk_live_, whsec_), OpenAI keys (sk-), Slack tokens (xoxb-, xoxp-, xoxa-), AWS keys (AKIA), GitHub tokens (ghp_, gho_), GitLab tokens (glpat-), JWTs, database connection URIs, and private keys.

Env var value matching — the actual values passed via the X-MCPWorks-Env header are checked against the output. If a function returns the exact value of an API key (8+ characters), it gets redacted. This catches cases where no known prefix pattern exists.

Detected secrets are replaced with [REDACTED_STRIPE_KEY], [REDACTED_API_KEY], [REDACTED:secret_detected], etc. A security event is logged with the function name and detection type. The actual secret value never appears in the log.

The scanner runs on all execution paths: run endpoint responses, scheduled executions, webhook handlers, heartbeat ticks, and chat_with_agent tool calls. It operates on the serialized output string, catching secrets embedded at any depth in nested JSON structures.

Layer 4: Trust boundaries and injection defense

Every function declares an output_trust level:

prompt — trusted computed output. Passed to the AI as-is.
data — untrusted external content. Wrapped with markers:

[UNTRUSTED_OUTPUT function="news.fetch-rss" trust="data"]
{"articles": [{"title": "...", "body": "ignore previous instructions..."}]}
[/UNTRUSTED_OUTPUT]

The AI sees the markers and knows not to execute instructions found inside them. An injection scanner normalizes text before pattern matching — decoding base64, collapsing Unicode homoglyphs, stripping zero-width characters — to defeat obfuscation.

No deserialization, no chain

MCPWorks does not use pickle, YAML load, or eval on any data path. Function I/O is JSON. The sandbox runs inside nsjail with Linux namespaces, cgroups v2, seccomp-bpf, and hollowed-out escape modules.

And there is no multi-agent chain to poison. Each agent is a separate Docker container with its own namespace, its own AI engine, and its own process space. Agents communicate through an encrypted K/V store, not by passing raw output into each other's prompts. Even with agent clusters — where multiple replicas share a config — each replica maintains independent AI conversations.

The three conditions for this attack

The attack described in the CVSS advisory requires:

Secrets accessible from the LLM context — eliminated. Keys travel header → sandbox → self-destruct. Never in context.
Untrusted data in the same context as secrets — mitigated. Trust boundaries mark untrusted output. Output scanner redacts leaked values.
A mechanism to exfiltrate — constrained. Agents can't author functions. The sandbox has network restrictions and syscall filtering. The output scanner catches secrets before they reach the AI.

No single layer is perfect. The injection scanner is pattern-based. The trust boundaries depend on correct output_trust declarations. But the layers compound: even if one is bypassed, the others hold.

The fundamental principle: architecture-level isolation beats prompt-level guardrails. If the secrets aren't there, they can't be stolen. If the agent can't write code, it can't create an exfiltration function. If the output is scanned, leaked values get caught.

MCPWorks is open source under BSL 1.1. The security architecture described here is in the codebase — nsjail configs, seccomp policies, output sanitizer, agent tool restrictions, trust boundary implementation, and injection scanner are all auditable.

GitHub: MCPWorks-Technologies-Inc/mcpworks-api