We Built a Pluggable Prompt Injection Defense — Not a Scanner, a Framework
We shipped three major features tonight: a pluggable security scanner pipeline, queryable execution debugging, and a batch of reliability fixes. Here's what matters and why.
The prompt injection problem nobody has solved
Every prompt injection defense has known bypasses. Regex catches 60-70% of naive attacks. ML classifiers catch 80-90% but drop to ~70% on out-of-domain data. LLM-as-judge catches novel attacks but costs 200-2000ms per call and is vulnerable to the same model exploiting it.
The industry consensus — from Google DeepMind, Microsoft, Anthropic, and OWASP — is that no single technique is sufficient. Defense-in-depth is mandatory.
So we didn't build a scanner. We built the framework for building your defense.
Three scanner types, one pipeline
Every function execution in MCPWorks now runs through a configurable scanner pipeline. Three scanner types:
Built-in — ships with MCPWorks, zero dependencies. Pattern-based injection detection, secret redaction, trust boundary wrapping. Works out of the box on every deployment.
Webhook — POST to any external HTTP service. Run Lakera Guard, a custom classifier, an LLM-as-judge — anything that speaks HTTP. Register it in one MCP tool call:
add_security_scanner(
type="webhook",
name="lakera-guard",
direction="output",
config={"url": "https://guard.internal/scan", "timeout_ms": 2000}
)
Python callable — import any Python module. Self-hosters can run LLM Guard's DeBERTa classifier or custom models in-process, no network hop:
add_security_scanner(
type="python",
name="llm-guard",
direction="output",
config={"module": "llm_guard_adapter", "function": "scan"}
)
The pipeline evaluates scanners in order. Block short-circuits — if any scanner says block, we stop. Highest severity wins across all verdicts. Fail-open by default (configurable to fail-closed per namespace).
Every scan decision is observable
This pairs with our new execution debugging system. Every function execution now creates a persistent, queryable record — inputs, outputs, errors, stdout/stderr, and scan results. Query it via REST API or MCP tools:
list_executions(service="social", function="post-to-bluesky", status="failed")
describe_execution(execution_id="...")
When a scanner flags content, you can see exactly which scanner flagged it, what confidence score it assigned, and what pattern matched. No black boxes.
What else shipped
Procedure retry intelligence. When a procedure step fails, the LLM now sees the actual error message on retry — not a generic "previous attempt failed." This fixes the case where a Bluesky post failed with "313 graphemes, max 300" and the LLM retried three times with identical input because it didn't know what went wrong.
Scheduled function cross-calls. Direct-mode scheduled functions that use from functions import ... now get the functions package generated. Previously this only worked in code-mode, causing 73+ consecutive failures for a lead generation dashboard.
Sandbox seccomp fix. The lstat syscall was missing from the nsjail seccomp allowlist, causing SIGSYS (exit code 159) on any sandbox code using os.lstat() or pathlib.
The philosophy
MCPWorks is a platform, not a product. We don't tell you which prompt injection scanner to use — we give you the pipeline to compose your own defense stack. Ship with sensible defaults. Let you plug in what works for your threat model.
The webhook scanner protocol is intentionally simple. Build a compatible endpoint in 10 lines of Python. The Python scanner interface is a single function signature. The barrier to writing your own scanner is as low as we could make it.
Every decision is logged. Every scan result is queryable. Bring your own defense, observe everything.
MCPWorks is open-source under BSL 1.1. Self-host with docker compose up or use MCPWorks Cloud.
MCPWorks is open source.
Self-host free forever, or try MCPWorks Cloud — 14-day Pro trial, no credit card.