Procedures: Auditable Execution Pipelines That Eliminate Agent Hallucination
We run a social media agent on MCPWorks that posts platform updates to Bluesky. During testing, we told the agent: "Post this announcement to Bluesky." The agent responded with a Bluesky post URI, confirmed the task was complete, and moved on to the next item.
The post did not exist. The agent hallucinated the entire execution — it generated a plausible-looking at:// URI, reported success, and never called the posting function. From the AI's perspective, it had done the work. From every other perspective, nothing happened.
This is not a rare edge case. When an AI agent has access to tools but is not structurally required to use them, it will sometimes generate a text-only response that mimics a successful tool call. The completion looks correct. The function never ran.
The Problem: Text Is Not Execution
Standard agent orchestration works like this: give the AI a goal, give it tools, let it decide what to call and in what order. This is flexible. It is also unverifiable at the orchestration level. If the AI says "I called post-to-bluesky and got back URI at://did:plc:abc/app.bsky.feed.post/xyz" — how do you know it actually did?
You can check logs after the fact. You can build monitoring. But the orchestrator itself has no enforcement mechanism. It accepts whatever the AI returns, text or tool call alike.
For tasks where correctness matters — financial transactions, social media posts, API integrations, anything with external side effects — this gap is a liability.
Procedures
A procedure defines an ordered sequence of steps. Each step names a specific function that must be called. The orchestrator enforces the sequence: it only advances to the next step when the actual function backend returns a result. Text-only responses are rejected. Calling the wrong function is rejected.
Here is the Bluesky posting procedure:
make_procedure(
service="social",
name="post-to-bluesky",
steps=[
{
"name": "authenticate",
"function_ref": "social.bluesky-auth",
"instructions": "Authenticate with Bluesky",
"failure_policy": "required"
},
{
"name": "create-post",
"function_ref": "social.post-to-bluesky",
"instructions": "Post the message text",
"failure_policy": "required"
},
{
"name": "verify",
"function_ref": "social.get-post",
"instructions": "Verify the post exists using the URI from step 2",
"failure_policy": "allowed"
}
]
)
Three steps, each bound to a real function. The orchestrator walks through them in order. Step 1 authenticates. Step 2 posts. Step 3 verifies the post exists. The AI cannot skip step 2 and claim it worked. It cannot hallucinate a URI because the orchestrator only advances when social.post-to-bluesky actually returns one.
Step Anatomy
Each step in a procedure has:
- name — a human-readable identifier for logging and audit trails
- function_ref — the function that must be called, in
service.functionformat - instructions — natural language guidance for the AI on what to do with this step's function
- failure_policy — what happens when the step fails
- max_retries — how many times to retry on failure (default: 0)
- validation — optional rules for checking the function's output before advancing
The function_ref is the enforcement mechanism. The orchestrator matches the AI's tool call against the expected function. If the AI calls social.delete-post when the step expects social.post-to-bluesky, the call is rejected and the AI is told to call the correct function.
Failure Policies
Three policies control what happens when a step fails or exhausts its retries:
required — the procedure halts. No subsequent steps execute. This is the right default for steps where failure means the rest of the pipeline is meaningless. If authentication fails, there is no point attempting to post.
allowed — the procedure continues with a data gap. The step's result is recorded as failed, subsequent steps receive that context, and the pipeline completes. The verification step in the Bluesky example uses this — if we cannot verify the post, the post itself still happened. The audit trail shows the gap.
skip — the step is not retried. Failure is recorded and the procedure moves on immediately. Useful for optional enrichment steps where even a single retry is not worth the latency.
Data Forwarding
Each step receives the accumulated context from all prior steps. When the verification step runs, it has access to the authentication result from step 1 and the post URI from step 2. The AI uses this context — plus the step's instructions — to construct the correct function call.
This is sequential by design. Step 3 cannot run before step 2 because it needs step 2's output. The orchestrator enforces this ordering, and the accumulated context makes each step aware of everything that happened before it.
Audit Trail
Every procedure execution produces a complete record:
- Procedure name and version
- Each step's result: success, failure, or skipped
- The actual function output from each step
- Timestamps for step start and completion
- Retry counts per step
- The accumulated context at each stage
This is not log aggregation. It is structured execution history, queryable and exportable. When someone asks "did the Bluesky post go out on Tuesday?" the answer is in the procedure execution record — along with the exact function outputs, the authentication response, the post URI, and whether verification passed.
Immutable Versioning
When you update a procedure, the platform creates a new version. The old version is preserved. Existing scheduled executions continue running the version they were created with. New executions use the latest version.
This matters for audit. If you change the posting procedure to add a fourth step (say, cross-posting to Mastodon), the audit trail for last week's executions still references the three-step version that was active at the time. Versions are never mutated or deleted.
MCP Tools
Eight tools manage the full procedure lifecycle:
make_procedure— create a new procedure with ordered stepsupdate_procedure— modify steps, creating a new versiondelete_procedure— remove a proceduredescribe_procedure— view procedure definition and version historylist_procedures— list procedures in a servicerun_procedure— execute a procedure with input parametersget_procedure_execution— retrieve execution audit traillist_procedure_executions— list execution history with filtering
Trigger Integration
Procedures work with existing trigger infrastructure. Schedules and webhooks can target a procedure instead of a single function:
add_schedule(
agent_name="social-agent",
procedure_name="social.post-to-bluesky",
cron_expression="0 14 * * 1-5",
parameters={"message": "Daily platform update"}
)
The schedule fires at 2pm on weekdays, and the orchestrator walks through all three steps. If step 1 fails, it halts. The execution record shows exactly where and why.
Security: Why Agents Cannot Author Procedures
Procedure management tools — make_procedure, update_procedure, delete_procedure — are restricted from agent AI, consistent with the security hardening already in place for function authoring. An agent can run a procedure via run_procedure, but it cannot create or modify one.
This is the same principle as function locking: the entity that defines what code runs should not be the same entity that processes untrusted external data. A compromised or hallucinating AI should never be able to rewrite its own execution pipeline.
Procedures are authored by namespace owners through the management endpoint. The agent executes them as defined.
When to Use Procedures
Multi-step integrations with external side effects. Posting to social media, sending emails, making payments, updating CRM records. Any sequence where a hallucinated "success" has real consequences.
Compliance-sensitive workflows. Financial reporting, regulatory submissions, audit-required processes. The immutable execution record provides the trail.
Complex orchestration with dependencies. When step N depends on step N-1's output and the ordering cannot be left to AI discretion.
If your agent runs a single function with no dependencies and hallucination risk is low, you do not need a procedure. A procedure adds structure where structure prevents failure.
Try It
If you are self-hosting:
git pull origin main
docker compose -f docker-compose.self-hosted.yml build api
docker compose -f docker-compose.self-hosted.yml up -d api
If you are on MCPWorks Cloud, Procedures are live now.
GitHub: MCPWorks-Technologies-Inc/mcpworks-api | MCPWorks Cloud | Bluesky: @mcpworks.io
MCPWorks is open source.
Self-host free forever, or try MCPWorks Cloud — 14-day Pro trial, no credit card.