Agent Clusters: Scale Any Agent to N Replicas with One Tool Call
MCPWorks Agents already run autonomously with persistent state, BYOAI, and multi-trigger orchestration. But until now, every agent was a single instance. If you needed five agents scraping five data sources on the same schedule with the same config, you created five agents manually.
Today we shipped Agent Clusters. One tool call scales any agent to N replicas.
scale_agent(name="social-scraper", replicas=5)
That is the entire API surface. One tool, two parameters.
What Happens When You Scale
Each replica inherits the parent agent's full configuration — functions, AI engine, schedules, webhooks, channels, MCP server connections, orchestration limits. Replicas share the config but maintain independent state and execution. When the cluster scales up, each new replica gets a unique verb-animal name generated by the platform:
{
"agent": "social-scraper",
"replicas": 5,
"members": [
"daring-duck",
"swift-falcon",
"calm-otter",
"bright-heron",
"keen-wolf"
]
}
The names are deterministic within a cluster but unique across your namespace. They show up in logs, execution history, and chat responses so you always know which replica did what.
Each replica counts as one agent slot against your subscription tier. A Pro account with 5 agent slots can run one cluster of 5, five single agents, or any combination that adds up. Each replica gets the full RAM and CPU allocation for its tier — there is no resource splitting or overcommit.
Schedule Coordination
This is where clusters get interesting. Two modes control how scheduled triggers execute across replicas.
Single Mode (Default)
Exactly-once execution. When a schedule fires, one replica claims the job and the rest skip it. This uses PostgreSQL FOR UPDATE SKIP LOCKED — the same row-locking primitive that powers reliable job queues in production systems everywhere.
add_schedule(
agent_name="social-scraper",
function_name="reporting.daily-summary",
cron_expression="0 12 * * *",
schedule_mode="single"
)
One replica posts the daily summary at noon. Not five identical summaries. Not a race condition. One.
Single mode is the default because it is what you want most of the time. Aggregation jobs, report generation, notifications, cleanup tasks — anything where duplicate execution is a bug.
Cluster Mode
All replicas execute independently when the schedule fires. Every replica runs the function on every tick.
add_schedule(
agent_name="social-scraper",
function_name="scraping.fetch-mentions",
cron_expression="*/15 * * * *",
schedule_mode="cluster"
)
Five replicas each scrape every 15 minutes. If each replica is configured to watch a different data source via its own state, you now have parallel coverage that scales linearly with replicas. Add a sixth source, scale to 6, set the new replica's state.
The same agent can mix both modes across different schedules. The social-scraper cluster above runs fetch-mentions in cluster mode (all five scrape) and daily-summary in single mode (one posts the rollup). This is the pattern for any collect-then-aggregate workload.
Chat Session Affinity
When you chat with a clustered agent, the platform routes your first message to any available replica. The response tells you which one handled it:
{
"reply": "I found 3 new mentions of MCPWorks on Hacker News...",
"replica": "daring-duck"
}
For follow-up messages in the same conversation, pass the replica name to maintain context:
chat_with_agent(
name="social-scraper",
message="Show me the full text of the top-voted one",
replica="daring-duck"
)
The replica parameter is optional. If you omit it, the platform routes to any replica. This is fine for stateless questions. For multi-turn conversations where context matters, pin to a replica.
This is conversational continuity without a session management layer. The replica's persistent state and conversation history handle the rest.
No New Infrastructure
Agent Clusters run on the same PostgreSQL and Redis that already power the platform. Schedule coordination uses SELECT ... FOR UPDATE SKIP LOCKED in PostgreSQL — a standard row-locking query, not a distributed consensus protocol. Replica metadata lives in Redis alongside existing session and rate-limiting data.
There is no RabbitMQ. No Kafka. No ZooKeeper. No external queue service. The infrastructure you already operate (or that MCPWorks Cloud already runs for you) handles clusters without modification.
For self-hosted deployments, this means git pull && docker compose up and clusters work. No new containers, no new ports, no new configuration.
Scale-Down Behavior
When you reduce replicas, the platform removes the newest ones first — LIFO ordering. If you scale from 5 to 3, keen-wolf and bright-heron are removed (the last two created). daring-duck, swift-falcon, and calm-otter continue running.
scale_agent(name="social-scraper", replicas=3)
LIFO preserves the replicas most likely to have active chat sessions and accumulated state. The oldest replicas have been running longest, have the richest state history, and are most likely mid-conversation with a user.
Removed replicas have their state archived, not deleted. Scale back up and the platform creates fresh replicas with new names — it does not resurrect old ones. Clean starts prevent stale state from causing subtle bugs.
Backward Compatibility
Every existing agent is already a cluster of 1. The make_agent tool creates a single-replica cluster. All existing schedules default to single schedule mode. All existing chat_with_agent calls continue to work without the replica parameter.
Nothing changes for agents you never scale. The cluster machinery is invisible until you call scale_agent with a number greater than 1.
When to Use Clusters
Parallel data collection. Five replicas watching five subreddits, five RSS feeds, five competitor websites. Each replica's state holds its assigned source. Cluster-mode schedules run them all in parallel.
High-availability agents. Scale a critical agent to 3 replicas. Single-mode schedules guarantee exactly-once execution with automatic failover — if one replica is mid-execution when a schedule fires, another claims it.
Load distribution for webhooks. A cluster receiving high-volume webhooks distributes incoming requests across replicas. Each replica handles a subset of the traffic with full tier resources.
A/B testing agent configurations. Scale to 2, give each replica different state (different system prompts, different thresholds, different data sources), and compare outputs.
Try It
If you are self-hosting:
git pull origin main
docker compose -f docker-compose.self-hosted.yml build api
docker compose -f docker-compose.self-hosted.yml up -d api
If you are on MCPWorks Cloud, Agent Clusters are live now.
scale_agent(name="my-agent", replicas=3)
That is it. Three replicas, running, coordinated, no YAML required.
GitHub: MCPWorks-Technologies-Inc/mcpworks-api | MCPWorks Cloud | Bluesky: @mcpworks.io
MCPWorks is open source.
Self-host free forever, or try MCPWorks Cloud — 14-day Pro trial, no credit card.