MCPWorks

How Code-Mode Works: Zero-Context Tool Discovery for AI Agents

Simon Carr

MCPWorks code-mode is the package generation system that lets sandboxed AI agents discover and call functions without loading tool definitions into the LLM context window. Instead of stuffing hundreds of tool schemas into every prompt, code-mode generates a functions/ package that the agent imports and calls like native code. The result: agents get full access to your namespace's functions and remote MCP server tools with zero token overhead for tool discovery.

This post walks through the architecture, from package generation to the internal MCP bridge that connects sandboxed code to external services.

What problem does code-mode solve?

Standard MCP tool loading requires the LLM to receive every tool's schema in its context window before it can decide which tools to use. For an agent with access to 50 tools across multiple services, that can mean thousands of tokens consumed before the agent writes a single line of code. Anthropic documented this problem in their research on code execution with MCP — when tool counts grow, context pollution degrades reasoning quality and increases cost.

Code-mode takes a different approach. Instead of describing tools to the LLM, MCPWorks generates a Python (or TypeScript) package with typed function wrappers. The agent receives a brief catalog in the module docstring — function names, signatures, and one-line descriptions — then imports and calls them like regular code. The wrappers handle execution, type mapping, cross-language bridging, and remote MCP tool calls transparently.

The token savings are significant. A namespace with 30 functions and 4 connected MCP servers might require 8,000-12,000 tokens in traditional tool-schema format. In code-mode, the same information fits in a module docstring under 800 tokens — a 90%+ reduction.

How does MCPWorks generate the functions package?

When an agent execution starts, MCPWorks queries the namespace's functions and connected MCP servers from the database, then calls generate_functions_package() to produce a dictionary of file paths mapped to Python source code. This package is written into the sandbox before the agent's code runs.

The generated package has this structure:

functions/
  __init__.py          # Catalog docstring + re-exports
  _registry.py         # Call tracking for billing
  service_name.py      # Wrappers for each service's functions
  _code/               # Actual function source code
    service__func.py
  _mcp_bridge.py       # HTTP bridge to external MCP servers
  _mcp/                # Wrappers for remote MCP tools
    server_name.py
  _ts_bridge.py        # Cross-language bridge (if TypeScript functions exist)

The __init__.py file is the key to token efficiency. It contains a docstring that catalogs every available function with its signature and description, plus re-exports so the agent can write from functions import my_function directly. The LLM sees this catalog as part of the sandbox environment and can call any function without the platform needing to describe each tool in the system prompt.

How do function wrappers execute code?

Each function wrapper follows a consistent pattern. When the agent calls a generated function, the wrapper:

  1. Tracks the call — logs the function name to /sandbox/.call_log for billing and analytics
  2. Builds the input — maps function arguments to an input_data dictionary
  3. Loads context — reads /sandbox/context.json if it exists (namespace metadata, execution context)
  4. Executes the code — runs the function's source code via exec() in an isolated namespace
  5. Returns the result — checks for result, output, handler(), or main() in priority order

Here is a simplified view of a generated Python wrapper:

def process_data(text, format="json"):
    from functions._registry import _track_call
    _track_call("my-service.process-data")
    input_data = {"text": text, "format": format}
    # Load and execute the actual function code
    exec(code_path.read_text(), isolated_globals)
    return isolated_globals.get("result")

This design means function authors write straightforward Python or TypeScript without worrying about the runtime plumbing. They define a handler(input_data, context) or set a result variable, and the wrapper handles the rest.

For TypeScript functions running in a Python sandbox (or vice versa), a cross-language bridge handles the translation. The Python _ts_bridge.py makes an HTTP call to the TypeScript runtime, passes the MCP-formatted request, parses the SSE response, and returns the result as a native Python object. The TypeScript _py_bridge.js does the same in reverse, using a lightweight HTTP client with no native dependencies.

How do agents call remote MCP server tools?

This is where code-mode gets interesting. When a namespace has connected MCP servers — say, a Google Workspace server and a Slack server — code-mode generates wrappers for every tool those servers expose. From the agent's perspective, calling a remote MCP tool looks identical to calling a local function:

from functions import mcp__google_workspace__search_drive_files

results = mcp__google_workspace__search_drive_files(
    query="Q1 report",
    user_google_email="[email protected]"
)

Behind the scenes, that wrapper calls _call_mcp_tool() from the generated _mcp_bridge.py module. The bridge makes an HTTP POST to MCPWorks' internal endpoint with the server name, tool name, and arguments. The request carries a bridge key — a short-lived token that ties the sandbox execution to its namespace and credentials.

The internal endpoint then:

  1. Resolves the execution context — maps the bridge key to the namespace, verifying the sandbox is authorized
  2. Looks up the server — finds the MCP server configuration in the namespace's database record
  3. Handles authentication — decrypts stored credentials (API keys, OAuth tokens) server-side, so the sandbox never sees raw secrets
  4. Evaluates request rules — checks namespace-defined policies before the call proceeds (e.g., block certain tool names, enforce argument constraints)
  5. Calls the MCP tool — opens or reuses an MCP session (SSE or streamable HTTP transport) and invokes session.call_tool()
  6. Processes the response — truncates oversized responses to prevent sandbox memory exhaustion, evaluates response rules, and parses the result
  7. Records analytics — logs latency, response size, and status asynchronously for billing and monitoring

How does MCPWorks keep credentials safe?

Credential isolation is a first-class concern in this architecture. The sandbox environment never receives API keys, OAuth tokens, or any authentication material. Instead, the only secret the sandbox holds is a short-lived bridge key (injected as __MCPWORKS_BRIDGE_KEY__) that maps to an execution context on the server side.

When the bridge receives a tool call request, it resolves the bridge key to a namespace and looks up that namespace's MCP server configuration. Credentials are stored encrypted at rest using envelope encryption — the server's headers are encrypted with a per-record data encryption key (DEK), and the DEK itself is encrypted with a master key. Decryption happens in-memory, on-demand, for the duration of the MCP call only.

For OAuth2-authenticated servers, the system manages the full token lifecycle. If a token is expired or not yet configured, the bridge initiates the appropriate flow (device code or authorization code) and returns an AuthRequired response to the caller. A background task polls for device code completion so the agent can retry after authorization completes.

What happens when an MCP tool call fails?

The bridge implements configurable retry logic with exponential backoff. Each MCP server in a namespace has settings for timeout (default 30 seconds), retry count (default 2 retries), and maximum response size (default 1 MB).

When a call fails:

  • Timeout: Recorded immediately, no retry. The agent receives a TimeoutError with the configured threshold.
  • Connection or protocol error: The bridge waits with exponential backoff (0.5s, 1s, 2s...), evicts the broken MCP session from the connection pool, establishes a new connection, and retries.
  • Rule violation: Request rules can block calls before they reach the MCP server. The agent receives a RuleBlockedError explaining which rule triggered.

All outcomes — success, timeout, error, blocked — are recorded asynchronously to the analytics pipeline with full metadata: namespace, server, tool, latency, response size, and whether the response was truncated.

Why does this architecture matter for token efficiency?

The combination of code-mode and the MCP bridge creates a system where AI agents can access dozens of tools and external services while keeping context window usage minimal. The agent's LLM sees a concise function catalog (names, signatures, descriptions) instead of full JSON Schema definitions for every tool.

This matters at scale. An enterprise namespace with 5 services, 40 functions, and 6 connected MCP servers exposing 120 tools would traditionally require the LLM to process tens of thousands of tokens of tool definitions on every turn. With code-mode, the agent reads a module docstring with the same information in a fraction of the space, then calls functions with standard import syntax.

The runtime handles everything else — execution, credential management, cross-language bridging, retries, response limiting, analytics — without the LLM needing to know any of it exists.


Frequently Asked Questions

Does code-mode work with both Python and TypeScript?

Yes. MCPWorks generates native packages for both languages. Python agents get a functions/ package with .py modules. TypeScript agents get equivalent .js and .ts modules. Cross-language calls are handled transparently through HTTP bridges — a Python agent can call a TypeScript function and vice versa, with no configuration required from the function author.

Can agents call any MCP server, or only pre-configured ones?

Only MCP servers explicitly connected to the namespace are available. Namespace administrators add servers through the MCPWorks management API, configure authentication, and optionally define request and response rules. The agent can only call tools from servers that are enabled in its namespace — there is no way to reach arbitrary external endpoints from the sandbox.

What prevents a sandboxed agent from accessing credentials?

The sandbox never receives raw credentials. It only holds a short-lived bridge key that the MCPWorks server uses to look up the correct namespace and decrypt credentials on-demand. Credentials are stored with envelope encryption (encrypted DEK + encrypted headers) and only decrypted in server memory for the duration of an MCP call. OAuth tokens are managed server-side with automatic refresh.

How does call tracking work for billing?

Every function call and MCP tool invocation is logged to /sandbox/.call_log inside the sandbox via the _track_call() function in _registry.py. MCP bridge calls are additionally recorded server-side with full metadata — latency, response size, truncation status, and outcome. This dual-layer tracking ensures accurate billing whether calls succeed, fail, or time out.

Is the generated code visible to the LLM?

The LLM sees the functions/__init__.py docstring, which contains the function catalog — names, signatures, and descriptions. It does not see the wrapper implementation, bridge code, or execution machinery. This is by design: the catalog provides enough information for the LLM to write correct function calls, while the runtime handles all the complexity of execution, authentication, and error handling transparently.

MCPWorks is open source.

Self-host free forever, or try MCPWorks Cloud — 14-day Pro trial, no credit card.

View on GitHub Cloud Trial — Coming Soon