Replies: 17 comments 15 replies
-
|
While I appreciate the attempt to address the context clogging issue caused by always putting the tool response into the LLM context window, it seems to me that the code execution post is primarily dealing with tools that are essentially "API wrappers" that produce structured data, but not tools that also attempt to guide the MCP client with additional instructions and metadata. The layering blog post has been pretty influential to MCP server developers. Some MCP server developers now expect that they can "guide" the MCP client with next steps, query refinements, error responses, and so on. In other words, the response payload is a channel for both "instructions" and "data". By bypassing the LLM context window entirely, MCP servers become no more than APIs. I think this would be a step backwards -- there are MCP servers, that with "guiding instructions", produce a "magical" experience. What I would like to propose is a pattern to give MCP tools two output channels:
I think the MCP protocol today already contains the ingredients for MCP servers to implement this pattern:
But what is still missing is that the MCP client needs to know that that resource can be read programmatically, not fed into the LLM context window. Edit: Perhaps there can be a short list of Edit 2: Added a link to the "layering" blog post. |
Beta Was this translation helpful? Give feedback.
-
|
From the blog post
What's missing from a folder of |
Beta Was this translation helpful? Give feedback.
-
|
I’m a developer deeply involved in building agents using MCP. One recurring challenge I’ve faced is that we often need to feed the entire MCP tool response back into the prompt, which isn’t always optimal and can be wasteful in terms of tokens. That’s why the recent “Code Execution with MCP” post caught my attention, which seems like a promising direction to address this inefficiency. After reading it, I had some thoughts and questions about the core idea. The post suggests letting the agent write and execute code to interact with MCP servers. This effectively adds a post-processing step to transform or filter the MCP tool results before passing them back to the LLM. In my own work, I’ve approached a similar problem a bit differently:
With this setup, I can achieve the same goal, efficient post-processing of MCP responses but without spinning up a separate virtual environment or executing LLM-generated code, which can add unnecessary overhead. I wanted to share this approach with the community to get feedback: |
Beta Was this translation helpful? Give feedback.
-
|
I'm reading the article about code execution. The article talks about an agent using slack and gsuite APIs. How are you supposed to give the agent access to those services? Would you set up your env like any other server with slack and gsuite keys and then an example usage and then tell the agent to follow the precedent set by your example? Secondly, it says that the MCP client intercepts PII data. These clients are like cursor and Claude code right? How can these 1) be validated that the do this implementation correctly and 2) accurately and deterministically identify which data is PII in the first place? |
Beta Was this translation helpful? Give feedback.
-
|
I got confused when reading this article and would like to have some clarification:
What exactly are the points this article trying to make? |
Beta Was this translation helpful? Give feedback.
-
|
Tbh, I'm not really sure where to start af how to implement it in my own projects. I'd appreciate if someone can point to any additional resources (if and when they're available). |
Beta Was this translation helpful? Give feedback.
-
|
Post reading the blog, have some follow-up questions.
|
Beta Was this translation helpful? Give feedback.
-
|
I just incorporated this model into an AI agent I’m building and training, i have built 7 tools, it’s working well for any GET type request, but when i try to get it to POST, the Llm refuses to call the tool and lies about it. i watched my coding AI sonnet 4.5 and my agent using gpt 4 mini arguing over it, it got to the point where my coding AI wrote a new system prompt for the bot in all caps! eventually i had to step in and break it up. Anyway if anyone can explain why the Llm refuses a direct instruction to trigger a POST using an MCP2.0 tool please tell. |
Beta Was this translation helpful? Give feedback.
-
|
Hey guys! Just wrote code_mode support for MCP in our client https://github.com/mcp-use/mcp-use if you want to try it out and check out the implementation.
|
Beta Was this translation helpful? Give feedback.
-
|
I also built a working implementation of this pattern (using the Claude Agent SDK) and ran a few experiments comparing code execution vs direct MCP on real-world GitHub issue analysis tasks: https://github.com/olaservo/code-execution-with-mcp The Readme includes a full breakdown of my first experiments. tldr; version: Large Dataset (5,205 issues):
Small Dataset (45 issues):
Re: @scottyak-datadog's concern about losing "guiding instructions:" this seems to be a valid point. This approach seems best for data-heavy operations where structured data processing matters more than iterative guidance. |
Beta Was this translation helpful? Give feedback.
-
|
Critique: Code Execution vs. Native Introspection & Schema Standards I read the Code Execution with MCP article and wanted to share some thoughts. While it tries to address the very real problem of context window bloat, the proposed solution ("Code Execution" via virtual filesystems) feels somewhat "handwavy" and introduces unnecessary complexity for problems that could be solved more elegantly within the protocol itself. Here are a few specific critiques and alternative proposals:
The article proposes mapping MCP servers to a local file system so agents can "discover" tools by listing directories and reading files. This feels like a workaround. An intelligent agent can already dynamically determine which servers to use based on high-level server descriptions. (i.e. we could say, ask LLMs which servers to use based on description for the task at hand and only discover tools on those servers ... ) We shouldn't need a code execution sandbox just to perform progressive disclosure of tools.
Instead of forcing the LLM to write code to discover tools from each server, a better protocol-level solution would be a standardized, layered introspection flow (similar to GraphQL): Level 1 (Discovery): The client requests a lightweight list of tools containing only names and descriptions. This is cheap and fits in context. Level 2 (Definition): The agent identifies relevant tools and requests the specific Input/Output schemas and annotations for only those tools. Level 3 (Execution): The agent constructs the plan or call. This achieves the same token savings as the article's method but keeps the architecture deterministic and cleaner.
The article's code examples rely on the agent knowing the shape of the return data (e.g., accessing .content or .rows on a result). const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content; However, Output Schemas are not yet a standard in MCP. Without a strict, enforceable contract on what a tool returns, asking an LLM to write code against that return value is incredibly brittle. The model is essentially guessing property names. Whether we use "Code Mode" or standard tool calling, we first need standardized Output Schemas to make reliable agentic workflows possible. [Edit] Adding to this
For instance in our MCP server for SaaS product, we list_tools dynamically based on users' role and access ...
|
Beta Was this translation helpful? Give feedback.
-
|
I've been following the discussion and agree with most of the points made here. In case anyone wanted to try, I developed a basic implementation of the blog here: https://www.npmjs.com/package/@abmalk/mcpcode. Looking for any suggestion on improving it, adding functionality, and thinking of ways to improve the idea. It discovers servers, collects tools/schemas, and generates input and output typescript interfaces. |
Beta Was this translation helpful? Give feedback.
-
|
I think the code execution through MCP is somehow strange to me. It feels like an even bigger nightmare to develop remote code execution. For me there are also two very different use cases for MCP. One very important is to connect applications like Jira, Google Suite and so on through OAuth to a Web Client. I'm not sure how this is expected to work then. But just going from how we humans work and keep our context clean is through "partial discovery". When using a CLI tool I don't know I get get a "cmd help" to get the basics with all possible sub commands. Then there is a "cmd tool help" that returns the details for a given tool. |
Beta Was this translation helpful? Give feedback.
-
|
I found this blog interesting, I wanted to validate getting this working locally (without MCP...), and to run it on a tiny local models. I thought to share my notes for what it may be worth to someone. What I found really interesting is that the tool definition and ergonomics seemed to matter a lot, and the best way to make them effective was to make them obvious. This felt like the only way to scale the tools so that you could run progressive discovery Example of "obvious" tools: When plan() returns list[str], the model knows it needs to iterate. When combine() takes list[str], the model knows to collect results. The generated code became correct without explicit patterns This made the code composition much more reliable for small models, and to me this was the most interesting aspect of the original blog from Anthropic: that you could chain together tools, and the orchestrator model never sees the content of each tool call (unless needed). |
Beta Was this translation helpful? Give feedback.
-
|
Hey all, few days ago I've launched Among other things, it supports JSON output mode ( Thanks to schema validation ( We will add generation of TypeScript server stubs soon to enable code mode also in TS. I think you might find it interesting... |
Beta Was this translation helpful? Give feedback.
-
|
It's interesting to see the different responses and solutions around Code Mode. My take was that I saw the problem in my use cases and I liked the idea, but I didn't like the "code" part, so I built an MCP-based solution for agents to build directed graphs to compose and orchestrate tool calls: https://github.com/TeamSparkAI/mcpGraph. I refer to is as "No Code Code Mode". |
Beta Was this translation helpful? Give feedback.
-
Semantic Routing as an Alternative to Code Execution for MCP EfficiencyReally interesting thread. We've been working on this exact problem from a different angle with OneConnecter. Rather than having agents write code to post-process MCP results, we route queries semantically before they hit any MCP server. The agent sends a natural language query to a single endpoint, and we:
Real-world token savingsWe tested the same query (
That's a 95% token reduction — not from post-processing, but from never loading the irrelevant context in the first place. How it relates to this discussionThe dynamic tool loading point raised earlier in this thread resonates strongly:
This is exactly the approach we took. Instead of exposing 100+ tool definitions across 30+ service agents and letting the LLM figure it out (burning context on every definition), we use semantic vector matching to surface only what's needed per query. The complexity concern is also valid:
With semantic routing, there's no sandbox, no code generation, no execution environment. One MCP endpoint, one Architecturevs the current pattern: We're running this across 30+ service agents (Notion, GitHub, Airbnb, financial data, sports, jobs, etc.) through a single MCP endpoint at Happy to share more detail on the routing architecture if anyone's interested. OneConnecter — Semantic routing layer for MCP. One endpoint, structured data, 95% fewer tokens. ## Semantic Routing as an Alternative to Code Execution for MCP EfficiencyReally interesting thread. We've been working on this exact problem from a different angle with [OneConnecter](https://oneconnecter.io). Rather than having agents write code to post-process MCP results, we route queries semantically before they hit any MCP server. The agent sends a natural language query to a single endpoint, and we:
Real-world token savingsWe tested the same query (
That's a 95% token reduction — not from post-processing, but from never loading the irrelevant context in the first place. How it relates to this discussionThe dynamic tool loading point raised earlier in this thread resonates strongly:
This is exactly the approach we took. Instead of exposing 100+ tool definitions across 30+ service agents and letting the LLM figure it out (burning context on every definition), we use semantic vector matching to surface only what's needed per query. The complexity concern is also valid:
With semantic routing, there's no sandbox, no code generation, no execution environment. One MCP endpoint, one Architecturevs the current pattern: We're running this across 30+ service agents (Notion, GitHub, Airbnb, financial data, sports, jobs, etc.) through a single MCP endpoint at Happy to share more detail on the routing architecture if anyone's interested. [OneConnecter](https://oneconnecter.io) — Semantic routing layer for MCP. One endpoint, structured data, 95% fewer tokens. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Pre-submission Checklist
Discussion Topic
Direct tool calls consume context for each definition and result. Agents scale better by writing code to call tools instead. Here's how it works with MCP.
This is the topic of a recent blog post from Anthropic. Please share your thoughts and experiences here!
Beta Was this translation helpful? Give feedback.
All reactions