Skip to content

Conversation

@patwhite
Copy link
Contributor

@patwhite patwhite commented Apr 11, 2025

EDITED May 12, 2025

Motivation and Context

This proposal, along with #334, is an effort to make the MCP protocol slightly more agnostic towards whether MCP servers should be small, single use domain constrained servers, or larger, multi-purpose servers. At a high level, the various list methods suffer from the issue that a) they may be unbounded in terms of length, leading to clients making decisions about how many tools to list, and b) even shorter tool lists (in the range of 100) can produce token counts in the 50-100k token range. That increases cost as well as potentially decreases accuracy of the llm in selecting a tool (when presented with a large list).

This is a proposal for an OPTIONAL capability for larger servers to expose search endpoints on tools, resources, resource templates, prompts, and potentially other capabilities coming down the pike. It proposes a recommended flow for clients to prefer search over listing tools, and to present LLM's with an initial tool set that starts with search and escalates to tool calls.

There are native ways to support search using progressive tool disclosure. This is an effort to standardize general efforts around search and create a situation where any generic, well implemented client can connect to any generic, well implemented server and achieve 100% success executing a search. Please see notes below for other options and why this is being proposed as a good solutions.

How Has This Been Tested?

Presently built into ScaledMCP in experimental mode.

Breaking Changes

This is an opt-in capability, and is thus fully backwards compatible.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

The other method of implementing search is to expose a tool "search_tools", and upon completion of a call to that tool, expand the tool list and send a tools_changed notification to the client. There are several issues with this, first and foremost that it requires a client / prompt that primes the model for a multi-turn process. This proposal also includes an update to prompt, however this has a single implementation rather than an open ended, bespoke solution. So, given the goal of a well built client being able to negotiate a search with 100% success with a server, a specialized tool call seemed more likely to succeed over time than a dance. This proposal COULD be matured to define a search dance, however there are a number of complex server side issues with progressive tool disclosure (for instance, requiring server's to be stateful, which means server less implementations are much harder to build). So, this is a good compromise.

@patwhite patwhite marked this pull request as draft April 11, 2025 14:55
@patwhite
Copy link
Contributor Author

@justinwilaby first draft

@patwhite
Copy link
Contributor Author

@LucaButBoring amazing comments, thanks for taking the time bud, just about to push up tweaks

@patwhite patwhite marked this pull request as ready for review April 14, 2025 21:27
@patwhite
Copy link
Contributor Author

@LucaButBoring updates have been pushed, any other thoughts?

- Resources **SHOULD** be deeply indexed (i.e. the content itself is indexed, not just the name and description)

2. Clients **SHOULD**:
- Use batching if it's not clear whether a tool, resource, or prompt is appropriate for the situation
Copy link
Contributor

@LucaButBoring LucaButBoring Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be interesting if this could somehow be leveraged on the server side to do a single large search rather than three separate ones, that might be a tricky optimization, though. I suppose the server could hypothetically have a reactive stream consuming and batching client search requests to handle in a single query, maybe 🤔

Not a criticism and I don't have concrete suggestions about it, it's just something that stood out to me when I re-read this line, since the complexity trade-off is questionable. Realistically with a "sufficiently-fast" index it shouldn't make much of a difference anyways, since this is an edge case already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I actually originally started going the direction of a single search/query endpoint - the BIGGEST issue is you end up in a situation where implementors have to make a lot more implementation decisions about how you interleave results, and the consuming LLM would I think get more confused. So, the options are:

  1. Single return with sections:
{ 
toolResults: {
   items: []
   nextCursor: tring
},
promptResults: {
...
},
resourceResults: {
...
}
}

That strikes me as too confusing for the llm to really process today, maybe that gets revisted a few years from now
2. A single result list, with items intermingled. This is somewhat doable with good vector lookups, but let's say the LLM knows it wants a tool, it has to craft an input that includes a filter, and suddenly we're back to quite a bit of client complexity.

I'm actually open to all three options here, but when I really thought through all this, it just seemed like separate endpoints was the for the llm, easiest for the implementors, and most understandable all around.

But, great feedback

@LucaButBoring
Copy link
Contributor

LGTM, left some thoughts on the batching note but beyond that I think this is more than ready for other eyes on it 👀

Copy link
Member

@tadasant tadasant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered where completions fit in related to this?

If we were to decide the concept of "search" for tools is needed, my default expectation would be that "completions" get extended to support Tools as well (they already support Resources and Prompts).

I do worry that the notion of "search" on tools is a major paradigm shift. Because tool invocations are model controlled, when should the model decide to search on tools as opposed to just consider using an available tool directly?

A server could theoretically solve this problem by exposing its own search_for_tools tool, and then dynamically update its exposed tools as a result; no dedicated "tool search" concept needed if we think of "search" as just another specific type of Tool.

@patwhite
Copy link
Contributor Author

I think this is quite different than completion, mostly because of whom is consuming it. The result of completions is, to my understanding, meant to be consumed by the user, the result of a list call is 100% meant to be consumed by the model.

Regarding client interaction, I would think that actually most clients would change their implementation to IF there's a search capability available, they would prefer search over a full listing of tools purely for performance reason, or alternately, if the client starts listing tools and gets to 100's or 1000's, or experiences a timeout or can detect a context overflow, they would move to search.

@patwhite
Copy link
Contributor Author

Basically, the way the listing feature is designed today actually causes a lot of issues for clients on startup time, on caching requirements, on context, etc. Search is a way to short circuit that.

@LucaButBoring
Copy link
Contributor

Because tool invocations are model controlled, when should the model decide to search on tools as opposed to just consider using an available tool directly?

The way I've been thinking about this at least (following discussion with @patwhite and others in #204) is that the model would always search if search is supported by both the client and the server. Given that search would be most relevant to servers with many tools, this shouldn't lead to a meaningful fan-out problem unless small servers started implementing search unnecessarily.

However, it would require client applications to implement considerable changes to tool loading to support this. I reasoned through an example flow here: #204 (comment)

@tadasant
Copy link
Member

Ah sorry I hadn't yet read #204.

I think this is quite different than completion, mostly because of whom is consuming it. The result of completions is, to my understanding, meant to be consumed by the user, the result of a list call is 100% meant to be consumed by the model.

If completions is too narrowly defined, my thought would be that the way to go here would be to refactor Completions to instead be Search. I don't really see a reason to have a notion of Completions for Prompts or Resources if we also have Search.

However, it would require client applications to implement considerable changes to tool loading to support this. I reasoned through an example flow here: #204 (comment)

Thanks for sharing - very helpful. I suppose it would be reasonable to just let client hosts just make their own decisions here, e.g. if the tools list is very long, engage the search-then-choose flow you laid out; or if they know their users are using some model with a massive context window, don't bother with search, etc. Similar to how hosts today are potentially showing the full list of Resources in some in-app contexts and in other contexts they lean on Completions to surface relevant Resources.

All to say, if we were to merge the notion of Completions and Search (and I think Search would be a better name for it), I think that makes this a significantly more minor change that'd be easier to push forward.

@patwhite
Copy link
Contributor Author

All to say, if we were to merge the notion of Completions and Search (and I think Search would be a better name for it), I think that makes this a significantly more minor change that'd be easier to push forward.

Love it - so, think it would make sense to drop some notion of completions deeper into the call stack (ie tools/completion)? Or, possibly structure this as completions/tools instead of tools/search?

@evalstate
Copy link
Member

The current specification supports this workflow:

  • A "Search" Tool offers search results to the Human, with Prompts/Resources offering completions for UX
  • A "Use" Tool from provides a ToolListChangedNotification
  • Host exposes the new Tool to the LLM for usage.

On that basis I am uncertain what this PR adds.

@justinwilaby
Copy link

justinwilaby commented Apr 16, 2025

The current specification supports this workflow:

  • A "Search" Tool offers search results to the Human, with Prompts/Resources offering completions for UX
  • A "Use" Tool from provides a ToolListChangedNotification
  • Host exposes the new Tool to the LLM for usage.

On that basis I am uncertain what this PR adds.

@evalstate - This is in fact the current work around and was a topic of discussion when Pat and I collaborated on this.

IMO - There is a distinct benefit to codifying tool search versus relying on idiom or convention.

  1. UX - Take tools for example: As you already know, clients require explicit permission to call tools. Search could be a candidate for exclusion, thus requiring a single action by the user to call the tool versus 2 actions - one to call the "Search" tool and another to call the "Use" tool. Imagine having multiple MCP servers, each with a search tool that must be allowed by the user each time.
  2. Scaleability - The corpus used for search can be updated with new objects seamlessly. Discovery is intrinsic and the client does not need to be notified of a new toolset. The action of searching is all that is needed to yield the new additions on the next iteration. Think of a plugin architecture for MCP - plugins are dynamically added to an MCP sever, the corpus is updated and the new functionality is available. Smithery toolbox, for example.
  3. Top-k Aggregate Query - A client may choose to query n MCP servers, aggregate the lists, compute a combined score for each unique item, and selects the top-k items based on the aggregated score. This further refines results and presents the user with only the most relevant.
    4 Or similarly, a sort of Self-RAG can be used on sophisticated clients to determine relevance in an aggregate using an LLM agent.

As you can tell, relying on convention makes the above difficult. Codifying this in the spec opens up a greater degree of flexibility.

@patwhite
Copy link
Contributor Author

@evalstate to add to what @justinwilaby said - there's a bootstrapping problem. The use case here is an mcp server with 1,000 tools (or heck, 100 tools but with a model with limited context). In order to find the tool you mention, you'd have to crawl all tools. Having a standard capability for search fundamentally changes the initialization protocol - basically means connecting clients no longer have to do a full list of all tools, they can just connect, search for the tool they need, then start using it.

I also imagine this more for enterprise use cases where you really only connect to a single proxy rather than adding 100's of servers to connect to, then that proxy keeps some sort of index of upstream servers.

And finally, even in that use case where you have 100 MCP servers you're connected to, you still have that crawling problem, the llm has to crawl and keep up to date with 100's of servers, rather than just running a fan out search when it needs a tool.

But, great question on this!

@evalstate
Copy link
Member

OK - one more question please... from your description this interaction describes direct Host->Client->Server communication with no LLM intermediary? The issue here is that there are an overwhelming number of tools, and we want the Human to be able to find and select them for presentation to the LLM. Is that it?

@qdrddr
Copy link

qdrddr commented Jul 30, 2025

I'd like to add a few cents on this discusstion.

  1. Fully agree, we need to find a way to limit list of tools added to the LLM's context window. Currently we simply add all available tools. When too many tools are listed, LLM gets confused (known as Context Rot) and at the same time we pay more tokens/money for tools that are actually not needed. That situation needs an improvement on the MCP Client side. And the best way to do so I beleve is introduce standartization that allingns with this PR.

  2. On the same time this is a great idea described here, I want to emphisize its limits. The proposed search_tools implies that the LLM must first decide if some tools are actually needed to be used (or wait for user explicit instruction to use the tool). If no user instruction explicitly added, for LLM to decide if a tool_search is needed it basically need to guess if a tool is needed, then there's a second guess: it needs to know a list of available tools BEFORE search_tools, which is controvercial since the name of the tool might be not what LLM invented/halucinated. All this adds multiple steps and likely additional re-tries (this is wasted user time and money).

  3. I propose to implementing on the MCP Client's end a semantic search BEFORE (or instead of search_tools) by simply using semantic search for each user request adding the list of semantically similar tools dynamically (if semantically close and no tools if not close enough) for each user request. Reducing LLM decisions if to use or not a short list of provided upfront individual tools (accross multiple servers). That limits the number of tools (saves money and removes LLM confusion), removes multi-steps (no need to do extra search_tools step), removes retries (improves quality and decreases confusion).

#845

@isbee
Copy link

isbee commented Jul 31, 2025

I'd also like to chime in here.

The 10k tools problem is a context engineering problem. I think this is better handled as implementation details rather than changes to the MCP spec itself. @SecretiveShell explained this really well. From what I understand, MCP is designed to focus on “seamless integration” per the spec, with engineering optimizations being something each implementation handles on their own.

Like @tadasant pointed out, we can already tackle this with search_for_tools using the current spec. If we can solve this pretty easily with existing methods, then this draft doesn't add much value.

  • That said, @LucaButBoring's minor change suggestion would be nice to see in the protocol.
  • btw, @thisisfixer brought up an interesting point about how search_for_tools ends up being controlled through chat context instead of the tools parameter. I'm skeptical about performance drops without solid evaluation data - especially since there's a report of think tool actually outperforming native thinking. But I do think not being able to control via tools might prevent us from fully leveraging API Provider features (e.g. constrained decoding). So we might need some SDK updates.

Also, this draft doesn't address what @briancripe mentioned about tool search indexes across multiple MCP servers on the host side. It only tackles the one giant MCP server problem.

@LucaButBoring
Copy link
Contributor

Like @tadasant pointed out, we can already tackle this with search_for_tools using the current spec.

@qdrddr's comment captures the problems with that pretty well, I think:

On the same time this is a great idea described here, I want to emphisize its limits. The proposed search_tools implies that the LLM must first decide if some tools are actually needed to be used (or wait for user explicit instruction to use the tool). If no user instruction explicitly added, for LLM to decide if a tool_search is needed it basically need to guess if a tool is needed, then there's a second guess: it needs to know a list of available tools BEFORE search_tools, which is controvercial since the name of the tool might be not what LLM invented/halucinated. All this adds multiple steps and likely additional re-tries (this is wasted user time and money).

I also want to emphasize that if you want to search every turn and avoid LLM-related errors, you need to build an extension point into the client-side conversation management execution flow, like I do in my client implementation example. Note that instead of deterministically listing tools at the start of the session, I deterministically search on each turn.


Also, this draft doesn't address what @briancripe mentioned about tool search indexes across multiple MCP servers on the host side. It only tackles the one giant MCP server problem.

This is a valid point, and comes down to tradeoffs between implementing search indexes on the client versus the server. My position on this has always been that you would search each server separately, and only call ListTools on servers that don't support search. You would then concatenate those lists (possibly with a client-side reranking step?) to build the tool list that you send to the model.

If you do host-side indexing, you need to bootstrap a tool index, potentially with embeddings, metadata, etc. at the beginning of a session. That likely requires embedding all tools and indexing them, which can take a few minutes or more, depending on how many tools you have. That's a valid option, but I also think it involves its own tradeoffs in terms of UX. Due to the flexibility of the protocol, there is no one approach that will work for every application, in general.


Finally, coming back to the most important point:

If we can solve this pretty easily with existing methods, then this draft doesn't add much value.

Taken together, we have a couple of meaningful tradeoffs to navigate in implementing search today:

  • Is search exposed as a tool to the model, or is it bolted into a conversation manager? Do we give the model discretion as to if search is needed, or do we automatically search because we don't trust the model to always make the right decisions here?
    • If search is exposed as a tool to the model, we have the problems @qdrddr and others have noted with regards to reliability. It might work most of the time, but there will be many cases where it does not (either the model fails to generate the right tool parameters, chooses not to call the tool when it should, or incorrectly extrapolates from the response).
    • If search is bolted into a conversation manager, you solve those problems, but then need some other way of searching tools that works better than using the LLM (one part of what this proposal addresses).
  • Is search done in host applications, or is it done on servers? Do we want to take on the overhead of building a search index on every client device, or do we want to offload it to servers to optimize the client-side user experience?
    • If search is done in host applications, every host app developer needs to set up indexing and embeddings on session initialization, and make a decision about if that overhead makes sense for their target users or not (and if it doesn't, they might just be out of luck).
    • If search is offloaded to servers, you solve those problems, but are limited to only searching one server at a time.

This proposal fills the gap in the status quo for host applications that cannot feasibly set up an indexing pipeline on every single client, and want to bolt search into the conversation manager for reliability. As far as I'm aware, there is no option for doing both of those things simultaneously today.

@LucaButBoring
Copy link
Contributor

Bringing this up as other background conversations have been going on - namespaces (formerly #334, now #993) are looking like they'll evolve to support filtering in tools/list, and I see #1269 is proposing extending tools/list for prefix filtering as well. It might make sense to put search into that filtering parameter instead of making it a separate RPC operation, since there are use cases for combining search with other forms of filtering.

@qdrddr
Copy link

qdrddr commented Jul 31, 2025

✍️ Proposal: Delegated/Federated Advanced Tool Search

I like the current direction and would like to build on it by proposing a Delegated Advanced Tool Search mechanism.

  • Delegated means the MCP Server performs the tool search on behalf of the MCP Client BEFORE LLM is invoked by MCP Client.
  • Advanced Tool Search means the MCP server returns a pre-filtered list of relevant tools rather than listing all available ones.

Motivation, Pitch

  • Title: Delegated Advanced Tool Search
  • Goal: Improve LLM output quality and efficiency through tool filtering
  • Why now: Rising complexity in tool ecosystems makes flat long lists of tools and their descriptions inefficient and costly

🤔 Problem Overview: Context Rot

The core issue we’re addressing is commonly referred to as Context Rot/Bloat.
When all available MCP tools—potentially hundreds—are injected into the prompt with full descriptions, it leads to:

  • LLMs getting overwhelmed and confused by the "haystack" of available tools
  • Reduced response quality and tool selection accuracy
  • Increased token consumption and operational cost

🎯Goal of Delegatet Advanced Tool Filtering

The goal of filtering should be framed as a strategy to:

  • Enhance the quality of LLM responses using the provided pre-filtered tools
  • Reduce context size and token usage by filtering the list of available tools
  • Lower costs associated with inference

This approach ensures that each user request always returns either:

  • An empty list (if no relevant tools are found), or
  • A small set of pre-filtered tools, based on the original user query

The LLM then decides whether to use any of these pre-filtered tools, without needing to guess what is available.


📌 Proposed Behavior

  1. MCP Server can indicate if it supports Delegated Advanced Tool Search.
  2. If supported, the tool search occurs BEFORE invoking the LLM.
  3. The original user query is passed from the MCP Client to the MCP Server.
  4. The MCP Server filters tools using any available techniques (such as semantic similarity or any other implementation) and returns a short list of only relevant tools.
  5. The MCP Client proceeds as usual using this filtered list, without injecting unnecessary tools into the LLM context.

🧩 Example: User Stories & Acceptance Criteria

🙋‍♂️ User Story 1

As a: end user using an MCP Client
I want: the MCP Server to pre-filter and MCP Client inject only these pre-filtered relevant tools before invoking LLM
So that: the LLM isn't overloaded by irrelevant tools and performs more accurate reasoning

✅ Acceptance Criteria

Scenario: Pre-filter tools based on user prompt before invoking LLM
  Given a user prompt is received
  When deligated search is applied against tool descriptions by MCP Server
  Then only the top-k or similarity-threshold matching tools are provided to the MCP Client

Scenario: Avoid irrelevant tools
  Given an unrelated tool in the registry
  When the prompt has no connection to it
  Then that tool is excluded from the filtered list
flowchart TD

   A[Prompt from End User] --> B[MCP Client: Check if MCP Server Supports Delegated Tool Search] 
   B --> C{Delegated Search Supported?}
   C -- Yes --> D[MCP Client Sends Prompt to MCP Server]

   subgraph MCP Server
       E[Filter Tools] --> F[Return Filtered Tool List]
   end
   D --> E
   C -- No --> G[MCP Client Uses Full Tool List or Local Filtering]
   F --> H[MCP Client Receives List of Tools]
   G --> H
   H --> I[MCP Client Invokes LLM with Tools as usual - filtered or not]
Loading

Intent

My intention is not to prescribe or dictate a solution. Rather, it’s to:

  • Guide and recommend an approach
  • Provide the means to standardize and improve existing workflows
  • Encourage better LLM behavior and quality while optimizing resource usage

@qdrddr
Copy link

qdrddr commented Jul 31, 2025

Have you considered where completions fit in related to this?

If we were to decide the concept of "search" for tools is needed, my default expectation would be that "completions" get extended to support Tools as well (they already support Resources and Prompts).

I do worry that the notion of "search" on tools is a major paradigm shift. Because tool invocations are model controlled, when should the model decide to search on tools as opposed to just consider using an available tool directly?

A server could theoretically solve this problem by exposing its own search_for_tools tool, and then dynamically update its exposed tools as a result; no dedicated "tool search" concept needed if we think of "search" as just another specific type of Tool.

I’d like to offer a perspective on the idea of using an MCP Server with a search_for_tool tool without implementing search capability into MCP Protocol.

The challenge here is that this adds extra layers of decision-making and retries for the LLM. It has to:

  1. Guess whether it needs a tool
  2. Guess whether it should search for one
  3. Perform the search
  4. Then choose a tool to invoke—assuming one is even found

All of this happens without the model knowing what tools are actually available ahead of time. This opens the door to hallucinations, unnecessary retries, and ultimately wasted tokens and higher costs.

It also creates a deeper issue: a chicken-and-egg problem.
How can the model decide to search for a tool if it doesn’t know what tools exist in the first place?

While exposing search_for_tools as just another tool might sound elegant, in practice it puts the burden on the model to reason in a blind spot—and that's both unreliable and inefficient.

@qdrddr
Copy link

qdrddr commented Aug 1, 2025

Let me restate my point, since it seems like the idea of a tool search_for_tool that invoked by LLM is still under discussion (basically an MCP Server that acts as a Proxy/Gateway even with a build-in semantic search in front of other MCP Servers with index of all the tools) is still problematic:

If we give the LLM a tool to search for other tools, we run into a few key issues:

  1. The model first has to decide whether it even needs to search for tools.
  2. Then, if it decides to search, it somehow needs to know what it’s looking for—before it has seen the available tools.

Now ask yourself:
How can the model answer either of those questions without already having the full list of tools?
It’s a classic chicken-and-egg problem.

The only way this would work reliably is if the entire tool list was already baked into the model’s dataset during training—which doesn’t help if:

  • The list changes over time (and it will)
  • A tool it needs simply isn’t available to use (and we can't control that)

Here is a simplified example, you asked "What's 1+1?".

  1. The LLM could've produce an answer withoot invoking a calculator tool, but with search_for_tool sticking it its eye, it'll always tend to try search for calc.
  2. Okay, now it decided to search for a tool, and may even guessed to search for a tool named "calc", but it simply doesn't have it.
    Now what was the point in doing all of that?

Sure, in some cases it still might be worth trying a search_for_tools step “just in case.”
But in most scenarios, it’s more reliable and efficient to pre-filter the list of available tools BEFORE the LLM is invoked (with MCP RAG or potentially some other searching mechanism). And I believe RAG is a commodity now, you can have a small implementation for 1,000 tools running on your computer locally (assuming Serverless Embedding model and internet connection), so this can be feasible for personal use and enterprise.

The key trick here is to invoke this search BEFORE LLM.

This way, the LLM makes informed decisions based on a relevant, narrowed-down list—without guessing, wasting tokens or extra steps.

But let me be clear and keep it separete: I do support search meachnism described in this PR to make it build into MCP Protocol Spec. And on top of that I like the idea of capability of search deligation.

@qdrddr
Copy link

qdrddr commented Aug 1, 2025

Before anyone raises (again) the (valid) point that we shouldn’t dictate a specific implementation—and that developers should be free to build their own MCP Clients however they see fit—I want to clarify that I agree with that principle. Flexibility and creativity and can bring a valuable and welcome invention.

That said, as several folks have pointed out here, we now have many different MCP implementations not interoperable with each other's MCP marketplace implementations. So clearly, there’s room for lightweight standardization—something that enables collaboration without limiting innovation.

One concrete area where an MCP spec improvement could help is in tool discovery and filtering.
I personally really like the delegated search approach—where an MCP client can offload tool searching to an MCP server (or some smart Proxy/Gateway) or even run the same logic air-gaped locally if needed (embedding models are less resource hungry than the LLMs after all and if a person able to run LLMs locally, they certainly capable to run an Embedding model too). It offers a good balance between control and flexibility. Small and large, local and internet connected, for personal and enterprise use-cases.

But for delegation to work in practice, the Client-Server communication needs a way to perform the search—and that’s where MCP specification, not implementation, comes in.

Here’s what the MCP specification could define, without enforcing how it’s built:

  1. A way for an MCP Client to initiate a tool search as part of MCP Protocol Specification (Not yet another MCP Server with the search_for_tools tool)
  2. An optional capability for an MCP Server (or proxy/gateway) to support delegated search and advertise this ability to the Client
  3. A way for the MCP Client to detect if search delegation is supported and use it when available
  4. And, as an additional extension on top of that, the user of MCP Client (potentially via UI or config setting) may even go farther and elect to allow performing such (Advanced) delegated tool search BEFORE each LLM invocation (Possibly with RAG/Semantic search on the server side but not limited to), improving both performance and quality. The MCP Client developer may even decide to build this into their implementation if they choose to combining capabilities of MCP Client and Smart MCP Proxy/Gateway.

This isn’t about locking anyone into a specific design and there's going to be plenty of room for interpretation, implementation and innovation — it’s about establishing optional but clearly defined behaviors that let different MCP Clients and servers work together more easily and benefit from improved quality and decreased token/money spendings.

Let me know your thoughts. Open to improving this idea with community input.

Something like this:

flowchart TD

    A[Developer Builds Custom MCP Client] --> B[Client Checks Server Capabilities]
    B --> C{Delegated Search Supported?}
    C -- No --> D[Client Uses Local or Static Tool List]
    C -- Yes --> J

    subgraph Smart Proxy / Gateway
        G[Perform Semantic / RAG Tool Search]
        H[Return Filtered Tool List to Client]
    end

    G --> H

    H --> I[MCP Client Injects List of Tools to LLM]
    I --> N[LLM]
    D --> I

    subgraph Optional Enhancements
        F[Advertise Support for Delegated Search]
        J[MCP Client Initiates Tool Search Request]
        J --> K{Is User Enabled Pre-LLM Tool Search on the Client via UI or Config?}
        K -- YES --> M[Search Before LLM Using User Prompt]
        M --> G
        K -- NO --> L
    end

    L[LLM decides to search] --> N
Loading

@isbee
Copy link

isbee commented Aug 1, 2025

  • Is search done in host applications, or is it done on servers? Do we want to take on the overhead of building a search index on every client device, or do we want to offload it to servers to optimize the client-side user experience?
    • If search is done in host applications, every host app developer needs to set up indexing and embeddings on session initialization, and make a decision about if that overhead makes sense for their target users or not (and if it doesn't, they might just be out of luck).
    • If search is offloaded to servers, you solve those problems, but are limited to only searching one server at a time.

10k tools problem is not a problem that everyone faces. It's a problem faced by some users and enterprises with sufficient scale. For them, implementing indexing on the host side seems to make sense.

I agree that what we're discussing is a trade-off. I think people will have different preferences. That's why I'm not 100% convinced about including this RFC in the spec. The area this proposal solves is narrow, and it's not impossible to address with the existing spec.

If you guys think search_for_tool is not great, how about agent as a tool?(beyond tool search, actual high-level task) This is also possible with the current spec and can hide 10k tools(e.g. expose agent tool only). As we all know, many people are already trying multi-agent architectures. Each agent performs tasks with independent context(or with some shared context), and if there's another agent suitable for handling a specific task, they delegate to it. agent as a tool takes natural language(query) as input like tools/search, but output might be more preferable in the long run.

  • If agent as a tool works well, I see the possibility that A can delegate task T to B and perform task T' asynchronously. (This might be tricky for tools/search or search_for_tool since A needs to actively handle the search-observe-plan cycle)
  • Someone might think agent as a tool doesn't perform well and expensive, and that might actually be the case. Conversely, someone might not prefer RAG(tools/search) too. Again, this is a trade-off. (then wouldn't it be better to go with approaches that work with the existing spec? 🤔)
  • Asides trade-offs, MCP servers with agent as a tool might be difficult for regular users to self-host(e.g. lack of VRAM, need to use commercial APIs, etc). But the 10k tools problem mainly occurs on enterprises, and I hope they would be okay with choosing this approach.

I'd like to hear your opinions!

@qdrddr
Copy link

qdrddr commented Aug 1, 2025

@isbee Just trying to better understand your thinking—how do you envision the agent-as-a-tool approach working?

  1. Are you imagining that the agent receives the original user query and is responsible for handling the entire task end-to-end?
  2. Or the agent receives the original user query and used only for searching and returning tools but the agent-as-a-tool invoked before the main LLM in the MCP Client?
  3. Or is the agent intended to be used only for searching and returning tools and invoked by the LLM in MCP Client?

If it’s the first or the second case, I can definitely see the value—especially since things like RAG or tool discovery could be handled internally by an agent as part of a larger reasoning flow. In my mind this will be very similar with proposed in this PR search mechanism, just with fancy name.

But if it’s the last case—where the agent’s only role is to help search for tools and the decision to run agent-as-a-tool is made by LLM—then it seems functionally very similar to the search_for_tool tool. How is it different from search_for_tool tool? In that case, I think we’d still face exactly the same core issues of the chicken and the egg problem I mentioned earlier: extra guessing steps, potential hallucinations, and inefficient retries before you even get to the agent-as-a-tool.This makes me think that a dedicated search mechanism is still necessary and should be able to run before the MCP Client calls the LLM. Whether we call it just search, delegated search, or agent-as-a-tool, it is still fundamentally a search and still is needed.

The key difference here is when the agent-as-a-tool (or search) is triggered, before LLM or with the LLM?

Happy to be wrong though—just want to make sure we're aligning on use case and expectations.

10k tools problem is not a problem that everyone faces. It's a problem faced by some users and enterprises with sufficient scale. For them, implementing indexing on the host side seems to make sense.

I’d like to gently push back on the idea that the “10K tools” problem isn’t relevant for personal use cases.

While most individuals won’t hit 10,000 tools, the issue actually surfaces much earlier. For example, when working in environments like VS Code with Copilot (or any other really), I regularly run into the 128-tool limit issue. It forces me to manually toggle servers and tools depending on the task at hand. So even in solo workflows, context bloat and degraded performance (what also known as Context Rot) become real pain points pretty quickly.

It’s not just about scale—it’s about how too many tools clutter the context window, confuse the model, and increase costs unnecessarily. And currently the bar is qute low to start seing the problem. In my expirience even with SotA models 50 tools is already too many, assuming each MCP Server having on average 10 tools this equates to 5 MCP Servers, I need much more than that for personal use, and I beleve same applies for every single one of us.

A search mechanism that runs before the LLM invocation on the MCP Client side—even a lightweight local or remote RAG (or something similar)—could address this cleanly for both personal and enterprise setups. What we’re missing isn’t an implementation, it’s a common spec-with a search before LLM mechanism.

@qdrddr
Copy link

qdrddr commented Aug 6, 2025

Need to better understand how this alighns with #142

@qdrddr
Copy link

qdrddr commented Aug 19, 2025

MCP-use lib implemented a tool semantic search mechanism into MCP Client
https://github.com/mcp-use/mcp-use

@dsp-ant dsp-ant requested a review from a team September 23, 2025 21:10
@dsp-ant
Copy link
Member

dsp-ant commented Nov 20, 2025

I think this discussion has been very valuable. Between Search and Filtering there is a general concern about if and how we would handle server side selection. Given how big of a change this is, I think it might be best suited for an extension for now and see how and if people would use it. For the core protocol it's too much of a change while the need for server side selection is not fully clear to me.

I understand this is frustrating particuarly since a few people here have done a lot of work in this space.

For now I close this and we wait for either an extension proposal or a formal SEP.

@dsp-ant dsp-ant closed this Nov 20, 2025
@thisisfixer
Copy link

curious how this will impact the potential tool search feature on mcp side
https://www.anthropic.com/engineering/advanced-tool-use

@maxious
Copy link

maxious commented Nov 24, 2025

Yeah, it seems a bit odd to be saying "the need for server side selection is not fully clear" when the docs outline the case for server selection including specifically for MCP https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool#mcp-integration

@qdrddr
Copy link

qdrddr commented Nov 25, 2025

I think there should be at least some feedback collected from the community before simply closing this proposal.

Federated/remote search is one of a few things that could've kept MCP afloat in 2026. This is simply a must feature for MCP enterprise adoption, without it Enterprise context management (MCP proxy) is a joke.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.