-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[RFC] Search #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Search #322
Conversation
|
@justinwilaby first draft |
|
@LucaButBoring amazing comments, thanks for taking the time bud, just about to push up tweaks |
… cleared up caching client expectations, added resource indexing implementation directive
…fication into search
|
@LucaButBoring updates have been pushed, any other thoughts? |
| - Resources **SHOULD** be deeply indexed (i.e. the content itself is indexed, not just the name and description) | ||
|
|
||
| 2. Clients **SHOULD**: | ||
| - Use batching if it's not clear whether a tool, resource, or prompt is appropriate for the situation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be interesting if this could somehow be leveraged on the server side to do a single large search rather than three separate ones, that might be a tricky optimization, though. I suppose the server could hypothetically have a reactive stream consuming and batching client search requests to handle in a single query, maybe 🤔
Not a criticism and I don't have concrete suggestions about it, it's just something that stood out to me when I re-read this line, since the complexity trade-off is questionable. Realistically with a "sufficiently-fast" index it shouldn't make much of a difference anyways, since this is an edge case already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I actually originally started going the direction of a single search/query endpoint - the BIGGEST issue is you end up in a situation where implementors have to make a lot more implementation decisions about how you interleave results, and the consuming LLM would I think get more confused. So, the options are:
- Single return with sections:
{
toolResults: {
items: []
nextCursor: tring
},
promptResults: {
...
},
resourceResults: {
...
}
}
That strikes me as too confusing for the llm to really process today, maybe that gets revisted a few years from now
2. A single result list, with items intermingled. This is somewhat doable with good vector lookups, but let's say the LLM knows it wants a tool, it has to craft an input that includes a filter, and suddenly we're back to quite a bit of client complexity.
I'm actually open to all three options here, but when I really thought through all this, it just seemed like separate endpoints was the for the llm, easiest for the implementors, and most understandable all around.
But, great feedback
|
LGTM, left some thoughts on the batching note but beyond that I think this is more than ready for other eyes on it 👀 |
tadasant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered where completions fit in related to this?
If we were to decide the concept of "search" for tools is needed, my default expectation would be that "completions" get extended to support Tools as well (they already support Resources and Prompts).
I do worry that the notion of "search" on tools is a major paradigm shift. Because tool invocations are model controlled, when should the model decide to search on tools as opposed to just consider using an available tool directly?
A server could theoretically solve this problem by exposing its own search_for_tools tool, and then dynamically update its exposed tools as a result; no dedicated "tool search" concept needed if we think of "search" as just another specific type of Tool.
|
I think this is quite different than completion, mostly because of whom is consuming it. The result of completions is, to my understanding, meant to be consumed by the user, the result of a list call is 100% meant to be consumed by the model. Regarding client interaction, I would think that actually most clients would change their implementation to IF there's a search capability available, they would prefer search over a full listing of tools purely for performance reason, or alternately, if the client starts listing tools and gets to 100's or 1000's, or experiences a timeout or can detect a context overflow, they would move to search. |
|
Basically, the way the listing feature is designed today actually causes a lot of issues for clients on startup time, on caching requirements, on context, etc. Search is a way to short circuit that. |
The way I've been thinking about this at least (following discussion with @patwhite and others in #204) is that the model would always search if search is supported by both the client and the server. Given that search would be most relevant to servers with many tools, this shouldn't lead to a meaningful fan-out problem unless small servers started implementing search unnecessarily. However, it would require client applications to implement considerable changes to tool loading to support this. I reasoned through an example flow here: #204 (comment) |
|
Ah sorry I hadn't yet read #204.
If completions is too narrowly defined, my thought would be that the way to go here would be to refactor Completions to instead be Search. I don't really see a reason to have a notion of Completions for Prompts or Resources if we also have Search.
Thanks for sharing - very helpful. I suppose it would be reasonable to just let client hosts just make their own decisions here, e.g. if the tools list is very long, engage the search-then-choose flow you laid out; or if they know their users are using some model with a massive context window, don't bother with search, etc. Similar to how hosts today are potentially showing the full list of Resources in some in-app contexts and in other contexts they lean on Completions to surface relevant Resources. All to say, if we were to merge the notion of Completions and Search (and I think Search would be a better name for it), I think that makes this a significantly more minor change that'd be easier to push forward. |
Love it - so, think it would make sense to drop some notion of completions deeper into the call stack (ie tools/completion)? Or, possibly structure this as completions/tools instead of tools/search? |
|
The current specification supports this workflow:
On that basis I am uncertain what this PR adds. |
@evalstate - This is in fact the current work around and was a topic of discussion when Pat and I collaborated on this. IMO - There is a distinct benefit to codifying tool search versus relying on idiom or convention.
As you can tell, relying on convention makes the above difficult. Codifying this in the spec opens up a greater degree of flexibility. |
|
@evalstate to add to what @justinwilaby said - there's a bootstrapping problem. The use case here is an mcp server with 1,000 tools (or heck, 100 tools but with a model with limited context). In order to find the tool you mention, you'd have to crawl all tools. Having a standard capability for search fundamentally changes the initialization protocol - basically means connecting clients no longer have to do a full list of all tools, they can just connect, search for the tool they need, then start using it. I also imagine this more for enterprise use cases where you really only connect to a single proxy rather than adding 100's of servers to connect to, then that proxy keeps some sort of index of upstream servers. And finally, even in that use case where you have 100 MCP servers you're connected to, you still have that crawling problem, the llm has to crawl and keep up to date with 100's of servers, rather than just running a fan out search when it needs a tool. But, great question on this! |
|
OK - one more question please... from your description this interaction describes direct Host->Client->Server communication with no LLM intermediary? The issue here is that there are an overwhelming number of tools, and we want the Human to be able to find and select them for presentation to the LLM. Is that it? |
|
I'd like to add a few cents on this discusstion.
|
|
I'd also like to chime in here. The 10k tools problem is a context engineering problem. I think this is better handled as implementation details rather than changes to the MCP spec itself. @SecretiveShell explained this really well. From what I understand, MCP is designed to focus on “seamless integration” per the spec, with engineering optimizations being something each implementation handles on their own. Like @tadasant pointed out, we can already tackle this with
Also, this draft doesn't address what @briancripe mentioned about tool search indexes across multiple MCP servers on the host side. It only tackles the one giant MCP server problem. |
@qdrddr's comment captures the problems with that pretty well, I think:
I also want to emphasize that if you want to search every turn and avoid LLM-related errors, you need to build an extension point into the client-side conversation management execution flow, like I do in my client implementation example. Note that instead of deterministically listing tools at the start of the session, I deterministically search on each turn.
This is a valid point, and comes down to tradeoffs between implementing search indexes on the client versus the server. My position on this has always been that you would search each server separately, and only call ListTools on servers that don't support search. You would then concatenate those lists (possibly with a client-side reranking step?) to build the tool list that you send to the model. If you do host-side indexing, you need to bootstrap a tool index, potentially with embeddings, metadata, etc. at the beginning of a session. That likely requires embedding all tools and indexing them, which can take a few minutes or more, depending on how many tools you have. That's a valid option, but I also think it involves its own tradeoffs in terms of UX. Due to the flexibility of the protocol, there is no one approach that will work for every application, in general. Finally, coming back to the most important point:
Taken together, we have a couple of meaningful tradeoffs to navigate in implementing search today:
This proposal fills the gap in the status quo for host applications that cannot feasibly set up an indexing pipeline on every single client, and want to bolt search into the conversation manager for reliability. As far as I'm aware, there is no option for doing both of those things simultaneously today. |
|
Bringing this up as other background conversations have been going on - namespaces (formerly #334, now #993) are looking like they'll evolve to support filtering in |
✍️ Proposal: Delegated/Federated Advanced Tool SearchI like the current direction and would like to build on it by proposing a Delegated Advanced Tool Search mechanism.
Motivation, Pitch
🤔 Problem Overview: Context RotThe core issue we’re addressing is commonly referred to as Context Rot/Bloat.
🎯Goal of Delegatet Advanced Tool FilteringThe goal of filtering should be framed as a strategy to:
This approach ensures that each user request always returns either:
The LLM then decides whether to use any of these pre-filtered tools, without needing to guess what is available. 📌 Proposed Behavior
🧩 Example: User Stories & Acceptance Criteria🙋♂️ User Story 1As a: end user using an MCP Client ✅ Acceptance CriteriaScenario: Pre-filter tools based on user prompt before invoking LLM
Given a user prompt is received
When deligated search is applied against tool descriptions by MCP Server
Then only the top-k or similarity-threshold matching tools are provided to the MCP Client
Scenario: Avoid irrelevant tools
Given an unrelated tool in the registry
When the prompt has no connection to it
Then that tool is excluded from the filtered listflowchart TD
A[Prompt from End User] --> B[MCP Client: Check if MCP Server Supports Delegated Tool Search]
B --> C{Delegated Search Supported?}
C -- Yes --> D[MCP Client Sends Prompt to MCP Server]
subgraph MCP Server
E[Filter Tools] --> F[Return Filtered Tool List]
end
D --> E
C -- No --> G[MCP Client Uses Full Tool List or Local Filtering]
F --> H[MCP Client Receives List of Tools]
G --> H
H --> I[MCP Client Invokes LLM with Tools as usual - filtered or not]
IntentMy intention is not to prescribe or dictate a solution. Rather, it’s to:
|
I’d like to offer a perspective on the idea of using an MCP Server with a The challenge here is that this adds extra layers of decision-making and retries for the LLM. It has to:
All of this happens without the model knowing what tools are actually available ahead of time. This opens the door to hallucinations, unnecessary retries, and ultimately wasted tokens and higher costs. It also creates a deeper issue: a chicken-and-egg problem. While exposing |
|
Let me restate my point, since it seems like the idea of a tool If we give the LLM a tool to search for other tools, we run into a few key issues:
Now ask yourself: The only way this would work reliably is if the entire tool list was already baked into the model’s dataset during training—which doesn’t help if:
Here is a simplified example, you asked "What's 1+1?".
Sure, in some cases it still might be worth trying a search_for_tools step “just in case.” The key trick here is to invoke this search BEFORE LLM. This way, the LLM makes informed decisions based on a relevant, narrowed-down list—without guessing, wasting tokens or extra steps. But let me be clear and keep it separete: I do support search meachnism described in this PR to make it build into MCP Protocol Spec. And on top of that I like the idea of capability of search deligation. |
|
Before anyone raises (again) the (valid) point that we shouldn’t dictate a specific implementation—and that developers should be free to build their own MCP Clients however they see fit—I want to clarify that I agree with that principle. Flexibility and creativity and can bring a valuable and welcome invention. That said, as several folks have pointed out here, we now have many different MCP implementations not interoperable with each other's MCP marketplace implementations. So clearly, there’s room for lightweight standardization—something that enables collaboration without limiting innovation. One concrete area where an MCP spec improvement could help is in tool discovery and filtering. But for delegation to work in practice, the Client-Server communication needs a way to perform the search—and that’s where MCP specification, not implementation, comes in. Here’s what the MCP specification could define, without enforcing how it’s built:
This isn’t about locking anyone into a specific design and there's going to be plenty of room for interpretation, implementation and innovation — it’s about establishing optional but clearly defined behaviors that let different MCP Clients and servers work together more easily and benefit from improved quality and decreased token/money spendings. Let me know your thoughts. Open to improving this idea with community input. Something like this: flowchart TD
A[Developer Builds Custom MCP Client] --> B[Client Checks Server Capabilities]
B --> C{Delegated Search Supported?}
C -- No --> D[Client Uses Local or Static Tool List]
C -- Yes --> J
subgraph Smart Proxy / Gateway
G[Perform Semantic / RAG Tool Search]
H[Return Filtered Tool List to Client]
end
G --> H
H --> I[MCP Client Injects List of Tools to LLM]
I --> N[LLM]
D --> I
subgraph Optional Enhancements
F[Advertise Support for Delegated Search]
J[MCP Client Initiates Tool Search Request]
J --> K{Is User Enabled Pre-LLM Tool Search on the Client via UI or Config?}
K -- YES --> M[Search Before LLM Using User Prompt]
M --> G
K -- NO --> L
end
L[LLM decides to search] --> N
|
10k tools problem is not a problem that everyone faces. It's a problem faced by some users and enterprises with sufficient scale. For them, implementing indexing on the host side seems to make sense. I agree that what we're discussing is a trade-off. I think people will have different preferences. That's why I'm not 100% convinced about including this RFC in the spec. The area this proposal solves is narrow, and it's not impossible to address with the existing spec. If you guys think
I'd like to hear your opinions! |
|
@isbee Just trying to better understand your thinking—how do you envision the agent-as-a-tool approach working?
If it’s the first or the second case, I can definitely see the value—especially since things like RAG or tool discovery could be handled internally by an agent as part of a larger reasoning flow. In my mind this will be very similar with proposed in this PR search mechanism, just with fancy name. But if it’s the last case—where the agent’s only role is to help search for tools and the decision to run agent-as-a-tool is made by LLM—then it seems functionally very similar to the The key difference here is when the agent-as-a-tool (or search) is triggered, before LLM or with the LLM? Happy to be wrong though—just want to make sure we're aligning on use case and expectations.
I’d like to gently push back on the idea that the “10K tools” problem isn’t relevant for personal use cases. While most individuals won’t hit 10,000 tools, the issue actually surfaces much earlier. For example, when working in environments like VS Code with Copilot (or any other really), I regularly run into the 128-tool limit issue. It forces me to manually toggle servers and tools depending on the task at hand. So even in solo workflows, context bloat and degraded performance (what also known as Context Rot) become real pain points pretty quickly. It’s not just about scale—it’s about how too many tools clutter the context window, confuse the model, and increase costs unnecessarily. And currently the bar is qute low to start seing the problem. In my expirience even with SotA models 50 tools is already too many, assuming each MCP Server having on average 10 tools this equates to 5 MCP Servers, I need much more than that for personal use, and I beleve same applies for every single one of us. A search mechanism that runs before the LLM invocation on the MCP Client side—even a lightweight local or remote RAG (or something similar)—could address this cleanly for both personal and enterprise setups. What we’re missing isn’t an implementation, it’s a common spec-with a search before LLM mechanism. |
|
Need to better understand how this alighns with #142 |
|
MCP-use lib implemented a tool semantic search mechanism into MCP Client |
|
I think this discussion has been very valuable. Between Search and Filtering there is a general concern about if and how we would handle server side selection. Given how big of a change this is, I think it might be best suited for an extension for now and see how and if people would use it. For the core protocol it's too much of a change while the need for server side selection is not fully clear to me. I understand this is frustrating particuarly since a few people here have done a lot of work in this space. For now I close this and we wait for either an extension proposal or a formal SEP. |
|
curious how this will impact the potential tool search feature on mcp side |
|
Yeah, it seems a bit odd to be saying "the need for server side selection is not fully clear" when the docs outline the case for server selection including specifically for MCP https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool#mcp-integration |
|
I think there should be at least some feedback collected from the community before simply closing this proposal. Federated/remote search is one of a few things that could've kept MCP afloat in 2026. This is simply a must feature for MCP enterprise adoption, without it Enterprise context management (MCP proxy) is a joke. |
EDITED May 12, 2025
Motivation and Context
This proposal, along with #334, is an effort to make the MCP protocol slightly more agnostic towards whether MCP servers should be small, single use domain constrained servers, or larger, multi-purpose servers. At a high level, the various list methods suffer from the issue that a) they may be unbounded in terms of length, leading to clients making decisions about how many tools to list, and b) even shorter tool lists (in the range of 100) can produce token counts in the 50-100k token range. That increases cost as well as potentially decreases accuracy of the llm in selecting a tool (when presented with a large list).
This is a proposal for an OPTIONAL capability for larger servers to expose search endpoints on tools, resources, resource templates, prompts, and potentially other capabilities coming down the pike. It proposes a recommended flow for clients to prefer search over listing tools, and to present LLM's with an initial tool set that starts with search and escalates to tool calls.
There are native ways to support search using progressive tool disclosure. This is an effort to standardize general efforts around search and create a situation where any generic, well implemented client can connect to any generic, well implemented server and achieve 100% success executing a search. Please see notes below for other options and why this is being proposed as a good solutions.
How Has This Been Tested?
Presently built into ScaledMCP in experimental mode.
Breaking Changes
This is an opt-in capability, and is thus fully backwards compatible.
Types of changes
Checklist
Additional context
The other method of implementing search is to expose a tool "search_tools", and upon completion of a call to that tool, expand the tool list and send a tools_changed notification to the client. There are several issues with this, first and foremost that it requires a client / prompt that primes the model for a multi-turn process. This proposal also includes an update to prompt, however this has a single implementation rather than an open ended, bespoke solution. So, given the goal of a well built client being able to negotiate a search with 100% success with a server, a specialized tool call seemed more likely to succeed over time than a dance. This proposal COULD be matured to define a search dance, however there are a number of complex server side issues with progressive tool disclosure (for instance, requiring server's to be stateful, which means server less implementations are much harder to build). So, this is a good compromise.