-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[feat] Introduce elcitation as new client capability #382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Introduce elcitation as new client capability #382
Conversation
|
I'd very much want to use this feature to request sensitive information from the user, as such a "sensitive" boolean on the request would be necessary. Would there be any issue with that? |
ihrpr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this!
The current ElicitationRequest using just a text message is quite limited. We need to support structured input schemas so tools can request multiple pieces of information in a form-like manner.
We need to support common use cases like:
- Requesting multiple fields
- Type-specific inputs (numbers, dates, booleans)
- Validation requirements
- Optional vs required fields
Suggested changes
Add an requestedSchema field:
export interface ElicitRequest extends Request {
method: "elicitation/create";
params: {
message: string;
requestedSchema?: JSONSchema; // Describes expected response structure
};
}
export interface ElicitResult extends Result {
content: unknown; // Validated against inputSchema
}Benefits
- Better UX - Clients can generate proper form UIs
- Type safety - Responses validated against schema
- Backward compatible - Works without schema for simple text
Example
{
method: "elicitation/create",
params: {
message: "Some additional details needed to configure a reminder",
requestedSchema: {
type: "object",
properties: {
title: { type: "string" },
date: { type: "string", format: "date" },
time: { type: "string", format: "time" },
priority: {
type: "string",
enum: ["high", "medium", "low"]
}
},
required: ["title", "date"]
}
}
}If the user cancels, the client should return a standard JSON-RPC error response instead of a successful result with a cancelled flag.
Please can you make changes on top of the draft version of the spec instead of 2025-03-06?
|
Thanks @ihrpr ,
|
|
@ibuildthecloud, could you please describe more on the sensitive boolean use-case, how behavior changes between sensitive and non-sensitive? Likely client can decide how they want to render elicitation request to users. |
@siwachabhi, thank you Sure, let's work on outputSchema discussion and include it in this spec changes |
After reviewing #356, I believe it's very tool-specific discussion. Since we're not constrained by backward compatibility issues in elicitation, we have the flexibility to adopt the approach that we find most appropriate, clean, and easy to follow. Therefore, I recommend adding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from @ihrpr's remarks about requestSchema, this feature looks great. It fills an actual void in the protocol without heaping on that much complexity. LGTM! 👍
|
I've been thinking about agentic workflows and asynchronous workflows in general, and I think it might be beneficial to model elicitation as a tool result rather than something a tool does. In an agentic workflow, there might be an agent several layers deep in the flow that needs to elicit input. Perhaps that elicitation needs to bubble up to the user, or perhaps it could be fulfilled by an intermediary agent. Even in a non-agentic workflow, the user may want to suspend their session at the point of elicitation. For example, a tool elicits confirmation ("Are you sure?"), but the user doesn't want to answer yet, so they shut down the client / server until they've made a decision. When they restart the client / server, it should be possible to resume the session and respond to the elicitation. Notably, the web has a way to handle the latter use case: forms. So, in the same way that getting a form is separate from submitting a form, what if elicitation was separate from fulfillment? Specifically, what if tools could return a prompt and a pointer to another tool. The prompt would describe the desired input, and the The client would be responsible for rendering an interface that maps to the other tool's |
Regardless of how many "layers deep" you are in an agentic workflow, there's one human. The host app that sets the workflow in motion should be able to present any elicitation request directly to them.
Resumability of the streamable-http transport should support this out of box.
I'm not sure making the operation of tools more complicated is the answer. We have a feature that lets the server send a direct request to 'sample' an LLM. This is the same, just |
|
+1 @cliffhall I had similar thought as you @jonathanhefner here: #314, but discussing with other folks in community and trying few things out, settled on introducing a new server request, that would align more with MCP mental model. Above response @cliffhall has, is essentially I was also about to add, there is value in keeping tools simpler and this is still a two way door/forward compatible in future, if some workflow semantics has to be added on tool. |
|
Thanks @ihrpr , that confirms, and to get your opinion on content in result, it could be an open ended field. Like you have described, I agree its simpler, only consistency was a concern, so thinking through that a little more but these don't have to same things. |
yeah, same as parameters for tools, they are |
Consider, an MCP Client calls a tool which requires some secret from the user. This could be credentials, payment information, etc. We don't want the secret to be a tool call parameter because that is saved in the chat context as a tool call. If my reading of this spec is right, Elicitations in their current state don't have any restrictions for the client on whether it could save the information, include the elicitation in LLM context, etc? If we include an "is sensitive" boolean, this indicates to the client+server that it MUST NOT save/log the information (or maybe less strictly, save the information securely in a password manager for example) In Cloudflare's MCP servers we've run into this issue in a few places. For example, this tool, which we're removing temporarily, requires database credentials: https://github.com/cloudflare/mcp-server-cloudflare/blob/599bfcf51e64faad9f43f6ad28fa05e8cbd93684/packages/mcp-common/src/tools/hyperdrive.ts#L83 Elicitations with a sensitive flag would solve our issues here. |
Yes, this. It should also indicate that it MUST NOT include the information in LLM context. |
|
Hey @siwachabhi this PR looks awesome. I have been thinking about some similar flows, and there are definitely cases where I want the data I collect from the user not to pass through the MCP Client. Either because it's sensitive data, or I want to control the UX of the data collection rather than leave it up to the client. Have you thought about those types of flows as well? Also, this seems to assume support for Server-Side Events, but I think that's optional in the spec generally. I've been thinking of ways to support this type of thing without needing SSE as well. @nbarbettini and I are working on a PR that will cover that and I'm happy to incorporate this elicitation pattern into that! |
Like sampling requests, if the tool needs to elicit input from the user it MUST use SSE. It has to be able to send the client a request rather than the other way around. For instance, this would not be possible on SDTIO transport. Some other features like resource subscriptions also require it. |
|
I am concerned that sensitive boolean will not provide any security guarantee so at best can be a hint. Additionally the use-case being described here for collecting credentials, I find it pre-mature to bake into protocol payload, ideally it should work out of band(for example auth0 token vault: https://auth0.com/blog/mcp-and-auth0-an-agentic-match-made-in-heaven/). I see this discussion to be similar: #234, and there are clearly two sides. My target with this PR is to enable the base use-case of eliciting information from client where payloads don't contain any sensitive data.
I think that would essentially become supporting workflow tools which will require tool functionality to be extended, there is a similar discussion above |
The protocol is transport-agnostic, and all the features are transport agnostic. Any transport layer should support bidirectional message exchange. SSE is part of transport implementation; we use it in SSE and StreamableHttp transport, but all the features like Sampling are supported in stdio. Elicitation is a feature of the protocol and is transport agnostic. |
If an intermediary agent knows enough about a user, I think it would be reasonable to allow it to fulfill (or reject) elicitation requests on the user's behalf. But, regardless, my point was about the elicitation bubbling up through layers of agents. How would that work with the sampling-derived proposal in this PR? Also, how would the sampling-derived proposal handle elicitation in long-running asynchronous workflows when the user / client is offline? Essentially, I am proposing we model elicitation as message passing in order to address these issues. Messages can be forwarded and routed and queued as necessary. A tool could return an elicitation (a message) which includes instructions about how to respond, and the subsequent reponse would cause the work to continue. A key point is that the tool returns the elicitation, and then stops running (it does not wait for a reponse).
Could you elaborate? I'm having trouble envisioning how this would work. If the server has been shut down and restarted, how would the tool continue from the point of elicitation?
Which part do you feel makes the operation of tools more complicated?
Actually, I think that sampling could also benefit from being modeled as message passing. If I understand correctly, currently, sampling during a tool call requires sticky sessions, which makes it very unfriendly to certain kinds of deployments. But that's a separate discussion! 😄 I would just like to make sure elicitation doesn't suffer from that limitation. A few things I didn't mention in my original comment:
|
|
Great questions, my mental model here is:
sequenceDiagram
participant User
participant Client
participant Server
participant Tool
Note over Client,Server: Session Establishment
Client->>+Server: POST InitializeRequest
Server->>Server: Generate Session ID
Server-->>Client: InitializeResult<br>Mcp-Session-Id: abc123
Note over Client,Server: Session ID stored by client
Note over Client,Server: Tool Execution Phase
User->>Client: Start workflow
Client->>+Server: ToolCallRequest<br>Mcp-Session-Id: abc123<br> RequestId: req123
Server->>+Tool: Execute
Tool->>Server: Needs user input
Server-->>Client: Create SSE stream with event ID<br>Content-Type: text/event-stream
Server->>-Client: SSE event: id=event789<br>ElicitRequest<br>Mcp-Session-Id: abc123<br> RequestId: req456
Server->>StateStore: Store execution state<br>Mcp-Session-Id: abc123<br> RequestIds: req123, req456
Note over Tool,Server: Tool Execution can be halted
Note over User,Client: Disconnection Scenario
Note over Client: Client stores last processed event-id
Client-xServer: Connection dropped
Note over User,Client: Later Reconnection
User->>Client: Reconnect & provide input
Client->>+Server: GET with Session ID and Last-Event-ID<br>Mcp-Session-Id: abc123<br>Last-Event-ID: event789
Server->>Server: Resume session state
Server-->>Client: 200 OK<br>Content-Type: text/event-stream
Note right of Server: Stream resumed
Note over Client,Server: Completing Elicitation
Client->>+Server: POST ElicitResult with Session ID<br>Mcp-Session-Id: abc123<br> RequestId: req456
Server->>StateStore: Retrieve execution state
StateStore-->>Server: Return state data
Server->>+Tool: Resume execution
Tool->>-Server: Complete task
Server-->>-Client: Return ToolCallResult<br>Mcp-Session-Id: abc123<br> RequestId: req123
Client->>User: Show result to user
A gap that still remains after this is if a client wants to GET state of a tool call, progress notifications partially fill that gap, but there could be also be a scenario to GET progress. |
Regarding this case in particular, elicitation can be used within a tool call just like sampling can, but that's probably not documented clearly - also just like sampling 😔 |
Ya, I read through the full thread here and I see the long discussion with @jonathanhefner - I 100% think this was pulled too soon, none of jonathan's issues were properly addressed, this is a nightmare to implement for both clients and servers. For instance, there is NO WHERE that specifies clients are expected to process a separate event stream while they're waiting on a tool response - every client I know today waits for a tool call to complete. This is a major rework on the clients to support this. Then, the statefullness, I don't believe Jonathan's stateless model is supported by the spec today, so that's not a usable model. And then, as to the sensitive data issue, the way oauth works is a perfect example of correctly gathering sensitive data, you send the user to an external URL to actually gather the data so there's no chance it's cached along the way. So, that + oauth seem like a critical use case that are ignore as far as I can tell. |
@patwhite I'm addressing this (with @wdawson) in #475. You are correct that elicitation is defined for a different use case than OAuth/server-initiated auth escalation. That's why the Security Considerations section says |
Ya, I saw your comment, and I’m essentially saying these should not be separate, this pr should better contemplate tool calling elicitation and auth escalation / sensitive data. As it’s written, it is really hard for me to imagine this getting any more support than sampling, since those are really the two primary use cases for this feature |
|
I’ll add - there are 3 primary use cases I see for elicitation, please correct me if I’m wrong 1 Server first touch - that’s the GitHub sever gathering your GitHub username when you connect, this pr has that covered. This pr handles one of the three, so we’re saying we have to build two other async messaging protocols here? |
I'm curious about this - can you elaborate @patwhite? |
Sure, this is the use case @jonathanhefner brought up - a tool call needs additional information, so it elicits that from the user, pausing the initial tool response. I’d add, I could also see elicitation being used for long running tool calls, you ask the MCP server to render a video, it returns an ok, then at some point in the future it sends you an elicitation with a url and asking if you have any changes |
|
Another great use case, think about an MCP server to do your taxes -it would heavily use this feature but I couldn’t figure out how it should be built from the spec here |
|
One other quick addition after reading this - MCP is already a VERY hard protocol to scale, this model potentially requires two SSE connections, I believe this requires a new session management model which is cross SSE sessions (something explicitly not included in the spec), and full statefulness of the backend - this will just make a hard to scale protocol even harder, and ultimately this is going to hinder adoption of this as much as sampling is hindered. And finally - I just don't see a world where you can enforce the sensitive data constraint. The only possible way is through some sort of semantic filtering, but I wouldn't be surprised if this comes out every stdio server that relies on api keys uses this to get them, its just such a nicer user experience. So, in the absence of a real solution for sensitive data gathering, this will be what folks use. Also, "sensitive" is not a universally agreed upon term, your github username can be considered sensitive if other user data is co-mingled (de-anonymizing your name for instance). So, there's a level of subjectivity in that statement, which is never good in a MUST spec clause. |
|
Hi @patwhite, @nbarbettini and team are already driving the out of band auth discussion, the prescriptive guidance identified by auth working group is pretty useful. Additionally, they have a viable point of exploration around url specification/user agent, but exact out come might be just elicitation being extended, would wait for a consensus from steering committee. Best to discuss over: #475
I didn't follow how we got to this conclusion, it will be a single SSE stream from server to client. Gap is the spec behavior for long running tools(also being explored in multiple discussions), specially if caller or tool doesn't know it will be long running before hand. Spec recommends server
How to model if tool can continue to work while it has requested elicitation? As an alternative to streamable HTTP, one could very well just implement existing transport spec as a HTTP1.1 transport where HTTP response/request body could contain json rpc response/request/notification, whole protocol doesn't need to change for that, same goes to use websockets transport. So we essentially need to iterate a bit on transport, which was out of scope of this PR.
|
This was from the discussion you had with @jonathanhefner that just got dropped, but honestly is the #1 use case here - unless I misread something, the elicitation after a tool call will trigger a second SSE session getting created. That might have just been for the more stateless model, but a >12 steps process to handle post tool call elicitation is way way way too complex. Jonathan's proposal to include a tool response type of an elicitation request would deal with this very nicely, hence why I keep saying this was included too early and without fully thinking through backend implementation details.
Yes, there's a huge gap in the spec here, this solve PART of it, but solving one small part in a vacuum doesn't make sense, this should be thought of holistically. What we're basically saying is there's a multi-turn tool calling lifecycle, that might involve pushing notifications, might involve elicitation, etc - we should solve that problem rather than taking a one of piece that makes elicitation very complex.
Elicitation as a tool response deals with this
Again, I'll go back to the concerns that @jonathanhefner brought up and that were not addresses - the are protocol level issues with the most basic use case here for a tool call eliciting more information. If we want this feature adopted, you can't have the feature be ambiguous support by the underlying protocol itself.
With elicitation in a tool response, it can be session aware but not necessarily stateful. Just so we're on the same page, as SOON as you escalate to an SSE connection, you have create a stateful requirement that there be a stateful singleton running on the server and if you've scaled out to multiple nodes, a message broker. There should be a model by which you do not need to upgrade to an SSE connection to make elicitation work. That's different than session aware, and something that really gets glossed over in all these MCP discussion - session aware != stateful. SSE sessions require a long lived, open connection that can be discovered by other hosts in the system when a message post comes in (in order to deliver the response). That is different than session aware where you put in redis that the last request from this session was this tool call, so when you get an elicitation response you can continue it. This is a VERY important distinction. |
I don't believe this is necessarily true? Elicitation implies having some sort of server->client connection to send the creation request on, but it shouldn't need to be a different one from the one used for tool calls (but it could be). As a client, you can receive a server request on the same stream that's waiting for a tool response.
This is arguably a flaw of Streamable HTTP, not of elicitation. It's something that impacts all server->client requests and notifications. It could be fixed with polling to set up a makeshift server->client stream, but I'd honestly consider that something the transport layer itself should handle, not the protocol interaction on top of it. |
Lets align on requirements:
If we align on this, then answer is MCP needs a resumable bi-directional transport, if current resumable HTTP transport version doesn't work then it needs to be improved. Note: https://html.spec.whatwg.org/multipage/server-sent-events.html, SSE doesn't require a long-lived connection, thats why |
No one is debating stateful storage - in the context of backend development, that's just storage, that's not "statefullness". Statefullness in this context refers to the long lived code that has to be running on the node that the SSE session was initially connected to, and keeps the SSE session alive. While it maybe be technically possible to treat SSE as a polling protocol like you're describing, I don't know of a single client or server library that operates like that. SSE is always implemented as a long lived connection. That's why it was created, if you want to return single items you just use HTTP.
SSE is a long lived connection, the second the server escalates a tool response to an SSE session in order to send a tool elicitation, you just made this long lived. The server could kill it immediately, that would trigger an immediate request from the client to re-establish, which you could then 204, but you wouldn't do that. You're going to have a tool response to send back as soon as the elicitation request comes back in, why would you close the session? Just to put a finer point on it - here's the SSE spec guidance in MCP:
So, now, imagine you're in a scaled out scenario. Let's walk through how that works for the elicitation on tool request, with two nodes behind a load balancer.
So, it is against the protocol to do what you're describing where you forcibly close the stream, then deliver the message on a different SSE channel You can check out scaled-mcp for how we handle this, but with all the motions we've made towards statelessness in the 2025 spec, why regress it here? So, that brings us to the final point, what would be better? In order to build this stateless, it's a pretty small tweak. We allow tools to return an elicitcation request as a tool response, and we allow an elicitation response request to have a tool response. That's just one idea, and I'm not even sure the best, and I could imagine a bunch of other ways we do this. But overall, I would much much prefer that instead of pushing this through in a way that really threatens elicitations adoption because of the statefullness, we have the bigger discussion about multi-turn tool calls (and the auth escalation) because they're fundamentally all part of the same pattern - you make a tool call, you need the user to do something, the user does it, then you answer the response. That's the meta flow we need to solve, and solving it 3 different times (once for this, once for tool calls, and once for auth) just seems wasteful.
My issue with this proposal takes no opinion on what the protocol should do in the future, it's how it works today. |
Lets raise a PR for this? Both of us are talking about same paper cuts in transport, why not improve it at the core.
There is still a streamable http, which supports resumable bi-directional connection. How is this a regression? Its making a paper cut in transport obvious.
Its not, its a one way door decision. A more two way door decision is to follow the current protocol and improve transport spec, if that doesn't work then we have to change MCP philosophy. I am not tied to being right, but I don't see any other way to make progress on this. Also related to returning elicitation in tool result, there answer is not to chain one concept into another, we end up in a dependency hell with that one if we see concepts to be independently composable, that seems to be philosophy of MCP and protocol maintainers confirmed above(folks could you have just asked to go other route, and we would have a different outcome). Again, I am not taking it to the heart what is global optimal right answer, we won't know that in short term in LLM space, but the features should align with protocol philosophy, only then we will get to true limits of the protocol, else its a spaghetti of everything. If human in the loop being such an important feature and doesn't get adoption(again its not same as sampling, I don't see a clear use-case for it irrespective for how one implements), then it will clearly show a pretty big gap in protocol, which will either warrant a transport spec improvement or more fundamental improvement.
It will be a single concept in my opinion, but it can't be big bang single change, we will incrementally get there.
That will be a transport spec update PR |
I'm open to doing some work on this, but it's a pretty fundamental change, and I'm imaging @dsp-ant had a reason to not support cross SSE messaging when they built that out, but if that's the answer from the steering committee, I'm happy to help out. This PR is missing other components - guidance to the clients that they should be watching for elicitation requests while waiting for tool responses - I believe almost all clients today are more or less entirely paused waiting for tool responses and will need to explicitly support this sort of interrupt pattern.
It's a regression of the move toward statelessness. If you look at the VAST majority of Remote MCP servers, they are not implementing SSE, they are implementing just HTTP responses to tool calls. It's a regression toward progress of making the protocol less stateful and supporting most modern deployment practices (in particular, serverless deployments). Right now, deploying an mcp server that supports SSE and that scales out requires k8s or baremetal, lambdas or cloud run present very tricky challenges.
One way door isn't the right way to talk about this, every change to a spec is a one way door that you'll have to support for at least several versions. I'd offer, a better way to think about this is in terms of protocol bloat - a new response type is a MUCH smaller change than a full new bidirectional messaging exchange. But, the WORST protocol bloat is yet to come - since this proposal doesn't adequately handle the auth use case, that means there will be another bidirectional messaging protocol to do essentially the same thing that will come next version. And, since this doesn't work with stateless tool call, that will end up being included at some point. So, in terms of bloat, including this without dealing with the other two use cases will lead to the great amount of bloat, and again, because this requires statefullness, from all the evidence we've seen it's going to have trouble getting adoption.
This isn't a dependency hell situation with variable multi hop dependencies etc - there are 3 very clear use cases for this type of communication that I can come up with, maybe there are a handful of others, let's gather those use cases then design something that works for the majority of them. I'm proposing we approach this like I would approach any sort of engineering effort.
I mean, this is my fundamental issue here - you're asserting this will get adoption, I'm asserting it will have challenges. But, if we're being objective, NONE of the server to client messaging models have gotten wide adoption. What's been widely adopted is client to server requests. So, given that's our only real data point, why are we introducing yet another speculative server to client model when we could design this better?
I'm referring to my comments not the overall PR. I'm saying this proposal has issues today, regardless of what's coming down the pike. |
|
I'm still not convinced that the SSE part of this is much of an issue — it follows nearly the same interaction flow as sampling, and I know for a fact that:
An SSE stream isn't a blocking channel, it's multiplexed unless explicitly stated otherwise in the spec. That's what things like JSON-RPC message IDs help handle, mapping a response back to its request when we're running many overlapping requests at once on a given stream or streams. It might be worth clarifying that in the transport specification, but I don't think it's the default assumption by any means. On auth, I think compartmentalizing that was probably the right move for now, because MITM is an inherent issue in multi-layered server setups, and that's an issue that needs to be solved in more than just this. It's not something any single feature can just give a solution for, it's something that needs to be addressed across the protocol in general. It affects regular tool calls, sampling, and elicitation, because in multi-layer setups all of those are things that can be introspected by intermediate servers. The out-of-band communication proposal comes closer to handling that within a standalone feature, but it still relies on having partial trust of the intermediate servers as of now. |
This is a great example - no one is arguing this is different than the sampling. The issue is both this and sampling require stateful servers (precluding easy deployment in serverless environments), which then leads to trouble scaling. It's also worth pointing out, sampling has essentially zero adoption, so, that begs the question should we really be using that as a model for new features? The number one feature by like, 99%, for adoption is tool calls, so how you elicit during tool calls should be a well thought out, easy to implement, feature. The issue with multiple SSE connections came up because jonathan was trying to figure out if it's possible to do this statelessly, and it kinda is, but it's quite complex and breaks the protocol.
With the auth escalation we'll now have a second model for server initiated messages asking for user input. So, I mean, it's fine, but it's just silly to approach them separately, and just bloats the protocol. The auth escalation and sensitive data acquisition issues are 100% the same thing, and should be dealt with holistically. I'll add one final thought, then I guess this is all done. I do actually think this will get adoption, but I think the primary use case will be server connect api key elicitation. I know that's explicitly against the spec, but for someone building a server that connects to an upstream service, that will be the best user experience to gather those keys, so everyone will do it. User experience will trump spec every day of the week. It's just another reason I think sensitive data and auth should be thought of holistically here. |
Got it, this is fair, actually. I think we should look at #543 in more detail for this - I believe that statefulness isn't actually a fundamental limitation of this interaction, but rather is a limitation of how SDKs represent it. We've discussed this exact issue with respect to sampling, and the same discussions and solutions should apply here, too.
Pretty sure we're in agreement here - for the record, I didn't mean to bring up out-of-band as a real solution, but rather as an example of yet another proposal that's trying to address it for one specific interaction pattern but makes no attempt to address it for the rest of the protocol. (also under no illusions about how people will re-appropriate this) |
PR Description: Draft for Elicitation Feature
This PR implements the elicitation feature for MCP, enabling servers to request additional information from users through the client. This feature was identified as a key improvement for MCP-based agents in #111 , #314 .
Motivation and Context
Many interactive workflows require servers to dynamically request additional information from users during execution. Examples include:
Until now, MCP lacked a standardized way for servers to request this information, requiring developers to implement custom solutions or multi-step tool calls. This feature provides a clean, consistent protocol for these interactions, completing the bidirectional communication path between servers and users.
How Has This Been Tested?
The implementation has been validated through documentation review and schema consistency checks. The design follows the established patterns of MCP, particularly mirroring the array-based response approach used in tool calls. The implementation deliberately keeps the feature simple while providing extensibility for future enhancements.
Breaking Changes
None. This is a new feature that adds capabilities without changing existing functionality.
Types of changes
[x] New feature (non-breaking change which adds functionality)
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[x] Documentation update
Checklist
[x] I have read the MCP Documentation
[x] My code follows the repository's style guidelines
[x] New and existing tests pass locally
[x] I have added appropriate error handling
[x] I have added or updated documentation as needed
Additional context
Implementation Details
The implementation follows a minimalist approach with a simple request/response pattern:
The array-based response design provides flexibility for clients to return multiple content items of different types (text, image, audio), similar to tool call results. This allows for rich responses like text with accompanying images.
Protocol
To request information from a user, servers send an
elicitation/createrequest:Request:
{ "jsonrpc": "2.0", "id": 1, "method": "elicitation/create", "params": { "message": "Please provide your GitHub username" } }Response:
{ "jsonrpc": "2.0", "id": 1, "result": { "content": [ { "type": "text", "text": "octocat" } ] } }Request with multiple responses:
{ "jsonrpc": "2.0", "id": 2, "method": "elicitation/create", "params": { "message": "What is your favorite color?" } }Response:
{ "jsonrpc": "2.0", "id": 2, "result": { "content": [ { "type": "text", "text": "Blue" }, { "type": "image", "data": "base64-encoded-image-data", "mimeType": "image/jpeg" } ] } }Message Flow
sequenceDiagram participant Server participant Client participant User Note over Server,Client: Server initiates elicitation Server->>Client: elicitation/create Note over Client,User: Human interaction Client->>User: Present elicitation UI User-->>Client: Provide requested information Note over Server,Client: Complete request Client-->>Server: Return user response Note over Server: Continue processing with new informationFuture Considerations
While the current implementation is intentionally simple, future improvements could include: