-
Notifications
You must be signed in to change notification settings - Fork 1.2k
SEP-975: Transport-agnostic resumable requests #925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEP-975: Transport-agnostic resumable requests #925
Conversation
27c8da4 to
c838ac3
Compare
c838ac3 to
c0f270c
Compare
|
@connor4312 @jonathanhefner This should be a SEP and should have an associated issue. Since you are maintainers, you are free to set yourself as sponsors (by assigning one of you the issue). Once you have an associated issue, I'll give you a SEP number. |
|
|
||
| After the resume policy is sent, both the client and the server **MAY** disconnect at will. This allows servers to handle long-running requests without maintaining a constant connection. | ||
|
|
||
| After a disconnection, clients can resume the request by sending a [`requests/resume`][] request with the **same ID** as the original request, plus the server-issued token as a parameter. If the ID and token are valid per the resume policy, the server **SHOULD** reset policy-related timers, send any pending messages (e.g., progress notifications), and then continue as if it were handling the original request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a kind of creative flow. I wonder if we could just have this as a notification that contains a result: ServerResult rather than resuming on the exact same ID. That seems like it might involve less special-casing for clients (e.g. no need to 'reserve' event IDs for requests that might get resumed later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. Are you saying have requests/resume as a notification? Where does result: ServerResult fit in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean having the response to the resumed request come in the form of a notification. requests/resume would still respond with a success or error, and then the result would later come via a notification like { method: 'notifications/requests/resumedCompleted', params: { resumeToken: 'foo', response: { /* ServerResult */ } } }, rather than being a 'normal' reply reusing the old event ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. In my opinion, that goes in the wrong direction. I would like to make resumable requests be as congruent to normal requests as possible. That way servers can emit messages without being concerned about the request "mode".
For example, a tool may emit a normal response message, but then the delivery fails. When the client resumes the request, the server should be able to just replay the message without rewriting it. Or, for example, servers in a distributed architecture can emit response messages without knowledge of whether the front-end server has disconnected.
Joffref
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good — I prefer this proposal over the previous one.
Small side question: how should the server react if a request is resumed twice? Do we drop everything on the first resume, or do we keep it?
I know it’s a bit of a silly question, since it would imply two clients trying to get the same response for a tool call that was only initiated once — but it’s still interesting to define the boundary just in case.
| * - `"failed"` indicates that the server has a final response, but the | ||
| * response is an error. | ||
| */ | ||
| status: "processing" | "completed" | "failed"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t we want a 'pending' status as well? For example, if my request depends on another event, like a pending validation?
I know it's already covered by hasPendingMessage and hasInputRequest, but I’m sure there will be cases where this is done out-of-band.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we add "pending" there may be an expectation that the server accurately reports whether it is "processing" or "pending". (Hypothetically, a server could be "processing" even when hasInputRequest is true.)
The problem with that expectation is that the state of the server is not observable from just the JSON-RPC messages it emits. For example, if you have a back-end server that is emitting messages, and a front-end server that is answering requests/getStatus, the front-end server wouldn't know whether the back-end server is "processing" or "pending" without some other communication channel (beside the JSON-RPC message queue).
It's doable, but I'm not sure if we want to bake that kind of expectation into the protocol.
|
|
||
| After the resume policy is sent, both the client and the server **MAY** disconnect at will. This allows servers to handle long-running requests without maintaining a constant connection. | ||
|
|
||
| After a disconnection, clients can resume the request by sending a [`requests/resume`][] request with the **same ID** as the original request, plus the server-issued token as a parameter. If the ID and token are valid per the resume policy, the server **SHOULD** reset policy-related timers, send any pending messages (e.g., progress notifications), and then continue as if it were handling the original request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to maintain two IDs here? I believe only the requestId is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this during the meeting earlier, but for posterity: Instead of a requestId param, the requests/resume request has the same JSON-RPC message ID as the original request, i.e. the same value for the id property.
I've changed the language to "same message ID as the original request", and I've changed the notation in the diagram to better distinguish between message IDs and params.
|
@jonathanhefner is there a prototype PR in any of the SDKs? (ideally Python or Typescript)
How this will work for stdio? |
This adds a transport-agnostic mechanism for resuming requests after
disconnections. Using this mechanism:
- Clients and servers can disconnect and reconnect without losing
progress.
- Servers can communicate expire-after-disconnect timeouts and reclaim
resources thereafter.
- Clients can check request status after disconnect without having to
fetch undelivered messages.
- All of the above works regardless of transport (HTTP, WebSocket,
stdio, etc.).
__Motivation__
- **Addressing limitations of resumability when using the Streamable
HTTP transport.**
- SSE-based resume requires the server to send at least one event in
order for the client to obtain a `Last-Event-ID`. If a connection
is lost before an event is sent, there is no way for the client to
resume the SSE stream. This is especially problematic because the
spec currently says that disconnection should not be interpreted as
the client cancelling its request.
- The spec does not indicate whether a server can delete previously
missed SSE events once they have been confirmed delivered by a
resume. The spec could explicitly allow this, but resuming is done
via HTTP GET, and HTTP GET requests should be read-only.
- There is no mechanism for a server to communicate that it will
expire a request after a certain duration of client inactivity.
- **Extending resumability to other transports.**
- Because resumability is defined by the transport layer, the burden
of creating new or custom transports is higher.
- If each transport defines its own version of resumability, it is
more difficult to develop MCP features without accounting for (or
relying on) the nuances of a particular transport.
- **Enabling robust handling of long-running requests such as tool
calls.**
- The spec does not allow servers to close a connection while
computing a result. In other words, servers must maintain
potentially long-running connections.
- There is no mechanism for a client to check the status of a request
after disconnection without having to fetch undelivered messages.
This commit addresses the above issues in the following ways:
- The server sends `notifications/requests/resumePolicy` notification as
soon as possible after determining a request should be resumable.
This causes the Streamable HTTP transport to send a usable
`Last-Event-ID` to the client.
- Because a client resumes using a request ID instead of solely an event
ID, there is no expectation for servers to retain messages that have
been confirmed delivered. Furthermore, for the Streamable HTTP
transport, `requests/resume` is sent via POST, not GET, allowing
servers to delete delivered messages as part of the resume request.
- The `notifications/requests/resumePolicy` notification includes an
optional `maxWait` parameter, informing the client of the maximum
number of seconds it may wait after a disconnection before resuming
the request or checking its status. After this time has elapsed, the
server MAY cancel the request and free all associated resources.
- Because resumability is handled at the application layer via
`notifications/requests/resumePolicy` and `requests/resume`, it works
the same for all transports.
- After sending a `notifications/requests/resumePolicy` notification,
the server is allowed to disconnect at will. Thus the server is not
required to maintain a long-running connection.
- The client can use `requests/getStatus` to check the status of a
request after disconnection without having to fetch undelivered
messages.
Co-authored-by: Connor Peet <connor@peet.io>
c0f270c to
73d9018
Compare
There is currently no prototype. @connor4312 (or @almaleksia) had a prototype for #899, but this SEP evolved out of that one.
I think that depends on whether the server is actually disconnected or not. For example, if a server is in a container, and the container is suspended, then there isn't really a disconnection. Or, likewise, if the server is a background process that is writing to a named pipe, and the client intermittently stops reading from the pipe, then there isn't really a disconnection. If there is no disconnection, then there is no need to resume a request (via However, in the case where a stdio server is actually shut down, then I would expect (1) the server to resume processing on boot, and (2) the client to eventually resume the request and receive any queued messages. (By the way, I don't think this flow is specific to stdio. HTTP-based servers may also be subject to reboots. For example, when deploying a new version of the server.) |
|
There was an excellent discussion about this in the working group meeting (see notes here). I'd like to better understand which use cases wouldn't be addressed if we instead:
Could you help identify specific scenarios where this approach might fall short? Understanding these edge cases would help us determine whether we truly need a new abstraction or if we can achieve the same goals by fixing what already exists. |
Sure! With regard to sessions, I agree we should lift session ID into the protocol layer (while also keeping the HTTP header for routing purposes). Around that topic, the Transports WG is working on some specific proposals that clarify the usage of sessions, enable sessions regardless of transport, and better support sessionless / stateless servers. (There has been a strong push from both Google and Microsoft to better support stateless servers.) We can go into details in #984. With regard to improving the Streamable HTTP transport, we would need to:
After those changes, the remaining shortcomings I can think of are:
|
|
This SEP was declined at the core maintainers meeting, but we will try to address some of the relevant concerns in future proposals. |
Preamble
Transport-agnostic Resumable Requests
Authors: Jonathan Hefner (jonathan@hefner.pro), Connor Peet (connor@peet.io)
Abstract
This proposal describes a transport-agnostic mechanism for resuming requests after disconnections. Using this mechanism:
Motivation
Last-Event-ID. If a connection is lost before an event is sent, there is no way for the client to resume the SSE stream. This is especially problematic because the spec currently says that disconnection should not be interpreted as the client cancelling its request.Specification
resumableRequestscapability, a server MAY send anotifications/requests/resumePolicynotification when responding to a request. The notification will specify the resume policy for the request in the event of disconnection, and will include a token that the client can use to resume the request.requests/getStatusrequest to get the status of the original request without fetching pending messages. If the parameters of therequests/getStatusrequest are valid per the request policy, the server SHOULD reset policy-related timers and then return the status of the original request.requests/resumerequest with the same message ID as the original request, plus the server-issued token as a parameter. If the ID and token are valid per the resume policy, the server SHOULD reset policy-related timers, send any pending messages (e.g., progress notifications), and then continue as if it were handling the original request.sequenceDiagram participant Client participant Server Client->>+Server: Request (e.g., tools/call)<br>{ id: 123, params: { ... } } Server-->>Client: notifications/requests/resumePolicy<br>{ params: { requestId: 123, resumeToken: "abc" } } loop Server-->>Client: Messages (e.g., notifications/progress) end Server--x-Client: Disconnection occurs Note over Client: Client checks request status (optional) opt Client->>+Server: requests/getStatus<br>{ params: { requestId: 123, resumeToken: "abc" } } Server-->>-Client: GetRequestStatusResult end Note over Client: Client decides to resume Client->>+Server: requests/resume<br>{ id: 123, params: { resumeToken: "abc" } }<br>[Same `id` as original request] Server-->>Client: Undelivered messages loop Server-->>Client: Messages (e.g., notifications/progress) end Server-->>-Client: CallToolResult<br>{ id: 123, result: { ... } }Rationale
The above specification addresses the issues outlined in the Motivation in the following ways:
notifications/requests/resumePolicynotification as soon as possible after determining a request should be resumable. This causes the Streamable HTTP transport to send a usableLast-Event-IDto the client.requests/resumeis sent via POST, not GET, allowing servers to delete delivered messages as part of the resume request.notifications/requests/resumePolicynotification includes an optionalmaxWaitparameter, informing the client of the maximum number of seconds it may wait after a disconnection before resuming the request or checking its status. After this time has elapsed, the server MAY cancel the request and free all associated resources.notifications/requests/resumePolicyandrequests/resume, it works the same for all transports.notifications/requests/resumePolicynotification, the server is allowed to disconnect at will. Thus the server is not required to maintain a long-running connection.requests/getStatusto check the status of a request after disconnection without having to fetch undelivered messages.Future Work
_metaparameter for the request. Upon completion of the request, if the client is disconnected, the server could send the request ID to the webhook. The webhook host could then send a notification (e.g. push notification) to the client, and the client could resume the request to receive the result.resources/subscribe/resumablemethod. See Proposal: Transport-agnostic resumable streams #543 (comment) for a proximal discussion.requests/resume/allandrequests/getStatus/all, or maybe something more closely integrated with sessions (e.g. asessions/resumemethod).Alternatives
#899: Transport-agnostic resumable streams
This proposal is a simplified version of #899. This proposal focuses on making JSON-RPC requests resumable in a transport-agnostic way, whereas #899 proposes a more general transport-agnostic mechanism (streams).
In terms of functionality, the two are mostly equivalent, but for this proposal, resumability is bounded by the JSON-RPC request message and response message. Thus, with this proposal, resumability cannot begin with a JSON-RPC notification, nor can it extend beyond a JSON-RPC response (whereas both of those things are possible with #899).
Resource-based approaches
Resource-based approaches propose assigning a resource URL to a tool call result so that the client may read it at a later time. This requires modifying the definition of resources to accommodate the
CallToolResulttype, which does not have a 1-to-1 mapping with theTextResourceContents/BlobResourceContentstypes. It also requires modifying the definition of resources such that resources may be "not ready", which in turn impacts all existing clients and servers that use resources.More critically, though, resource-based approaches require distinct handling mechanisms for each message type other than
CallToolResult. Fundamentally, the output of a request, such as a tool call, is a sequence of messages, even if the cardinality is 1 in many cases. If we try to represent the output as a resource, then we must define ways to handle messages that do not fit in a resource, such as progress notifications and sampling requests. Each message type that we introduce would need consideration about how it would work with "resource-ended" requests versus "normal" requests.A resource-based approach would increase the number of provisions the spec must make, increase the number of code paths required for implementation, and increase the potential for incompatibilities when extending the spec.
#650:
tools/async/callvstools/call#650 proposes adding a new type of tool call,
tools/async/call. When a client calls a tool viatools/async/call, the server returns aCallToolAsyncResultresponse which includes a token. The client can then use the token to check the status of the tool call viatools/async/status, and to fetch the tool call result viatools/async/result.There is some overlap between #650 and this proposal, such as using tokens and having a dedicated polling method, but there are some important differences:
With #650, the client drives the decision of whether the tool call is async. This means the server cannot make the decision based on input arguments or session state.
#650 requires the server to implement an additional form of persistence for tool call results, separate from the message queue it must already implement for resumability.
Because
tools/async/resultonly captures the tool call result, #650 effectively requires the client to stream from the GET/mcpendpoint. Otherwise, the client may miss server-sent requests (e.g. sampling requests) that would block tool call progress.Thus, #650 is still affected by the same problems listed in this proposal's "Motivation" section. For example, if a disconnection occurs before the client receives an event ID on the GET
/mcpendpoint, and the server sends a sampling request, then the tool call would be blocked until it expires because the client would have no way to get the sampling request.Furthermore, it begs the question: if the client must stream from that (or any other) endpoint, why not also send the tool call result on that stream? (If the answer is to make the result fetchable separately from the stream, that can be achieved with resource links instead.)
#1003: Resume tokens for long-running operations
Essentially, #1003 is cursor-based pagination of results. In order to benefit from the proposal, a method must divide its result into chunks. Calls to retrieve each chunk are affected by the same problems listed in this proposal's "Motivation" section. If a result is divided enough, the problems could be mitigated, however each chunk will require an additional round trip. Also, #1003 does not apply when a result is indivisible, such as for a long-running computation that computes a singular value.
Other differences:
#1003 assumes client support; it does not define additional client capabilities nor consider them. If a client does not support the proposal, it will only receive the first chunk of the result. If the proposal were to define an additional client capability, it is not clear how result chunks could be automatically combined to support clients without the capability.
Note: if we decide we want to assume client support, this proposal (#925) can drop the
resumableRequestsclient capability. Everything else will work as expected.With #1003, the only way for a client to check the status of a request is to resume the request. If the server does not return an error, then the request is still ongoing.
Note: if we decide we don't want to support a dedicated polling mechanism, this proposal (#925) can drop the
requests/getStatusmethod. Everything else will work as expected.Backwards Compatibility
This feature is backward compatible because clients must opt in by advertising the
resumableRequestscapability, and servers have no obligation to send anotifications/requests/resumePolicynotification.Security Implications
The
resumeTokenthat the server issues as part of thenotifications/requests/resumePolicynotification should be treated as sensitive information because it can be used to access messages related to the request.