-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Resume tokens for long-running operations #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Resume tokens for long-running operations #1003
Conversation
83b1983 to
7ce64be
Compare
|
Are there benefits of your proposal that you identify over #925? |
|
@connor4312 the approaches are similar, but I would say the primary benefit would be simplicity and clarity of the direction of the relationship (server -> client)
In both cases, there's an assumption of client awareness of which operations are resumable. Additional information about expected completion windows and whether an operation is resumable may be useful hints provided in I like that both proposals have similar ideas and a desire to interoperate well with progress tokens, so think that's a positive sign that a feature like this one will exist in the near future. |
Sorry, I don't follow. For #925, resumability is also server-determined. And servers may close the connection immediately, if desired. Are you referring to the fact that clients are also allowed to close the connection while waiting for a result?
Not sure what you are referring to here. Are you referring to the
Also not sure what you are referring to here. For #925, all requests happen over a typical connection (e.g. an HTTP POST to
Could elaborate on this? I don't see anything about requesting a health state in the above proposal. |
Yes, I was referring to the requirement that the client advertise support for resumable operations. The
The health state is synonymous with the JSON-RPC state, so there's no need for a separate status check, just a re-issuing of the previous call with the |
To clarify, #925 adds the How does your proposal handle clients that don't support resumability? Also, what does the resume process look like if a disconnection occurs before the first response?
Assuming no additional work has been completed yet, how can a client verify that a task is still ongoing? |
A further question is would the tool be exposed to a client that can't resume requests? Does a tool have value even if it can't be resumed? Does a resumable tool have value if it's not resumed? Often resumable tools still have value, even if only the first call / result completes. Resuming improves the experience, a lack of resuming degrades it, but it's still useful.
This looks similar regardless of the proposal. A dropped client connection on send means that it's uncertain the server received the request. In #925 this drop could be either a failure to send the request or to receive a notification. Generally, clients handle intermittent failures with retries.
Resend the request with the last received |
In #925, yes, the tool is always exposed to the client. If the client does not support resumable requests, the server will treat the tool call normally (i.e. not send a In this proposal, are you saying that the tool is always exposed to client, but if the client does not support resumable requests, it will only get the first chunk of a result without knowing the result is incomplete?
In #925, the server sends a
If the first result has not been returned, and thus there is no What did you mean by "hold open a progress notification channel"? |
This proposed specification change introduces a backwards compatible, transport-agnostic, server-managed mechanism for resuming long- running operations over multiple requests between client and server. The technique is simple and requires no additional communication or notifications beyond standard tool-calling. By enabling a server to provide an opaque token to a client in order to continue receiving requests, this technique helps better support MCP within distributed systems and provides agents the option to decide whether resuming a long-running operation aligns with its objectives
7ce64be to
1c0092b
Compare
If the caller is not aware of the final state, they may need to resort to other methods to determine whether forward progress is possible. Sometimes this means calling another tool or set of tools to inspect state, introducing a custom set of behaviors for a specific tool which are functional but not standard, or living with the degraded experience. For example, Kubernetes API calls are eventually consistent. The server ACKs the modification right away; however, rollout takes time. If you want to model this as a resumable call, the call itself could say "yes, I've accepted the request and here's the initial state" and subsequent calls could say, "yes, the requested stated is still rolling out", etc. etc. Most eventually consistent APIs will follow a pattern of comparing a desired state with an actual one to indicate progress; however, as APIs, the information is split across two methods which appear unrelated. Modeling these methods as a single 'resumable' call simplifies the agent's task. Let's look again at K8s though because the conventional wisdom is that if a rollout is 'stuck' it could just be retried. The concept of 'stuck' is often a matter of how much time has passed. Hence, the agent can decide whether to try again, wait, or abandon the task entirely. Queries could likewise paginate results. The first page of results might be enough to make progress even though more records are available. The database could have a bespoke tool for pagination or page offsets, but ideally it wouldn't need one. The problem isn't knowing whether the operation is complete, so much as it is about standardizing how calls should represent such operations so that an agent can reason about them consistently while assessing whether the progress thus far aligns with its goals.
It's never safe to assume that the operation is running if the connection has been terminated before the client receives a response. This is a pretty classic retry case for most clients of remote systems.
This assumes that the resume token isn't specific to the operation being performed. In many cases different calls will have different tokens to represent signed server-specific state that a distributed system can use to understand how to proceed when another task in the system is handling the request other than the original one. |
That is equivalent to saying that the client must either support resumable requests or support a workaround for resumable requests.
I think this conflates the concept of resuming with the concept of advancing a state machine. For MCP, I see resuming as purely client-driven behavior, whereas advancing a state machine would be model-driven behavior. I think it's reasonable to use a token param for cases like that, but I think it should be part of the particular tool's definition, not part of the protocol. Consider cases where new (generated) input is required in order to advance to the next state.
When using the Streamable HTTP transport, if the server starts an SSE stream, then it is safe to assume that the operation has started (even if the server does not send an actual SSE event). Are you saying that the client should just retry in this case?
I think this goes back to my point about resuming vs advancing a state machine. |
Yes, but this would be consistent for all proposals listed in #982 where, for practical purposes, the connection cannot be held open for the duration required to complete the task.
I'm open to moving the notion of a
Regardless of transport, if a client request may not have reached the server, it should be retried. Resume tokens are not a replacement for |
|
One argument for using resume tokens would be persistence across disconnects. There was a discussion in the Go SDK regarding support for persisted resumability Session recovery / shared session storage for distributed MCP env |
|
Another issue would be wanting to use HTTP POST for transport over a load balancer. sequenceDiagram
participant Client
participant LoadBalancer
participant ServerA
participant ServerB
Note over Client: Initial tool call
Client->>LoadBalancer: HTTP POST /tool_call (no resumeToken)
LoadBalancer->>ServerA: Forward request
ServerA->>ServerA: Start operation<br/>Store state in memory
ServerA-->>LoadBalancer: Response with nextResumeToken = "abc123"
LoadBalancer-->>Client: Response with nextResumeToken = "abc123"
Note over Client: Second tool call (attempt to resume)
Client->>LoadBalancer: HTTP POST /tool_call (resumeToken = "abc123")
LoadBalancer->>ServerB: Forward request
ServerB->>ServerB: No knowledge of resumeToken<br/>State not found
ServerB-->>LoadBalancer: Error or re-initiate operation
LoadBalancer-->>Client: Error / inconsistent result
With persisted session: sequenceDiagram
participant Client
participant LoadBalancer
participant ServerA
participant ServerB
participant SharedStore
Note over Client: Initial tool call
Client->>LoadBalancer: HTTP POST /tool_call (no resumeToken)
LoadBalancer->>ServerA: Forward request
ServerA->>ServerA: Start operation
ServerA->>SharedStore: Save state with resumeToken "abc123"
ServerA-->>LoadBalancer: Response with nextResumeToken = "abc123"
LoadBalancer-->>Client: Response with nextResumeToken = "abc123"
Note over Client: Later resume attempt
Note over Client: Second tool call (attempt to resume)
Client->>LoadBalancer: HTTP POST /tool_call (resumeToken = "abc123")
LoadBalancer->>ServerB: Forward request
ServerB->>SharedStore: Fetch state using resumeToken "abc123"
SharedStore-->>ServerB: Return saved state
ServerB->>ServerB: Resume operation from saved state
ServerB-->>LoadBalancer: Response (maybe with new nextResumeToken)
LoadBalancer-->>Client: Resumed response
|
|
Completely agree on both counts. Load-balanced traffic and resuming through opaque context passed from server to client in the simplest manner possible was a key consideration. You articulated it very well @MathiasChristiansen The tokens are very tolerant of disconnections. There are startup edge cases that can be addressed through retry, but once some signal of state is communicated from server to client the task should be immune to transient network issues. |
Speaking for #925, if the client does not support resumable requests, then the server will behave exactly as it does now. Which is to say, there are three possible outcomes:
Those three outcomes are different from the client receiving only the first page of a result and not being aware of that fact.
I would like to focus more on this point. I'm not opposed to adding an official mechanism to support state machines (if it is superior to simple tool use), but I don't think we should call it "resume". I also think it would be useful to expand support to additional tools and inputs. As a rough sketch, instead of {
"jsonrpc": "2.0",
"id": 1,
"result": {
/* ... */
"next": {
"name": "step_1_tool",
"arguments": {
"cursor": "adef50"
}
}
},
}But it could also specify a different tool and override values for the arguments: {
"jsonrpc": "2.0",
"id": 2,
"result": {
/* ... */
"next": {
"name": "step_2_tool",
"arguments": {
"some_arg": "state generated by step_1_tool"
}
}
},
}And the client could generate values for non-overidden arguments using the model. |
The situation where the server "behaves as normal" is simply not practical or feasible in many cases which is what this proposal aims to address. If a tool can behave as normal, it shouldn't use a resume token scheme; however, if disconnection is a necessity for practical purposes (load balancing, duration of task, quantity of data), then you'll need to provide some opaque token back to the caller. If it's not clear work has started (no first response), then the caller should retry.
I mean, this happens now and the initial response should provide a reasonable amount of information to the caller which can be used for further processing, even if that response is just 'work started'. At least with this proposal there's a chance to become aware in a standardized and easy to adopt manner.
I don't think servers should get involved in an agent's task orchestration as it implies they know more than the callers. However, if you wanted to pack opaque state which the server can interpret as another tool call local to itself, that's something it can do without further client involvement or awareness. If agents are communicating with each other incremental progress, the MCP tool may represent a goal and the opaque state represent the next thing the receiving agent should do if the calling agent requests the goal be resumed. The |
|
I agree with @TristonianJones , MCP is a protocol, so it shouldn't dictate the functionality of the client and server, or exactly how the agent should behave. Resume tokens would provide a way for client and server to help recover, either if something disconnects or if transport happens over a more distributed system, e.g Load Balanced MCP servers. The client and server can then handle how they recover state themselves. People might want to create frameworks ontop of the official sdk's that would provide more granular state recovery. |
FWIW I agree with this, however I've been doing some further experiments and I think all the primitives that are needed are already built into the protocol for resume and async: modelcontextprotocol/python-sdk#1209 demonstrates how by externalising the request state logic (e.g. managing request ids and resume tokens associated with a request) the existing python client can then call a tool and later join that call to retrieve the result. I have tested this with a redis backed request state manager where the client is able to issue requests, attempt to retrieve the result via join, if this returns quickly then the client proceeds as normal, if it doesn't it parks the call and gets on with other tasks, it can then rejoin later (even over process restarts) and attempt to get any missed notifications or the result if it is complete. I think there may be at least three separate use cases being addressed by these various proposals:
For my use case this solves the problem of 1 & 2 and makes it simpler for the client to manage resumes, it does not address 3 for which resumeToken logic would still be required. |
|
As mentioned on Discord, I think for this to work, the result should explictly not be included. Consider: Which then gets this response: If this tool response is returned to the LLM, it will not be able to see the token (the client host has to manage that). And there is no mechanism by which it can repeat the tool call. If the client host repeats the tool call until it returns without a token: There are two problems:
So the only way this can work is if we do not return to the model until the final tool result happens. And if that's the case, then we should not include that part of the payload. This does add some complexity to implementation in the SDK, as the tool call response needs to be checked for a resumeToken, and if it is present, the rest of the body ignored. There also needs to be a new callback mechanism to the client host so it is aware that the tool call is pending with a resumeToken. And it needs to configurable how the subsequent calls happen, to check for eventual completion. In the diagram above the second tool call is shown, but there is a difference to the first - it will be initiated by the SDK or client host, not the model. This is a significant difference and creates a lot of complexity. My apologies if I have misunderstood parts of the proposal, but there seems to be a slight disconnect with how tool calls wire up to the LLM APIs, that make this quite complex to orchestrate on the client host side. I still think it's a valid proposal, as client host complexity is considered acceptable if it makes server author complexity go down. But I think some thought needs to go into how SDKs are supposed to configure this retry/resume logic, and I also think it should be explicitly stated that the tool payload fields MUST NOT be present in a resume token containing message. (whether call or result) |
|
Hi @PederHP
My thought was that MCP clients would likely hand off the control flow to agent code (perhaps codified into A2A) which would continue to issue requests (or not) based on presence of the resume token. I wouldn't expect the LLM to be involved in tool call param updates in the resume flow. There could be cases where you could indicate a preference for completion prior to agent analysis of the incremental results, but that feels like something best configured in LangChain/LangGraph or similar. Does that make sense? I view this proposal as a way for MCP to enable control flows which are also often necessary for server load-balancing or large result analysis or both in the case of paginated results. It also happens to have some other nice properties which make long-running calls more reliable/resilient |
I usually work at the level of MCP wired directly to API calls, sometimes with agentic orchestration,, but never really these larger frameworks like LangChain/LangGraph. I also did some implementation work on the C# SDK. I have written client hosts and servers in Python, TypeScript, and C#. I think it's important that the protocol does not make any assumptions about framework the server and client host authors are using or not using, and the semantics of the core building blocks, of which tools is perhaps the most significant, should be very clear. If I am finding the semantics unclear I think a lot of developers will too when reading the spec (if the change is added as-is) So let me rephrase my question: How is a client host that receives a resume token supposed to act? The change is only explicit about:
How should clients interpret the presence of a It essentially gives the client two options: consider this a failed call or switch to polling for the result. I think the spec should be more explicit about the expectations for the client. In the diagrams by @MathiasChristiansen it becomes very clear that the only way to actually get the result once a resume token has been returned is essentially by polling. (even if these can very often be hanging polls, so the client only has to poll once, assuming no disconnects).
Example:
Edit: Nah this is too awkward. But I think the SHOULD part is still relevant (even if as a MAY), but maybe add a server capability flag? My point is that client hosts that don't want to handle this shouldn't have to and servers should probably signal the capability. Ideally a client could during negotiation choose whether it wants resumeTokens or not. This also adds flexibility in terms of using this for control flow in whatever manner the client host wants to. And if I am server author it makes sense that I can provide both the options between 1) resume tokens and 2) synchronous / stalling (with progress notifications) to clients. |
Often providing different behaviors based on client capability is either not possible or practical for a single tool which would benefit from resume tokens. Instead, I think this points to a need for better tool listing where capabilities are expressed as filters. Servers could tag tools for added robustness, but I expect such tags would be used for filtering anyway -- so kind of two sides of the same coin.
The diagram in the proposal intentionally mirrors cursors. I'm happy to make the diagram clearer.
It can be treated as "not yet complete", though it's really tool specific. In many cases, eventually consistent operations will always 'ack' the request even though the target state hasn't been reached. So, in that sense the inclusion of a resume token in such an operation would be more accurate even if there's no observed functional change to the client. |
But how is the client supposed to know this? Tools are generally considered, and treated as, "plug and play" primitives. Which in many ways is what led to the exploding popularity of MCP I would claim. ListTools, hand off to the LLM, and everything else is already part of LLM APIs. This feature is different. LLMs cannot handle it directly. That's not to say that this fact should disqualify it - ToolAnnotations are also not handled at the model level. But I do think it means that the spec should be somewhat clear on how to handle these tokens. If it's tool specific, then the client (host) author has to either also be the server author, or they have to read the documentation for the tool (server) and write code accordingly. That seems wrong to me. If I am writing an IDE extension or conversation chat client, I should be able to know how to interpret the presence of a resume token without knowing anything about the tool or server. I know I sound rather negative, but I actually think the problem you're solving is legitimate and I am also not entirely opposed to the solution. I just think there are aspects of how it is integrated into the tool capability which will make it useful only if one is writing both client and server. That's actually the most common scenario for my org these days, so this feature would slot perfectly into most of my daily work. But the non-agentic stuff, the situations where I need to create servers for arbitrary client hosts. They should have guidance on how to interpret the token. After writing my earlier response, it occurred to me I would actually prefer the token in the I think that's a better approach than putting it in the requests. Primarily because LLMs are inherently synchronous. And tools are a model level capability. Adding an asynchronous mechanism directly in the request/response exchange makes life too complicated for implementers I think, and would make this a mostly extension-like facility, which would be a shame because resuming, eventual consistency, better robustness/resilience for long-operations - all the stuff you mention - is something we really need more of for MCP to be more suitable for enterprise and agentic systems. Thank you for your patience in addressing my questions/concerns. I agree on the need for this, even if I might not be entirely in agreement on all of the particulars. |
Asynchronous support is what this proposal aims to address. There are many tools which cannot run in a synchronous manner, and MCP needs to evolve to enable such calls. In some cases you can do what you need to accomplish synchronously, and MCP will pretty much work as-is in these cases. Changes could make the resume step / status more ergonomic in the synchronous case, but they don't address the asynchronous need present for agent-to-agent or long-running operations that involve human actions. With the feature proposed, you could introduce some built in support to clients (though I think client capabilities and tool filtering need an overhaul), but there needs to be a pressure relief valve to enable new design patterns that compose with MCP Hopefully that makes sense, @PederHP, and I appreciate your comments and questions |
|
@TristonianJones I know the SEP states
And that the SEP guidelines state that there doesn't necessarily need to be a reference implementation before being accepted, just before being marked final. That said, I really think that this SEP would benefit from a reference impl now. Questions like the one @PederHP raised above point to the kind of issues that it would not be great to discover after the proposal is accepted. I built a reference for a similarly complex SEP in a day. The process was helpful to me, in that I could see what the impact to the codebase would be and how easy it would be to code against it. Considering the fact that there are at least two SEPs jockeying for position here, a reference would go a long way toward illustrating how various scenarios would be handled and make choosing this proposal an easier decision. |
Not sure I agree with this - the LLM isn't waiting for a Call Tool Result. There's some overlap with the Transport Working Group discussion on sessions here. It's possible of course for an MCP Server to offer "batch" style tools that provide short synchronous calls to dispatch, list and collect results from long running jobs. The challenge is that we want the Host application to know when to assemble the next inference based on that batch completing. If I've read this correctly, the Host application is responsible and maintaining the set of unfulfilled tokens, and polling the CallToolRequest method with valid tokens to determine whether Tool Call(s) are complete? |
That's correct, it's a host application decision to continue the call. You could use this in combination with progress notification or bidirectional streaming to improve the freshness of results, but polling works just fine. Often the poll interval is either determined by the host or hinted at by the server in the tool metadata |
Maybe I am misunderstanding you, but the LLM does require a tool call to always be followed by a tool result. curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "What is the weather in Paris?"
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "I will check the weather in Paris for you."},
{
"type": "tool_use",
"id": "tool_123",
"name": "get_weather",
"input": {"location": "Paris"}
}
]
},
{ "role": "user", "content": "Never mind, tell me a joke instead" } ]
}'This fails with: {"type":"error","error":{"type":"invalid_request_error","message":"messages.2: `tool_use` ids were found without `tool_result` blocks immediately after: tool_123. Each `tool_use` block must have a corresponding `tool_result` block in the next message."}}Whereas if you do the expected: curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "What is the weather in Paris?"
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "I will check the weather in Paris for you."},
{
"type": "tool_use",
"id": "tool_123",
"name": "get_weather",
"input": {"location": "Paris"}
}
]
},
{ "role": "user","content": [ { "type": "tool_result","tool_use_id": "tool_123","content": "Temperature: 22 C, Sunny"} ]} ]
}'It does work. (Sorry for the janky formatting.) I am sorry if I was unclear in the point about LLMs and synchronousness - what I am trying to say is that LLM APIs do not allow any messages other than a This is why the semantics of multiple Tool Call results are not great. If it means to just wait for the final one - then that should be more clear from the proposal (ie the payload is considered cumulative). If the payload holds different messages per response (including status messages), than that is not actually compatible with LLM APIs and implementing this capability will be difficult to impossible. Hence why I think the SEP which is notification based is better - it makes it unambiguous when the fool call is to be considered resolved towards the LLM, and it allows for full host flexibility in terms of handling the notifications. @TristonianJones I hope this also shows why I have trouble with the multiple results. The inherent constraints at the LLM API level make it difficult to know how to implement this, and it's not possible to append subsequent |
|
@PederHP - the inference engine has quiesced at the point of issuing the That's why it makes sense if you want to continue the conversation to use Tools with batch semantics, that return immediately with a tracking identifier. @TristonianJones I don't know if you have a sample implementation? - this is something I'd be quite happy to pick up as a demo in |
|
Thanks @evalstate, I just started looking at a sample implementation, but don't know that I'll have bandwidth until tomorrow. The behavior should be almost exactly as you describe. I think the request was to have the Let me know what you need and/or what bandwidth you have and we'll go from there! |
|
Just chiming in, as I've recently done some work that is loosely coupled to some of these concepts. In my own fork, I've already added client->server metadata (_meta) per request: evalstate/fast-agent#340 While researching, I came across many discussions on the use of _meta : The ability for bi-directional metadata would allow this to be implemented outside of FastAgent, or as a third party library fairly easily without adding complexity to FastAgent (or internal use of the _meta functionality within FA would greatly reduce the complexity of implementing it within FA) |
I didn't mean the inference engine. There's no stipulation one even has to use the same one. I am observing this from the perspective of the LLM thread, and the fact that dangling tool calls are not allowed. As there seemed to be an expectation among some that this was allowed. That was all. |
|
@pwwpche passed along this example of the interaction model for resume tokens: |
|
Another backwards compatibility concern that some MCP proxies might assume that the request with the "resumeToken" entirely new since it looks so similar to a normal request. I'm not aware of anything specific that would break though, and it would probably be fixed in time if the feature became widely used. Another option might be to use an entirely new method name like "operation/resume". It's not clear to me there's much value in repeating the whole request if the server is expected to use the resumeToken to track operation state already. |
Yes, that will be a possible outcome. If a proxy returns a tool that contains a resume token, I think the expectation should be that the proxy would pass through the call without modifying the content, or perhaps, only modifying very specific fields while preserving the fields it doesn't recognize (a best practice, IMO)
I'm flexible on this one, but some applications create hmac signatures over requests and include the signature in the resume token to validate that the context provided in the resumed request corresponds to a prior request. I believe the concern is related to token-hijacking. You could pack the entire request into the resume token, but that would be potentially expensive as well. |
|
See #1391 for a continued discussion around long-running operations. |
Preamble
Resumable tool calls with stateful server-side tokens
Abstract
This proposal introduces a standardized mechanism for handling long-running and resumable operations within the Model Context Protocol (MCP). The introduction of two new optional fields,
resumeTokenandnextResumeToken, to the MCP tool call params and result payloads, respectively allows an MCP tool to indicate that an operation is stateful and can be continued across multiple interactions without changing any other call parameters.This enhancement can be used to support use cases like database query pagination, monitoring eventually-consistent systems, and other asynchronous workflows where the caller may need to resume an operation after some analysis of its current state, or as a result of interruption, or both.
Motivation
The current MCP specification includes a
progressTokenmechanism for receiving progress updates and handling cancellations. However, this is designed for ephemeral, side-channel notifications within a single client connection. It does not address the need for a durable, stateful mechanism to resume an operation across different connections, in a highly distributed serving environment, or after a significant delay.Many critical operations are long-running by nature. For example:
readystate.In these scenarios, agents needs both the pause to determine what to do, and a way to continue the operation from where it left off. This proposal provides a simple, general-purpose technique directly within the tool call/result specification to model these interactions, giving the agent the control to decide whether to resume or abandon the operation.
Rationale
The proposed design places the
resumeTokenandnextResumeTokenfields within the RPC call params and result. This was a deliberate choice to separate the continuation logic from the core tool parameters (params) and without the need to depend on notifications which imply a consistent connection between a caller and receiver which may not hold true in a distributed system.Note
The
resumeTokenandnextResumeTokencould be moved into the_metafield of the call and result respectively and the proposal remains unchanged. Placing the token in the metadata would make it easier for servers to validate thatparamshave not been modified between invocations; however, by placing the resume tokens in the core payloads, the contract between client and server is more self-evident from the tool schema.This approach is complementary to a
progressToken, but distinct in its function and purpose:An alternative of wrapping the tool call in higher-order object a was considered but rejected. This would add unnecessary complexity to the call structure and shift the responsibility of managing the task away from the agent and into the MCP client, which is undesirable. The proposed design keeps the agent in full control of its decision-making process and the tools focused on delivering information.
Specification
This SEP introduces two new optional fields within the MCP tool call params and result payload.
A
nextResumeTokenis a string-typed opaque token which a server may provide in result payload. The caller may reissue the same request and resume the operation if it specifies the server response in theresumeTokenfield within the tool call parameters.The lifecycle and validity of the
resumeTokenare at the discretion of the MCP tool provider, and may be rejected if the token is expired or otherwise invalid.Example Flow
When a server wants to communicate to a client that a call may be resumed, it includes a
nextResumeTokenin the result payload. If the caller chooses to resume the call it provides this value in theresumeTokenfield in addition to the original call params.The caller initiates a new long-running operation as usual:
{ "jsonrpc": "2.0", "id": 1, "method": "long_running_method", "params": { "param_str": "value", "param_num": 2 } }The server returns a result to the caller and indicates that the request may be resumed. The request may terminate or remain open as this call flow is transport and streaming agnostic.
{ "jsonrpc": "2.0", "id": 1, "result": { ..., "nextResumeToken": "adef50" }, }The opaque
nextResumeTokencommunicates to the caller that the request may be resumed by providing theresumeTokenin the call params.{ "jsonrpc": "2.0", "id": 1, "method": "long_running_method", "params": { "param_str": "value", "param_num": 2, "resumeToken": "adef50" } }The operation is considered complete when the server provides a response which does not contain a
nextResumeToken.sequenceDiagram participant Client participant Server Client->>Server: Request (no resume token) loop Resume Loop Server-->>Client: Result + nextResumeToken Client->>Server: Request (with resumeToken) endBackwards Compatibility
Adding
resumeTokenandnextResumeTokenas optional fields with semantic meaning could impact some existing tools which already specify fields with these names. However, the majority of tools will be unaffected, and tools can opt into this functionality in a manner which is transparent to the caller.Note
If the
resumeTokenandnextResumeTokenwere placed into the_metafield the change would becompletely backward compatible; however, the call contract and capabilities of the tools would be less
clear.
This feature is transport agnostic and works well with streaming either from callers or receivers
Reference Implementation
A reference implementation demonstrating the use of
resumeTokenandnextResumeTokenin a sample tool will be provided once this SEP is accepted.Security Implications
Service providers implementing this SEP must ensure resume tokens are handled securely.
resumeTokenas untrusted input and validate it before use to reduce man-in-the-middle attacks.