Skip to content

Conversation

@TristonianJones
Copy link

@TristonianJones TristonianJones commented Jul 18, 2025

Preamble

Resumable tool calls with stateful server-side tokens

  • Author(s): Tristan Swadell, Philip Stephens, Todd Segal
  • Status: proposal
  • Type: Standards Track

Abstract

This proposal introduces a standardized mechanism for handling long-running and resumable operations within the Model Context Protocol (MCP). The introduction of two new optional fields, resumeToken and nextResumeToken, to the MCP tool call params and result payloads, respectively allows an MCP tool to indicate that an operation is stateful and can be continued across multiple interactions without changing any other call parameters.

This enhancement can be used to support use cases like database query pagination, monitoring eventually-consistent systems, and other asynchronous workflows where the caller may need to resume an operation after some analysis of its current state, or as a result of interruption, or both.

Motivation

The current MCP specification includes a progressToken mechanism for receiving progress updates and handling cancellations. However, this is designed for ephemeral, side-channel notifications within a single client connection. It does not address the need for a durable, stateful mechanism to resume an operation across different connections, in a highly distributed serving environment, or after a significant delay.

Many critical operations are long-running by nature. For example:

  • A database query might return a vast number of records that are best consumed in pages.
  • An API call to a system like Kubernetes might initiate a change that takes time to reach a consistent ready state.
  • A human-in-the-loop workflow may be paused indefinitely pending user input.

In these scenarios, agents needs both the pause to determine what to do, and a way to continue the operation from where it left off. This proposal provides a simple, general-purpose technique directly within the tool call/result specification to model these interactions, giving the agent the control to decide whether to resume or abandon the operation.

Rationale

The proposed design places the resumeToken and nextResumeToken fields within the RPC call params and result. This was a deliberate choice to separate the continuation logic from the core tool parameters (params) and without the need to depend on notifications which imply a consistent connection between a caller and receiver which may not hold true in a distributed system.

Note

The resumeToken and nextResumeToken could be moved into the _meta field of the call and result respectively and the proposal remains unchanged. Placing the token in the metadata would make it easier for servers to validate that params have not been modified between invocations; however, by placing the resume tokens in the core payloads, the contract between client and server is more self-evident from the tool schema.

This approach is complementary to a progressToken, but distinct in its function and purpose:

Feature Progress Token Resume Token
Purpose Caller-provided token to receive progress updates Receiver-provided token to indicate a stateful/resumable operation
Interaction Notifications indicate progress for cancellation or awaiting a result Multiple call/result pairs indicate stages of completion
Channel Additional structured notification message via a side-channel No additional message structure; contained in the core params and result payloads
Durability Ephemeral; alive only during a single client connection Durable across connections

An alternative of wrapping the tool call in higher-order object a was considered but rejected. This would add unnecessary complexity to the call structure and shift the responsibility of managing the task away from the agent and into the MCP client, which is undesirable. The proposed design keeps the agent in full control of its decision-making process and the tools focused on delivering information.

Specification

This SEP introduces two new optional fields within the MCP tool call params and result payload.

A nextResumeToken is a string-typed opaque token which a server may provide in result payload. The caller may reissue the same request and resume the operation if it specifies the server response in the resumeToken field within the tool call parameters.

The lifecycle and validity of the resumeToken are at the discretion of the MCP tool provider, and may be rejected if the token is expired or otherwise invalid.

Example Flow

When a server wants to communicate to a client that a call may be resumed, it includes a nextResumeToken in the result payload. If the caller chooses to resume the call it provides this value in the resumeToken field in addition to the original call params.

The caller initiates a new long-running operation as usual:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "long_running_method",
  "params": {
    "param_str": "value",
    "param_num": 2
  }
}

The server returns a result to the caller and indicates that the request may be resumed. The request may terminate or remain open as this call flow is transport and streaming agnostic.

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    ...,
    "nextResumeToken": "adef50"
  },
}

The opaque nextResumeToken communicates to the caller that the request may be resumed by providing the resumeToken in the call params.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "long_running_method",
  "params": {
    "param_str": "value",
    "param_num": 2,
    "resumeToken": "adef50"
   }
}

The operation is considered complete when the server provides a response which does not contain a nextResumeToken.

sequenceDiagram
    participant Client
    participant Server

    Client->>Server: Request (no resume token)
    loop Resume Loop
      Server-->>Client: Result + nextResumeToken
      Client->>Server: Request (with resumeToken)
    end
Loading

Backwards Compatibility

Adding resumeToken and nextResumeToken as optional fields with semantic meaning could impact some existing tools which already specify fields with these names. However, the majority of tools will be unaffected, and tools can opt into this functionality in a manner which is transparent to the caller.

Note

If the resumeToken and nextResumeToken were placed into the _meta field the change would be
completely backward compatible; however, the call contract and capabilities of the tools would be less
clear.

This feature is transport agnostic and works well with streaming either from callers or receivers

Reference Implementation

A reference implementation demonstrating the use of resumeToken and nextResumeToken in a sample tool will be provided once this SEP is accepted.

Security Implications

Service providers implementing this SEP must ensure resume tokens are handled securely.

  • Opacity: Tokens should not expose any internal or sensitive information.
  • Integrity: Tokens may be signed or encrypted to prevent tampering by the client.
  • Lifecycle: Tokens should have a defined lifecycle (e.g., expiration time) to mitigate the risk of replay attacks and prevent indefinite resource consumption on the server.
  • Validation: The server should treat every received resumeToken as untrusted input and validate it before use to reduce man-in-the-middle attacks.

@TristonianJones TristonianJones requested review from a team July 18, 2025 21:24
@connor4312
Copy link
Contributor

Are there benefits of your proposal that you identify over #925?

@TristonianJones
Copy link
Author

@connor4312 the approaches are similar, but I would say the primary benefit would be simplicity and clarity of the direction of the relationship (server -> client)

  • Resumability is completely server-determined which works very well with highly distributed systems and eventually consistent APIs where connections are closed immediately even though the desired state is not yet reached.
  • No configuration from the client is needed
  • No additional reserved routes or communication channels
  • Status is communicated through existing means by issuing subsequent calls with a resumeToken, either in the form of request health state (success, error) or completeness with the optional nextResumeToken.

In both cases, there's an assumption of client awareness of which operations are resumable. Additional information about expected completion windows and whether an operation is resumable may be useful hints provided in tools/list metadata which agents could use to determine whether to resume and on what time horizon the goal should be revised.

I like that both proposals have similar ideas and a desire to interoperate well with progress tokens, so think that's a positive sign that a feature like this one will exist in the near future.

@jonathanhefner
Copy link
Member

but I would say the primary benefit would be simplicity and clarity of the direction of the relationship (server -> client)

  • Resumability is completely server-determined which works very well with highly distributed systems and eventually consistent APIs where connections are closed immediately even though the desired state is not yet reached.

Sorry, I don't follow. For #925, resumability is also server-determined. And servers may close the connection immediately, if desired.

Are you referring to the fact that clients are also allowed to close the connection while waiting for a result?

  • No configuration from the client is needed

Not sure what you are referring to here. Are you referring to the resumableRequests client capability?

  • No additional reserved routes or communication channels

Also not sure what you are referring to here. For #925, all requests happen over a typical connection (e.g. an HTTP POST to /mcp if using the Streamable HTTP transport). Could you clarify?

  • Status is communicated through existing means by issuing subsequent calls with a resumeToken, either in the form of request health state (success, error) or completeness with the optional nextResumeToken.

Could elaborate on this? I don't see anything about requesting a health state in the above proposal.

@TristonianJones
Copy link
Author

Sorry, I don't follow. For #925, resumability is also server-determined. And servers may close the connection immediately, if desired.

Are you referring to the fact that clients are also allowed to close the connection while waiting for a result?

Not sure what you are referring to here. Are you referring to the resumableRequests client capability?

Yes, I was referring to the requirement that the client advertise support for resumable operations. The nextResumeToken from server to client also allows the client or server to disconnect after the first response and resume the operation later.

Could elaborate on this? I don't see anything about requesting a health state in the above proposal.

The health state is synonymous with the JSON-RPC state, so there's no need for a separate status check, just a re-issuing of the previous call with the resumeToken in order to ascertain whether additional work has been completed or the call has potentially failed.

@jonathanhefner
Copy link
Member

Yes, I was referring to the requirement that the client advertise support for resumable operations. The nextResumeToken from server to client also allows the client or server to disconnect after the first response and resume the operation later.

To clarify, #925 adds the resumableRequests client capability so that a server knows whether it may send a notifications/requests/resumePolicy notification. It is still the server's choice (i.e. server-determined). After the notification is sent, either server or the client may disconnect.

How does your proposal handle clients that don't support resumability?

Also, what does the resume process look like if a disconnection occurs before the first response?

The health state is synonymous with the JSON-RPC state, so there's no need for a separate status check, just a re-issuing of the previous call with the resumeToken in order to ascertain whether additional work has been completed or the call has potentially failed.

Assuming no additional work has been completed yet, how can a client verify that a task is still ongoing?

@TristonianJones
Copy link
Author

How does your proposal handle clients that don't support resumability?

A further question is would the tool be exposed to a client that can't resume requests? Does a tool have value even if it can't be resumed? Does a resumable tool have value if it's not resumed?

Often resumable tools still have value, even if only the first call / result completes. Resuming improves the experience, a lack of resuming degrades it, but it's still useful.

Also, what does the resume process look like if a disconnection occurs before the first response?

This looks similar regardless of the proposal. A dropped client connection on send means that it's uncertain the server received the request. In #925 this drop could be either a failure to send the request or to receive a notification. Generally, clients handle intermittent failures with retries.

Assuming no additional work has been completed yet, how can a client verify that a task is still ongoing?

Resend the request with the last received resumeToken. Or hold open a progress notification channel. Either works fine.

@jonathanhefner
Copy link
Member

A further question is would the tool be exposed to a client that can't resume requests? Does a tool have value even if it can't be resumed? Does a resumable tool have value if it's not resumed?

Often resumable tools still have value, even if only the first call / result completes. Resuming improves the experience, a lack of resuming degrades it, but it's still useful.

In #925, yes, the tool is always exposed to the client. If the client does not support resumable requests, the server will treat the tool call normally (i.e. not send a notifications/requests/resumePolicy notification, but send all other messages normally).

In this proposal, are you saying that the tool is always exposed to client, but if the client does not support resumable requests, it will only get the first chunk of a result without knowing the result is incomplete?

Also, what does the resume process look like if a disconnection occurs before the first response?

This looks similar regardless of the proposal. A dropped client connection on send means that it's uncertain the server received the request. In #925 this drop could be either a failure to send the request or to receive a notification. Generally, clients handle intermittent failures with retries.

In #925, the server sends a notifications/requests/resumePolicy notification immediately, so the client has the resumeToken as soon as possible. There are still cases where the client must retry, but once the client receives the notifications/requests/resumePolicy notification, it can resume instead of retry. This also allows the server to disconnect immediately after sending the notifications/requests/resumePolicy notification.

Assuming no additional work has been completed yet, how can a client verify that a task is still ongoing?

Resend the request with the last received resumeToken. Or hold open a progress notification channel. Either works fine.

If the first result has not been returned, and thus there is no resumeToken, how can the client verify that the task is still ongoing?

What did you mean by "hold open a progress notification channel"?

This proposed specification change introduces a backwards compatible,
transport-agnostic, server-managed mechanism for resuming long-
running operations over multiple requests between client and server.

The technique is simple and requires no additional communication or
notifications beyond standard tool-calling. By enabling a server to
provide an opaque token to a client in order to continue receiving
requests, this technique helps better support MCP within distributed
systems and provides agents the option to decide whether resuming
a long-running operation aligns with its objectives
@TristonianJones
Copy link
Author

TristonianJones commented Jul 23, 2025

In this proposal, are you saying that the tool is always exposed to client, but if the client does not support resumable requests, it will only get the first chunk of a result without knowing the result is incomplete?

If the caller is not aware of the final state, they may need to resort to other methods to determine whether forward progress is possible. Sometimes this means calling another tool or set of tools to inspect state, introducing a custom set of behaviors for a specific tool which are functional but not standard, or living with the degraded experience.

For example, Kubernetes API calls are eventually consistent. The server ACKs the modification right away; however, rollout takes time. If you want to model this as a resumable call, the call itself could say "yes, I've accepted the request and here's the initial state" and subsequent calls could say, "yes, the requested stated is still rolling out", etc. etc. Most eventually consistent APIs will follow a pattern of comparing a desired state with an actual one to indicate progress; however, as APIs, the information is split across two methods which appear unrelated. Modeling these methods as a single 'resumable' call simplifies the agent's task. Let's look again at K8s though because the conventional wisdom is that if a rollout is 'stuck' it could just be retried. The concept of 'stuck' is often a matter of how much time has passed. Hence, the agent can decide whether to try again, wait, or abandon the task entirely. Queries could likewise paginate results. The first page of results might be enough to make progress even though more records are available. The database could have a bespoke tool for pagination or page offsets, but ideally it wouldn't need one.

The problem isn't knowing whether the operation is complete, so much as it is about standardizing how calls should represent such operations so that an agent can reason about them consistently while assessing whether the progress thus far aligns with its goals.

If the first result has not been returned, and thus there is no resumeToken, how can the client verify that the task is still ongoing?

It's never safe to assume that the operation is running if the connection has been terminated before the client receives a response. This is a pretty classic retry case for most clients of remote systems.

In #925, the server sends a notifications/requests/resumePolicy notification immediately, so the client has the resumeToken as soon as possible. There are still cases where the client must retry, but once the client receives the notifications/requests/resumePolicy notification, it can resume instead of retry. This also allows the server to disconnect immediately after sending the notifications/requests/resumePolicy notification.

This assumes that the resume token isn't specific to the operation being performed. In many cases different calls will have different tokens to represent signed server-specific state that a distributed system can use to understand how to proceed when another task in the system is handling the request other than the original one.

@jonathanhefner
Copy link
Member

In this proposal, are you saying that the tool is always exposed to client, but if the client does not support resumable requests, it will only get the first chunk of a result without knowing the result is incomplete?

If the caller is not aware of the final state, they may need to resort to other methods to determine whether forward progress is possible. Sometimes this means calling another tool or set of tools to inspect state, introducing a custom set of behaviors for a specific tool which are functional but not standard, or living with the degraded experience.

That is equivalent to saying that the client must either support resumable requests or support a workaround for resumable requests.

For example, Kubernetes API calls are eventually consistent. The server ACKs the modification right away; however, rollout takes time. If you want to model this as a resumable call, the call itself could say "yes, I've accepted the request and here's the initial state" and subsequent calls could say, "yes, the requested stated is still rolling out", etc. etc. Most eventually consistent APIs will follow a pattern of comparing a desired state with an actual one to indicate progress; however, as APIs, the information is split across two methods which appear unrelated. Modeling these methods as a single 'resumable' call simplifies the agent's task. Let's look again at K8s though because the conventional wisdom is that if a rollout is 'stuck' it could just be retried. The concept of 'stuck' is often a matter of how much time has passed. Hence, the agent can decide whether to try again, wait, or abandon the task entirely.
...
The problem isn't knowing whether the operation is complete, so much as it is about standardizing how calls should represent such operations so that an agent can reason about them consistently while assessing whether the progress thus far aligns with its goals.

I think this conflates the concept of resuming with the concept of advancing a state machine. For MCP, I see resuming as purely client-driven behavior, whereas advancing a state machine would be model-driven behavior.

I think it's reasonable to use a token param for cases like that, but I think it should be part of the particular tool's definition, not part of the protocol. Consider cases where new (generated) input is required in order to advance to the next state.

If the first result has not been returned, and thus there is no resumeToken, how can the client verify that the task is still ongoing?

It's never safe to assume that the operation is running if the connection has been terminated before the client receives a response. This is a pretty classic retry case for most clients of remote systems.

When using the Streamable HTTP transport, if the server starts an SSE stream, then it is safe to assume that the operation has started (even if the server does not send an actual SSE event). Are you saying that the client should just retry in this case?

This assumes that the resume token isn't specific to the operation being performed. In many cases different calls will have different tokens to represent signed server-specific state that a distributed system can use to understand how to proceed when another task in the system is handling the request other than the original one.

I think this goes back to my point about resuming vs advancing a state machine.

@TristonianJones
Copy link
Author

That is equivalent to saying that the client must either support resumable requests or support a workaround for resumable requests.

Yes, but this would be consistent for all proposals listed in #982 where, for practical purposes, the connection cannot be held open for the duration required to complete the task.

I think this conflates the concept of resuming with the concept of advancing a state machine. For MCP, I see resuming as purely client-driven behavior, whereas advancing a state machine would be model-driven behavior.

I think it's reasonable to use a token param for cases like that, but I think it should be part of the particular tool's definition, not part of the protocol. Consider cases where new (generated) input is required in order to advance to the next state

I'm open to moving the notion of a resumeToken into the tool params, and have updated the proposal to reflect this. There are some minor tradeoffs, one being that the use of a resumeToken and nextResumeToken would shift to being a best practice rather than a core tenant of the standard. Over time best practices can become de facto standards, and documentation in the protocol itself to recommend consistency in naming would be quite useful to this effect.

When using the Streamable HTTP transport, if the server starts an SSE stream, then it is safe to assume that the operation has started (even if the server does not send an actual SSE event). Are you saying that the client should just retry in this case?

Regardless of transport, if a client request may not have reached the server, it should be retried. Resume tokens are not a replacement for Last-Event-Id or the session management supported by MCP even though resume tokens are robust to many of those failure domains and so you get some nice redundancy which can further strengthen the reliability posture of the overall spec.

@MathiasChristiansen
Copy link

MathiasChristiansen commented Jul 25, 2025

One argument for using resume tokens would be persistence across disconnects. There was a discussion in the Go SDK regarding support for persisted resumability Session recovery / shared session storage for distributed MCP env
The idea is that if the MCP is behind a load balancer in a distributed system, we could more reliably persist and resume progress. Currently, at least in the Go SDK, session is stored in memory.
Introducing an abstraction layer or the ability to create drivers that would allow MCP servers to store resume tokens in memory, on disk, or in databases could enable more stateless MCP servers, leading to more reliable systems.

@MathiasChristiansen
Copy link

MathiasChristiansen commented Jul 25, 2025

Another issue would be wanting to use HTTP POST for transport over a load balancer.

sequenceDiagram
    participant Client
    participant LoadBalancer
    participant ServerA
    participant ServerB

    Note over Client: Initial tool call
    Client->>LoadBalancer: HTTP POST /tool_call (no resumeToken)
    LoadBalancer->>ServerA: Forward request
    ServerA->>ServerA: Start operation<br/>Store state in memory
    ServerA-->>LoadBalancer: Response with nextResumeToken = "abc123"
    LoadBalancer-->>Client: Response with nextResumeToken = "abc123"

    Note over Client: Second tool call (attempt to resume)
    Client->>LoadBalancer: HTTP POST /tool_call (resumeToken = "abc123")
    LoadBalancer->>ServerB: Forward request
    ServerB->>ServerB: No knowledge of resumeToken<br/>State not found
    ServerB-->>LoadBalancer: Error or re-initiate operation
    LoadBalancer-->>Client: Error / inconsistent result

Loading

With persisted session:

sequenceDiagram
    participant Client
    participant LoadBalancer
    participant ServerA
    participant ServerB
    participant SharedStore

    Note over Client: Initial tool call
    Client->>LoadBalancer: HTTP POST /tool_call (no resumeToken)
    LoadBalancer->>ServerA: Forward request
    ServerA->>ServerA: Start operation
    ServerA->>SharedStore: Save state with resumeToken "abc123"
    ServerA-->>LoadBalancer: Response with nextResumeToken = "abc123"
    LoadBalancer-->>Client: Response with nextResumeToken = "abc123"

    Note over Client: Later resume attempt

    Note over Client: Second tool call (attempt to resume)
    Client->>LoadBalancer: HTTP POST /tool_call (resumeToken = "abc123")
    LoadBalancer->>ServerB: Forward request
    ServerB->>SharedStore: Fetch state using resumeToken "abc123"
    SharedStore-->>ServerB: Return saved state
    ServerB->>ServerB: Resume operation from saved state
    ServerB-->>LoadBalancer: Response (maybe with new nextResumeToken)
    LoadBalancer-->>Client: Resumed response

Loading

@TristonianJones
Copy link
Author

TristonianJones commented Jul 25, 2025

Completely agree on both counts.

Load-balanced traffic and resuming through opaque context passed from server to client in the simplest manner possible was a key consideration. You articulated it very well @MathiasChristiansen

The tokens are very tolerant of disconnections. There are startup edge cases that can be addressed through retry, but once some signal of state is communicated from server to client the task should be immune to transient network issues.

@jonathanhefner
Copy link
Member

jonathanhefner commented Jul 25, 2025

That is equivalent to saying that the client must either support resumable requests or support a workaround for resumable requests.

Yes, but this would be consistent for all proposals listed in #982 where, for practical purposes, the connection cannot be held open for the duration required to complete the task.

Speaking for #925, if the client does not support resumable requests, then the server will behave exactly as it does now. Which is to say, there are three possible outcomes:

  1. The request completes without disconnection.
  2. A disconnection occurs, but the client received an ID for Last-Event-Id, so it is able to get the result.
  3. A disconnection occurs, and the client is not able to resume, so it does not get the result.

Those three outcomes are different from the client receiving only the first page of a result and not being aware of that fact.

I think this conflates the concept of resuming with the concept of advancing a state machine. For MCP, I see resuming as purely client-driven behavior, whereas advancing a state machine would be model-driven behavior.
I think it's reasonable to use a token param for cases like that, but I think it should be part of the particular tool's definition, not part of the protocol. Consider cases where new (generated) input is required in order to advance to the next state

I'm open to moving the notion of a resumeToken into the tool params, and have updated the proposal to reflect this. There are some minor tradeoffs, one being that the use of a resumeToken and nextResumeToken would shift to being a best practice rather than a core tenant of the standard. Over time best practices can become de facto standards, and documentation in the protocol itself to recommend consistency in naming would be quite useful to this effect.

I would like to focus more on this point.

I'm not opposed to adding an official mechanism to support state machines (if it is superior to simple tool use), but I don't think we should call it "resume".

I also think it would be useful to expand support to additional tools and inputs. As a rough sketch, instead of nextResumeToken, what if tools could return a next object that specifies which tool to call next plus some arguments? The next object could specify the same tool plus a cursor, similar to your proposal:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    /* ... */
    "next": {
      "name": "step_1_tool",
      "arguments": {
        "cursor": "adef50"
      }
    }
  },
}

But it could also specify a different tool and override values for the arguments:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    /* ... */
    "next": {
      "name": "step_2_tool",
      "arguments": {
        "some_arg": "state generated by step_1_tool"
      }
    }
  },
}

And the client could generate values for non-overidden arguments using the model.

@TristonianJones
Copy link
Author

TristonianJones commented Jul 25, 2025

Speaking for #925, if the client does not support resumable requests, then the server will behave exactly as it does now. Which is to say, there are three possible outcomes:

The situation where the server "behaves as normal" is simply not practical or feasible in many cases which is what this proposal aims to address. If a tool can behave as normal, it shouldn't use a resume token scheme; however, if disconnection is a necessity for practical purposes (load balancing, duration of task, quantity of data), then you'll need to provide some opaque token back to the caller. If it's not clear work has started (no first response), then the caller should retry.

Those three outcomes are different from the client receiving only the first page of a result and not being aware of that fact.

I mean, this happens now and the initial response should provide a reasonable amount of information to the caller which can be used for further processing, even if that response is just 'work started'. At least with this proposal there's a chance to become aware in a standardized and easy to adopt manner.

I also think it would be useful to expand support to additional tools and inputs. As a rough sketch, instead of nextResumeToken, what if tools could return a next object that specifies which tool to call next plus some arguments? The next object could specify the same tool plus a cursor, similar to your proposal:

I don't think servers should get involved in an agent's task orchestration as it implies they know more than the callers.

However, if you wanted to pack opaque state which the server can interpret as another tool call local to itself, that's something it can do without further client involvement or awareness. If agents are communicating with each other incremental progress, the MCP tool may represent a goal and the opaque state represent the next thing the receiving agent should do if the calling agent requests the goal be resumed. The resumeToken works for state transitions and state machines, but this is purely an implementation detail of whether a specific agent-behind-a-tool has completed its goal.

@MathiasChristiansen
Copy link

MathiasChristiansen commented Jul 26, 2025

I agree with @TristonianJones , MCP is a protocol, so it shouldn't dictate the functionality of the client and server, or exactly how the agent should behave. Resume tokens would provide a way for client and server to help recover, either if something disconnects or if transport happens over a more distributed system, e.g Load Balanced MCP servers.

The client and server can then handle how they recover state themselves. People might want to create frameworks ontop of the official sdk's that would provide more granular state recovery.

@davemssavage
Copy link

I agree with @TristonianJones , MCP is a protocol, so it shouldn't dictate the functionality of the client and server, or exactly how the agent should behave. Resume tokens would provide a way for client and server to help recover, either if something disconnects or if transport happens over a more distributed system, e.g Load Balanced MCP servers.

The client and server can then handle how they recover state themselves. People might want to create frameworks ontop of the official sdk's that would provide more granular state recovery.

FWIW I agree with this, however I've been doing some further experiments and I think all the primitives that are needed are already built into the protocol for resume and async: modelcontextprotocol/python-sdk#1209 demonstrates how by externalising the request state logic (e.g. managing request ids and resume tokens associated with a request) the existing python client can then call a tool and later join that call to retrieve the result.

I have tested this with a redis backed request state manager where the client is able to issue requests, attempt to retrieve the result via join, if this returns quickly then the client proceeds as normal, if it doesn't it parks the call and gets on with other tasks, it can then rejoin later (even over process restarts) and attempt to get any missed notifications or the result if it is complete.

I think there may be at least three separate use cases being addressed by these various proposals:

  1. Async or non-blocking tool calls - protocol is inherently non-blocking as a series of messages, however this is not currently supported in any sdk
  2. Resumption of tool calls over client disconnects (already built into streamable http transport though fiddly to implement)
  3. Pagination of long running tool call results - no current support in protocol or sdk

For my use case this solves the problem of 1 & 2 and makes it simpler for the client to manage resumes, it does not address 3 for which resumeToken logic would still be required.

@PederHP
Copy link
Member

PederHP commented Aug 9, 2025

As mentioned on Discord, I think for this to work, the result should explictly not be included. Consider:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "the_slowest_db_query",
    "arguments": {
      "query_arg": "text"
    }
  }
}

Which then gets this response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "query in progress"
      }
    ],
    "isError": false
    "nextResumeToken": "adef50"
  }
}

If this tool response is returned to the LLM, it will not be able to see the token (the client host has to manage that). And there is no mechanism by which it can repeat the tool call. If the client host repeats the tool call until it returns without a token:

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/call",
  "params": {
    "name": "the_slowest_db_query",
    "arguments": {
      "query_arg": "text"
    },
    "resumeToken": "adef50"
  }
}

There are two problems:

  1. The client host can send different params for the tool call. We could just consider this an error, but it's really a redundant part of the payload if it must always match the original.

  2. When the call completes, how is this sent to the model? If we've already returned the original tool call - which would be the assumption given it had a content field, then the model isn't waiting for a tool call. And most model APIs will fail if there isn't a 1-to-1 between tool_use and tool_result pairs (by id).

So the only way this can work is if we do not return to the model until the final tool result happens. And if that's the case, then we should not include that part of the payload.

This does add some complexity to implementation in the SDK, as the tool call response needs to be checked for a resumeToken, and if it is present, the rest of the body ignored. There also needs to be a new callback mechanism to the client host so it is aware that the tool call is pending with a resumeToken. And it needs to configurable how the subsequent calls happen, to check for eventual completion.

In the diagram above the second tool call is shown, but there is a difference to the first - it will be initiated by the SDK or client host, not the model. This is a significant difference and creates a lot of complexity.

My apologies if I have misunderstood parts of the proposal, but there seems to be a slight disconnect with how tool calls wire up to the LLM APIs, that make this quite complex to orchestrate on the client host side. I still think it's a valid proposal, as client host complexity is considered acceptable if it makes server author complexity go down. But I think some thought needs to go into how SDKs are supposed to configure this retry/resume logic, and I also think it should be explicitly stated that the tool payload fields MUST NOT be present in a resume token containing message. (whether call or result)

@TristonianJones
Copy link
Author

Hi @PederHP

My apologies if I have misunderstood parts of the proposal, but there seems to be a slight disconnect with how tool calls wire up to the LLM APIs, that make this quite complex to orchestrate on the client host side. I still think it's a valid proposal, as client host complexity is considered acceptable if it makes server author complexity go down. But I think some thought needs to go into how SDKs

My thought was that MCP clients would likely hand off the control flow to agent code (perhaps codified into A2A) which would continue to issue requests (or not) based on presence of the resume token. I wouldn't expect the LLM to be involved in tool call param updates in the resume flow. There could be cases where you could indicate a preference for completion prior to agent analysis of the incremental results, but that feels like something best configured in LangChain/LangGraph or similar. Does that make sense?

I view this proposal as a way for MCP to enable control flows which are also often necessary for server load-balancing or large result analysis or both in the case of paginated results. It also happens to have some other nice properties which make long-running calls more reliable/resilient

@PederHP
Copy link
Member

PederHP commented Aug 9, 2025

Hi @PederHP

My thought was that MCP clients would likely hand off the control flow to agent code (perhaps codified into A2A) which would continue to issue requests (or not) based on presence of the resume token. I wouldn't expect the LLM to be involved in tool call param updates in the resume flow. There could be cases where you could indicate a preference for completion prior to agent analysis of the incremental results, but that feels like something best configured in LangChain/LangGraph or similar. Does that make sense?

I view this proposal as a way for MCP to enable control flows which are also often necessary for server load-balancing or large result analysis or both in the case of paginated results. It also happens to have some other nice properties which make long-running calls more reliable/resilient

I usually work at the level of MCP wired directly to API calls, sometimes with agentic orchestration,, but never really these larger frameworks like LangChain/LangGraph. I also did some implementation work on the C# SDK. I have written client hosts and servers in Python, TypeScript, and C#. I think it's important that the protocol does not make any assumptions about framework the server and client host authors are using or not using, and the semantics of the core building blocks, of which tools is perhaps the most significant, should be very clear. If I am finding the semantics unclear I think a lot of developers will too when reading the spec (if the change is added as-is)

So let me rephrase my question:

How is a client host that receives a resume token supposed to act? The change is only explicit about:

(clients)
SHOULD treat an absent nextResumeToken as the completion of the operation.

How should clients interpret the presence of a nextResumeToken token?

It essentially gives the client two options: consider this a failed call or switch to polling for the result. I think the spec should be more explicit about the expectations for the client. In the diagrams by @MathiasChristiansen it becomes very clear that the only way to actually get the result once a resume token has been returned is essentially by polling. (even if these can very often be hanging polls, so the client only has to poll once, assuming no disconnects).

I would suggest:

* Adding a capability flag for this on the client side. A client negotiates whether it supports resume (or some other name). If it doesn't the server isn't allowed to send resume token.
* Clients that set this flag SHOULD follow-up a response with a resume token by making calls with it until the server responds with a completion of the operation.

Example:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "capabilities": {
      "resumeToken": {
        "tools": true
      },
      "sampling": {},
      "elicitation": {}
    },
    "clientInfo": {
      "name": "ExampleClient",
      "title": "Example Client Display Name",
      "version": "1.0.0"
    }
  }
}

It allows SDKs to take on the brunt of the mechanical complexity on the client host side. It makes the expectations much clearer and someone who is making a simple client doesn't need to worry about this feature, and the minimalist samples stay minimalist.

Edit: Nah this is too awkward. But I think the SHOULD part is still relevant (even if as a MAY), but maybe add a server capability flag? My point is that client hosts that don't want to handle this shouldn't have to and servers should probably signal the capability.

Ideally a client could during negotiation choose whether it wants resumeTokens or not. This also adds flexibility in terms of using this for control flow in whatever manner the client host wants to.

And if I am server author it makes sense that I can provide both the options between 1) resume tokens and 2) synchronous / stalling (with progress notifications) to clients.

@TristonianJones
Copy link
Author

And if I am server author it makes sense that I can provide both the options between 1) resume tokens and 2) synchronous / stalling (with progress notifications) to clients.

Often providing different behaviors based on client capability is either not possible or practical for a single tool which would benefit from resume tokens. Instead, I think this points to a need for better tool listing where capabilities are expressed as filters. Servers could tag tools for added robustness, but I expect such tags would be used for filtering anyway -- so kind of two sides of the same coin.

It essentially gives the client two options: consider this a failed call or switch to polling for the result. I think the spec should be more explicit about the expectations for the client. In the diagrams by @MathiasChristiansen it becomes very clear that the only way to actually get the result once a resume token has been returned is essentially by polling.

The diagram in the proposal intentionally mirrors cursors. I'm happy to make the diagram clearer.

How should clients interpret the presence of a nextResumeToken token?

It can be treated as "not yet complete", though it's really tool specific. In many cases, eventually consistent operations will always 'ack' the request even though the target state hasn't been reached. So, in that sense the inclusion of a resume token in such an operation would be more accurate even if there's no observed functional change to the client.

@PederHP
Copy link
Member

PederHP commented Aug 10, 2025

How should clients interpret the presence of a nextResumeToken token?

It can be treated as "not yet complete", though it's really tool specific. In many cases, eventually consistent operations will always 'ack' the request even though the target state hasn't been reached. So, in that sense the inclusion of a resume token in such an operation would be more accurate even if there's no observed functional change to the client.

But how is the client supposed to know this? Tools are generally considered, and treated as, "plug and play" primitives. Which in many ways is what led to the exploding popularity of MCP I would claim. ListTools, hand off to the LLM, and everything else is already part of LLM APIs.

This feature is different. LLMs cannot handle it directly. That's not to say that this fact should disqualify it - ToolAnnotations are also not handled at the model level. But I do think it means that the spec should be somewhat clear on how to handle these tokens.

If it's tool specific, then the client (host) author has to either also be the server author, or they have to read the documentation for the tool (server) and write code accordingly. That seems wrong to me. If I am writing an IDE extension or conversation chat client, I should be able to know how to interpret the presence of a resume token without knowing anything about the tool or server.

I know I sound rather negative, but I actually think the problem you're solving is legitimate and I am also not entirely opposed to the solution. I just think there are aspects of how it is integrated into the tool capability which will make it useful only if one is writing both client and server. That's actually the most common scenario for my org these days, so this feature would slot perfectly into most of my daily work. But the non-agentic stuff, the situations where I need to create servers for arbitrary client hosts. They should have guidance on how to interpret the token.

After writing my earlier response, it occurred to me I would actually prefer the token in the notifications/progress message (using the message id to link it to the tool) as that would be entirely unambiguous and it would slot seamlessly into client hosts that are not able to resume (while still enabling those who are to do so). It also works across capabilities - including future primitives. But I then learned about this: #925 - which is along the same lines but using a specific notification.

I think that's a better approach than putting it in the requests. Primarily because LLMs are inherently synchronous. And tools are a model level capability. Adding an asynchronous mechanism directly in the request/response exchange makes life too complicated for implementers I think, and would make this a mostly extension-like facility, which would be a shame because resuming, eventual consistency, better robustness/resilience for long-operations - all the stuff you mention - is something we really need more of for MCP to be more suitable for enterprise and agentic systems.

Thank you for your patience in addressing my questions/concerns. I agree on the need for this, even if I might not be entirely in agreement on all of the particulars.

@TristonianJones
Copy link
Author

TristonianJones commented Aug 10, 2025

Primarily because LLMs are inherently synchronous

Asynchronous support is what this proposal aims to address. There are many tools which cannot run in a synchronous manner, and MCP needs to evolve to enable such calls.

In some cases you can do what you need to accomplish synchronously, and MCP will pretty much work as-is in these cases. Changes could make the resume step / status more ergonomic in the synchronous case, but they don't address the asynchronous need present for agent-to-agent or long-running operations that involve human actions.

With the feature proposed, you could introduce some built in support to clients (though I think client capabilities and tool filtering need an overhaul), but there needs to be a pressure relief valve to enable new design patterns that compose with MCP

Hopefully that makes sense, @PederHP, and I appreciate your comments and questions

@cliffhall
Copy link
Member

cliffhall commented Aug 11, 2025

@TristonianJones I know the SEP states

A reference implementation demonstrating the use of resumeToken and nextResumeToken in a sample tool will be provided once this SEP is accepted.

And that the SEP guidelines state that there doesn't necessarily need to be a reference implementation before being accepted, just before being marked final.

That said, I really think that this SEP would benefit from a reference impl now. Questions like the one @PederHP raised above point to the kind of issues that it would not be great to discover after the proposal is accepted. I built a reference for a similarly complex SEP in a day. The process was helpful to me, in that I could see what the impact to the codebase would be and how easy it would be to code against it.

Considering the fact that there are at least two SEPs jockeying for position here, a reference would go a long way toward illustrating how various scenarios would be handled and make choosing this proposal an easier decision.

@evalstate
Copy link
Member

Primarily because LLMs are inherently synchronous

Not sure I agree with this - the LLM isn't waiting for a Call Tool Result.

There's some overlap with the Transport Working Group discussion on sessions here. It's possible of course for an MCP Server to offer "batch" style tools that provide short synchronous calls to dispatch, list and collect results from long running jobs.

The challenge is that we want the Host application to know when to assemble the next inference based on that batch completing.

If I've read this correctly, the Host application is responsible and maintaining the set of unfulfilled tokens, and polling the CallToolRequest method with valid tokens to determine whether Tool Call(s) are complete?

@TristonianJones
Copy link
Author

TristonianJones commented Aug 13, 2025

If I've read this correctly, the Host application is responsible and maintaining the set of unfulfilled tokens, and polling the CallToolRequest method with valid tokens to determine whether Tool Call(s) are complete?

That's correct, it's a host application decision to continue the call. You could use this in combination with progress notification or bidirectional streaming to improve the freshness of results, but polling works just fine. Often the poll interval is either determined by the host or hinted at by the server in the tool metadata

@PederHP
Copy link
Member

PederHP commented Aug 14, 2025

Primarily because LLMs are inherently synchronous

Not sure I agree with this - the LLM isn't waiting for a Call Tool Result.

Maybe I am misunderstanding you, but the LLM does require a tool call to always be followed by a tool result.

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in Paris?"
      },
      {
        "role": "assistant",
        "content": [
          {"type": "text", "text": "I will check the weather in Paris for you."},
          {
            "type": "tool_use",
            "id": "tool_123",
            "name": "get_weather",
            "input": {"location": "Paris"}
          }
        ]
      },
      { "role": "user", "content": "Never mind, tell me a joke instead" } ]
}'

This fails with:

{"type":"error","error":{"type":"invalid_request_error","message":"messages.2: `tool_use` ids were found without `tool_result` blocks immediately after: tool_123. Each `tool_use` block must have a corresponding `tool_result` block in the next message."}}

Whereas if you do the expected:

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in Paris?"
      },
      {
        "role": "assistant",
        "content": [
          {"type": "text", "text": "I will check the weather in Paris for you."},
          {
            "type": "tool_use",
            "id": "tool_123",
            "name": "get_weather",
            "input": {"location": "Paris"}
          }
        ]
      },
      { "role": "user","content": [ { "type": "tool_result","tool_use_id": "tool_123","content": "Temperature: 22 C, Sunny"} ]} ]
}'

It does work. (Sorry for the janky formatting.)

I am sorry if I was unclear in the point about LLMs and synchronousness - what I am trying to say is that LLM APIs do not allow any messages other than a tool_result after the tool_use. So while the host waits for the tool call to complete it cannot continue on the conversational/agentic thread that issued it. It is blocked until one and only one result for that call can be added.

This is why the semantics of multiple Tool Call results are not great. If it means to just wait for the final one - then that should be more clear from the proposal (ie the payload is considered cumulative). If the payload holds different messages per response (including status messages), than that is not actually compatible with LLM APIs and implementing this capability will be difficult to impossible. Hence why I think the SEP which is notification based is better - it makes it unambiguous when the fool call is to be considered resolved towards the LLM, and it allows for full host flexibility in terms of handling the notifications.

@TristonianJones I hope this also shows why I have trouble with the multiple results. The inherent constraints at the LLM API level make it difficult to know how to implement this, and it's not possible to append subsequent tool_result messages to the a tool_use as that will also generate an error response. This proposal could be changed to be notification based as well, I believe, which would make it compatible with LLM APIs.

@evalstate
Copy link
Member

evalstate commented Aug 14, 2025

@PederHP - the inference engine has quiesced at the point of issuing the stop_reason, and typically doesn't maintain state (perhaps short-term cache for efficiency). The Context Window is re-assembled, and re-sent by the client application when ready (if ever). So there is no synchronous "waiting".

That's why it makes sense if you want to continue the conversation to use Tools with batch semantics, that return immediately with a tracking identifier.

@TristonianJones I don't know if you have a sample implementation? - this is something I'd be quite happy to pick up as a demo in fast-agent to integrate with. Pretty sure this can be done purely with meta tokens and the Host application understanding the intent.

@TristonianJones
Copy link
Author

Thanks @evalstate, I just started looking at a sample implementation, but don't know that I'll have bandwidth until tomorrow. The behavior should be almost exactly as you describe. I think the request was to have the resumeToken in the call parameters and the nextResumeToken in the tool response, but you can achieve the same effect by placing those values in meta

Let me know what you need and/or what bandwidth you have and we'll go from there!

@jdecker76
Copy link

Just chiming in, as I've recently done some work that is loosely coupled to some of these concepts.

In my own fork, I've already added client->server metadata (_meta) per request: evalstate/fast-agent#340

While researching, I came across many discussions on the use of _meta :
jlowin/fastmcp#1315
jlowin/fastmcp#1294
modelcontextprotocol/python-sdk#1231

The ability for bi-directional metadata would allow this to be implemented outside of FastAgent, or as a third party library fairly easily without adding complexity to FastAgent (or internal use of the _meta functionality within FA would greatly reduce the complexity of implementing it within FA)

@PederHP
Copy link
Member

PederHP commented Aug 14, 2025

@PederHP - the inference engine has quiesced at the point of issuing the stop_reason, and typically doesn't maintain state (perhaps short-term cache for efficiency). The Context Window is re-assembled, and re-sent by the client application when ready (if ever). So there is no synchronous "waiting".

I didn't mean the inference engine. There's no stipulation one even has to use the same one. I am observing this from the perspective of the LLM thread, and the fact that dangling tool calls are not allowed. As there seemed to be an expectation among some that this was allowed. That was all.

@TristonianJones
Copy link
Author

@pwwpche passed along this example of the interaction model for resume tokens:
TristonianJones/mcp-python-sdk#1

@pwwpche pwwpche added proposal SEP proposal without a sponsor. SEP labels Aug 22, 2025
@evalstate evalstate self-assigned this Aug 25, 2025
@halter73
Copy link
Contributor

halter73 commented Aug 27, 2025

Another backwards compatibility concern that some MCP proxies might assume that the request with the "resumeToken" entirely new since it looks so similar to a normal request. I'm not aware of anything specific that would break though, and it would probably be fixed in time if the feature became widely used. Another option might be to use an entirely new method name like "operation/resume". It's not clear to me there's much value in repeating the whole request if the server is expected to use the resumeToken to track operation state already.

@TristonianJones
Copy link
Author

Another backwards compatibility concern that some MCP proxies might assume that the request with the "resumeToken" entirely new since it looks so similar to a normal request.

Yes, that will be a possible outcome. If a proxy returns a tool that contains a resume token, I think the expectation should be that the proxy would pass through the call without modifying the content, or perhaps, only modifying very specific fields while preserving the fields it doesn't recognize (a best practice, IMO)

It's not clear to me there's much value in repeating the whole request if the server is expected to use the resumeToken to track operation state already.

I'm flexible on this one, but some applications create hmac signatures over requests and include the signature in the resume token to validate that the context provided in the resumed request corresponds to a prior request. I believe the concern is related to token-hijacking. You could pack the entire request into the resume token, but that would be potentially expensive as well.

@TristonianJones
Copy link
Author

See #1391 for a continued discussion around long-running operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proposal SEP proposal without a sponsor. SEP

Projects

None yet

Development

Successfully merging this pull request may close these issues.