Skip to content

Conversation

@matoushavlena
Copy link

This PR introduces support for partial results, primarily intended to enable streaming responses from long-running tools and agent interactions. Returning partial output improves responsiveness, interactivity, and user experience. The feature was identified as a key improvement in #111, #117, and #484.

Motivation and Context

Partial result streaming improves responsiveness and flexibility for both long-running tools and MCP-based agents, enabling:

  • Better UX – Clients can start processing or displaying output immediately (e.g. chatbots, code tools).
  • Concurrent processing – Downstream agents/tools can act on partial results as they arrive.
  • Interruptibility – Partial delivery allows early cancellation, saving compute on long responses.
  • Support for real-time applications – Such as live summarization, transcription, or chat.

We are proposing this change because our own use case involves using MCP for agent communication, where partial results are essential for enabling responsive, real-time, multi-agent workflows.

Related Work

Related Issues

How Has This Been Tested?

Yes. Tool output streaming tested in real applications using the modified Python SDK.

Breaking Changes

None. Clients must explicitly opt in via allowPartial. Servers are free to ignore this flag.

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional Context

This feature is designed to be minimal, backward-compatible, and avoid unnecessary data duplication.

This implementation raises two open design questions:

  1. Structured data merging
    While unstructured content (e.g. content in CallToolResult) can be concatenated incrementally, merging partial structured data (objects, arrays, etc.) is not currently defined. This PR deliberately avoids specifying a merging or delta application algorithm, leaving it open for future discussion if the protocol chooses to express an opinion on it.

  2. Use of id for multiple responses
    This implementation leverages a flexibility in JSON-RPC to send multiple responses with the same id. While practical and effective, it stretches the spec's intent, which assumes a one-to-one mapping between request and response. Using id in this way introduces a form of implicit multiplexing, which may conflict with traditional request–response matching logic in some JSON-RPC clients.

Example

Below is an example of a request with allowPartial set to true, followed by multiple partial responses:

// Request
{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/call",
  "params": {
    "name": "generate_story",
    "arguments": {
      "topic": "fox"
    },
    "_meta": {
      "allowPartial": true  // Client allows partial (incremental) results
    }
  }
}

// First partial response (intermediate chunk)
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "Once upon a time, in a land far away,"
      }
    ],
    "_meta": {
      "hasMore": true  // Indicates more chunks are coming
    }
  }
}

// Final response (completes the result)
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": " there lived a small and curious fox."
      }
    ],
    // _meta.hasMore omitted — this is the final chunk (alternatively, set hasMore: false explicitly)
  }
}

Notes for Reviewers

  • allowPartial is a client hint — servers are not required to support or act on it.
  • hasMore: true appears only in intermediate responses; the final response omits it.
  • Final responses are structurally identical to standard JSON-RPC responses, ensuring full backward compatibility.
  • Clients that do not support streaming will simply treat the last message as the complete result.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
Co-authored-by: Matous Havlena <havlenma@gmail.com>
@pilartomas pilartomas force-pushed the feat/partial-results branch from 386692d to 5e81901 Compare June 18, 2025 08:09
@jonathanhefner
Copy link
Member

One limitation discussed in this comment (which we independently arrived at as well) is that notifications may be ignored by clients, requiring a full result to be sent at the end regardless.

I've seen this repeated, but I am still a bit unclear. JSON-RPC notifications are simply messages without an id. They are sent without expectation of a reply, but why would they be ignored?

Though I agree we should include the original request id somewhere in the partial result so that clients can associate the result with the request.

Clients must explicitly opt in via allowPartial. Servers are free to ignore this flag.

When would clients set this flag versus not? It seems like this might be more suitable as a client capability rather than a per-request setting.

While unstructured content (e.g. content in CallToolResult) can be concatenated incrementally, merging partial structured data (objects, arrays, etc.) is not currently defined. This PR deliberately avoids specifying a merging or delta application algorithm, leaving it open for future discussion if the protocol chooses to express an opinion on it.

I think it would be good to specify this behavior up front because it would allow a better API for tool authors. Instead of a tool having multiple code paths based on an allowPartial flag, a tool could always emit partial results via an SDK API, and the SDK could transparently merge the partial results if the client has not set the flag.

One approach would be to include something like _meta.mergeStrategy in each partial result. Supported values could be "append" and "replace", with the possibility to define more in the future. ("append" could apply to both arrays and objects, modeled like JavaScript's spread operator, e.g. [...result1, ...result2] and { ...result1, ...result2 }. Though we could also define a separate enum setting for objects if that is easier to understand.)

Alternatively, we could just say that partial results are not (currently) supported for structured content.

This implementation leverages a flexibility in JSON-RPC to send multiple responses with the same id.

To be clear, this "flexibility" is not officially supported by the JSON-RPC spec, correct? Are there examples in the wild where this flexibility is being leveraged?

  • Final responses are structurally identical to standard JSON-RPC responses, ensuring full backward compatibility.
  • Clients that do not support streaming will simply treat the last message as the complete result.

The final response will be structurally identical, but the data will be incomplete, correct? In which case, I don't think the above is quite accurate. But it also shouldn't be a problem since partial results would be opt-in.

@pilartomas
Copy link
Contributor

pilartomas commented Jun 19, 2025

I think it would be good to specify this behavior up front because it would allow a better API for tool authors. ...

The idea was to let the output schema define arbitrary strategy.

However, I now see that may be unnecessary. Having a merging strategy (or pre-defined set of strategies) for structured data would allow SDKs to handle this logic. 👍

@matoushavlena
Copy link
Author

matoushavlena commented Jun 19, 2025

I've seen this repeated, but I am still a bit unclear. JSON-RPC notifications are simply messages without an id. They are sent without expectation of a reply, but why would they be ignored?

Valid question. I think the perception lives in that RPC clients would treat notifications as "additional" information, not a "response" to a "request". But maybe all this is not relevant, JSON-RPC spec is very open in how it can be interpreted. I think this PR is meant to propose an alternative approach. And it's ok if this discussion steers us towards notifications.

Though I agree we should include the original request id somewhere in the partial result so that clients can associate the result with the request.

Wouldn't the progressToken be sufficient? I know not the most elegant solution.

When would clients set this flag versus not? It seems like this might be more suitable as a client capability rather than a per-request setting.

This gives the client more control, for example if the host app is going from foreground to background. Or parts of the app are user facing tasks and part are background tasks.

To be clear, this "flexibility" is not officially supported by the JSON-RPC spec, correct? Are there examples in the wild where this flexibility is being leveraged?

Correct, but also not forbidden. I don't think there are any examples in the wild. All real examples I could found are eventually using notifications or more complex approaches (not suitable for mcp imho).

Edited: A2A protocol is one of the examples: https://a2a-protocol.org/latest/specification/#93-streaming-task-execution-sse

@jonathanhefner
Copy link
Member

Wouldn't the progressToken be sufficient? I know not the most elegant solution.

In my opinion, progress notifications and partial results are orthogonal features. Also, progress notifications are optional, so the client may not specify a progressToken.

This gives the client more control, for example if the host app is going from foreground to background. Or parts of the app are user facing tasks and part are background tasks.

But would the client know whether it will go from foreground to background before sending the tool call? Or, if it's already in the background, would it know whether it will be brought to foreground in the middle of the tool call?

It seems like it would be better (and simpler) for the client to always accept partial results, and silently merge them if it's running in the background.

There's also an argument to be made for partial results being non-optional (though contingent upon negotiated protocol version), and always handling / merging them client-side instead of server-side. If a server wants to send a large number of partial results without buffering them all in memory, then, arguably, a client shouldn't be able to prevent that.

@siwachabhi
Copy link
Contributor

siwachabhi commented Jun 20, 2025

+1 to proposal here after implementing more such servers, I am coming around to feedback here: #383, its better to support this as by letting tool result be partial.

Some examples and sample implementation will really help! Another thing to consider is how will structured content look like, and when there is a delta update, which index of the unstructured content is it updating.

@ibuildthecloud
Copy link

How would a response with two content blocks work? If my tool result was

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "content 1."
      },
      {
        "type": "text",
        "text": "content 2."
      }
    ]
  }
}

If I send

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "content 2."
      }
    ]
  }
}

How does the client know to append to the previous content or create a new one element. Other streaming protocols do this with an index variable or an id in the content object. For example (use index)

First

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "index": 0,
        "type": "text",
        "text": "con"
      }
    ]
  }
}

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "index": 0,
        "type": "text",
        "text": "tent 1."
      }
    ]
  }
}

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "index": 1,
        "type": "text",
        "text": "cont"
      }
    ]
  }
}

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "index": 1,
        "type": "text",
        "text": "ent 2."
      }
    ]
  }
}

@jonathanhefner
Copy link
Member

jonathanhefner commented Jun 21, 2025

How does the client know to append to the previous content or create a new one element.

From #776 (comment): "One approach would be to include something like _meta.mergeStrategy in each partial result. Supported values could be "append" and "replace", with the possibility to define more in the future. ("append" could apply to both arrays and objects, modeled like JavaScript's spread operator, e.g. [...result1, ...result2] and { ...result1, ...result2 }. Though we could also define a separate enum setting for objects if that is easier to understand.)"

Other streaming protocols do this with an index variable or an id in the content object.

How would index-based replacement represent total replacement? For example, if a tool has emitted two partial results and wants to replace them both with a single final result, how would that work?

Also, how would indexes work with structured content?

I also wonder if index-based replacement would be more challenging for some clients to support in terms of rendering. For example, could a stream of "random" indexes cause jankiness in a UI?

That said, we could support index-based replacement as an option in the _meta.mergeStrategy enum (possibly limited to unstructured content).

@ibuildthecloud
Copy link

A merge strategy would really be too much burden on the client. I've yet to see any need beyond append. I would propose that for streaming a streaming response is only allowed to have one content element and the content element index in the logical final list should be in the _meta field next to "hasMore".

Just for reference, the index approach is what OpenAI responses API uses https://platform.openai.com/docs/api-reference/responses-streaming/response/content_part/added#responses-streaming/response/content_part/added-output_index

@victordibia
Copy link

victordibia commented Jun 25, 2025

Hi @matoushavlena , @siwachabhi , @jonathanhefner , all

Thanks for the discussion and proposal here. I can certainly see how this improves over earlier efforts (e.g., using progress notifications or resource notifications). I work on AutoGen and we have been experimenting with enabling multi-agent workflows on the MCP protocol, so this proposal is great and super relevant! (we'd explored a few implementations based on progress and resource notifications, similar to #383 for streaming updates).

I've taken a first pass at implementing the spec above based on the python SDK, mostly to better understand the pros and cons: The general dev experience looks like:

https://github.com/victordibia/python-sdk/blob/main/examples/fastmcp/streaming_example.py

mcp_776_streaming_1080.mp4

https://github.com/victordibia/python-sdk/blob/main/examples/fastmcp/streaming_example.py

## server
@mcp.tool(description="A tool that streams partial results")
    async def streaming_counter(count: int, delay: float, ctx: Context) -> str: 
        for i in range(1, count + 1): 
            await ctx.stream_partial([TextContent(type="text", text=f"Processing item {i}/{count}")]) 
            await asyncio.sleep(delay)

## client
tool_call_stream = session.stream_tool("streaming_counter", {"count": 5, "delay": 1})
async for partial_result in tool_call_stream:
   # do something .. e.g, update UI 

What is implemented:

  • Added a stream_tool method in the client that handles the logic for adding allowPartial in the request, reading subsequent responses with _meta.hasMore, and yielding results via an async generator
  • Extended call_tool to use stream_tool underneath and return only the final result
  • Updated the type definitions to include allowPartial in request _meta and hasMore in result _meta
  • Added comprehensive tests covering both streaming and non-streaming scenarios
  • Ensured proper resource cleanup and protocol compliance

Some open questions/TODOs based on my implementation:

Edit: See this doc with comments (thanks @matoushavlena ) comparing #776 and #383 approaches

  • Testing behavior in combination with elicitation and sampling workflows (with UI integration)
  • Adding index information to streaming results for ordering information, double check resumability behaviors (what happens on client disconnect etc)
  • Potential tool annotation to advertise that the tool supports streaming?
  • Exploring optional linking of streams to resources to enable task state polling (related to fix: add status field in Resource class and Resource as a return type in CallToolResult #549) e.g., if a resource content is returned in the stream, it can be polled to fetch task state (a bit messy)

I'll report back here as I make progress on this and would be excited to help push this forward.
I'd be happy to also to join future conversations on similar agentic related capabilities in MCP.

@pilartomas
Copy link
Contributor

pilartomas commented Jun 26, 2025

How does the client know to append to the previous content or create a new one element.

In the original proposal, the content list shall always be extended by the content list arriving in the next partial result, incrementally building the full list. The content blocks (items) are never modified.

Two following two partial results:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "content 1."
      }
    ],
    "_meta": {
      "hasMore": true
     }
  }
}

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "content 2."
      }
    ],
    "_meta": {
      "hasMore": false
     }
  }
}

Would become the following result:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "content 1."
      },
      {
        "type": "text",
        "text": "content 2."
      }
    ]
  }
}

We considered index-based replacement unnecessary due to specific semantics and structure of the content field. The index-based replacement is certainly more useful when working with structured content and general use-cases.

@darrelmiller
Copy link

it stretches the spec's intent, which assumes a one-to-one mapping between request and response

I would argue that this design does more than stretches the JSON RPC spec, I think it breaks it.

When a rpc call is made, the Server MUST reply with a Response, except for in the case of Notifications. The Response is expressed as a single JSON Object,

https://www.jsonrpc.org/specification#response_object

It is hard to tell what the consequences of this spec violation are, and I don't know whether it would be better to just create a new derived protocol based on JSON-RPC or just suck it up and say "it's sorta JSON-RPC".

@mariusdv
Copy link

mariusdv commented Jun 26, 2025

I would argue that this design does more than stretches the JSON RPC spec, I think it breaks it.

I agree with @darrelmiller above

I absolutely believe in the need for status updates and response streaming, but believe that something based on #383 is a cleaner implementation.

Clients that do not support streaming will simply treat the last message as the complete result.

To me, this is a signal to follow the JSON-RPC notification pattern; where notifications can be used for result chunking; but to ensure the result message contains the full result.

This prevents clients that dont have the capability or need to have streaming from seeing/dealing with these streams without data loss and for the SDKs to not be required to collect and aggregate response streams.
It also enables streaming the pattern of notification and progress messages, in addition to chunked results.

@matoushavlena
Copy link
Author

There's also an argument to be made for partial results being non-optional (though contingent upon negotiated protocol version), and always handling / merging them client-side instead of server-side. If a server wants to send a large number of partial results without buffering them all in memory, then, arguably, a client shouldn't be able to prevent that.

@jonathanhefner I think you want to allow thin clients (ideally like curl). If it's up to the server, then clients need to have more logic to be able to consume all possible responses from the server.

@matoushavlena
Copy link
Author

but to ensure the result message contains the full result

I think it's impractical to require the server to accumulate the partials for the response object. It's not scalable, especially for content-heavy long-running tasks. This is also aligned with the LSP spec:

The final response has to be empty in terms of result values. This avoids confusion about how the final result should be interpreted, e.g. as another partial result or as a replacing result.

@matoushavlena
Copy link
Author

It is hard to tell what the consequences of this spec violation are, and I don't know whether it would be better to just create a new derived protocol based on JSON-RPC or just suck it up and say "it's sorta JSON-RPC".

@darrelmiller I don't want to derail the original topic, but since you brought it up and this is likely one of the few real stretches of JSON-RPC in MCP, I will share my take.

JSON-RPC was great when MCP servers were mostly run locally with stdio transport. But with the shift to remote servers and possible expansion to some of the agentic use cases (long-running processes with state sharing), JSON-RPC starts to show its limitations.

I think a RESTful approach using SSE for streaming and async result pickup patterns would deliver a better ease of integration and unblock MCP to do more with less hassle. The REST ecosystem has already solved many of these problems, and MCP could benefit from building on those foundations rather than retrofitting them into JSON-RPC.

@madevoge
Copy link

@matoushavlena

I think it's impractical to require the server to accumulate the partials for the response object. It's not scalable, especially for content-heavy long-running tasks. This is also aligned with the LSP spec:

It makes sense to not want to emit the same data twice.
I am concerned that this proposal moves partial streaming from an optional addition to the spec to a requirement for all clients to reason over and understand, including the risk of out of order message delivery without final reconciliation.

To enable durability, servers can follow the SSE standard to ensure they can persist and redeliver messages
https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#resumability-and-redelivery
Which implies the persisting of previously emitted messages, which feels especially applicable for long running tasks, where a client may disconnect for some time.

@pilartomas
Copy link
Contributor

Reflecting back on my previous comment

We considered index-based replacement unnecessary due to specific semantics and structure of the content field. The index-based replacement is certainly more useful when working with structured content and general use-cases.

We realised a limitation of the proposed simple block concatenation merging strategy. The client may not be able to clearly identify content boundaries, what blocks belong together. This would make block "compression" problematic and certain operations, like saving content into a file, hard to do.

Index-based replacement would address this issue, I believe, as the server would not create additional blocks for the same content.

@ibuildthecloud @jonathanhefner @victordibia @matoushavlena

@darrelmiller
Copy link

@matoushavlena I largely agree. JSON-RPC provides an envelope format for transports that don't have them. HTTP already has URIs, headers, status codes, so JSON-RPC syntax becomes redundant waste. Putting the JSON-RPC parameters/result in the HTTP body and leveraging the URL for the method is a more natural way to support both HTTP and stdio.

With a mapping between JSON-RPC and HTTP equivalents, it would be possible to do chunked encoding of the result object without touching any of the other parts of the protocol.

Here's a pure thought experiment, off the top of my head, that tries to demonstrate what I mean.

POST /tool/call/get_weather HTTP/1.1
host: https://mcp.example.org
Content-Type: application/json
Content-Length: 31
json-rpc-id: 2

{ 
  "location": "New York"
}

200 OK
Content-Type: text/event-stream
Transfer-encoding: chunked
json-rpc-id: 2

52
data: { "content": [{"type": "text", "text": "Current weather in New York"}] }
31
data: { "content": [{"type": "text", "text": "\nTemperature 72F"}]}
44
data: { "content": [{ "type": "text", "text": "\nConditions: Partly cloudy"}] }
0

...which is the HTTP equivalent of the following single JSON-RPC response

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "Current weather in New York:"
      },
      {
        "type": "text",
        "text": "\nTemperature: 72°F"
      },
      {
        "type": "text",
        "text": "\nConditions: Partly cloudy"
      }
    ],
    "isError": false
  }
}

@mikekistler
Copy link
Contributor

@darrelmiller I like this idea, but there's something troubling me. In the current protocol:

The server MAY send JSON-RPC requests and notifications before sending the JSON-RPC response.

And I think a request (e.g. elicitation) would have its own request id -- not the same as the original request. How would something like this be accomplished? Could the request id be moved into the body of the response rather than set in a header?

@darrelmiller
Copy link

@mikekistler Yeah, you are correct. The fact that MCP attempts to be a bidirectional protocol that interleaves S->C requests within C->S requests, makes the approach I suggested problematic. If we start putting request Ids in the events then we are going to quickly revert back to tunnelling JSON-RPC over event-streams.
All this feels like re-inventing HTTP/2/3 frames and streams.

@mikekistler
Copy link
Contributor

I wonder if creating a separate notification type for partial results could help here. This would avoid sending multiple responses for a request (with a partial_result boolean) and overloading the progress notifications with partial results where they might be ignored. We could make partial results work similar to progress notifications in that they'd be enabled only when requested on tool calls -- maybe with a resultToken (analogous to progressToken). So if a tool call has partial results enabled, it could stream results back, and otherwise would accumulate the response and provide it in the response message. The response message could be empty if partial results is enabled, avoiding redundant data transfer.

In the event that it's impractical for the server to accumulate the response, it could simply reject the tool call.

@matoushavlena
Copy link
Author

Just wanted to add that the A2A protocol from Google also streams multiple JSON-RPC result responses with the same id over SSE. Check 9.3. Streaming Task Execution (SSE) from their spec. Short example:

data: {
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "kind": "artifact-update",
    "append": true,
    "artifact": { "artifactId": "abc", "parts": [...] }
  }
}

data: {
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "kind": "status-update",
    "status": { "state": "completed" },
    "final": true
  }
}

I'm increasingly convinced that multiple result responses are a better fit than notifications, especially if streaming were ever formally added to JSON-RPC. It seems far more natural to extend the request-response model than to rely on notifications.

@jonathanhefner
Copy link
Member

@mpcm Sorry for the ping, but since you weighed on #414, I was wondering if you could weigh in here also. 🙏

For JSON-RPC, when sending partial results to the client, which of the following would be more appropriate?

  • Sending the partial results as JSON-RPC notifications (including a property that indicates the request ID)
  • Sending the partial results as multiple JSON-RPC responses that all have the same id

Speculatively, if JSON-RPC were to add official support for streaming partial results, which approach do you think it would favor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants