-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: add support for partial results and streaming responses #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add support for partial results and streaming responses #776
Conversation
Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com> Co-authored-by: Matous Havlena <havlenma@gmail.com>
386692d to
5e81901
Compare
I've seen this repeated, but I am still a bit unclear. JSON-RPC notifications are simply messages without an Though I agree we should include the original request
When would clients set this flag versus not? It seems like this might be more suitable as a client capability rather than a per-request setting.
I think it would be good to specify this behavior up front because it would allow a better API for tool authors. Instead of a tool having multiple code paths based on an One approach would be to include something like Alternatively, we could just say that partial results are not (currently) supported for structured content.
To be clear, this "flexibility" is not officially supported by the JSON-RPC spec, correct? Are there examples in the wild where this flexibility is being leveraged?
The final response will be structurally identical, but the data will be incomplete, correct? In which case, I don't think the above is quite accurate. But it also shouldn't be a problem since partial results would be opt-in. |
The idea was to let the output schema define arbitrary strategy. However, I now see that may be unnecessary. Having a merging strategy (or pre-defined set of strategies) for structured data would allow SDKs to handle this logic. 👍 |
Valid question. I think the perception lives in that RPC clients would treat notifications as "additional" information, not a "response" to a "request". But maybe all this is not relevant, JSON-RPC spec is very open in how it can be interpreted. I think this PR is meant to propose an alternative approach. And it's ok if this discussion steers us towards notifications.
Wouldn't the
This gives the client more control, for example if the host app is going from foreground to background. Or parts of the app are user facing tasks and part are background tasks.
Edited: A2A protocol is one of the examples: https://a2a-protocol.org/latest/specification/#93-streaming-task-execution-sse |
In my opinion, progress notifications and partial results are orthogonal features. Also, progress notifications are optional, so the client may not specify a
But would the client know whether it will go from foreground to background before sending the tool call? Or, if it's already in the background, would it know whether it will be brought to foreground in the middle of the tool call? It seems like it would be better (and simpler) for the client to always accept partial results, and silently merge them if it's running in the background. There's also an argument to be made for partial results being non-optional (though contingent upon negotiated protocol version), and always handling / merging them client-side instead of server-side. If a server wants to send a large number of partial results without buffering them all in memory, then, arguably, a client shouldn't be able to prevent that. |
|
+1 to proposal here after implementing more such servers, I am coming around to feedback here: #383, its better to support this as by letting tool result be partial. Some examples and sample implementation will really help! Another thing to consider is how will structured content look like, and when there is a delta update, which index of the unstructured content is it updating. |
|
How would a response with two content blocks work? If my tool result was {
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"type": "text",
"text": "content 1."
},
{
"type": "text",
"text": "content 2."
}
]
}
}If I send {
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"type": "text",
"text": "content 2."
}
]
}
}How does the client know to append to the previous content or create a new one element. Other streaming protocols do this with an index variable or an id in the content object. For example (use index) First {
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"index": 0,
"type": "text",
"text": "con"
}
]
}
}
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"index": 0,
"type": "text",
"text": "tent 1."
}
]
}
}
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"index": 1,
"type": "text",
"text": "cont"
}
]
}
}
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"index": 1,
"type": "text",
"text": "ent 2."
}
]
}
}
|
From #776 (comment): "One approach would be to include something like
How would index-based replacement represent total replacement? For example, if a tool has emitted two partial results and wants to replace them both with a single final result, how would that work? Also, how would indexes work with structured content? I also wonder if index-based replacement would be more challenging for some clients to support in terms of rendering. For example, could a stream of "random" indexes cause jankiness in a UI? That said, we could support index-based replacement as an option in the |
|
A merge strategy would really be too much burden on the client. I've yet to see any need beyond append. I would propose that for streaming a streaming response is only allowed to have one content element and the content element index in the logical final list should be in the _meta field next to "hasMore". Just for reference, the index approach is what OpenAI responses API uses https://platform.openai.com/docs/api-reference/responses-streaming/response/content_part/added#responses-streaming/response/content_part/added-output_index |
|
Hi @matoushavlena , @siwachabhi , @jonathanhefner , all Thanks for the discussion and proposal here. I can certainly see how this improves over earlier efforts (e.g., using progress notifications or resource notifications). I work on AutoGen and we have been experimenting with enabling multi-agent workflows on the MCP protocol, so this proposal is great and super relevant! (we'd explored a few implementations based on progress and resource notifications, similar to #383 for streaming updates). I've taken a first pass at implementing the spec above based on the python SDK, mostly to better understand the pros and cons: The general dev experience looks like: https://github.com/victordibia/python-sdk/blob/main/examples/fastmcp/streaming_example.py mcp_776_streaming_1080.mp4https://github.com/victordibia/python-sdk/blob/main/examples/fastmcp/streaming_example.py ## server
@mcp.tool(description="A tool that streams partial results")
async def streaming_counter(count: int, delay: float, ctx: Context) -> str:
for i in range(1, count + 1):
await ctx.stream_partial([TextContent(type="text", text=f"Processing item {i}/{count}")])
await asyncio.sleep(delay)
## client
tool_call_stream = session.stream_tool("streaming_counter", {"count": 5, "delay": 1})
async for partial_result in tool_call_stream:
# do something .. e.g, update UI What is implemented:
Some open questions/TODOs based on my implementation: Edit: See this doc with comments (thanks @matoushavlena ) comparing #776 and #383 approaches
I'll report back here as I make progress on this and would be excited to help push this forward. |
In the original proposal, the content list shall always be extended by the content list arriving in the next partial result, incrementally building the full list. The content blocks (items) are never modified. Two following two partial results: {
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"type": "text",
"text": "content 1."
}
],
"_meta": {
"hasMore": true
}
}
}
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"type": "text",
"text": "content 2."
}
],
"_meta": {
"hasMore": false
}
}
}Would become the following result: {
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"type": "text",
"text": "content 1."
},
{
"type": "text",
"text": "content 2."
}
]
}
}We considered index-based replacement unnecessary due to specific semantics and structure of the content field. The index-based replacement is certainly more useful when working with structured content and general use-cases. |
I would argue that this design does more than stretches the JSON RPC spec, I think it breaks it.
https://www.jsonrpc.org/specification#response_object It is hard to tell what the consequences of this spec violation are, and I don't know whether it would be better to just create a new derived protocol based on JSON-RPC or just suck it up and say "it's sorta JSON-RPC". |
I agree with @darrelmiller above I absolutely believe in the need for status updates and response streaming, but believe that something based on #383 is a cleaner implementation.
To me, this is a signal to follow the JSON-RPC notification pattern; where notifications can be used for result chunking; but to ensure the result message contains the full result. This prevents clients that dont have the capability or need to have streaming from seeing/dealing with these streams without data loss and for the SDKs to not be required to collect and aggregate response streams. |
@jonathanhefner I think you want to allow thin clients (ideally like |
I think it's impractical to require the server to accumulate the partials for the response object. It's not scalable, especially for content-heavy long-running tasks. This is also aligned with the LSP spec:
|
@darrelmiller I don't want to derail the original topic, but since you brought it up and this is likely one of the few real stretches of JSON-RPC in MCP, I will share my take. JSON-RPC was great when MCP servers were mostly run locally with stdio transport. But with the shift to remote servers and possible expansion to some of the agentic use cases (long-running processes with state sharing), JSON-RPC starts to show its limitations. I think a RESTful approach using SSE for streaming and async result pickup patterns would deliver a better ease of integration and unblock MCP to do more with less hassle. The REST ecosystem has already solved many of these problems, and MCP could benefit from building on those foundations rather than retrofitting them into JSON-RPC. |
It makes sense to not want to emit the same data twice. To enable durability, servers can follow the SSE standard to ensure they can persist and redeliver messages |
|
Reflecting back on my previous comment
We realised a limitation of the proposed simple block concatenation merging strategy. The client may not be able to clearly identify content boundaries, what blocks belong together. This would make block "compression" problematic and certain operations, like saving content into a file, hard to do. Index-based replacement would address this issue, I believe, as the server would not create additional blocks for the same content. |
|
@matoushavlena I largely agree. JSON-RPC provides an envelope format for transports that don't have them. HTTP already has URIs, headers, status codes, so JSON-RPC syntax becomes redundant waste. Putting the JSON-RPC parameters/result in the HTTP body and leveraging the URL for the method is a more natural way to support both HTTP and stdio. With a mapping between JSON-RPC and HTTP equivalents, it would be possible to do chunked encoding of the result object without touching any of the other parts of the protocol. Here's a pure thought experiment, off the top of my head, that tries to demonstrate what I mean. POST /tool/call/get_weather HTTP/1.1
host: https://mcp.example.org
Content-Type: application/json
Content-Length: 31
json-rpc-id: 2
{
"location": "New York"
}
200 OK
Content-Type: text/event-stream
Transfer-encoding: chunked
json-rpc-id: 2
52
data: { "content": [{"type": "text", "text": "Current weather in New York"}] }
31
data: { "content": [{"type": "text", "text": "\nTemperature 72F"}]}
44
data: { "content": [{ "type": "text", "text": "\nConditions: Partly cloudy"}] }
0
...which is the HTTP equivalent of the following single JSON-RPC response {
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [
{
"type": "text",
"text": "Current weather in New York:"
},
{
"type": "text",
"text": "\nTemperature: 72°F"
},
{
"type": "text",
"text": "\nConditions: Partly cloudy"
}
],
"isError": false
}
} |
|
@darrelmiller I like this idea, but there's something troubling me. In the current protocol:
And I think a request (e.g. elicitation) would have its own request id -- not the same as the original request. How would something like this be accomplished? Could the request id be moved into the body of the response rather than set in a header? |
|
@mikekistler Yeah, you are correct. The fact that MCP attempts to be a bidirectional protocol that interleaves S->C requests within C->S requests, makes the approach I suggested problematic. If we start putting request Ids in the events then we are going to quickly revert back to tunnelling JSON-RPC over event-streams. |
|
I wonder if creating a separate notification type for partial results could help here. This would avoid sending multiple responses for a request (with a partial_result boolean) and overloading the progress notifications with partial results where they might be ignored. We could make partial results work similar to progress notifications in that they'd be enabled only when requested on tool calls -- maybe with a resultToken (analogous to progressToken). So if a tool call has partial results enabled, it could stream results back, and otherwise would accumulate the response and provide it in the response message. The response message could be empty if partial results is enabled, avoiding redundant data transfer. In the event that it's impractical for the server to accumulate the response, it could simply reject the tool call. |
|
Just wanted to add that the A2A protocol from Google also streams multiple JSON-RPC result responses with the same I'm increasingly convinced that multiple result responses are a better fit than notifications, especially if streaming were ever formally added to JSON-RPC. It seems far more natural to extend the request-response model than to rely on notifications. |
|
@mpcm Sorry for the ping, but since you weighed on #414, I was wondering if you could weigh in here also. 🙏 For JSON-RPC, when sending partial results to the client, which of the following would be more appropriate?
Speculatively, if JSON-RPC were to add official support for streaming partial results, which approach do you think it would favor? |
This PR introduces support for partial results, primarily intended to enable streaming responses from long-running tools and agent interactions. Returning partial output improves responsiveness, interactivity, and user experience. The feature was identified as a key improvement in #111, #117, and #484.
Motivation and Context
Partial result streaming improves responsiveness and flexibility for both long-running tools and MCP-based agents, enabling:
We are proposing this change because our own use case involves using MCP for agent communication, where partial results are essential for enabling responsive, real-time, multi-agent workflows.
Related Work
[feat] Introduce partial results as part of progress notifications #383
Proposes streaming via progress notifications, which is a promising direction. One limitation discussed in this comment (which we independently arrived at as well) is that notifications may be ignored by clients, requiring a full result to be sent at the end regardless.
[Proposal] Task semantics and multi-turn interactions with tools #314
Explores related concepts for streaming and multi-turn interactions. Since elicitation was introduced in the latest draft, both
progressTokenandstateappear unnecessary under the updated semantics. This PR builds on those ideas and reflects recent protocol evolution.Related Issues
How Has This Been Tested?
Yes. Tool output streaming tested in real applications using the modified Python SDK.
Breaking Changes
None. Clients must explicitly opt in via
allowPartial. Servers are free to ignore this flag.Types of Changes
Checklist
Additional Context
This feature is designed to be minimal, backward-compatible, and avoid unnecessary data duplication.
This implementation raises two open design questions:
Structured data merging
While unstructured content (e.g.
contentinCallToolResult) can be concatenated incrementally, merging partial structured data (objects, arrays, etc.) is not currently defined. This PR deliberately avoids specifying a merging or delta application algorithm, leaving it open for future discussion if the protocol chooses to express an opinion on it.Use of
idfor multiple responsesThis implementation leverages a flexibility in JSON-RPC to send multiple responses with the same
id. While practical and effective, it stretches the spec's intent, which assumes a one-to-one mapping between request and response. Usingidin this way introduces a form of implicit multiplexing, which may conflict with traditional request–response matching logic in some JSON-RPC clients.Example
Below is an example of a request with
allowPartialset to true, followed by multiple partial responses:Notes for Reviewers
allowPartialis a client hint — servers are not required to support or act on it.hasMore: trueappears only in intermediate responses; the final response omits it.