Allow Prompt/Sampling Messages to contain multiple content blocks. #198

evalstate · 2025-03-13T17:01:49Z

Tool Call Results allow the return of an array of Text, Image and EmbeddedResources. This is typically consistent with Messaging APIs (e.g. OpenAI, Anthropic) which allow separation of content blocks within a "User" or "Assistant" message.

The current API treats Prompt and Sampling messages as singular - e.g. they can only contain one content block. This means that client code for message handling needs to "special case" building multi-part messages by recognizing and concatenating them. This also potentially loses the semantics of the "Message" container.

Motivation and Context

Consistency across schema: Currently CallToolResultSchema uses an array of content items, while PromptMessageSchema and SamplingMessageSchema use a single content item. This inconsistency creates implementation complexity.
Alignment with LLM provider APIs: Modern LLM APIs like OpenAI's Chat Completions and Anthropic's Messages API support multiple content blocks per message:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            },
        ],
    }],
)

Improved expressiveness: Allows for natural combinations like:

Text with supporting images in the same message
Text with embedded code snippets as separate blocks
Multiple resource references within a logical message unit

Simplified client implementations: Eliminates the need for clients to split/join content across multiple messages to represent what is logically a single message with multiple parts.

How Has This Been Tested?

Breaking Changes

This breaking change can be mitigated with a Protocol Version check to convert from a single element to an Array.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

The User Guide will need updating on publication.

PederHP · 2025-03-14T13:43:49Z

An alternative to the breaking change could be to use a new name for the field or to add the array of content as a new type of content. Not saying either is better than a breaking change. Just worth considering, as in practice many clients/servers will likely not support multiple protocol versions, which means that non-backwards compatible schema changes will break compatibility. Maybe that's ok, but thought I mention this anyway.

evalstate · 2025-03-14T14:48:42Z

I did think on this one quite hard, but I think mitigating are:

Relatively low take-up of Prompts/Sampling Features reduce the risk. Those using the features will likely be in a position to adapt. It would be nice to know if others had similar feedback.
Conversion for the general case is quite straightforward
A new name/field would introduce duplication and tech debt - it might make sense as a migration path, but internally I'm now coding to the assumption that Messages have multiple content blocks.

PederHP · 2025-03-14T15:17:41Z

I did think on this one quite hard, but I think mitigating are:

Relatively low take-up of Prompts/Sampling Features reduce the risk. Those using the features will likely be in a position to adapt. It would be nice to know if others had similar feedback.

Conversion for the general case is quite straightforward

A new name/field would introduce duplication and tech debt - it might make sense as a migration path, but internally I'm now coding to the assumption that Messages have multiple content blocks.

I agree, but I think it makes sense to have articulated and considered the alternatives.

evalstate · 2025-03-14T16:47:05Z

Well, it's put here as a draft to provoke the conversation - and get input from the Maintainers. I'm happy to put the work in to a solution of any type (compatibility preserving etc.) if we agree this is something worth doing - but there will be a lot of documentation etc. to write if we proceed with any option. Thank you.

dsp-ant · 2025-03-20T15:46:24Z

Curious what @jspahrsummers and @jerome3o-anthropic have to say, but I think this approach makes sense. It'll be a bit painful for clients to update, but I think that's probably okay. Luckily the protocol is versioned and so we can deal with different result types.

evalstate · 2025-03-20T16:37:19Z

On this one, I am planning on writing a discussion thread showing examples of this, and potential workarounds with sample code.

jspahrsummers · 2025-03-24T11:49:47Z

Yep, no objections from me.

cliffhall

LGTM! 👍

cliffhall · 2025-04-10T16:24:12Z

Well, it's put here as a draft to provoke the conversation - and get input from the Maintainers.

Here's a possible alternative: The content field could be one of the types OR an array of them.

"content": {
    "anyOf": [
        {
            "$ref": "#/definitions/TextContent"
            ...
        },
        {
            "type": "array",
            "items": {
                "anyOf": [
                    {
                        "$ref": "#/definitions/TextContent"
                        ...
                    }
                ]
            }
        }
    ]
}

Probably not the right solution, but I thought I'd throw it out there.

Makes consuming the content more complex since you have to account for the either/or. And devs who are already using sampling would still need to update their code.

Realizing its essentially @PederHP's suggestion:

add the array of content as a new type of content.

theobjectivedad · 2025-05-11T16:06:53Z

I appreciate this discussion and just wanted to weigh regarding use cases. I can think of two scenarios where this would be useful. As previously mentioned, (a) when sampling responses are asking for n>1 is certainly valid. I also wanted to add a (perhaps?) more common scenario where (b) MCP servers need to run multiple sampling requests in parallel.

I ran into this yesterday when I wanted to run parallel summary requests at the MCP server level on a list of search results. For this specific situation, I can certainly summarize at the client level however I feel strongly that enough new scenarios will continue to arise for (a) and (b) over time to justify a protocol change.

…t-arrays

ktwillcode

👍

cliffhall

Just a few comments.

cliffhall · 2025-06-02T15:01:33Z

schema/2025-03-26/schema.json

We shouldn't be modifying this past schema version.

cliffhall · 2025-06-02T15:01:50Z

schema/2025-03-26/schema.ts

We shouldn't be modifying this past schema version.

cliffhall · 2025-06-02T15:05:42Z

schema/draft/schema.ts

 export interface SamplingMessage {
  role: Role;
-  content: TextContent | ImageContent | AudioContent;
+  content: (TextContent | ImageContent | AudioContent)[];


One last-ditch ask for a backward compatible way to handle this. Defining an ArrayContent type which can contain any of the existing types, and then this could become:

Suggested change

content: (TextContent | ImageContent | AudioContent)[];

content: (TextContent | ImageContent | AudioContent | ArrayContent);

Not certain if there's a reason why it wouldn't work, but thought I'd put it out there.

I understand the concern, however the original proposal is still my preference. My reasoning is:

Changing content to an Array makes it the same as content in CallToolResult.

SDK compatibility for both Client and Server is quite straightforward. E.g. converting from content to [content] or from [content,content] to [Message,Message]. This should mean the rollout at both the SDK and protocol level can be managed. There is example code in fast-agent that does this conversion (as it uses this type internally).

It is more directly expressive. For example mcp-webcam development version has image prompts for ICL. These have to be [Message TextContent],[Message ImageContent] rather than the actual LLM API Shape of Message [TextContent,ImageContent].

Adoption of Sampling and Prompts containing embedded content is still relatively small in comparison to the broader MCP system, so this will be nowhere near as impactful as a breaking change on CallToolResult.

So the trade-off is between a new type introduced for backwards compatibility, or expressing the Message content semantically. I think because it can be mitigated at the SDK with low effort I fall towards the latter.

evalstate

I understand the concern, however the original proposal is still my preference. My reasoning is:

Changing content to an Array makes it the same as content in CallToolResult.
SDK compatibility for both Client and Server is quite straightforward. E.g. converting from content to [content] or from [content,content] to [Message,Message]. This should mean the rollout at both the SDK and protocol level can be managed. There is example code in fast-agent that does this conversion (as it uses this type internally).
It is more directly expressive. For example mcp-webcam development version has image prompts for ICL. These have to be [Message TextContent],[Message ImageContent] rather than the actual LLM API Shape of Message [TextContent,ImageContent].
Adoption of Sampling and Prompts containing embedded content is still relatively small in comparison to the broader MCP system, so this will be nowhere near as impactful as a breaking change on CallToolResult.

So the trade-off is between a new type introduced for backwards compatibility, or expressing the Message content semantically. I think because it can be mitigated at the SDK with low effort I fall towards the latter.

evalstate · 2025-06-02T15:44:06Z

schema/draft/schema.ts

 export interface SamplingMessage {
  role: Role;
-  content: TextContent | ImageContent | AudioContent;
+  content: (TextContent | ImageContent | AudioContent)[];


I understand the concern, however the original proposal is still my preference. My reasoning is:

Changing content to an Array makes it the same as content in CallToolResult.

SDK compatibility for both Client and Server is quite straightforward. E.g. converting from content to [content] or from [content,content] to [Message,Message]. This should mean the rollout at both the SDK and protocol level can be managed. There is example code in fast-agent that does this conversion (as it uses this type internally).

It is more directly expressive. For example mcp-webcam development version has image prompts for ICL. These have to be [Message TextContent],[Message ImageContent] rather than the actual LLM API Shape of Message [TextContent,ImageContent].

Adoption of Sampling and Prompts containing embedded content is still relatively small in comparison to the broader MCP system, so this will be nowhere near as impactful as a breaking change on CallToolResult.

So the trade-off is between a new type introduced for backwards compatibility, or expressing the Message content semantically. I think because it can be mitigated at the SDK with low effort I fall towards the latter.

evalstate · 2025-06-02T22:13:46Z

Yes, I understand it is a breaking change, and was proposed as such. Given the changes for StructuredOutput for the next protocol revision I'm not sure that this is worse (as there is a non-breaking SDK interface path to introduce it).

…state/specification into feat/message-content-arrays

…t-arrays

dsp-ant · 2025-06-06T14:19:15Z

Claude suggested that several documentation files in docs/ need updates to reflect the breaking changes in this PR:

Files needing updates:

docs/docs/concepts/sampling.mdx - Message format examples show content as single object instead of array
docs/tutorials/building-a-client-node.mdx - Client tutorial examples use old single content format
docs/sdk/java/mcp-server.mdx - Java SDK sampling examples need to use content arrays

Why: Since this changes message content from content: {...} to content: [{...}], the documentation examples will mislead developers and cause implementation errors.

dsp-ant

I think I am okay with this change.

Please update the documentation and changelog. Please run this past SDK maintainers ASAP to understand any concerns before we land the revision.

Ping me when you need final approval.

dsp-ant · 2025-06-06T14:13:48Z

docs/specification/draft/changelog.mdx


 ## Other schema changes

+- PromptMessage and SamplingMessage now contain Arrays of content.


This should be a major change with a note that it's breaking

dsp-ant · 2025-06-10T10:04:25Z

Okay coming back to this. I think we, while we are all happy with the change, on the SDK side this is a true test for how we handle version negotiation and it revealed that we need much more work and coordination on this. While it is quite annoying for everyone involved, I believe it's best if we not include it in this revision and give SDK developers a chance to figure out how to best deal with different versions of an interface in their SDK.

ochafik · 2025-09-26T11:26:56Z

I'm slightly worried about allowing message content array w/o requiring a strict message role alternance.
And very worried about the breaking change.

Most inference APIs (OpenAI's chat completions, Claude's, but also OSS in HF transformers and llama.cpp) require or assume a strict assistant / user alternance in messages, with message content being a single string or an array of typed parts.

The current sampling API amounts to flattened version of this & allows consecutive repeated roles, but is currently trivial and unambiguous to unflatten, by just grouping by role:

// Sampling messages

[
  {"role": "user", "content": {"type": "text", "text": "Describe and enhance this pic:"}},
  {"role": "user", "content": {"type": "image", "mimeType": "image/png", "data": "base64..."}},
  {"role": "assistant", "content": {"type": "text", "text": "It's dull. I've spiced it up"}},
  {"role": "assistant", "content": {"type": "image", "mimeType": "image/png", "data": "base64..."}},
  {"role": "user", "content": {"type": "text", "text": "And then?"}}
]

Converted to OpenAI / HF-style format (content: string | ({type: "text", text: string} | ...)[]):

// OpenAI- / HF-style messages

[
  {"role": "user", "content": [
    {"type": "text", "text": "Describe and enhance this pic:"},
    {"type": "image", "mimeType": "image/png", "data": "base64..."}
  ]},
  {"role": "assistant", "content": [
    {"type": "text", "text": "It's dull. I've spiced it up"},
    {"type": "image", "mimeType": "image/png", "data": "base64..."}
  ]},
  {"role": "user", "content": {"type": "text", "text": "And then?"}}
]

Now if we allow this:

[
  {"role": "user", "content": [{"type": "text", "text": "content1.1"}, {"type": "text", "text": "content1.2"}]},
  {"role": "user", "content": [{"type": "text", "text": "content2"}]}
]

The only way to implement it w/ actual inference APIs will be to coalesce these, loosing the kinda-implied semantic grouping of content1.1 and content1.2:

[
  {"role": "user", "content": [
    {"type": "text", "text": "content1.1"},
    {"type": "text", "text": "content1.2"},
    {"type": "text", "text": "content2"}
  ]}
]

My take is we should:

Have content accept a single MessageContent or an array of it, to avoid backwards-incompatibility:

type MessageContent = TextContent | ImageContent | AudioContent | EmbeddedResource;
export interface PromptMessage {
  role: Role;
  content: MessageContent | MessageContent[];
}

Introduce backward-compatible message role alternance: maybe something like:

Consecutive sub-sequences of messages with the same role MUST either all have a content with a single MessageContent, or be of length 1.

PederHP · 2025-09-26T13:19:49Z

Most inference APIs (OpenAI's chat completions, Claude's, but also OSS in HF transformers and llama.cpp) require or assume a strict assistant / user alternance in messages, with message content being a single string or an array of typed parts.

This is no longer the case. OpenAI and Claude both allow arbitrary ordering, and I think Gemini does too.

If a client has a need for strict turn ordering they can insert dummy message or merge consecutive user / assistant messages. This is a relatively trivial change to make in those few host who need it (probably only open source inference), and it avoids a lot of complexity in the protocol.

evalstate · 2025-09-26T13:23:37Z

This is no longer the case. OpenAI and Claude both allow arbitrary ordering, and I think Gemini does too.

Came here to say same (I explicitly test fast-agent handling of that case as well). I did previously used to have to divide document blocks out for Anthropic but that's been fixed too: https://github.com/anthropics/anthropic-sdk-python/releases/tag/v0.66.0 :)

I agree with @PederHP that concerns about presentation to the inference API should be handled by the Host, and that the intermediate MCP format should allow maximum expressiveness.

evalstate · 2025-11-20T16:54:55Z

#1577 partially solves this, new SEP to be considered for PromptMessage types.

switched prompt/sampling messages to array content blocks

8baf498

jspahrsummers added this to Standards Track Mar 14, 2025

jspahrsummers moved this to Draft in Standards Track Mar 14, 2025

cliffhall mentioned this pull request Apr 2, 2025

Create sampling response form modelcontextprotocol/inspector#246

Merged

9 tasks

cliffhall previously approved these changes Apr 2, 2025

View reviewed changes

Merge branch 'main' into feat/message-content-arrays

eb4ad9a

evalstate dismissed cliffhall’s stale review via eb4ad9a April 10, 2025 07:06

Merge branch 'main' into feat/message-content-arrays

34c215a

evalstate mentioned this pull request May 14, 2025

Enforce parity between supported message types in tool results and sampling #522

Closed

9 tasks

LucaButBoring mentioned this pull request May 15, 2025

Revise sampling specification to define all valid request and response fields #531

Merged

9 tasks

evalstate added 6 commits May 25, 2025 13:02

Merge remote-tracking branch 'upstream/main' into feat/message-conten…

41bfd83

…t-arrays

move changes to draft schema

32fa465

remove ER

c13b1d1

Generate JSON Schema

9661bf6

Update changelog

36634e5

update draft docs to reflect changes in schema

0e7dbe1

evalstate marked this pull request as ready for review May 25, 2025 20:25

evalstate requested a review from jerome3o-anthropic May 25, 2025 20:25

ktwillcode reviewed Jun 2, 2025

View reviewed changes

cliffhall reviewed Jun 2, 2025

View reviewed changes

evalstate commented Jun 2, 2025

View reviewed changes

evalstate added 2 commits June 4, 2025 09:28

Merge branch 'feat/message-content-arrays' of https://github.com/eval…

4d34819

…state/specification into feat/message-content-arrays

Merge remote-tracking branch 'upstream/main' into feat/message-conten…

d1b2387

…t-arrays

evalstate requested review from dsp-ant and pcarleton June 4, 2025 08:30

evalstate mentioned this pull request Jun 4, 2025

add ResourceLink to CallToolResult #603

Merged

8 tasks

dsp-ant moved this from Draft to Consulting in Standards Track Jun 6, 2025

dsp-ant requested changes Jun 6, 2025

View reviewed changes

github-project-automation bot moved this from Consulting to In Review in Standards Track Jun 6, 2025

dsp-ant added the awaiting-sdk-change label Jun 6, 2025

dsp-ant added this to the DRAFT 2025-06-XX milestone Jun 6, 2025

dsp-ant modified the milestones: DRAFT 2025-06-XX, DRAFT-XX-XX Jun 10, 2025

evalstate mentioned this pull request Jun 10, 2025

EmbeddedResource returns a single content block, ReadResourceResult returns an array #699

Open

dsp-ant moved this from In Review to Consulting in Standards Track Jun 10, 2025

LucaButBoring mentioned this pull request Jul 17, 2025

SEP-958: Sample calls establish new sessions to enable zero knowledge MCP nesting or chaining #958

Closed

evalstate mentioned this pull request Aug 7, 2025

SamplingMessage.content should accept a list #1282

Open

eiriktsarpalis mentioned this pull request Aug 26, 2025

Overhaul design to handle changes between spec versions modelcontextprotocol/csharp-sdk#569

Closed

dsp-ant requested a review from a team September 23, 2025 21:10

evalstate mentioned this pull request Sep 25, 2025

Added Tool Call and Tool Result to GetPrompt for in-context learning … #188

Closed

9 tasks

This was referenced Sep 30, 2025

SEP-1577 - Sampling With Tools #1577

Closed

prototype: SEP-1577 - Sampling With Tools modelcontextprotocol/typescript-sdk#991

Closed

jerome3o-anthropic removed their request for review November 10, 2025 17:04

evalstate closed this Nov 20, 2025

	content: (TextContent \| ImageContent \| AudioContent)[];
	content: (TextContent \| ImageContent \| AudioContent \| ArrayContent);


		## Other schema changes

		- PromptMessage and SamplingMessage now contain Arrays of content.

Allow Prompt/Sampling Messages to contain multiple content blocks. #198

Allow Prompt/Sampling Messages to contain multiple content blocks. #198

Conversation

evalstate commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

PederHP commented Mar 14, 2025

Uh oh!

evalstate commented Mar 14, 2025

Uh oh!

PederHP commented Mar 14, 2025

Uh oh!

evalstate commented Mar 14, 2025

Uh oh!

dsp-ant commented Mar 20, 2025

Uh oh!

evalstate commented Mar 20, 2025

Uh oh!

jspahrsummers commented Mar 24, 2025

Uh oh!

cliffhall left a comment

Choose a reason for hiding this comment

Uh oh!

cliffhall commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theobjectivedad commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ktwillcode left a comment

Choose a reason for hiding this comment

Uh oh!

cliffhall left a comment

Choose a reason for hiding this comment

Uh oh!

cliffhall Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

cliffhall Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

cliffhall Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evalstate Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

evalstate left a comment

Choose a reason for hiding this comment

Uh oh!

evalstate Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

evalstate commented Jun 2, 2025

Uh oh!

dsp-ant commented Jun 6, 2025

Uh oh!

dsp-ant left a comment

Choose a reason for hiding this comment

Uh oh!

dsp-ant Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

dsp-ant commented Jun 10, 2025

Uh oh!

ochafik commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PederHP commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evalstate commented Sep 26, 2025

Uh oh!

evalstate commented Mar 13, 2025 •

edited

Loading

cliffhall commented Apr 10, 2025 •

edited

Loading

theobjectivedad commented May 11, 2025 •

edited

Loading

cliffhall Jun 2, 2025 •

edited

Loading

ochafik commented Sep 26, 2025 •

edited

Loading

PederHP commented Sep 26, 2025 •

edited

Loading