Skip to content

Conversation

@lukaswelinder
Copy link

@lukaswelinder lukaswelinder commented Apr 16, 2025

Motivation and Context

Only allowing for unstructured text content in tool call results puts limits on the extensibility of MCP. Adding support for structured output allows us to provide better context to the LLMs calling the tools, and allows for extensibility of MCP tools beyond just LLM usage.

See #97 for discussion.

Changes Made

  1. Add optional outputSchema object on the tool definition.
  2. Add a content type for JSON:
    {
        "description": "A JSON encodable object to or from an LLM.",
        "properties": {
            "data": {
                "description": "The result object.",
                "type": "object"
            },
            "schema": {
                "description": "The schema $ref or a schema object.",
                "type": ["string", "object"]
            }
        },
        "required": ["data"]
    }

This allows for pretty flexible usage:

  • If schema is not defined, the outputSchema is implied
  • If schema is a string, it is interpreted as a URI for the schema (either a ref in outputSchema or a fully qualified URL)
  • If schema is an object, that object is the schema (for supporting dynamic content schemas)

How Has This Been Tested?

Have been using this approach with and internally built MCP server and agent workflow. We have only tested it for listing tools and tool calls. We have not tested sampling or prompt messages.

Breaking Changes

The changes here shouldn't break existing implementations.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@Ejb503
Copy link
Contributor

Ejb503 commented Apr 17, 2025

I think this goes in the right direction, can't help but feel that JsonSchema is the correct standard for the schema property.

Data MUST be valid Json object
Schema (optional) should be a valid json schema description of the object.

Proper implementation then allows runtime validation with common libraries and standard implementation.

Just need the APIs to support full json schema next but one battle at a time.

@He-Pin
Copy link
Contributor

He-Pin commented Apr 17, 2025

With this, we can help the LLM better understand the result.

@lukaswelinder BTW, I think the MCP server should provide a JSONSchema query tool too.

@briancripe
Copy link

I think this goes in the right direction, can't help but feel that JsonSchema is the correct standard for the schema property.

Data MUST be valid Json object Schema (optional) should be a valid json schema description of the object.

Agree with @Ejb503 here but want to raise some additional considerations based on the current response structure:

  • As opposed to the args and their associated inputSchema being a single required object, the return content is allowed to be 0..N mixed content types, some of which could be non-objects
  • This means that if outputSchema is on the root of the tool definition you have to infer a few other considerations
  • Does the outputSchema apply to every Content item, only 'object' content items, or the entire response content as a whole?
  • If there is some use case to return additional content types alongside the object content, is there anything in the tool definition (besides potentially the description) that hints to expect this?

Let's assume that it's not desirable to overhaul the CallToolResult type, so we're likely adding some sort of ObjectContent type that should have a JSON Schema somewhere we can use to validate it and provide hints to the LLM of what to expect. The next layer we might want to provide is hints as to which content types are expected (between 0, 1, and N for each content type).

An off-the-cuff proposal for metadata could be:

expectedOutput:
  textContent:
    quantity: "none"
  objectContent:
    quantity: "multiple"
    objectSchema: [a valid Draft-7 (?) JSON schema]
  imageContent:
    quantity: "single"
  audioContent:
    quantity: "none"
  embeddedResource:
    quantity: "none"

NOTE: Omission of any content key could either imply an expected quantity of "none" or "multiple" (the default expectation today).

@lukaswelinder
Copy link
Author

lukaswelinder commented Apr 17, 2025

@briancripe I think that the design I've put forward largely addresses the points you made.

With regards to my notes on the JsonContent response behavior:

  • If schema is not defined, the entire outputSchema is implied to represent the content of data
  • If schema is a string, it is interpreted as a URI for the schema (either a $ref in outputSchema or a fully qualified URL)
  • If schema is an object, that object is the schema

In all cases, whatever the schema is, it just applies to the content.data property (not the entire tool result). I don't think it makes sense for tool output schemas to drive the internal implementation of MCP.

If there is some use case to return additional content types alongside the object content, is there anything in the tool definition (besides potentially the description) that hints to expect this?

Yes, a few things come to mind:
- Natural language description/summary/usage instructions/etc
- For things like image/document processing, including a transformed file alongside structured content

Edit: I misread. Currently no. I don't think it's really something worth worrying about. Between this design and using annotations, the interpretation of tool results should be largely up to the client & LLM implementation.

@He-Pin Can you elaborate on what you mean by providing a JSON Schema query tool? I feel like that isn't really in scope of MCP. I suppose that with the ability to use $ref, the MCP client should provide a way to extract that from the tool's outputSchema, but I'm not sure if that falls within the specification's responsibility to dictate.

@He-Pin
Copy link
Contributor

He-Pin commented Apr 18, 2025

That is out of the scope of this PR, I mean, an MCP server should provide a JSON schema query tool out of the box, but anyway, the client will get all the schema when listing tools with this outputschema change.

Thanks for this.

@siwachabhi
Copy link
Contributor

siwachabhi commented Apr 18, 2025

PR looks great! outputSchema for sure being a replica of inputSchema makes sense.

I am contemplating about JsonContent, Json is also text. Wondering if original TextContent object should be extended: https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-03-26/schema.ts#L949. We can introduce an optional mimeType and schema in there. There is value is having object structure instead of string.

Should JsonContent be named DataContent object? JsonContent seems too specific. This one is probably only a naming difference.

As a reference: https://google.github.io/A2A/#/documentation?id=structured-output, https://github.com/google/A2A/blob/main/specification/json/a2a.json#L505

@lukaswelinder
Copy link
Author

Should JsonContent be named DataContent object? JsonContent seems too specific. This one is probably only a naming difference.

@siwachabhi I actually raised that point in #97, and I think you're right, renaming it to DataContent makes more sense. I'll make this change later today.

@lukaswelinder lukaswelinder changed the title Add tool outputSchema and JsonContent type to support structured content Add tool outputSchema and DataContent type to support structured content Apr 19, 2025
@lukaswelinder lukaswelinder changed the title Add tool outputSchema and DataContent type to support structured content RFC: add tool outputSchema and DataContent type to support structured content Apr 19, 2025
@marianogonzalez
Copy link
Contributor

I proposed something very similar to this in this discussion, but I agree with some other comments in this thread:

  • This should not be json specific. Other types of structured data should be supported as well (XML, Yml, etc)
  • It should not be limited to text responses. I think that the ability to say it generates PNG images or it generates CSV resources would be just as useful

This thread also brings a question for which I find myself having mixed feelings: Should structured data responses like Json and XML be treated as Text responses? Or should we use Text responses for natural language and add a brand new type of response for structured machine language ?

@lukaswelinder
Copy link
Author

@marianogonzalez I think there is some overlap there with the discussion in #223 regarding content types.

  • This should not be json specific. Other types of structured data should be supported as well (XML, Yml, etc)

I'm not sure how I feel about this. I build out this implementation in a way that tried to minimize scope and impact, but I am inclined to agree with your first point.

I think the alternative approach would be to extend the TextContent type to support a mimeType and allow an optional schema field for the following content types:

  • application/json
  • application/yaml
  • application/xml

The other aspects of my proposal would remain the same.

That said, it feels a bit weird to be using JSONRPC and having structured JSON encoded as a string text field. Potential performance/bandwidth impact aside, it means that clients that intend to use the content as needed would need to deserialize it first.

  • It should not be limited to text responses. I think that the ability to say it generates PNG images or it generates CSV resources would be just as useful

As for your second point, I feel like that is kind of a separate problem that I'd rather not concern the scope of this RFC with. If I'm understanding correctly, you want to have a way to explicitly define the whole of the CallToolResult, which feels like a significant departure from the current design where the tool returns an arbitrary number of content entries of various types and it is left to the client to consume them.

@bhosmer-ant
Copy link
Contributor

@lukaswelinder hey, so here's my attempt to summarize the differences between this proposal and #371. TLDR I think they've got two different categories of use case in mind, but AFAICT there's only a couple of spots where they actually clash, and I think we have a couple of options for serving both sets of goals in a single proposal.

The main priorities that drove the design in #371 were:

Strictness: a client should be able to examine the output schema a tool declares ahead of time, and the schema should be binding (non-overridable). This is important for both interacting with tools from untrusted servers (both at tool selection and result validation time), and for composing tool calls in code.

Simplicity: for for tools that essentially wrap function calls, schematizing the tool's "return type" should ideally be
easy to derive mechanically from the implementing function's return type on the server side (a la FastMCP's current use of argument types for inputSchema), easy to add to existing tool declarations (ideally without implementation changes to the tool itself) on the server side, and easy to generate validation and deserialization code from mechanically on the client side (Aside: my sense is that these "function wrappers" are the vast majority of existing tools, though of course we can't privilege this category at the expense of others.)

In contrast, I read #356 as prioritizing expressiveness: being able to make use of schematized results in the full space of possible CallToolResult shapes. Whereas #371 is very modal (your tool result either mimics a function's return value, or you can't schematize it at all), #356 lets you "mix in" schema-validated DataContent blocks into any result. Which is cool! All things being equal it's definitely nicer to bring schematic validation to this larger space of use cases.

So it's a matter of figuring out how to serve both use cases with one setup. I think the pain points for function-wrapper use cases with #356 in its current form are as follows:

Strictness - this is the only genuine conflict, I think. In #356 currently (IIUC)

  • there's no way to constrain a tool to return a single DataContent block, since outputSchema applies to per-block rather than a per-result.
  • there's no way to prevent a result from containing non-schematized blocks along with schematized ones.
  • the tool's outputSchema is non-binding, since each DataContent block can override it with one of its own.

Off the top of my head I don't see a way for Tool.outputSchema to serve both roles here (i.e., both a binding schema for the whole result and a default schema for DataContent blocks). One alternative might be to define Tool.outputSchema as in #371, and also define DataContent as here, but remove the implicit defaulting behavior of DataContent.schema - i.e., it's now either required or its absence means "structured but unconstrained", rather than implicitly falling back to Tool.outputSchema. Interested in your thoughts on this - it's not the only solution, maybe there are others that are better.

(Oh btw, I think DataContent.schema is actually mandatory in the current PR?)

Simplicity - there are a couple minor things here but they don't seem difficult to resolve, one way or another.

  • The main one is that the currentoutputSchema is constrained to be an object type here - this means simple (non-object) return types would need to be wrapped, complicating the process of automatically deriving outputSchemas from function signatures (as in e.g. FastMCP) and of consuming the schematized values.
  • There's also the issue of what kind of content block to return in the strict case. If DataContent retains its ability to carry its own schema, then it's not the right vehicle for the strictly-validated result, absent some sort of modality.

Anyway, definitely interested in how you see the choice space here. (Apologies for the length of this, felt like it was worth backfilling the context though.)

@lukaswelinder
Copy link
Author

lukaswelinder commented Apr 22, 2025

@bhosmer-ant Thanks for the detailed writeup, you make some great points.

I agree that there are merits to both approaches. I took the route of a more expressive solution to allow for greater flexibility, but with all abstractions come drawbacks. From what I gather, to bridge the gap to the design decisions you put forward with #371 we want to:

  • Support non-object output schemas (this seems simple enough, I'll just go ahead and update my implementation)
  • Support unconstrained ObjectType responses
  • Be more explicit with behavior of outputSchema and DataContent.schema

(Oh btw, I think DataContent.schema is actually mandatory in the current PR?)

Yeah, my mistake. I'll correct that.

  • there's no way to constrain a tool to return a single DataContent block, since outputSchema applies to per-block rather than a per-result.

I sort of touched on this in my last paragraph here. I think that's sensible and aligned with the current CallToolResult design pattern. It's also similar with LLM API's; for example, if I were to make a request to OpenAI's o4-mini with a structured response, I would receive two content blocks, one for reasoning and the other with the assistant's structured response.

Enforcing a single block response feels like an arbitrary restriction without much gain. Having the outputSchema and ability to reference definitions within it using schema provides enough for the client to interpret its result.

  • there's no way to prevent a result from containing non-schematized blocks along with schematized ones.

Not sure I follow here. I'm assuming it's along the same lines as the first point (as in you could have TextContent or others along side DataContent entries), in which case I stand by the above.

  • the tool's outputSchema is non-binding, since each DataContent block can override it with one of its own.

I would regard this as an advanced use case that most will not use, but there are a few examples where it would be useful:

  • Tool that queries the DB, and interprets the result columns to generate a schema
  • Tool that pulls data from external APIs that own their own schemas

Off the top of my head I don't see a way for Tool.outputSchema to serve both roles here (i.e., both a binding schema for the whole result and a default schema for DataContent blocks).

I see your point about it being non-binding from the perspective of not explicitly knowing the whole results of a tool before receiving them. I think this is a separate concern though. Having outputSchema and DataContent provides the tools to exchange structured data without imposing arbitrary restrictions on the broader design of CallToolResults content.

One alternative might be to define Tool.outputSchema as in #371, and also define DataContent as here, but remove the implicit defaulting behavior of DataContent.schema - i.e., it's now either required or its absence means "structured but unconstrained", rather than implicitly falling back to Tool.outputSchema. Interested in your thoughts on this - it's not the only solution, maybe there are others that are better.

Not sure I follow exactly; do you mean have outputSchema and enforce a single content response, but also add DataContent that can contain it's own schema object? I think the problem with that approach (aside from what I've laid out above) is that no aspect of DataContent can be inferred from the list tools result.

I think I have an approach that may satisfy both.


Proposal

  • Change outputSchema to support non-object schemas
  • Change behavior of DataContent.schema:
    • When Tool.outputSchema is defined and:
      • When DataContent.schema is not defined, DataContent.data must be valid according to Tool.outputSchema
      • When DataContent.schema is a string, it must be a valid JSON Path in Tool.outputSchema.$defs, and DataContent.data must be valid according to the evaluated schema
      • When DataContent.schema is an object, it must be a valid JSON Schema that may reference entries in Tool.outputSchema.$defs using the $ref keyword, and DataContent.data must be valid according to the evaluated schema
      • When DataContent.schema is false, then DataContent.data is considered unstructured
    • When outputSchema is not defined and:
      • When DataContent.schema is a string, it should result in an error
      • When DataContent.schema is not defined, it is considered unstructured data
      • When DataContent.schema is and object, it must be a valid JSON Schema, and DataContent.data must be valid according to DataContent.schema

I believe that this approach maintains expressive flexibility while also supporting unstructured data as needed. Thoughts?

@bhosmer-ant
Copy link
Contributor

bhosmer-ant commented Apr 22, 2025

  • there's no way to constrain a tool to return a single DataContent block, since outputSchema applies to per-block rather than a per-result.

I sort of touched on this in my last paragraph here. I think that's sensible and aligned with the current CallToolResult design pattern. It's also similar with LLM API's; for example, if I were to make a request to OpenAI's o4-mini with a structured response, I would receive two content blocks, one for reasoning and the other with the assistant's structured response.

Enforcing a single block response feels like an arbitrary restriction without much gain. Having the outputSchema and ability to reference definitions within it using schema provides enough for the client to interpret its result.

  • there's no way to prevent a result from containing non-schematized blocks along with schematized ones.

Not sure I follow here. I'm assuming it's along the same lines as the first point (as in you could have TextContent or others along side DataContent entries), in which case I stand by the above.

  • the tool's outputSchema is non-binding, since each DataContent block can override it with one of its own.

I would regard this as an advanced use case that most will not use, but there are a few examples where it would be useful:

Sure, these are all well-motivated features in service of expressiveness. The issue is that "function call" tools need strictness: an output schema that describes the exact, complete structure of the tool call result (i.e. analogous to a function's return type).

So, single-block enforcement isn't a goal per se, it's just the only (non-baroque) way to apply an outputSchema to an entire result, in a predictable way. Same for schema overrides in DataContent - not an issue in and of themselves, only that they provide an escape hatch that would let results violate the strictness requirement.

So yeah, strictness is in direct tension with expressiveness, and both are well-motivated, so we need to find a way to express both. My "outputSchema is strict, DataContent.schema is (disjoint and) expressive" idea was just the simplest/least intrusive way I could think of. But like you say, it has the drawback that the tool description now gives no information about DataContent in the expressive case.

Other obvious options: two output schema properties, one for each case, along with the meta-constraint that they be mutually exclusive - outputSchema and dataContentSchema, say. Or (I don't really like this) some kind of is_strict flag that changes the meaning of outputSchema.

But before going further down the path of possible solutions, I want to make sure we're walking around on the same landscape - it makes sense why we'll need strict (rather than expressive) validation for function-call-like tool results, right?

@lukaswelinder
Copy link
Author

But before going further down the path of possible solutions, I want to make sure we're walking around on the same landscape - it makes sense why we'll need strict (rather than expressive) validation for function-call-like tool results, right?

@bhosmer-ant Yeah, I get that. I was aiming for an approach that didn't operate in conflict of or put special case limitations on the multipart content pattern, but I do agree that having a strict/explicit way to dictate expected output is valuable.

I was kind of hoping not to have to get into the weeds of dictating the overall ToolResult, but I'm inclined to agree that having a way to enforce and communicate strictness is probably worth addressing with this initiative.

Other obvious options: two output schema properties, one for each case, along with the meta-constraint that they be mutually exclusive - outputSchema and dataContentSchema, say. Or (I don't really like this) some kind of is_strict flag that changes the meaning of outputSchema.

I don't think either of these are bad ideas. I would probably be more in favor of just having a flag instead of potentially confusing mutually exclusive properties.

I don't recall where it was brought up, but we may consider a broader approach to dictating the tool's output. Something like this:

interface Tool {
  /* other tool fields */
  output: {
    content: {
      data: {
        // Whether the tool returns a single or multiple `content` entries of this type
        count?: 'single' | 'multiple';
        description: string;
        // Whether the tool's output must match the schema exactly, or if it can be overridden by `DataContent.schema`
        strict: boolean;
        schema: {
          type: 'object';
          properties: {
            my_property: { type: 'string' };
          };
        };
        /* other content types */
      };
    };
  };
}

I'll need to think on this a bit.

@Ejb503
Copy link
Contributor

Ejb503 commented Apr 23, 2025

I've been following the suggestions in some detail, perhaps I'm missing some nuance here but the simplest solution to me seems to be.

1/ Add output Schema to Tool, this is optional and can be any of the current content types (text, image, audio etc).
2,/ Add a new content type (data, json whatever), which has an optional schema field (similar to how audio has mimetype etc...)

This keeps everything fundamentally the same and allows for discovery of schemas as well as runtime validation (and backwards compatibility)

@lukaswelinder
Copy link
Author

@Ejb503 I'm not sure I follow what you mean.

Having the outputSchema represent the whole CallToolResult doesn't make sense because it would require every schema redundantly document that aspect of the spec up to the point of DataContent.data.

Or if you mean to only have a deep schema on the new data content type, that doesn't solve the issue of knowing output structure ahead of time.

@Ejb503
Copy link
Contributor

Ejb503 commented Apr 24, 2025

@Ejb503 I'm not sure I follow what you mean.

Having the outputSchema represent the whole CallToolResult doesn't make sense because it would require every schema redundantly document that aspect of the spec up to the point of DataContent.data.

Or if you mean to only have a deep schema on the new data content type, that doesn't solve the issue of knowing output structure ahead of time.

Yeh, like mimetype is to audiodata, schema is to jsondata. It would live inside the 'jsonData' object alongside a 'data' key. Can come up with a pr if your interested.

Every content type essentially contains metadata that self documents.

And every call toolresult has to return a content type anyway right, so output Schema in the tool (which would be optional) is just providing the expected existing content types along with metadata within the expected type.

@evalstate
Copy link
Member

evalstate commented Apr 24, 2025

I've been following the suggestions in some detail, perhaps I'm missing some nuance here but the simplest solution to me seems to be.

1/ Add output Schema to Tool, this is optional and can be any of the current content types (text, image, audio etc). 2,/ Add a new content type (data, json whatever), which has an optional schema field (similar to how audio has mimetype etc...)

This keeps everything fundamentally the same and allows for discovery of schemas as well as runtime validation (and backwards compatibility)

I think some of the nuance is the difference between defining the structure of a CallToolResult or ResourceReadResult and defining the structure of payload contents.

Both CallToolResult and ResourceReadResult can emit multiple content parts, and it is NOT known ahead of time (other than via description/documentation) what the result may contain.

Note that ReadResourceResult returns an array. This means that we already have the capability for a Resource or Resource Template (let's say my-server://some-data/customer-{1234}.json) to produce:

  • A TextResourceContents containing a mimeType of application/json for the payload
  • A TextResourceContents containing a mimeType of application/schema+json for the associated schema.

A similar capability exists for CallToolResult, where both parts could be emitted using a known URI scheme (alongside any other content appropriate for the request).

My understanding is this proposal is not aiming to add metadata to enforce this structure, but to define the content of JSON (or other) payloads before calling them.

Annotating a tool with a set of Resource URIs that advertise the location of the Schema for it's results would be useful. I do wonder how much of this is already achievable using primitives already within the specification?

It's also my understanding that this structure is typically useful for Application Developers to transform structured data prior to presentation to the LLM - if that's the case I think we should optimise the experience for that case. I would also like to clearly distinguish between defining the structure of Result Content and Payload Content as they are related, but different things.

@Ejb503
Copy link
Contributor

Ejb503 commented Apr 24, 2025

Addressing JSON Handling in Tool Results

I think some of the nuance is the difference between defining the structure of a CallToolResult or ResourceReadResult and defining the structure of payload contents.

Leaving aside ReadResource/Resource discovery and content types for now (as I don't use resources deeply in practice), let's focus on Tools.

Problem -> Using TextContent for JSON

The core problem I'm addressing is this:

Using TextContent for JSON is a common hack (employed by myself and many others). This approach misuses what should be a simple text conversation result. It leverages the fact that TextContent returns a string, and then relies on custom schemas (often passed via metadata) to achieve reliable, structured tool results. There are numerous documented instances of this practice.

Need -> Explicit JSON Content Type

We require a way to express the following in code:

callTool(params) => returns JsonContent<mySchema>

This would enable subsequent operations like:

executeCode(JsonContent)

This pattern is fundamental for my MCP clients and servers.

Current Protocol Context

In the current protocol, we primarily interact with:

  • Tool[] (the result of listTools)
  • CallToolResult<type>()

We can assume the implicit output schema for a Tool is an array of content types. The CallToolResult must return a Result containing TextContent | ImageContent | AudioContent | EmbeddedResource[]:

export interface Tool {
  /**
   * The name of the tool.
   */
  name: string;

  /**
   * A human-readable description of the tool.
   *
   * This can be used by clients to improve the LLM's understanding of available tools.
   * It can be thought of like a "hint" to the model.
   */
  description?: string;

  /**
   * A JSON Schema object defining the expected parameters for the tool.
   */
  inputSchema: {
    type: "object";
    properties?: { [key: string]: object };
    required?: string[];
  };

  // Implicit Output Schema in existing spec:
  // outputSchema: `TextContent | ImageContent | AudioContent | EmbeddedResource`[];

  // Explicit Output Schema (Proposed):
  // Allows defining specific content types, including JsonContent and stricter definitions for existing types.
  outputSchema?: Pick<`TextContent | ImageContent | AudioContent | EmbeddedResource | JsonContent`[], "type">; // Example: Enforce specific mimeTypes or JSON structures

  /**
   * Optional additional tool information.
   */
  annotations?: ToolAnnotations;
}

Result Interfaces

export interface CallToolResult extends Result {
  content: (TextContent | ImageContent | AudioContent | EmbeddedResource | JsonContent)[]; // Include proposed JsonContent

  /**
   * Whether the tool call ended in an error.
   *
   * If not set, this is assumed to be false (the call was successful).
   */
  isError?: boolean;
}

Proposed Solution: JsonContent Type

My suggestion is simply to add an explicit JsonContent type. This type would include optional, self-documenting schema information, fitting seamlessly within the existing protocol and resolving the core use case (for tools).

Proposed JsonContent Interface

export interface JsonContent {
  /**
   * Identifies the content type as JSON.
   */
  type: "json";

  /**
   * The JSON payload as a JavaScript object (or string representation).
   * Consider if this should be `unknown` or `any` for flexibility, or strictly `string` requiring parsing.
   */
  data: string; 

  /**
   * Optional JSON Schema describing the structure of the 'data' property.
   */
  schema?: JsonSchema7; // Assuming standard JSON Schema
}

Benefits

This change provides a clear, structured method for returning JSON data from tool calls. It eliminates the need for TextContent workarounds and leverages explicit schema definitions directly within the content object itself.

I don't immediately see why this would interfere with other proposed changes to the protocol or the use of resources.

@evalstate
Copy link
Member

evalstate commented Apr 24, 2025

Using TextContent for JSON is a common hack (employed by myself and many others)

The appropriate type to use for JSON is TextResourceContent with a mimeType of application/json, but it doesn't sound like we are in agreement on that?

I also personally think it's also fine to use TextContent for JSON, as it commonly used as a quick interchange format or presented directly to the LLM - but would suggest using the appropriate EmbeddedResource type where guarantees of structure are required or expected. This would apply equally to application/xml or application/protobuf.

I think your proposal above differs from the content of the PR?

To check my understanding your proposal is:

  • Add a JsonContent type with an optional JSON schema
  • Add an "outputSchema" to the Tool Definition which constrains the Resource Types returned by a Tool (but this outputSchema is NOT referring to a specific JSON schema).

@Ejb503
Copy link
Contributor

Ejb503 commented Apr 24, 2025

Using TextContent for JSON is a common hack (employed by myself and many others)

The appropriate type to use for JSON is TextResourceContent with a mimeType of application/json, but it doesn't sound like we are in agreement on that?

There is no TextResourceContent with mimeType available for a Tool result unless I'm mis-reading this: https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-03-26/schema.ts#L698

export interface CallToolResult extends Result {
  content: (TextContent | ImageContent | AudioContent | EmbeddedResource)[];

  /**
   * Whether the tool call ended in an error.
   *
   * If not set, this is assumed to be false (the call was successful).
   */
  isError?: boolean;
}

I'm specifically and only referring to Tools. As was mentioned in the threads, I think the suggestions/discussions are starting to encapsulate too many concerns.

You 'could' create a resource as a side effect of a tool execution and then return it through the spec, but that is convoluted in my opinion and certainly not how I implement tool execution flows. And this would also makes the result of the CallTool redundant which seems strange.

To check my understanding your proposal is:

Add a JsonContent type with an optional JSON schema
Add an "outputSchema" to the Tool Definition which constrains the Resource Types returned by a Tool (but this outputSchema is NOT referring to a specific JSON schema).

Correct, although it doesn't have to constrain it can simply describe, it is implied by the spec already we just make it explicit, this serves as discovery.

@evalstate
Copy link
Member

@evalstate I saw your "ToolAnnotations are an appropriate place" comment but I'm not sure I understand it. Were you just saying that given an output schema (somewhere) in the tool definition,

I think it's part of my earlier question:

Another area I'd like to see considered is what the cascading effect to the MCP SDKs for developers. It feels that this could change Tool definition and consumption APIs significantly, placing extra burden upon them.

I'm missing the "and then what?" bit here. Is the aim that the Client SDK throws an exception if it gets content that mismatches the schema? Or is the main goal here to open up an LLM Context bypassing side-channel between Client and Server?

@bhosmer-ant
Copy link
Contributor

@evalstate I saw your "ToolAnnotations are an appropriate place" comment but I'm not sure I understand it. Were you just saying that given an output schema (somewhere) in the tool definition,

I think it's part of my earlier question:

Another area I'd like to see considered is what the cascading effect to the MCP SDKs for developers. It feels that this could change Tool definition and consumption APIs significantly, placing extra burden upon them.

I'm missing the "and then what?" bit here. Is the aim that the Client SDK throws an exception if it gets content that mismatches the schema? Or is the main goal here to open up an LLM Context bypassing side-channel between Client and Server?

The former. But after our discussion on #415, I think I've come around to the feeling that EmbeddedResource isn't the right concept for simple, ephemeral structured results that require no additional descriptive information. They're not really Resources in the sense that MCP defines them, regardless of whether we make it clear that in some contexts persistence is not required. I think DataContent is a better fit for this kind of result, and sufficiently different from the notion of a Resource that it doesn't feel redundant to me.

@evalstate
Copy link
Member

evalstate commented Apr 30, 2025

Here's a Client/Server pair based on the MCP specification and SDK that uses TextContentResources for schema specification and Tool Call results validation before coercion to a Pydantic model ready for presentation to an LLM:

Server:

app = FastMCP(name="structured weather server")

@app.tool(name="check_weather", description="Gets the weather as JSON.")
def check_weather(location: str) -> list[EmbeddedResource]:
    result: str = Template(WEATHER_TEMPLATE).substitute(location=location)
    return EmbeddedResource(
        type="resource",
        resource={"mimeType":"application/json",
        "uri":f"my-mcp://check_weather/{location}",
        "text":result}
    )

@app.resource("my-mcp://schema/tools/check_weather/schema.json",
              mime_type="application/schema+json")
def get_schema() -> str:
    return WEATHER_SCHEMA

if __name__ == "__main__":
    app.run(transport="stdio")

Client:

async with ClientSession(read_stream, write_stream) as session:
    # Initialize the session
    await session.initialize()

    schema_resource: ReadResourceResult = await session.read_resource("my-mcp://schema/tools/check_weather/schema.json")
    schema =  json.loads(schema_resource.contents[0].text)

    ### LLM (Assitant) has stopped for tool_use
    tool_result: CallToolResult = await session.call_tool("check_weather", {"location": "London"})
    payload = tool_result.content[0].resource.text
    validate(instance=json.loads(payload),schema=schema)

    structured = WeatherResult.model_validate_json(payload)
    ### Send new User message to LLM (Tool Result)
    print(f"Weather in {structured.location}: {structured.conditions}")

Full code with schema template here: https://gist.github.com/evalstate/e49cb163297c1ab940fb8a98e31947ed

Could the Resource usage be better? Certainly - it would be improved if I maintained a template resource that responded with the current weather, the uri scheme should probably include a time component. But this is 100% in spec, today.

I think we need concentrate on Developer experience between the "tool_use" stop and the subsequent presentation of content to the LLM.

If I control the Host/Client/Server triple, I already have/control everything I need, although there might be some "nice to haves" like having the schema resource reference attached to a Tool Annotation. But in that case, I don't think there's anything necessary in this PR for me.

What's interesting is if I am a Host/Client developer where people can add unknown MCP Servers - which I think this is motivating the PR... In that situation, the things I really care about are:

  • uri. Is this content recognisable as part of some scheme? MCP has guidance over https://, file:// and git:// already, but there may be other schemes I recognise and can do something useful with.
  • mimeType. Is this content tokenizable for my LLM? If it's a text type I'll probably hand it straight over - otherwise check whether my LLM can handle say application/pdf and construct a message accordingly. Failing that I could perhaps spool it to disk and replace it with a message for the LLM saying "we got content but couldn't handle it, saved to a file 'foo.unknown'".

The structured-ness will come from the mimeType (e.g. application/json is by definition structured). The payload schema I can derive from either from knowing the URI scheme in advance, or [some other mechanism described in this PR].

My opinion is that having all the Client SDKs do automatic validation against schemas will add a large burden to interoperability testing, as my experience is that validation libraries can be sensitive to small variances.

I think it would be helpful to base this discussion on how e.g. this code would be enhanced by the proposed changes to make sure we get the ergonomics right.

(update) I do also think we need to encourage the use of these MCP features rather than make people fearful of them - I genuinely think people are avoiding them because of a lack of playbook examples like the one above.

@patwhite
Copy link
Contributor

patwhite commented May 1, 2025

Great example here, this definitely helps me see how you're approaching this.

If I control the Host/Client/Server triple, I already have/control everything I need, although there might be some "nice to haves" like having the schema resource reference attached to a Tool Annotation. But in that case, I don't think there's anything necessary in this PR for me.

This, to me, is sort of the crux of this issue, as well as the long long discussion we've been having around the tools/search capability. If you control the triple, you don't need MCP - you can have the LLM format tool calls in a way your app can consume, MCP is completely superfluous. You can expose multi-turn search protocols because, well, you just hard code the logic in.

What's interesting is if I am a Host/Client developer where people can add unknown MCP Servers - which I think this is motivating the PR... In that situation, the things I really care about are:

uri. Is this content recognisable as part of some scheme? MCP has guidance over https://, file:// and git:// already, but there may be other schemes I recognise and can do something useful with.

I could be wrong here, but all three schemas you mention here offer no insight into the content or how to interpret it, these are all just protocols. They might tell you how to get the data, but a git repo can have anything checked into it.

mimeType. Is this content tokenizable for my LLM? If it's a text type I'll probably hand it straight over - otherwise check whether my LLM can handle say application/pdf and construct a message accordingly. Failing that I could perhaps spool it to disk and replace it with a message for the LLM saying "we got content but couldn't handle it, saved to a file 'foo.unknown'".

Tokenizable by the llm is important, but it's not the whole story when it comes to data returns. WHAT the fields in the structured data mean is what's important. So, I don't think the assertion that uri and mimetype is all you care about as the client / host isn't quite accurate.

When I think about MCP, I think of it as a protocol for backend developers to expose robust capabilities to hosts / clients when they have ZERO control over them, and when the clients / hosts in turn have zero a priori knowledge of the server (besides what they can consume in a reasonable time and with a reasonable context limit upon connection). So in your example, if you control the entire stack, you wouldn't pull a schema from a resource, you'd embed it in the host / client or just have it at a url you can call.

I didn't work on this PR at all, but as I internalized this proposal, this is fundamentally a mechanism to explain structured data to the LLM in a situation where you have zero control over the host / client. It has an added benefit of being great for server developers to type and structure check the output as it leaves the server, and potential client validation, but the much more important part is the context being added to the data for the benefit of the model. The context being added by the uri or the mime type field just isn't super helpful when it's arbitrary json data that a server is returning as part of a tool call.

To put it another way, If you don't control the host / client, there is NO mechanism in the protocol to tell the client to do what you're proposing up above (grab the schema from resources and hand the schema plus tool return to the model). This is an exact echo of the multi-turn search idea that got floated in the other thread - if you don't control the full stack, the spec as it exists today has no mechanism to explain to the client or host that it needs to chain two calls together to achieve a goal. There are a bunch of ways you COULD do it, you can hope that the client / host understands it should re-pull tools if a tool notification comes right after a tool call, but none where you have guarantees as the backend that the client / host will actually do what you are hoping. So, your comment above focuses on the validation aspect, but (in my opinion) much more important are the descriptions being provided to the model to explain the return.

So, I don't think this can be condensed to an ergonomics discussion - if we just take the server side of your code, and expose it to 10 different client / host / models combinations in a world where you don't control the full stack, you're going to get 10 different treatments of the data, where I'd say 9 of the 10 wouldn't pull the resource to then to augment the return data when handed to the model, and there is NOTHING we could do to change that. There are no hints, the tool annotations don't cover this type of thing, etc, and that's where I see this proposal fitting in.

BTW - Your aside about how to you could use a resource template to achieve the same result here is SUPER interesting!! I never would have thought of doing that. So, I actually coded up a server to try this out, and exposed it to claude and....claude doesn't seem to crawl resource templates 🤦‍♂️. Haha, I don't know whose point that really proves, you're right that there's hit or miss adoption of these various components, but then I'm right in that as a backend dev, the only thing I can pretty much guarantee a host / client will implement these days is simple tool calls, so it feels like we should really be blowing out the tool capabilities with features like this proposal and the tool/search, since that's the main feature area the client / host devs seem to be implementing. Really fun little experiment, thanks for the idea there!

@evalstate
Copy link
Member

MCP aims to solve (or at least improve) the MxN problem. If you have dozens of unknown Servers, and dozens of unknown Clients the best way to coordinate structure is with a shared uri scheme - hence the reason why Resources are designed and documented as they are.

This gives the flexibility for general clients to do generally useful things, and more specific clients to use structured data as they see fit to add value.

The tokenization aspect is important as that is the entire reason for using MCP. OpenAPI etc. have better answers to these problems if your target isn't specifically an LLM.

Changes to the Protocol that force SDK implementors and Host/Client developers to do things based on individual use-cases should generally be avoided, as MCP is intended to be used in incredibly diverse scenarios.

@lukaswelinder
Copy link
Author

I haven't had time to put towards this since my last update to the PR. I'll be reviewing the discussion and following up with my thoughts over the weekend.

@sambhav
Copy link
Member

sambhav commented May 5, 2025

@sambhav here 👋 I have been very interested in this subject since the inception of MCP and it is great to see so much enthusiasm on this topic.

Hi all,

This RFC (#356) is a valuable step towards standardizing structured data handling in MCP tool results.

Reflecting on the proposal and discussion, it appears tool outputs generally fall into three distinct primary use cases:

  1. Predetermined Structured JSON Output: Tools acting like functions that return a single JSON structure whose schema is fixed and known statically (e.g., returning a UserProfile). This mirrors the structured inputSchema.
  2. Statically Constrained Content Types: Tools returning other specific content types (like images, audio, or specific resource types) where the server wants to statically define constraints about what can be returned (e.g., "this tool only returns PNG or JPEG images").
  3. Dynamic or Flexible Output: Tools where the output isn't a single static structure. This includes multi-part responses, files, binary data, simple text, or structured data (JSON/XML) where the schema is dynamic or only determined at runtime (e.g., database query results).

Critique of the RFC's output.content-based Approach

The RFC's proposal (using the detailed Tool.output.content block and DataContent) attempts to address all three use cases within a single, unified structure. While comprehensive, this approach introduces challenges:

  • Complexity & Asymmetry (Impacts Use Case 1): For the common case of returning simple, static JSON (Use Case 1), the configuration becomes significantly more complex than the direct inputSchema, feeling asymmetric given MCP's JSON-RPC foundation and structured JSON inputs.
  • Conflated Concerns: It bundles the distinct needs of static JSON definition (Use Case 1), static constraints for other content types (Use Case 2), and handling dynamic outputs (Use Case 3) into one mechanism.
  • Imposing Static Structure on Flexible Content (Impacts Use Case 2 & 3): The output.content block attempts to define static constraints (like lists of allowed mimeTypes) even for content types that are often dynamic or where runtime type information (via EmbeddedResource.mimeType) might be sufficient. This adds complexity where flexibility is often needed.

Proposal: Dedicated Mechanisms per Use Case (Backward Compatible)

I propose a simpler, backward-compatible approach that provides distinct mechanisms tailored to these use cases, focusing initially on solving Use Case 1 cleanly while providing a path for Use Case 3 and explicitly deferring Use Case 2.

  • For Use Case 1 (Static JSON): Introduce mode: 'value' with a static valueSchema. This directly addresses the JSON symmetry issue.
  • For Use Case 3 (Dynamic/Flexible): Use mode: 'content', relying on the existing CallToolResult.content types (TextContent, ImageContent, AudioContent, ResourceContent). Use EmbeddedResource.annotations within ResourceContent to provide optional hints about dynamic schemas.
  • For Use Case 2 (Static Constraints on Other Content): Acknowledge this need but defer its solution to a separate, subsequent proposal (see Section 3).

1. Mechanism for Use Case 1: mode: 'value' (Static JSON Output)

Addresses the JSON symmetry requirement directly.

  • Tool.output changes:
    • Add mode: 'value' | 'content' flag.
    • Add valueSchema: object | string (Required when mode === 'value'). This defines the static schema.
    • (No complex content block needed for this mode).
  • CallToolResult changes:
    • Add mutually exclusive value: any field. Used only when mode === 'value'. Contains the object matching valueSchema.
// Tool Definition Snippet for UC1
output?: {
  mode?: 'value' | 'content';
  valueSchema?: object | string; // Static JSON schema definition
};

// CallToolResult Snippet showing separation
export interface CallToolResult extends Result {
  value?: any; // Used if mode=='value' (static JSON)
  content?: (TextContent | ImageContent | AudioContent | ResourceContent)[]; // Used if mode=='content' (flexible)
  isError?: boolean;
}
  • Example (get_user_profile): Tool defines mode: 'value', valueSchema. Result has value: { /* UserProfile object */ }.

2. Mechanism for Use Case 3: mode: 'content' (Dynamic/Flexible Output)

Leverages existing types and optional hints for flexibility.

  • Tool.output changes:
    • Set mode: 'content' (or default).
    • (No valueSchema, no complex output.content block needed). The Tool.description should explain the general nature of the output parts.
  • CallToolResult changes:
    • Uses the mutually exclusive content: (TextContent | ImageContent | AudioContent | ResourceContent)[] field. (Type remains fully compatible).
  • ResourceContent / EmbeddedResource Role: Handles files, binaries, multi-part, and dynamically schematized structured data (JSON/XML). EmbeddedResource.annotations provide optional runtime hints about schema/type.
// EmbeddedResource Reminder
export interface EmbeddedResource {
    uri: string; mimeType?: string; text?: string; data?: string; description?: string;
    annotations?: Annotations; // Optional hints like mcp:schemaLocation, mcp:schemaType
}
  • Example (query_database with dynamic schema + hints): Tool defines mode: 'content'. Result has content: [ { type: "resource", resource: { ..., mimeType: "application/json", text: "[{...dynamic...}]", annotations: { "mcp:schemaType": "DynamicResult" } } } ].

3. Addressing Use Case 2 (Static Constraints for Other Content) - Deferred

This proposal deliberately separates and defers the solution for Use Case 2.

Defining static constraints for non-JSON content types returned via mode: 'content' (e.g., "this tool only returns PNG images via ResourceContent") is recognized as a potentially valuable feature.

However, from an implementation perspective, this is a separate problem from achieving static JSON I/O symmetry (Use Case 1). Many current MCP servers (https://github.com/modelcontextprotocol/servers) wrap APIs where solving JSON symmetry provides immediate, high value.

Therefore, this proposal focuses on solving Use Case 1 (value/valueSchema) now and providing a flexible mechanism for Use Case 3 (content/ResourceContent). We propose tackling Use Case 2 (static constraints for non-JSON content) in a separate, subsequent discussion or RFC enhancement. This could involve enhancing ResourceContent definitions or selectively re-introducing parts of the RFC's output.content logic focused only on constraining Image/Audio/ResourceContent types, without conflating it with the primary value/valueSchema mechanism for static JSON. This phased approach addresses the most immediate need first.

4. Backward Compatibility

This proposal remains backward compatible (optional fields, existing types in content array unchanged).

5. Example Usage

(Examples for mode: 'value' [static JSON] and mode: 'content' [flexible, potentially with dynamic JSON + hints in annotations] remain the same as presented earlier).

Benefits & Tradeoffs Summary:

  • Benefit - Solves Use Case 1 Cleanly: Provides symmetry and static typing for predictable JSON returns.
  • Benefit - Provides Flexible Path for Use Case 3: Uses existing types (ResourceContent) with optional hints (annotations) for dynamic outputs.
  • Benefit - Simplifies Definitions: Avoids the RFC's complex output.content block for Use Case 1 and acknowledges the dynamic nature of Use Case 3.
  • Benefit - Clear Separation of Concerns: Explicitly addresses Use Case 1 (static JSON) now and defers Use Case 2 (static constraints for others).
  • Benefit - Backward Compatible.
  • Tradeoff - Use Case 2 Deferred: Intentionally does not provide a mechanism yet for statically defining constraints (like allowed mimeTypes) for Image/Audio/ResourceContent when returned in content mode.

While I've proposed a specific shape (mode, value, valueSchema), I'm not strictly tied to this exact implementation.

My main goal is to encourage acknowledgement of the different use cases for tool outputs. It's crucial that we optimize for the prevalent scenario (Use Case 1) where MCP servers wrap pre-deterministic request/response APIs that inherently have structure, as seen in many existing servers (https://github.com/modelcontextprotocol/servers).

Providing a clear, symmetric way to define these structured (especially JSON) outputs is important, while also ensuring the chosen mechanism effectively communicates the expected output structure to the LLM for better understanding and utilization.

Addressing this primary use case first, while allowing flexibility for dynamic outputs and deferring static constraints for other content types, seems like a pragmatic path forward.

Looking forward to feedback on this perspective.

@sambhav
Copy link
Member

sambhav commented May 5, 2025

Note: having a static output schema (similar to input schema) with descriptions at a tool/server level would also help with features like #469

@bhosmer-ant
Copy link
Contributor

Hi all - thoughts after reviewing the discussion so far:

  • first off, it's worth calling out the effort and thought that the community is putting into thinking through this functionality - it's really amazing. A huge thanks to everyone who's taking the time to participate in the discussion, and to @lukaswelinder for the original proposal.
  • it seems pretty clear that full support of the use cases being considered here (i.e., additions that can describe the full space of CallToolResult.content arrays) brings in a significant amount of complexity, along with many design considerations and subtleties that are disjoint from the use case that RFC: add Tool.outputSchema and CallToolResult.structuredContent #371 originally aimed to address: e.g. this discussion has ongoing threads concerning cardinality and sequencing of content blocks; whether add a new DataContent type for structured data, or use EmbeddedResource (see also Update guidance for use of Embedded Resources in CallToolResult #415), among other things.
  • on the other hand, RFC: add Tool.outputSchema and CallToolResult.structuredContent #371's target use cases (structured result data that can be described completely with a single JSON schema) can be supported much more simply and disjointly. The entanglement of the two sets of use cases is not fundamental - it's due solely to the choice of using CallToolResult.content array in both cases.

At this point I think it's clear that disentangling the two use cases makes sense, since it allows the support for the #371 use case to be truly lightweight, and lets it proceed quickly, without constraining the support of the the use cases being discussed here or rushing its design.

To that end I've updated #371 to provide support for the structured data use case in a disjoint way - details over there. This decouples the two, and should also simplify the eligible designs here, by eliminating the need to support strictness (unless it's desirable for other reasons).

@000-000-000-000-000
Copy link
Contributor

This PR is cool -- i was wondering though, do we have any ideas of how users will actually pass output schema to models. All of the model APIs I've seen do not take an output schema in tools... examples:

  1. https://platform.openai.com/docs/guides/function-calling?api-mode=responses#defining-functions
  2. https://docs.anthropic.com/en/api/messages#body-tools
  3. https://ai.google.dev/gemini-api/docs/function-calling?example=meeting

@patwhite
Copy link
Contributor

patwhite commented May 9, 2025

This PR is cool -- i was wondering though, do we have any ideas of how users will actually pass output schema to models. All of the model APIs I've seen do not take an output schema in tools... examples:

  1. https://platform.openai.com/docs/guides/function-calling?api-mode=responses#defining-functions
  2. https://docs.anthropic.com/en/api/messages#body-tools
  3. https://ai.google.dev/gemini-api/docs/function-calling?example=meeting

When the client gets a response that’s been annotated like this, you can just hand the raw json and the output schema to the model to basically just add additional context to the json, and you can optionally do client level validation. But mostly it’s for getting field level metadata to the model.

@000-000-000-000-000
Copy link
Contributor

you can just hand the raw json and the output schema to the model to basically just add additional context to the json

True, but if you are using the APIs with built-in function calling like the ones above, it might be odd to put output schema directly in prompt while input schema and other elements are going in function calling. Not sure 1/ how would this look in the end when model sees all the tokens 2/ how accuracy would be affected.

@evalstate
Copy link
Member

you can just hand the raw json and the output schema to the model to basically just add additional context to the json

True, but if you are using the APIs with built-in function calling like the ones above, it might be odd to put output schema directly in prompt while input schema and other elements are going in function calling. Not sure 1/ how would this look in the end when model sees all the tokens 2/ how accuracy would be affected.

I can think of a couple of scenarios where this might make sense, but they are few and far between and would probably be better solved either through more direct prompting, or programmatically. The schema doesn't have special meaning for the model any more than any other tokens.

These outputs make the most sense when the Host application developer knows the shape of the content, allowing them to transform it. In which case the schema is useful to validate, and to share earlier with an identifier so the data can be coerced in to the right shape (f.x. a pydantic model).

@Ejb503
Copy link
Contributor

Ejb503 commented May 9, 2025

This PR is cool -- i was wondering though, do we have any ideas of how users will actually pass output schema to models. All of the model APIs I've seen do not take an output schema in tools... examples:

  1. https://platform.openai.com/docs/guides/function-calling?api-mode=responses#defining-functions
  2. https://docs.anthropic.com/en/api/messages#body-tools
  3. https://ai.google.dev/gemini-api/docs/function-calling?example=meeting

When the client gets a response that’s been annotated like this, you can just hand the raw json and the output schema to the model to basically just add additional context to the json, and you can optionally do client level validation. But mostly it’s for getting field level metadata to the model.

The latest version of the Google APIs do include this (not released yet).

/** Structured representation of a function declaration as defined by the [OpenAPI 3.0 specification](https://spec.openapis.org/oas/v3.0.3). Included in this declaration are the function name, description, parameters and response type. This FunctionDeclaration is a representation of a block of code that can be used as a `Tool` by the model and executed by the client. */
export declare interface FunctionDeclaration {
    /** Optional. Description and purpose of the function. Model uses it to decide how and whether to call the function. */
    description?: string;
    /** Required. The name of the function to call. Must start with a letter or an underscore. Must be a-z, A-Z, 0-9, or contain underscores, dots and dashes, with a maximum length of 64. */
    name?: string;
    /** Optional. Describes the parameters to this function in JSON Schema Object format. Reflects the Open API 3.03 Parameter Object. string Key: the name of the parameter. Parameter names are case sensitive. Schema Value: the Schema defining the type used for the parameter. For function with no parameters, this can be left unset. Parameter names must start with a letter or an underscore and must only contain chars a-z, A-Z, 0-9, or underscores with a maximum length of 64. Example with 1 required and 1 optional parameter: type: OBJECT properties: param1: type: STRING param2: type: INTEGER required: - param1 */
    parameters?: Schema;
    /** Optional. Describes the output from this function in JSON Schema format. Reflects the Open API 3.03 Response Object. The Schema defines the type used for the response value of the function. */
    response?: Schema;
}

/** A function response. */
export declare class FunctionResponse {
    /** The id of the function call this response is for. Populated by the client
     to match the corresponding function call `id`. */
    id?: string;
    /** Required. The name of the function to call. Matches [FunctionDeclaration.name] and [FunctionCall.name]. */
    name?: string;
    /** Required. The function response in JSON object format. Use "output" key to specify function output and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as function output. */
    response?: Record<string, unknown>;
}

The latest version of the Google GenAI SDK does include output schemas embedded into function declarations. I expect all major providers to follow suit and standardization is probably, the MCP should definitely lead the charge here.

@evalstate
Copy link
Member

evalstate commented May 9, 2025

It's a really cool feature and actually looks very similar to Anthropics "JSON Mode" which uses the Tool Calling coercion to achieve same https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#json-mode.

I think this discussion was about CallToolResults - and normally we'd assume they are deterministic as they are coming from an MCP Server to the LLM.

I agree MCP should absolutely lead the charge - which is why I think we need to let people know that they can achieve a mime type safe, validated solution today with a couple of lines of code :) Being able to specify an outputSchema uri on a Tool definition makes it better.

@patwhite
Copy link
Contributor

patwhite commented May 9, 2025

I can think of a couple of scenarios where this might make sense, but they are few and far between and would probably be better solved either through more direct prompting, or programmatically. The schema doesn't have special meaning for the model any more than any other tokens.

The schema itself doesn't, the natural language components of the schema do. This isn't a weird one off use case, any json return that doesn't have highly readable property names will benefit from this. Hahaha, like basically every single one of our discussions, this comes back to enterprise use cases - if enterprises directly expose existing json returns (which can be horrific), this is a mechanism to provide natural language explanation of the output to a model. And ya, with the Google work, just makes more sense. If there's a need to describe the input schema in natural language to the model, there's a need to describe the output schema.

I agree MCP should absolutely lead the charge - which is why I think we need to let people know that they can achieve a mime type safe, validated solution t

I still haven't seen how this clicks together - protocols don't describe content, mime doesn't have a mechanism to describe arbitrarily shaped data. Those are the two mechanisms in a resource. You might be able to do something bespoke with an output schema uri, but that dance (if you see uri X, download the schema and hand to the model) needs to get defined at the protocol level.

@evalstate
Copy link
Member

Yep - don't think we're disagreeing. MCP is "different" because as a Host application you have the "predictable" MCP Server side, and then the "fuzzy" LLM side. Throwing a schema at LLM is absolutely context, and it will make inferences given a schema... but it doesn't "understand" the schema in the same way a traditional program does :) An MCP Server author can return pretty much anything textual and hand it over to the LLM.... and stuff happens!

Yep - there's a couple of small bits missing from the picture. The mime type does tell you the shape of the data though - if you get application/json or it's schema type, then you know what to expect (non JSON content in that case would be an error). I think you made the point earlier in this thread that the application developer needs to know the shape of the content to do something useful - that's the problem to solve. Anyway, I suspect this is probably better shifted to discord where we can natter :)

@davemssavage
Copy link

Hi there enthusiastic newbie here, I have been playing around with MCP and this feature seems extremely useful for the usecases I'm considering, hence I took a pass at implementing it on the Python SDK see modelcontextprotocol/python-sdk#685

I'm experimenting with this patch to see if it does work as I expect, so far so good.

I note #356 (comment) refers to set of related RFCs so if some/part of this merge request can be directed towards that let me know I'll happily take a pass at it.

@bhosmer-ant
Copy link
Contributor

bhosmer-ant commented May 15, 2025

Thanks again @lukaswelinder for working on this proposal, and to everyone who participated in the discussion. However, the ROI of its complexity cost + new content type (see discussion here and upthread) makes it difficult to justify adding to the protocol at present, while sticking to the goal of keeping MCP as simple as possible but no simpler.

If sufficient real-world need arises that isn’t satisfied by either simple structured output (for function-call-like tools) or EmbeddedResources (for richer results that include structured content), we can revisit, but closing for now.

@davemssavage
Copy link

davemssavage commented May 16, 2025

@bhosmer-ant is there appetite for retaining the outputSchema part of this RFC as a separate new proposal? That's the particular part of this RFC that seems most useful to me. I get the rationalisation for not extending the number of content types, and in fact when implementing the patch this was the part that would have introduced a breaking change so I think it's a good decision to just rely on primitive return types.

However being able to check the output schema before calling the tools is useful to rationalise beyond the input schema and the description what the tool is likely to return so I expect this to lead to better tool selection by agents. I'm happy to maintain my python branch along these lines so this hypothesis can be test at scale. Though at the moment I don't have the resources to run an exaustive test of this theory so it would probably require some community effort to test this hypothesis.

Perhaps if the outputSchema behaviour was opt in (from the client) and optional for the tool provider that would allow client code that wanted to use the output schema to make a decision to do so and also not force tool vendors to support it introducing breaking changes.

@bhosmer-ant
Copy link
Contributor

@bhosmer-ant is there appetite for retaining the outputSchema part of this RFC as a separate new proposal?

@davemssavage it sounds like #371 may be what you're looking for - it supports output schemas more or less as you're describing, but only in the context of results that are single JSON objects, rather than components of the content array.

@davemssavage
Copy link

davemssavage commented May 16, 2025

Ok great, yes that's what I'm looking for, if helpful I'm happy to rework my merge request for the python sdk along these lines.

This brings in a question on process to my mind as I've not been involved in this community before. I infer that an RFC being merged means its an accepted part of the protocol. Is that a correct assumption? If it is should there be some tasks to track documentation updates or sdk updates which I could associate the merge request with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.