-
Notifications
You must be signed in to change notification settings - Fork 1.2k
RFC: add tool outputSchema and DataContent type to support structured content
#356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: add tool outputSchema and DataContent type to support structured content
#356
Conversation
|
I think this goes in the right direction, can't help but feel that JsonSchema is the correct standard for the schema property. Data MUST be valid Json object Proper implementation then allows runtime validation with common libraries and standard implementation. Just need the APIs to support full json schema next but one battle at a time. |
|
With this, we can help the LLM better understand the result. @lukaswelinder BTW, I think the MCP server should provide a JSONSchema query tool too. |
Agree with @Ejb503 here but want to raise some additional considerations based on the current response structure:
Let's assume that it's not desirable to overhaul the An off-the-cuff proposal for metadata could be: NOTE: Omission of any content key could either imply an expected quantity of "none" or "multiple" (the default expectation today). |
|
@briancripe I think that the design I've put forward largely addresses the points you made. With regards to my notes on the
In all cases, whatever the schema is, it just applies to the
Edit: I misread. Currently no. I don't think it's really something worth worrying about. Between this design and using @He-Pin Can you elaborate on what you mean by providing a JSON Schema query tool? I feel like that isn't really in scope of MCP. I suppose that with the ability to use |
|
That is out of the scope of this PR, I mean, an MCP server should provide a JSON schema query tool out of the box, but anyway, the client will get all the schema when listing tools with this Thanks for this. |
|
PR looks great!
Should JsonContent be named DataContent object? JsonContent seems too specific. This one is probably only a naming difference. As a reference: https://google.github.io/A2A/#/documentation?id=structured-output, https://github.com/google/A2A/blob/main/specification/json/a2a.json#L505 |
@siwachabhi I actually raised that point in #97, and I think you're right, renaming it to |
outputSchema and JsonContent type to support structured contentoutputSchema and DataContent type to support structured content
outputSchema and DataContent type to support structured contentoutputSchema and DataContent type to support structured content
|
I proposed something very similar to this in this discussion, but I agree with some other comments in this thread:
This thread also brings a question for which I find myself having mixed feelings: Should structured data responses like Json and XML be treated as Text responses? Or should we use Text responses for natural language and add a brand new type of response for structured |
|
@marianogonzalez I think there is some overlap there with the discussion in #223 regarding content types.
I'm not sure how I feel about this. I build out this implementation in a way that tried to minimize scope and impact, but I am inclined to agree with your first point. I think the alternative approach would be to extend the
The other aspects of my proposal would remain the same. That said, it feels a bit weird to be using JSONRPC and having structured JSON encoded as a string text field. Potential performance/bandwidth impact aside, it means that clients that intend to use the content as needed would need to deserialize it first.
As for your second point, I feel like that is kind of a separate problem that I'd rather not concern the scope of this RFC with. If I'm understanding correctly, you want to have a way to explicitly define the whole of the |
|
@lukaswelinder hey, so here's my attempt to summarize the differences between this proposal and #371. TLDR I think they've got two different categories of use case in mind, but AFAICT there's only a couple of spots where they actually clash, and I think we have a couple of options for serving both sets of goals in a single proposal. The main priorities that drove the design in #371 were: Strictness: a client should be able to examine the output schema a tool declares ahead of time, and the schema should be binding (non-overridable). This is important for both interacting with tools from untrusted servers (both at tool selection and result validation time), and for composing tool calls in code. Simplicity: for for tools that essentially wrap function calls, schematizing the tool's "return type" should ideally be In contrast, I read #356 as prioritizing expressiveness: being able to make use of schematized results in the full space of possible So it's a matter of figuring out how to serve both use cases with one setup. I think the pain points for function-wrapper use cases with #356 in its current form are as follows: Strictness - this is the only genuine conflict, I think. In #356 currently (IIUC)
Off the top of my head I don't see a way for (Oh btw, I think Simplicity - there are a couple minor things here but they don't seem difficult to resolve, one way or another.
Anyway, definitely interested in how you see the choice space here. (Apologies for the length of this, felt like it was worth backfilling the context though.) |
|
@bhosmer-ant Thanks for the detailed writeup, you make some great points. I agree that there are merits to both approaches. I took the route of a more expressive solution to allow for greater flexibility, but with all abstractions come drawbacks. From what I gather, to bridge the gap to the design decisions you put forward with #371 we want to:
Yeah, my mistake. I'll correct that.
I sort of touched on this in my last paragraph here. I think that's sensible and aligned with the current Enforcing a single block response feels like an arbitrary restriction without much gain. Having the
Not sure I follow here. I'm assuming it's along the same lines as the first point (as in you could have
I would regard this as an advanced use case that most will not use, but there are a few examples where it would be useful:
I see your point about it being non-binding from the perspective of not explicitly knowing the whole results of a tool before receiving them. I think this is a separate concern though. Having
Not sure I follow exactly; do you mean have I think I have an approach that may satisfy both. Proposal
I believe that this approach maintains expressive flexibility while also supporting unstructured data as needed. Thoughts? |
Sure, these are all well-motivated features in service of expressiveness. The issue is that "function call" tools need strictness: an output schema that describes the exact, complete structure of the tool call result (i.e. analogous to a function's return type). So, single-block enforcement isn't a goal per se, it's just the only (non-baroque) way to apply an outputSchema to an entire result, in a predictable way. Same for schema overrides in DataContent - not an issue in and of themselves, only that they provide an escape hatch that would let results violate the strictness requirement. So yeah, strictness is in direct tension with expressiveness, and both are well-motivated, so we need to find a way to express both. My "outputSchema is strict, DataContent.schema is (disjoint and) expressive" idea was just the simplest/least intrusive way I could think of. But like you say, it has the drawback that the tool description now gives no information about DataContent in the expressive case. Other obvious options: two output schema properties, one for each case, along with the meta-constraint that they be mutually exclusive - But before going further down the path of possible solutions, I want to make sure we're walking around on the same landscape - it makes sense why we'll need strict (rather than expressive) validation for function-call-like tool results, right? |
@bhosmer-ant Yeah, I get that. I was aiming for an approach that didn't operate in conflict of or put special case limitations on the multipart content pattern, but I do agree that having a strict/explicit way to dictate expected output is valuable. I was kind of hoping not to have to get into the weeds of dictating the overall
I don't think either of these are bad ideas. I would probably be more in favor of just having a flag instead of potentially confusing mutually exclusive properties. I don't recall where it was brought up, but we may consider a broader approach to dictating the tool's output. Something like this: interface Tool {
/* other tool fields */
output: {
content: {
data: {
// Whether the tool returns a single or multiple `content` entries of this type
count?: 'single' | 'multiple';
description: string;
// Whether the tool's output must match the schema exactly, or if it can be overridden by `DataContent.schema`
strict: boolean;
schema: {
type: 'object';
properties: {
my_property: { type: 'string' };
};
};
/* other content types */
};
};
};
}I'll need to think on this a bit. |
|
I've been following the suggestions in some detail, perhaps I'm missing some nuance here but the simplest solution to me seems to be. 1/ Add output Schema to Tool, this is optional and can be any of the current content types (text, image, audio etc). This keeps everything fundamentally the same and allows for discovery of schemas as well as runtime validation (and backwards compatibility) |
|
@Ejb503 I'm not sure I follow what you mean. Having the Or if you mean to only have a deep schema on the new data content type, that doesn't solve the issue of knowing output structure ahead of time. |
Yeh, like mimetype is to audiodata, schema is to jsondata. It would live inside the 'jsonData' object alongside a 'data' key. Can come up with a pr if your interested. Every content type essentially contains metadata that self documents. And every call toolresult has to return a content type anyway right, so output Schema in the tool (which would be optional) is just providing the expected existing content types along with metadata within the expected type. |
I think some of the nuance is the difference between defining the structure of a CallToolResult or ResourceReadResult and defining the structure of payload contents. Both Note that
A similar capability exists for CallToolResult, where both parts could be emitted using a known URI scheme (alongside any other content appropriate for the request). My understanding is this proposal is not aiming to add metadata to enforce this structure, but to define the content of JSON (or other) payloads before calling them. Annotating a tool with a set of Resource URIs that advertise the location of the Schema for it's results would be useful. I do wonder how much of this is already achievable using primitives already within the specification? It's also my understanding that this structure is typically useful for Application Developers to transform structured data prior to presentation to the LLM - if that's the case I think we should optimise the experience for that case. I would also like to clearly distinguish between defining the structure of Result Content and Payload Content as they are related, but different things. |
Addressing JSON Handling in Tool Results
Leaving aside Problem -> Using
|
The appropriate type to use for JSON is I also personally think it's also fine to use TextContent for JSON, as it commonly used as a quick interchange format or presented directly to the LLM - but would suggest using the appropriate EmbeddedResource type where guarantees of structure are required or expected. This would apply equally to I think your proposal above differs from the content of the PR? To check my understanding your proposal is:
|
There is no TextResourceContent with mimeType available for a Tool result unless I'm mis-reading this: https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-03-26/schema.ts#L698 export interface CallToolResult extends Result {
content: (TextContent | ImageContent | AudioContent | EmbeddedResource)[];
/**
* Whether the tool call ended in an error.
*
* If not set, this is assumed to be false (the call was successful).
*/
isError?: boolean;
}I'm specifically and only referring to Tools. As was mentioned in the threads, I think the suggestions/discussions are starting to encapsulate too many concerns. You 'could' create a resource as a side effect of a tool execution and then return it through the spec, but that is convoluted in my opinion and certainly not how I implement tool execution flows. And this would also makes the result of the CallTool redundant which seems strange.
Correct, although it doesn't have to constrain it can simply describe, it is implied by the spec already we just make it explicit, this serves as discovery. |
I think it's part of my earlier question:
I'm missing the "and then what?" bit here. Is the aim that the Client SDK throws an exception if it gets content that mismatches the schema? Or is the main goal here to open up an LLM Context bypassing side-channel between Client and Server? |
The former. But after our discussion on #415, I think I've come around to the feeling that EmbeddedResource isn't the right concept for simple, ephemeral structured results that require no additional descriptive information. They're not really Resources in the sense that MCP defines them, regardless of whether we make it clear that in some contexts persistence is not required. I think |
|
Here's a Client/Server pair based on the MCP specification and SDK that uses TextContentResources for schema specification and Tool Call results validation before coercion to a Pydantic model ready for presentation to an LLM: Server:app = FastMCP(name="structured weather server")
@app.tool(name="check_weather", description="Gets the weather as JSON.")
def check_weather(location: str) -> list[EmbeddedResource]:
result: str = Template(WEATHER_TEMPLATE).substitute(location=location)
return EmbeddedResource(
type="resource",
resource={"mimeType":"application/json",
"uri":f"my-mcp://check_weather/{location}",
"text":result}
)
@app.resource("my-mcp://schema/tools/check_weather/schema.json",
mime_type="application/schema+json")
def get_schema() -> str:
return WEATHER_SCHEMA
if __name__ == "__main__":
app.run(transport="stdio")Client:async with ClientSession(read_stream, write_stream) as session:
# Initialize the session
await session.initialize()
schema_resource: ReadResourceResult = await session.read_resource("my-mcp://schema/tools/check_weather/schema.json")
schema = json.loads(schema_resource.contents[0].text)
### LLM (Assitant) has stopped for tool_use
tool_result: CallToolResult = await session.call_tool("check_weather", {"location": "London"})
payload = tool_result.content[0].resource.text
validate(instance=json.loads(payload),schema=schema)
structured = WeatherResult.model_validate_json(payload)
### Send new User message to LLM (Tool Result)
print(f"Weather in {structured.location}: {structured.conditions}")Full code with schema template here: https://gist.github.com/evalstate/e49cb163297c1ab940fb8a98e31947ed Could the Resource usage be better? Certainly - it would be improved if I maintained a template resource that responded with the current weather, the uri scheme should probably include a time component. But this is 100% in spec, today. I think we need concentrate on Developer experience between the "tool_use" stop and the subsequent presentation of content to the LLM. If I control the Host/Client/Server triple, I already have/control everything I need, although there might be some "nice to haves" like having the schema resource reference attached to a Tool Annotation. But in that case, I don't think there's anything necessary in this PR for me. What's interesting is if I am a Host/Client developer where people can add unknown MCP Servers - which I think this is motivating the PR... In that situation, the things I really care about are:
The structured-ness will come from the mimeType (e.g. application/json is by definition structured). The payload schema I can derive from either from knowing the URI scheme in advance, or [some other mechanism described in this PR]. My opinion is that having all the Client SDKs do automatic validation against schemas will add a large burden to interoperability testing, as my experience is that validation libraries can be sensitive to small variances. I think it would be helpful to base this discussion on how e.g. this code would be enhanced by the proposed changes to make sure we get the ergonomics right. (update) I do also think we need to encourage the use of these MCP features rather than make people fearful of them - I genuinely think people are avoiding them because of a lack of playbook examples like the one above. |
|
Great example here, this definitely helps me see how you're approaching this.
This, to me, is sort of the crux of this issue, as well as the long long discussion we've been having around the tools/search capability. If you control the triple, you don't need MCP - you can have the LLM format tool calls in a way your app can consume, MCP is completely superfluous. You can expose multi-turn search protocols because, well, you just hard code the logic in.
I could be wrong here, but all three schemas you mention here offer no insight into the content or how to interpret it, these are all just protocols. They might tell you how to get the data, but a git repo can have anything checked into it.
Tokenizable by the llm is important, but it's not the whole story when it comes to data returns. WHAT the fields in the structured data mean is what's important. So, I don't think the assertion that uri and mimetype is all you care about as the client / host isn't quite accurate. When I think about MCP, I think of it as a protocol for backend developers to expose robust capabilities to hosts / clients when they have ZERO control over them, and when the clients / hosts in turn have zero a priori knowledge of the server (besides what they can consume in a reasonable time and with a reasonable context limit upon connection). So in your example, if you control the entire stack, you wouldn't pull a schema from a resource, you'd embed it in the host / client or just have it at a url you can call. I didn't work on this PR at all, but as I internalized this proposal, this is fundamentally a mechanism to explain structured data to the LLM in a situation where you have zero control over the host / client. It has an added benefit of being great for server developers to type and structure check the output as it leaves the server, and potential client validation, but the much more important part is the context being added to the data for the benefit of the model. The context being added by the uri or the mime type field just isn't super helpful when it's arbitrary json data that a server is returning as part of a tool call. To put it another way, If you don't control the host / client, there is NO mechanism in the protocol to tell the client to do what you're proposing up above (grab the schema from resources and hand the schema plus tool return to the model). This is an exact echo of the multi-turn search idea that got floated in the other thread - if you don't control the full stack, the spec as it exists today has no mechanism to explain to the client or host that it needs to chain two calls together to achieve a goal. There are a bunch of ways you COULD do it, you can hope that the client / host understands it should re-pull tools if a tool notification comes right after a tool call, but none where you have guarantees as the backend that the client / host will actually do what you are hoping. So, your comment above focuses on the validation aspect, but (in my opinion) much more important are the descriptions being provided to the model to explain the return. So, I don't think this can be condensed to an ergonomics discussion - if we just take the server side of your code, and expose it to 10 different client / host / models combinations in a world where you don't control the full stack, you're going to get 10 different treatments of the data, where I'd say 9 of the 10 wouldn't pull the resource to then to augment the return data when handed to the model, and there is NOTHING we could do to change that. There are no hints, the tool annotations don't cover this type of thing, etc, and that's where I see this proposal fitting in. BTW - Your aside about how to you could use a resource template to achieve the same result here is SUPER interesting!! I never would have thought of doing that. So, I actually coded up a server to try this out, and exposed it to claude and....claude doesn't seem to crawl resource templates 🤦♂️. Haha, I don't know whose point that really proves, you're right that there's hit or miss adoption of these various components, but then I'm right in that as a backend dev, the only thing I can pretty much guarantee a host / client will implement these days is simple tool calls, so it feels like we should really be blowing out the tool capabilities with features like this proposal and the tool/search, since that's the main feature area the client / host devs seem to be implementing. Really fun little experiment, thanks for the idea there! |
|
MCP aims to solve (or at least improve) the MxN problem. If you have dozens of unknown Servers, and dozens of unknown Clients the best way to coordinate structure is with a shared uri scheme - hence the reason why Resources are designed and documented as they are. This gives the flexibility for general clients to do generally useful things, and more specific clients to use structured data as they see fit to add value. The tokenization aspect is important as that is the entire reason for using MCP. OpenAPI etc. have better answers to these problems if your target isn't specifically an LLM. Changes to the Protocol that force SDK implementors and Host/Client developers to do things based on individual use-cases should generally be avoided, as MCP is intended to be used in incredibly diverse scenarios. |
|
I haven't had time to put towards this since my last update to the PR. I'll be reviewing the discussion and following up with my thoughts over the weekend. |
|
@sambhav here 👋 I have been very interested in this subject since the inception of MCP and it is great to see so much enthusiasm on this topic. Hi all, This RFC (#356) is a valuable step towards standardizing structured data handling in MCP tool results. Reflecting on the proposal and discussion, it appears tool outputs generally fall into three distinct primary use cases:
Critique of the RFC's
|
|
Note: having a static output schema (similar to input schema) with descriptions at a tool/server level would also help with features like #469 |
|
Hi all - thoughts after reviewing the discussion so far:
At this point I think it's clear that disentangling the two use cases makes sense, since it allows the support for the #371 use case to be truly lightweight, and lets it proceed quickly, without constraining the support of the the use cases being discussed here or rushing its design. To that end I've updated #371 to provide support for the structured data use case in a disjoint way - details over there. This decouples the two, and should also simplify the eligible designs here, by eliminating the need to support strictness (unless it's desirable for other reasons). |
|
This PR is cool -- i was wondering though, do we have any ideas of how users will actually pass output schema to models. All of the model APIs I've seen do not take an output schema in tools... examples: |
When the client gets a response that’s been annotated like this, you can just hand the raw json and the output schema to the model to basically just add additional context to the json, and you can optionally do client level validation. But mostly it’s for getting field level metadata to the model. |
True, but if you are using the APIs with built-in function calling like the ones above, it might be odd to put output schema directly in prompt while input schema and other elements are going in function calling. Not sure 1/ how would this look in the end when model sees all the tokens 2/ how accuracy would be affected. |
I can think of a couple of scenarios where this might make sense, but they are few and far between and would probably be better solved either through more direct prompting, or programmatically. The schema doesn't have special meaning for the model any more than any other tokens. These outputs make the most sense when the Host application developer knows the shape of the content, allowing them to transform it. In which case the schema is useful to validate, and to share earlier with an identifier so the data can be coerced in to the right shape (f.x. a pydantic model). |
The latest version of the Google APIs do include this (not released yet). /** Structured representation of a function declaration as defined by the [OpenAPI 3.0 specification](https://spec.openapis.org/oas/v3.0.3). Included in this declaration are the function name, description, parameters and response type. This FunctionDeclaration is a representation of a block of code that can be used as a `Tool` by the model and executed by the client. */
export declare interface FunctionDeclaration {
/** Optional. Description and purpose of the function. Model uses it to decide how and whether to call the function. */
description?: string;
/** Required. The name of the function to call. Must start with a letter or an underscore. Must be a-z, A-Z, 0-9, or contain underscores, dots and dashes, with a maximum length of 64. */
name?: string;
/** Optional. Describes the parameters to this function in JSON Schema Object format. Reflects the Open API 3.03 Parameter Object. string Key: the name of the parameter. Parameter names are case sensitive. Schema Value: the Schema defining the type used for the parameter. For function with no parameters, this can be left unset. Parameter names must start with a letter or an underscore and must only contain chars a-z, A-Z, 0-9, or underscores with a maximum length of 64. Example with 1 required and 1 optional parameter: type: OBJECT properties: param1: type: STRING param2: type: INTEGER required: - param1 */
parameters?: Schema;
/** Optional. Describes the output from this function in JSON Schema format. Reflects the Open API 3.03 Response Object. The Schema defines the type used for the response value of the function. */
response?: Schema;
}
/** A function response. */
export declare class FunctionResponse {
/** The id of the function call this response is for. Populated by the client
to match the corresponding function call `id`. */
id?: string;
/** Required. The name of the function to call. Matches [FunctionDeclaration.name] and [FunctionCall.name]. */
name?: string;
/** Required. The function response in JSON object format. Use "output" key to specify function output and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as function output. */
response?: Record<string, unknown>;
}The latest version of the Google GenAI SDK does include output schemas embedded into function declarations. I expect all major providers to follow suit and standardization is probably, the MCP should definitely lead the charge here. |
|
It's a really cool feature and actually looks very similar to Anthropics "JSON Mode" which uses the Tool Calling coercion to achieve same https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#json-mode. I think this discussion was about CallToolResults - and normally we'd assume they are deterministic as they are coming from an MCP Server to the LLM. I agree MCP should absolutely lead the charge - which is why I think we need to let people know that they can achieve a mime type safe, validated solution today with a couple of lines of code :) Being able to specify an outputSchema uri on a Tool definition makes it better. |
The schema itself doesn't, the natural language components of the schema do. This isn't a weird one off use case, any json return that doesn't have highly readable property names will benefit from this. Hahaha, like basically every single one of our discussions, this comes back to enterprise use cases - if enterprises directly expose existing json returns (which can be horrific), this is a mechanism to provide natural language explanation of the output to a model. And ya, with the Google work, just makes more sense. If there's a need to describe the input schema in natural language to the model, there's a need to describe the output schema.
I still haven't seen how this clicks together - protocols don't describe content, mime doesn't have a mechanism to describe arbitrarily shaped data. Those are the two mechanisms in a resource. You might be able to do something bespoke with an output schema uri, but that dance (if you see uri X, download the schema and hand to the model) needs to get defined at the protocol level. |
|
Yep - don't think we're disagreeing. MCP is "different" because as a Host application you have the "predictable" MCP Server side, and then the "fuzzy" LLM side. Throwing a schema at LLM is absolutely context, and it will make inferences given a schema... but it doesn't "understand" the schema in the same way a traditional program does :) An MCP Server author can return pretty much anything textual and hand it over to the LLM.... and stuff happens! Yep - there's a couple of small bits missing from the picture. The mime type does tell you the shape of the data though - if you get |
|
Hi there enthusiastic newbie here, I have been playing around with MCP and this feature seems extremely useful for the usecases I'm considering, hence I took a pass at implementing it on the Python SDK see modelcontextprotocol/python-sdk#685 I'm experimenting with this patch to see if it does work as I expect, so far so good. I note #356 (comment) refers to set of related RFCs so if some/part of this merge request can be directed towards that let me know I'll happily take a pass at it. |
|
Thanks again @lukaswelinder for working on this proposal, and to everyone who participated in the discussion. However, the ROI of its complexity cost + new content type (see discussion here and upthread) makes it difficult to justify adding to the protocol at present, while sticking to the goal of keeping MCP as simple as possible but no simpler. If sufficient real-world need arises that isn’t satisfied by either simple structured output (for function-call-like tools) or EmbeddedResources (for richer results that include structured content), we can revisit, but closing for now. |
|
@bhosmer-ant is there appetite for retaining the outputSchema part of this RFC as a separate new proposal? That's the particular part of this RFC that seems most useful to me. I get the rationalisation for not extending the number of content types, and in fact when implementing the patch this was the part that would have introduced a breaking change so I think it's a good decision to just rely on primitive return types. However being able to check the output schema before calling the tools is useful to rationalise beyond the input schema and the description what the tool is likely to return so I expect this to lead to better tool selection by agents. I'm happy to maintain my python branch along these lines so this hypothesis can be test at scale. Though at the moment I don't have the resources to run an exaustive test of this theory so it would probably require some community effort to test this hypothesis. Perhaps if the outputSchema behaviour was opt in (from the client) and optional for the tool provider that would allow client code that wanted to use the output schema to make a decision to do so and also not force tool vendors to support it introducing breaking changes. |
@davemssavage it sounds like #371 may be what you're looking for - it supports output schemas more or less as you're describing, but only in the context of results that are single JSON objects, rather than components of the |
|
Ok great, yes that's what I'm looking for, if helpful I'm happy to rework my merge request for the python sdk along these lines. This brings in a question on process to my mind as I've not been involved in this community before. I infer that an RFC being merged means its an accepted part of the protocol. Is that a correct assumption? If it is should there be some tasks to track documentation updates or sdk updates which I could associate the merge request with? |
Motivation and Context
Only allowing for unstructured text content in tool call results puts limits on the extensibility of MCP. Adding support for structured output allows us to provide better context to the LLMs calling the tools, and allows for extensibility of MCP tools beyond just LLM usage.
See #97 for discussion.
Changes Made
outputSchemaobject on the tool definition.{ "description": "A JSON encodable object to or from an LLM.", "properties": { "data": { "description": "The result object.", "type": "object" }, "schema": { "description": "The schema $ref or a schema object.", "type": ["string", "object"] } }, "required": ["data"] }This allows for pretty flexible usage:
schemais not defined, theoutputSchemais impliedschemais a string, it is interpreted as a URI for the schema (either a ref inoutputSchemaor a fully qualified URL)schemais an object, that object is the schema (for supporting dynamic content schemas)How Has This Been Tested?
Have been using this approach with and internally built MCP server and agent workflow. We have only tested it for listing tools and tool calls. We have not tested sampling or prompt messages.
Breaking Changes
The changes here shouldn't break existing implementations.
Types of changes
Checklist
Additional context