-
Notifications
You must be signed in to change notification settings - Fork 1.3k
SEP-1865: MCP Apps - Interactive User Interfaces for MCP #1865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added documentation for optional extensions to the Model Context Protocol.
|
This is exciting to see! I see that as with current MCP-UI and Apps SDK specs, this covers allowing UI to request tool calls and rerender in a way that keeps the agent in the loop. But does this proposal intentionally not address mechanisms to close the loop in the other direction, to flow related tool call data from subsequent conversational turns back into an already rendered widget? Or is the ability to read (and subscribe to?) resources directly from within a widget intended for this purpose? Suppose I have a getItemDetails tool that renders a Book widget, and then in a subsequent turn a user utterance triggers a setItemStatus tool which mutates a status field. How should the change be communicated to the widget so it can rerender? |
|
That’s not an MCP limitation is it? (roll your own MCP and chat interface and there’s no issue rendering video or html or whatever), that is a UI of chat bots limitation. You don’t NEED to give the model back text data over MCP - a tool can be triggered and vend auth’d data of any kind to the interface. What am I missing? Why concretize a general communication/auth protocol spec within one specific usecase? |
|
Fantastic work on this PR.. really sharp update. Introducing this resource declarations and a bi-directional UI communication model feels like a big step toward unlocking richer and more interactive MCP clients. One question: how do you envision capability negotiation evolving for UI-enabled resources once multiple client types adopt this pattern? I'm curiosa whether you see a standardized handshake emerging or if it stays client-specific for now. |
In "base" MCP there is no way to distinguish between what is for the model and what is for the application. The OpenAI Apps SDK worked around this by putting the UI in structured output and the for-model result in regular unstructured / text tool output. But that's actually not standards-compliant (the structured and unstructured output is supposed to be the same as of current spec), and it prevents the use of structured output for other things. By using metadata it becomes much more explicit what is the tool result and what is the tool app/ui/visualization component. It is important to distinguish between context for the model and for the application. You don't want to send context for the application to the model, as the model doesn't have direct access to the UI (at least not without calling another tool, and that'd be a very roundabout way to accomplish the same thing). Also, this is an extension, so it is purely additive. It's a great way to let something that is bound to evolve and need continuous adjustments not get bogged down by only being allowed to change with spec versions. You're right that if you're rolling your own MCP server and client host, then you can already do this using whatever scheme you want - but the beauty of a standardized extension is that we have less risk of ending up with a unique UI / Apps contract per client host. Ideally as a server author, you'd want your MCP server to be able to render UI to all chat platforms without having to use a different communication convention for each of them. And simply returning the UI resource to the model is not a solution. The model is not an active participant in rendering the UI - that's an application-level concern. Finally, the whole in-frame messaging part of this extension is non-trivial to design and engineer, so having a standardized way to that is highly valuable. See: https://github.com/modelcontextprotocol/ext-apps/blob/main/specification/draft/apps.mdx#transport-layer |
|
Many questions: 1) Why bring this application implementation into the protocol itself?This, along with OpenAI's implementation of the Apps SDK, is not bounded by the MCP specification. @PederHP makes a good point:
Then I would say this should be rejected as a SEP (since it's not apart of the specification) and instead, documented as a "best practice" or "official extension". The docs site already has precedence for this in the roadmap: https://modelcontextprotocol.io/development/roadmap#official-extensions
I missed this https://github.com/modelcontextprotocol/ext-apps 🤦🏼 - this getting a SEP in 2) Spec bloatOne of my main worries with bringing this into the main spec is that we continue to over complicate an already bloated and bifurcated specification: I know of no other MCP servers that implement extensions and encouraging official support for this will make server implementors lives harder (vs. encouraging this as a best practice). I would point to the following discussions on how fractured the community is and the difficulty server builders / maintainers have had over the last year:
3) JSON RPC
|
|
Regarding
To me this seems like something that SHOULD require a user approval. Giving the server the ability to inject arbitrary messages into the conversation without user approval seems like a problematic pattern. I know there is an increased reliance on trusting the server, but there may also be non-malicious cases where the user simply does not want the inject message for whatever reason. |
@adamesque Great use case! Currently, the SEP doesn't explicitly address that flow, but it does support patterns that enable it. For example -
It's likely a common use case that requires clarification and guidance. However, I'm not sure that the MVP should enforce specific behavior at this point. We can definitely discuss it. |
@adriannoes Host<>UI capability negotiation is implemented in the SDK and mentioned in the spec as part of the We'd love to hear your feedback! I'll note that we need to review the internal structure and ensure it includes the fields we need for the MVP. |
@PederHP It came up, and it's definitely worth further discussion. I think it warrants a thread in #ui-cwg. |
|
@idosal I noticed in the proposal the use of the media type Media Type suffixes (+json, etc) are intended to communicate an underlying format that the new media type is based on. e.g. image/svg+xml indicates that content can be processed as xml. If in the future there was a decision to try and register this media type, it is highly unlikely that the registration would be allowed. I say this as one of IANA's media type reviewers. However, I think I have a proposal that would address your needs and only slightly bend the rules. This specification https://datatracker.ietf.org/doc/html/rfc6906#section-3.1 (which is just Informational so doesn't carry any IETF approval) proposes the use of the My suggestion is to use this:
Media type parameters are a commonly used construct. HTML technically only allows the charset parameter, but with some creative license, adding the profile parameter is not likely to cause any problems. From rfc6909,
This is ideal because it allows the content to still be treated like text/html but you have the clear indicator that the HTML is intended to be processed using MCP semantics. |
|
Should the profile maybe be |
|
Agent-Driven UI Navigation Question: Would it make sense to document "UI control" patterns alongside MCP Apps? Many complex applications (network visualization, CAD tools, enterprise dashboards, ...) might benefit from agent-guided navigation beyond embedded widgets. I believe the two patterns are complementary. Background:
Potential synergy:
This combines MCP Apps' inline interactivity with full application depth. Widgets act as gateways to rich, stateful exploration. |
|
@glen-84 I don't have much visibility into what the range of values could be for the profile. I think mcp-app is fine too. |
|
Thanks for the thoughtful feedback, @darrelmiller! Really appreciate the insight, especially given your experience with IANA. I agree that The Does that tradeoff make sense given the use case, or do you think the profile approach is still worth it? |
|
It would be very useful to have a way to push context to the host application without necessarily triggering a user message. Something like Consider an MCP App that tracks some activity, like a build/release dashboard. We want the app to be able to push user interactions (like when the user triggers a new build or deployment via the UI) or when state changes (a deployment changed from in-progress to done). Doing this with By adding a sort of context buffer for the server / ui to push to we leave it up to the host how to handle these concerns, which I think is a good pattern, and allows this extension to be useful in a variety of contexts from autonomous agents to conversational AI on a variety of device form factors. I'll open a thread on Discord for this as well. |
|
@antonpk1 I am contractually bound to say you should not use a media type that is not registered and is structurally invalid. :-) However, I do understand your concern over adding unnecessary complexity. There are two mitigations: One is that there are lots of parsing libraries that do make some of that normalization go away. Especially now that Structured Fields are a thing. https://www.rfc-editor.org/rfc/rfc9651.html The other is that you are free to mandate that people use the exact string You do what you think is right for your community, but I will say that from experience, complying with existing standards does generally provide long term benefits, especially when the cost to do so is low. |
@idosal I think it's worth a discussion — in my mind, without explicitly addressing this, we're not able to "close the agentic loop" for UI, where agents that render UI can not only see the information presented but collaborate on / assist with it. We've seen the need for more formal patterns around this at Indeed.
Agree that more structure would be helpful b/c as currently written I don't believe this spec is clear enough around subsequent host-initiated update mechanisms — if it's permissible for the host to supply tool-result updates not- requested by the UI during the interactive phase, it would be good to include it there. It's possible some sort of widget or resource key should be returned in tool result meta if tool call data is intended to be merged into an existing widget.
I think unless the intent is to poll (or subscribe), the spec doesn't describe the Host -> Guest events that should trigger these updates to ensure the UI stays in sync with model data as conversation-initiated tool calls occur. Generally I would prefer other mechanisms than a "please refetch" message. One other piece that has come up in internal discussions at Indeed is that this spec doesn't provide a mechanism similar to Apps SDK widget state. Without this, it's unclear how a guest UI can communicate:
The first could probably be achieved via a tool call and might only need a recommendation, but the second seems fairly important since the spec does provide for interactive phase UI-initiated tool calls that can result in UI updates. Currently a widget would have to wait for both initialization and at a minimum ui/tool-input to then make a request to its own backend to get a last-saved state snapshot (if one exists). Specifying such a backend is well outside the scope of a spec like this but feels some part should discuss the reload flow. Otherwise I think it's likely unsophisticated implementors will build out-of-the-box broken experiences. Finally, the spec includes displayMode in the HostContext interface but doesn't define a guest -> host message to request a different displayMode. Is that an intentional omission? Thanks! |
I think it'd be valuable to merge this to maintain the context gathered here. It's also the PR everyone links to, so marking it as "Closed" might cause confusion. Regarding the change itself, the idea was to add MCP Apps to the website as described in SEP-1724, but I can easily change it to whatever makes sense now. |
|
MCP servers provide gated access to a potentially encapsulated dataset. The encapsulation here means that the user might not have any obvious other ways to interact with it. When a human is using an AI assistant, it can be useful to provide a way to visualize or interact with that data. This UI is not linked to any single tool call but is associated with the MCP server as a whole and the dataset it represents. As an example, I have a file server that exposes all the usual file-handling tools to an MCP agent. These tools work as normal, and the LLM can interact with them as expected. I want to provide the user with an interactive file browser that reuses a combination of the existing MCP tools into a more human-friendly experience. I suggest we consider categorizing a UI resource as a standalone entry point, an application on its own, which is not tied to a particular tool. For example: {
"uri": "ui://application/file-browser",
"type": "application",
"mime": "text/html+mcp"
}Because it is entirely optional and only aides a human user, it can be safely ignored by tools that do not support these UIs or are fully automated anyway. |
🏠 Remote-Dev: homespace
🏠 Remote-Dev: homespace
🏠 Remote-Dev: homespace
SEP-1865: MCP Apps - Interactive User Interfaces for MCP
Track: Extensions
Authors: Ido Salomon, Liad Yosef, Olivier Chafik, Jerome Swannack, Jonathan Hefner, Anton Pidkuiko, Nick Cooper, Bryan Ashley, Alexi Christakis
Status: Final
Created: 2025-11-21
Please review the full SEP at modelcontextprotocol/ext-apps. This PR provides a summary of the proposal and wires it into the main spec.
Abstract
This SEP proposes an extension to MCP (per SEP-1724) that enables servers to deliver interactive user interfaces to hosts. MCP Apps introduces a standardized pattern for declaring UI resources via the ui:// URI scheme, associating them with tools through metadata, and facilitating bi-directional communication between the UI and the host using MCP's JSON-RPC base protocol. This extension addresses the growing community need for rich, interactive experiences in MCP-enabled applications, maintaining security, auditability, and alignment with MCP's core architecture. The initial specification focuses on HTML resources (
text/html;profile=mcp-app) with a clear path for future extensions.Motivation
MCP lacks a standardized way for servers to deliver rich, interactive user interfaces to hosts. This gap blocks many use cases that require visual presentation and interactivity that go beyond plain text or structured data. As more hosts adopt this capability, the risk of fragmentation and interoperability challenges grows.
MCP-UI has demonstrated the viability and value of MCP apps built on UI resources and serves as a community playground for the UI spec and SDK. Fueled by a dedicated community, it developed the bi-directional communication model and the HTML, external URL, and remote DOM content types. MCP-UI's adopters, including hosts and providers such as Postman, HuggingFace, Shopify, Goose, and ElevenLabs, have provided critical insights and contributions to the community.
OpenAI's Apps SDK, launched in November 2025, further validated the demand for rich UI experiences within conversational AI interfaces. The Apps SDK enables developers to build rich, interactive applications inside ChatGPT using MCP as its backbone.
The architecture of both the Apps SDK and MCP-UI has significantly informed the design of this specification.
However, without formal standardization:
This SEP addresses the current limitations through an optional, backwards-compatible extension that unifies the approaches pioneered by MCP-UI and the Apps SDK into a single, open standard.
Specification (high level)
The full specification can be found at modelcontextprotocol/ext-apps.
At a high level, MCP Apps extends the Model Context Protocol to enable servers to deliver interactive user interfaces to hosts. This extension introduces:
ui://URI schemeThis specification focuses on HTML content (
text/html;profile=mcp-app) as the initial content type, with extensibility for future formats.As an extension, MCP Apps is optional and must be explicitly negotiated between clients and servers through the extension capabilities mechanism (see Capability Negotiation section).
Rationale
Key design choices:
ui://), referenced by tools via metadata.Alternatives considered:
Alternatives considered:
Alternatives considered:
externalIframescapability.Backward Compatibility
The proposal is an optional extension to the core protocol. Existing implementations continue working without changes.
Reference Implementation
The MCP-UI client and server SDKs support the patterns proposed in this spec.
Olivier Chafik has developed a prototype in the
ext-appsrepository.Security Implications
Hosting interactive UI content from potentially untrusted MCP servers requires careful security consideration.
Based on the threat model, MCP Apps proposes the following mitigations:
You can review the threat model analysis and mitigations in the full spec.
Related
New Content Type for "UI" (#1146) by @kentcdodds
This is a long-awaited addition to the spec, the result of months of work by the MCP community and early adopters. We encourage you to: