-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add User Interaction (out-of-band interactions) as a client capability #475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add User Interaction (out-of-band interactions) as a client capability #475
Conversation
|
I'm not sure I understand how this works. Could you go through a step by step example of the requests and responses that we would see with this? |
|
Doesn't the new authorization work that was proposed handle this with headers? I might have misunderstood the original proposal, but I thought you could 401 with a header that pointed you to an auth server to auth with, but I might be mis-remembering |
@patwhite that is for when authorization is needed for the client to communicate (at all) to the server. This proposal is for any situation when a server would want to interact with the user. For example:
|
@aaronpk the Flow Diagram sections of the spec show the MCP protocol level requests at a high level. You'll notice the "Human interaction" note, which could be anything, similar to the "user authenticates" part of the OAuth authorization code flow. In this case, the "human interaction" might be an OAuth flow that redirects to the MCP server, granting it a token to call some OAuth protected API downstream. Or it could present a form asking for an API key. Or it could present a stripe payment portal requiring a subscription upgrade. The main point here is that the MCP spec doesn't need to care what the interaction is. Does that help? Are you looking for a flow diagram for a specific use-case to ground understanding with an example? |
OK, got it - I think this is a cool idea, and solves some issues in the protocol in general. But, I do worry about conflating generalized interactions with security specific interactions. For generalized interactions, I think it would be great to actually flesh this out ever further, add some additional details around multi-turn interactions etc. But, for a security demand in particular, this PR could literally just be a single new error type (401 equivalent) that includes enough information to craft an oauth redirect. That could be a lot easier to generally get through the review process here, and I think it would also be a lot easier to get clients to implement it, cause it would be using mechanisms which already exist, and you could basically make it more or less required for the vNext spec. |
This is a good point @patwhite. That is exactly what we've defined under "Requiring Interaction as an Error Response". |
849c523 to
8cdd61a
Compare
8cdd61a to
a099db4
Compare
|
FYI, I slimmed down this PR (thanks for the nudge from @patwhite) to focus just on the necessary interaction to unblock authorization. The language is written with an eye to extending the user interaction concept in the future, if desired. I also added more flow diagram examples, let me know if it feels clearer now @aaronpk! |
|
Thanks for drafting this proposal. I have two follow-up questions:
|
|
@adranwit thanks for taking a look! Answers inline:
The MCP client doesn't need to relay anything back to the server. The client's only responsibility is defined in "Interaction Types > User Agent Interaction":
That's it! The rest of the interaction is controlled by the MCP server, and purposefully kept out of view of the MCP client. That maintains the correct security boundary around the MCP server. Of course, in a interaction like a downstream OAuth authorization, the MCP server will indeed host a callback to complete the authorization flow. But again, that's not in view of the MCP client.
I think a progress mechanism that gives the client updates on the status of the interaction would be a good addition. I've left it out of this proposal (for now) to keep this focused specifically on the absolute must-haves to unblock use cases like downstream tool authorization. Progress could be added as a fast-followup, or now if the community feels it is a must-have.
The end-user cancelling or denying the interaction at the outset is addressed in "Client-Initiated Cancellation". But you raise a different, interesting scenario: the user proceeds partway through an authorization flow (for example), and then rejects it in the downstream system. This is still out of view of the MCP client by definition, unless the MCP server chooses to indicate to the client that the entire interaction should be considered canceled. In that case, the cancellation notification described in "Server-Initiated Cancellation" is appropriate. |
|
Added a concrete example (plus flow diagram) above of the tool authorization scenario, by far the most requested version of this idea! |
|
Thanks for adding the concrete example and flow diagram @nbarbettini —super helpful! The sequence reads like a classic variation of Backend-for-Frontend (BFF) OAuth flow:
All that said, applicability is far beyond OAuth flows, especially when the MCP client is effectively out-of-band for the secure exchange. Idea: instead of layering another “correlationId”-style value on top of
Would love the team’s thoughts on whether a POST-back (or similar out-of-band signal) callback could simplify the flow. Thanks again—this example really clarifies the proposal! |
Absolutely fine to move that conversation elsewhere, and happy to refocus this conversation on the User Interaction primitives. I only brought it up because we started talking about in-band E2EE, impersonation, MITM, etc. Can you suggest a good venue for secure chained agent/service identity mechanisms? I see it best fitting in with MCP Initialize, I suppose? |
I did a quick search in the Discussions in this repository and found this one that might be a better fit from @ggoodman , but you could also start a new Discussion if you'd like. |
Definitely fair. I think we'll need to disambiguate "impersonation" and whether that's the user level or MCP client level. Thanks for your comments here. We already had on our list to clean up that section of the spec and this will help I think!
Definitely will make this clearer! The intention for interaction URLs is to provide a mechanism by which the MCP server can communicate directly with the user. That URL will be visible to the MCP client and therefore MUST NOT contain any sensitive or secure information directly. Again, we don't want to be prescriptive about the interaction itself here, but you could imagine that once the user agent loads that URL, the server could reveal the pre-signed URL to that user. There might be some level of authentication (e.g., checking a session) before that happens if desired as well.
Yeah, we'll need to detail this a bit more as well. Thanks for the nudge here! |
|
@nbarbettini - thank you for putting this proposal , i have no role in the review process but would like to share my thoughts as someone handling auth for the past 15 years now.
I admit that in this MCP world i'm quite a beginner and probably haven't thought this through all use cases (although downstream auth is definitely interesting to me) this way downstream auth is made simple in the eyes of the MCP server , you either send me a token or get 401 asking you to issue one, then come back. @aaronpk - would love to hear what do you think as well. |
|
@shlomiken Thanks for taking a look!
Resource server is one of the roles that an MCP server plays, but not the only role. This proposal enables the MCP server to take the role of OAuth Client for any downstream resource servers it needs to access.
MCP servers that act as OAuth clients to downstream services need to be stateful. It's implied but not stated explicitly in the text of this proposal - I can call that out more clearly.
In the language of OAuth 2.1, Claude Desktop is a public (not confidential) client. SPA code that runs in a browser is public as well. If the MCP client code does indeed run entirely on a server-side/backend process, then it qualifies as confidential. The distinction between public and confidential is not about whether I trust Claude Desktop or not. It's whether I can decompile or right-click->View Source. If I can, then a malicious actor can too, which means we need to treat the client differently from a security perspective. For example, we can't keep OAuth client secrets safe anywhere in a public client.
This is tempting, and it's where I started first. I landed on the current proposal after conversations with @wdawson, @aaronpk, @mcguinness and others. The reasons the MCP client can't also be the client to the downstream OAuth service are:
|
|
Thanks for this proposal, how would the following work with this solution? I have an MCP server that can perform various levels of sensitive actions. Let's use the Github MCP server as an example. It can make commits on a personal and corporate repo. The MCP server has permission to carry out all of these actions because the user has that level of permission. I want to user to be prompted by default above a certain level of sensitive action - but to give them a way to say "next time I do this, I don't want to be prompted" For example, I want them to be prompted for any time they're performing mutation actions against the corporate repo - unless they have decided that it's fine for repo X. However, I want them to be always prompted if they're changing security settings for any repo |
@RichardoC, I think there are tools emerging that help address your use-case. For a naive consent prompt, perhaps Elicitiation will suffice. However, if you want stronger signals and better 'receipts', you might reach for this proposal. Your sensitive tool calls could respond with errors and requests for interaction. The user would need to navigate to the linked page and confirm their consent there. It would be upon you, to correlate the interaction request from the MCP Server and the user confirmation on the web page. A JWT query param could be a good way to correlate these two interactions in a tamper-proof and stateless way. |
Yeh , after thinking more - it make sense although the spec say the MCP Client act as Oauth 2.1 client - which is confusing i think . Another thing i'm not fully understand is if the auth loop is closed at the MCP server then how the client is being notified on that , and re-issue their tool request . also what in the protocol can be used as session (since we don't have cookies here) One more question to oauth guys - isn't that approach make DCR redundant ? a MCP server only knows a finite number of resource servers he will talk with , why do we need Dynamic registration then ? Regarding confidential clients - my bad, i updated my comment , probably you saw it later. i then read that even if you use OS native secure storage , IDPs cannot rely on an app to do so , so all installed apps are public. |
This is the "Transactional" aspect I've been pushing for. I believe that this User Interaction capability would really benefit from a robust Server -> Client tracking / notification upon landing and not as a follow-on. |
This is correct for the authorization between the MCP client and MCP server. The type of authorization we're talking about here is when an MCP server needs authorization to a downstream API. For example, a Personal Assistant MCP server needing authorization to Google's APIs for a particular user. It is a bit confusing, but the MCP server can play both the OAuth resource server and client roles. Remember, these are better considered roles that can be played instead of fixed capabilities.
We have added a section back into the spec for how the MCP client can monitor the progress of the interaction from the MCP server. We did this to leverage existing capabilities in the MCP spec. This can accomplish the arrow number 6 in your diagram. As @ggoodman points out, there is appetite in the community for a better pattern when dealing with asynchronous operations. We don't want to introduce a new pattern for that in this PR, but are open to changes in the future.
The MCP server receives an access token from the client. Depending on the implementation, the server can leverage the subject identifier as a way to track which user to link the downstream credentials to. This can be used to deliver arrow number 7 in your diagram. Ultimately, this will be up to the exact implementation of the MCP server and its authorization server for how to do this.
I think this question is confusing the roles again. DCR is required for MCP servers that will not necessarily know their clients ahead of time. Think my particular instance of Claude Desktop I have installed wanting to access your MCP server. The MCP server, however, may act as an OAuth client to the Google API. In that case, it's a different OAuth flow entirely and DCR may not be required. |
|
On the |
|
Hi @nbarbettini, given the Elicitation is now part of the MCP spec, I could imagine the following election request schema "requestedSchema": {
"type": "object",
"properties": {
"secureFlow": {
"type": "string",
"title": "Open Link To Provide Secrets",
"description": "Start Secure ...."
"default":"https://somehost/secure/flow"
"format":"uri"
},
},
"required": ["secureFlow"]
}where do we stand with User Interaction ? |
@adranwit
In your example, does |
|
@nbarbettini instead of adding this functionality as a client capability, why not as a MCP Proxy Server capability? Why change the target? B/c the MCP Proxy Server definition
seems to more accurately describe the role of "client" you are describing. |
|
Is there still interest in pursuing this proposal / addition? It seems to have stalled, curious why that is. Perhaps it is being discussed elsewhere or there is another proposal? |
|
@dmiyamasu thanks for the feedback. In this scenario, elicitation could simply results in a URL being rendered or opened by the MCP host. At this stage, there's nothing inherently secure about that interaction—the MCP client is just triggering a link. Whatever happens after that (i.e., within the URL target) is entirely out of scope for the MCP protocol and the elicitation process itself. So, even though "secureFlow" is a required field, it's just a URI. No sensitive data is being directly requested or transmitted via elicitation. The actual handling of secrets or sensitive input—if any—would happen within the hosted flow pointed to by the URL, which is beyond the control or visibility of the MCP client. In addition, I'd assume any of secure flow would generate URL that can be redeem only once. Therefore, in my opinion this doesn't violate the draft spec’s guidance that “Servers MUST NOT use elicitation to request sensitive information.” |
|
Hey folks! Not stalled. We're working through all the things in order to rework this on to of the most recently released version that includes elicitation. I hope to have something ready this week! Edit: I, unfortunately, did not quite make it over the line this week. Will definitely be early next week! Thanks for your patience, folks! |
|
Thanks for putting this proposal @nbarbettini After going through the draft and all the discussions I realised it will be so much cleaner if authorization was handled between the third party API and its AS and would like your review on the same, whether or not this will be a good design or has any holes in it. For example: A user calls a tool "approve_pull_request" to approve any PR on a Github MCP server. This requires checking if the user is authorized to approve PRs in that particular git repo or not. Our very initial thought was to handle these checks when the MCP tool is called - on the MCP server or with a custom AS. But with this proposal I feel its simplified, as an MCP server can only do what the user has the permissions to do on Github. It removes any scope of privilege escalation or access misconfiguration (In the case where we were handling checks at tool calls) |
|
@Xerenz Yes, you understood the proposal correctly! In cases like yours where an MCP server is calling a downstream API (such as GitHub) on behalf of the end-user, the best place to check the user's authorization is in the downstream API itself (or its authorization server). Otherwise, MCP servers must re-build a ton of authorization logic. However, the MCP server may need to pass an authorization challenge back to the end-user ("click here to authorize access to GitHub"). Some MCP servers are nested within other MCP servers, so the "pass back" mechanism needs to be robust and something that can be programmatically forwarded to upstream servers. That's the foundational principle of this proposal. |
|
Hey everyone! Thanks for all the great feedback here. To keep things clean we've created a new PR for the updated version of this proposal: #887 |


Introduces a new client capability that servers can use to trigger an interaction with the end-user.
There are some important use cases that require the MCP server to interact with the end-user in a secure way:
These interactions are highly sensitive in nature, and we can take inspiration from how OAuth/OIDC solved these problems on the web. The interaction type proposed here is
type="ua", which requires the MCP client obtain consent from the user and navigate to a URL in a user-agent (aka browser), where the sensitive operation can occur securely.Motivation and Context
One of the hot topics discussed but not addressed in #284 was the idea of fine-grained authorization for specific tools. @wdawson and I previously discussed this idea in #234 as well.
For example, the community identified scenarios involving "downstream" tools and resources that need authorization, but MCP authorization is about MCP client->MCP server authorization, not how "downstream" authorization would be handled in an MCP server that talks to a third-party API or resource server.
Kudos to @siwachabhi for the idea to model this as a client capability. This proposal is distinct from and complementary to the elicitation proposal (#382): it describes an interaction that takes place outside of the MCP client, whereas elicitation describes an interaction that takes place inside the MCP client.
Examples
Downstream authorization for tools
A big question in #284 was downstream authorization, or "tool authorization", meaning this scenario:
Nate's Awesome Google Toolsserver which exposes tools that interact with Google's REST API. Let's also say the server has these tools:search_email, send_email, trash_emailNate's Awesome Google Toolsdoesn't mean that the user has also authorized scopes for Google's API.search_email, it would be desirable to be able to ask the user "just in time" to approve a scope from Google (https://www.googleapis.com/auth/gmail.readonly)Here is how this scenario would work with this proposal:
sequenceDiagram participant U as End-User participant B as User-Agent (Browser) participant AS as 3rd-party Authorization Server participant C as MCP Client participant S as MCP Server C->>S: Call tool send_email Note over S: Server determines user is not yet authorized S->>C: interaction/create type=ua<br>with OAuth 2.1 authorize URL<br>scope=https://www.googleapis... C-->>U: Present consent to open URL U-->>C: Provide consent C-->>B: Open URL B-->>AS: Navigate to URL C->>S: Send response ("Ack") Note over U,AS: Perform authorization<br>(out of band) AS-->>S: OAuth callback S->>AS: OAuth 2.1 token exchange Note over S: Server now has a valid 3rd-party token B-->>S: Perform interaction Note over S: Continue operationsBy adding a path for the server to instruct the MCP client to send the end-user to a server-defined URI, we gain the ability to perform this "downstream" authorization. If we zoom out a bit, this doesn't just apply to servers acting as OAuth clients - directing the end-user to a URL also unlocks the ability to gather any kind of sensitive data that shouldn't pass through the MCP client (or an intermediary).
FAQ
Redirecting to a URL is great for doing OAuth, but what about servers that need the user to provide API keys, connection strings, etc?
These should not be exposed to the MCP client either! Redirecting to a URL (the
type="ua"pattern proposed here) also works for gathering any sensitive information. For example, the MCP server itself can host a public page containing an HTML form that asks the user to enter a required API key -- all under the control of the MCP server, and without that sensitive information passing through the client.Why is it such a big deal if sensitive information passes through the client?
For the same reason that you shouldn't give Yelp your Gmail password (to use a classic example). The MCP client's job is to communicate with the MCP server only, even though the MCP server might also be communicating with other APIs, resource servers, etc. on behalf of the user. There are important security boundaries at play:
How Has This Been Tested?
TODO: Sample app showing both the client and server side of user interactions.
Coming soon -- wanted to get the doc up to start the discussion, and will follow up shortly with code.
Breaking Changes
None
Types of changes
Checklist
Additional context
TODO:
schema.ts