Asynchronous Approval for Tool Calls (Extensions Track SEP)#2848
Asynchronous Approval for Tool Calls (Extensions Track SEP)#2848mcguinness wants to merge 1 commit into
Conversation
f5101c1 to
8cca561
Compare
9d4fde8 to
cd425b4
Compare
|
This is a good shape for async approval because the MCP-visible part stays narrow: the client gets a task handle, while the approval system stays server-side. One invariant I would make explicit is that the task handle is not authority by itself. It should be bound to the exact original call envelope and to the policy evaluation that produced the requestable denial. The server-side record should preserve at least:
On approval, the server should re-evaluate policy against the same envelope before execution. If the tool, arguments, principal, resource, policy version, or freshness window no longer match, the task should resolve as stale/denied with no side effect. Acceptance tests I would want:
That keeps async approval durable without turning |
Draft Extensions Track SEP (experimental extension net.openid.authzen/tool-approval, per SEP-2133) that lets an MCP server gate a tool call on out-of-session approval, resolved out of band by a human, a supervising agent, a policy or risk engine, or an external system. Layered on the tasks extension (SEP-2663): a requestable denial returns a CreateTaskResult (resultType "task") in place of the tool result; the server brokers the access request server-side (OpenID AuthZEN Access Request and Approval Profile), re-evaluates against the policy decision point on approval, and executes the tool exactly once. Submission input flows through the task input channel (SEP-2322 MRTR), answerable by a human or an autonomous agent. The client carries only a server-generated taskId; all authorization artifacts stay server-side. Key semantics: - The denied-vs-executed distinction is carried in the CallToolResult body (not only _meta), with a net.openid.authzen/disposition companion, so a tasks-only client can act on it safely. - A non-tasks client may receive a degraded actionable requestable CallToolResult in addition to the -32003 path; includes a client-maturity note. - At-most-once execution; best-effort dedup scoped to originating auth context, tool, and arguments; a re-submission after a terminal denial starts a new request. - tasks/cancel before execution is honored (skip-and-deny); once execution begins the cancel is ineffective, never both executed and cancelled. - COAZ -> Access Request -> re-eval composition stated explicitly; COAZ flagged as a draft dependency. - ttlMs covers and is extended to the learned approval window; in-task input requires a connected client (collect inputs pre-submission for long gaps). - Limitations cover cross-principal and lost-taskId resumption; approval-amplification rate-limiting is MUST with an over-limit non-requestable denial; backward compatibility is additive for new tools but a behavior change for upgraded ones.
cd425b4 to
8d12c29
Compare
|
Strongly agree, and this is the invariant worth stating in normative text rather than leaving implied. The handle is a reference to a pending decision, never a grant. The SEP leaned this way in a few scattered places; per your comment I have consolidated it into one explicit rule plus a required binding record. Applied in Where it already lived:
What I added (new subsection "The task handle is not authority", plus edits to Completion and Security):
One refinement I flagged rather than adopting verbatim: policy id/version. I record it (audit, drift detection), but did not make a version change auto-resolve to stale. The SEP re-evaluates against current policy, so a policy change should produce a fresh allow-or-deny, not an automatic denial; pinning to the exact version that produced the requestable denial would also reject approvals still valid after an unrelated policy edit. So in the text: envelope fields (tool/args/principal/resource) are an exact-match binding; policy version is recorded and re-evaluation runs against current policy. If you specifically want version pinning for a class of high-assurance tools, that reads as a deployment policy on top rather than the default. Did you mean strict pinning or drift-detection? Your acceptance tests are exactly the conformance scenarios this needs, and most are MCP-observable, so I folded them into the conformance clause (now a checklist):
Thanks, this tightens the "durable handle, not bearer token" line that is the whole point of keeping the approval system server-side. |
|
On the Limitations: you give two safety conditions for a side-effecting tool — a durable consumer, or that the effect is "independently auditable." The durable-consumer half has a protocol shape (poll the task). The independently-auditable half doesn't — it's a requirement with nothing in the task model to satisfy it against. Is that intentional (left fully to deployments), or is leaving room for it in the task model in scope for this SEP? The two conditions read as parallel, but only one is actionable at the protocol level. |
|
I would split the answer in two. The audit system itself can stay deployment-owned. I do not think this SEP needs to standardize the audit record schema, storage backend, retention policy, or verifier format. But I would avoid leaving "independently auditable" as pure prose, because then the two safety conditions are not really parallel. The durable-consumer path has a protocol object to come back to. The independently-auditable path should at least have a task-visible hook that lets the final task outcome reconcile to some server-side evidence. The smallest shape I would leave room for is:
That keeps the audit evidence out of MCP core while making the requirement testable. A later verifier does not need the whole approval backend on the MCP wire, but it should be able to answer:
So my read is: leave the evidence format to deployments, but leave an explicit task-model attachment point for an opaque outcome/audit reference. Otherwise "independently auditable" is true operationally, but not actionable for implementers reading the SEP. |
Summary
Draft Extensions Track SEP introducing an experimental extension,
net.openid.authzen/tool-approval, that lets an MCP server gate a tool call onout-of-session approval without holding the session open. When a server's
authorization decision is a denial that is requestable (something can still
approve it), the server returns a task handle (the
io.modelcontextprotocol/tasksextension, SEP-2663) in place of the tool result instead of failing the call. The
task stays
workingwhile the approval is resolved out of band, and on approval theserver re-evaluates policy and executes the tool; the result is retrieved via
tasks/get.What approves is up to the deployment: a human reviewer, a supervising agent, an
automated policy or risk engine, an external ITSM/IGA system, or a combination. The
approval protocol runs entirely on the server's backend (referencing the OpenID
AuthZEN Access Request and Approval Profile as the reference backend) and never
appears on the MCP wire. The client carries only a server-generated
taskId.What it builds on
(
tasks/get/tasks/update/tasks/cancel,CreateTaskResult).as the reference approval backend (swappable; the MCP-observable behavior does not
depend on AuthZEN internals).
Why this is worth a SEP
Tasks gives durable async execution but says nothing about why a call is pending or
how a policy decision resolves it. No existing in-session primitive (elicitation,
sampling) survives a disconnect or models a decision made by a different party on a
different timeline. This SEP defines the narrow, MCP-observable binding: returning a
task on a requestable denial, never surfacing authorization artifacts, a
net.openid.authzen/dispositionsignal that distinguishes a denial (no side effect)from an execution error, at-most-once execution, and submission input over the task
input channel.
Open questions for reviewers (in the SEP's Limitations section)
tasks/list, sessions removed); same-principal resume across restart works, butportable handoff to a different principal is not solved at the MCP layer.
is out of scope; deployments must ensure a durable consumer or an auditable effect.
justified by the
dispositionsemantics plus discovery, but this could instead beInformational guidance. Explicitly posed for sponsor/Core Maintainer input.
Status
draft, seeking a sponsor. Given this builds directly on the Tasks extension, theAgents Working Group is the most relevant home. A prototype (an MCP server fronting an
AuthZEN PDP + access-request service, plus a tasks-only client) and a conformance
scenario are required before Final and are not yet built.
References