Skip to content

Asynchronous Approval for Tool Calls (Extensions Track SEP)#2848

Open
mcguinness wants to merge 1 commit into
modelcontextprotocol:mainfrom
mcguinness:sep-async-approval-tool-calls
Open

Asynchronous Approval for Tool Calls (Extensions Track SEP)#2848
mcguinness wants to merge 1 commit into
modelcontextprotocol:mainfrom
mcguinness:sep-async-approval-tool-calls

Conversation

@mcguinness

@mcguinness mcguinness commented Jun 3, 2026

Copy link
Copy Markdown

Summary

Draft Extensions Track SEP introducing an experimental extension,
net.openid.authzen/tool-approval, that lets an MCP server gate a tool call on
out-of-session approval without holding the session open. When a server's
authorization decision is a denial that is requestable (something can still
approve it), the server returns a task handle (the io.modelcontextprotocol/tasks
extension, SEP-2663) in place of the tool result instead of failing the call. The
task stays working while the approval is resolved out of band, and on approval the
server re-evaluates policy and executes the tool; the result is retrieved via
tasks/get.

What approves is up to the deployment: a human reviewer, a supervising agent, an
automated policy or risk engine, an external ITSM/IGA system, or a combination. The
approval protocol runs entirely on the server's backend (referencing the OpenID
AuthZEN Access Request and Approval Profile as the reference backend) and never
appears on the MCP wire. The client carries only a server-generated taskId.

What it builds on

  • SEP-2663 (Tasks extension) for the durable, server-directed task primitive
    (tasks/get / tasks/update / tasks/cancel, CreateTaskResult).
  • SEP-2322 (MRTR) for submission-time input from the user or an autonomous agent.
  • SEP-2133 (Extensions) for the extension and capability framing.
  • The OpenID AuthZEN Access Request and Approval Profile
    as the reference approval backend (swappable; the MCP-observable behavior does not
    depend on AuthZEN internals).

Why this is worth a SEP

Tasks gives durable async execution but says nothing about why a call is pending or
how a policy decision resolves it. No existing in-session primitive (elicitation,
sampling) survives a disconnect or models a decision made by a different party on a
different timeline. This SEP defines the narrow, MCP-observable binding: returning a
task on a requestable denial, never surfacing authorization artifacts, a
net.openid.authzen/disposition signal that distinguishes a denial (no side effect)
from an execution error, at-most-once execution, and submission input over the task
input channel.

Open questions for reviewers (in the SEP's Limitations section)

  • Cross-principal resumption is constrained by MCP's task-access model (no
    tasks/list, sessions removed); same-principal resume across restart works, but
    portable handoff to a different principal is not solved at the MCP layer.
  • Result-consumption orchestration (who comes back to consume a completed task)
    is out of scope; deployments must ensure a durable consumer or an auditable effect.
  • Extension vs Informational: much can be built on tasks alone; the capability is
    justified by the disposition semantics plus discovery, but this could instead be
    Informational guidance. Explicitly posed for sponsor/Core Maintainer input.

Status

draft, seeking a sponsor. Given this builds directly on the Tasks extension, the
Agents Working Group is the most relevant home. A prototype (an MCP server fronting an
AuthZEN PDP + access-request service, plus a tasks-only client) and a conformance
scenario are required before Final and are not yet built.

References

@mcguinness mcguinness force-pushed the sep-async-approval-tool-calls branch 3 times, most recently from f5101c1 to 8cca561 Compare June 3, 2026 00:21
@mcguinness mcguinness requested review from a team as code owners June 3, 2026 00:21
@mcguinness mcguinness force-pushed the sep-async-approval-tool-calls branch 2 times, most recently from 9d4fde8 to cd425b4 Compare June 3, 2026 05:18
@localden localden added the SEP label Jun 8, 2026
@localden localden added the proposal SEP proposal without a sponsor. label Jun 8, 2026
@rpelevin

rpelevin commented Jun 8, 2026

Copy link
Copy Markdown

This is a good shape for async approval because the MCP-visible part stays narrow: the client gets a task handle, while the approval system stays server-side.

One invariant I would make explicit is that the task handle is not authority by itself. It should be bound to the exact original call envelope and to the policy evaluation that produced the requestable denial.

The server-side record should preserve at least:

  • task id;
  • tool name;
  • canonical arguments digest;
  • requester principal/session;
  • subject or resource being acted on;
  • policy id/version;
  • approval request id;
  • disposition;
  • expiry/freshness window.

On approval, the server should re-evaluate policy against the same envelope before execution. If the tool, arguments, principal, resource, policy version, or freshness window no longer match, the task should resolve as stale/denied with no side effect.

Acceptance tests I would want:

  • requestable denial returns a task without executing the tool;
  • approval resumes the call at most once;
  • denial produces a terminal disposition, not an execution error;
  • cancelled or expired task cannot later execute;
  • changed arguments or principal require a new approval;
  • result retrieval does not expose backend approval artifacts.

That keeps async approval durable without turning taskId into a bearer approval token.

Draft Extensions Track SEP (experimental extension net.openid.authzen/tool-approval,
per SEP-2133) that lets an MCP server gate a tool call on out-of-session approval,
resolved out of band by a human, a supervising agent, a policy or risk engine, or an
external system.

Layered on the tasks extension (SEP-2663): a requestable denial returns a
CreateTaskResult (resultType "task") in place of the tool result; the server brokers
the access request server-side (OpenID AuthZEN Access Request and Approval Profile),
re-evaluates against the policy decision point on approval, and executes the tool
exactly once. Submission input flows through the task input channel (SEP-2322 MRTR),
answerable by a human or an autonomous agent. The client carries only a
server-generated taskId; all authorization artifacts stay server-side.

Key semantics:
- The denied-vs-executed distinction is carried in the CallToolResult body (not only
  _meta), with a net.openid.authzen/disposition companion, so a tasks-only client can
  act on it safely.
- A non-tasks client may receive a degraded actionable requestable CallToolResult in
  addition to the -32003 path; includes a client-maturity note.
- At-most-once execution; best-effort dedup scoped to originating auth context, tool,
  and arguments; a re-submission after a terminal denial starts a new request.
- tasks/cancel before execution is honored (skip-and-deny); once execution begins the
  cancel is ineffective, never both executed and cancelled.
- COAZ -> Access Request -> re-eval composition stated explicitly; COAZ flagged as a
  draft dependency.
- ttlMs covers and is extended to the learned approval window; in-task input requires
  a connected client (collect inputs pre-submission for long gaps).
- Limitations cover cross-principal and lost-taskId resumption; approval-amplification
  rate-limiting is MUST with an over-limit non-requestable denial; backward
  compatibility is additive for new tools but a behavior change for upgraded ones.
@mcguinness mcguinness force-pushed the sep-async-approval-tool-calls branch from cd425b4 to 8d12c29 Compare June 9, 2026 03:19
@mcguinness

Copy link
Copy Markdown
Author

Strongly agree, and this is the invariant worth stating in normative text rather than leaving implied. The handle is a reference to a pending decision, never a grant. The SEP leaned this way in a few scattered places; per your comment I have consolidated it into one explicit rule plus a required binding record. Applied in 8d12c295.

Where it already lived:

  • Completion requires a fresh re-evaluation on approval (approval is an input to a new decision, not a standing grant; PDP authoritative at enforcement), plus an execution-time freshness/credential check.
  • Security / Confused deputy binds the task to the originally evaluated subject, resource, action, and context and forbids steering execution to a different operation.
  • At-most-once + best-effort dedup, and authorization artifacts stay server-side (taskId is not a bearer token; binding material never crosses the wire).

What I added (new subsection "The task handle is not authority", plus edits to Completion and Security):

  1. An explicit invariant: the taskId is not authority. Authority is the PDP decision, re-derived at execution against the original call envelope.
  2. A normative binding record the server MUST retain and MUST check before executing: task id, tool name, canonical arguments digest, requester principal/session, subject/resource, approval request id, disposition, and expiry/freshness window (your list, adopted close to verbatim).
  3. An explicit envelope-match-or-new-approval rule: if tool, arguments, principal, or subject/resource differ from the bound envelope at execution time, the task resolves denied-not-executed with no side effect and a changed call requires a new access request. This sharpens the dedup section, which only spoke to retries.

One refinement I flagged rather than adopting verbatim: policy id/version. I record it (audit, drift detection), but did not make a version change auto-resolve to stale. The SEP re-evaluates against current policy, so a policy change should produce a fresh allow-or-deny, not an automatic denial; pinning to the exact version that produced the requestable denial would also reject approvals still valid after an unrelated policy edit. So in the text: envelope fields (tool/args/principal/resource) are an exact-match binding; policy version is recorded and re-evaluation runs against current policy. If you specifically want version pinning for a class of high-assurance tools, that reads as a deployment policy on top rather than the default. Did you mean strict pinning or drift-detection?

Your acceptance tests are exactly the conformance scenarios this needs, and most are MCP-observable, so I folded them into the conformance clause (now a checklist):

  • requestable denial returns a task without executing — observable
  • approval resumes at most once — observable
  • denial is a terminal disposition, not an execution error — observable (the denied-not-executed vs execution-error distinction, carried in the CallToolResult body)
  • cancelled or expired task cannot later execute — observable (ties to the cancel-before-execution boundary rule)
  • changed arguments or principal require a new approval — observable (the envelope-match test above)
  • result retrieval does not expose backend approval artifacts — observable

Thanks, this tightens the "durable handle, not bearer token" line that is the whole point of keeping the approval system server-side.

@AgentGymLeader

Copy link
Copy Markdown

On the Limitations: you give two safety conditions for a side-effecting tool — a durable consumer, or that the effect is "independently auditable." The durable-consumer half has a protocol shape (poll the task). The independently-auditable half doesn't — it's a requirement with nothing in the task model to satisfy it against.

Is that intentional (left fully to deployments), or is leaving room for it in the task model in scope for this SEP? The two conditions read as parallel, but only one is actionable at the protocol level.

@rpelevin

rpelevin commented Jun 9, 2026

Copy link
Copy Markdown

I would split the answer in two.

The audit system itself can stay deployment-owned. I do not think this SEP needs to standardize the audit record schema, storage backend, retention policy, or verifier format.

But I would avoid leaving "independently auditable" as pure prose, because then the two safety conditions are not really parallel. The durable-consumer path has a protocol object to come back to. The independently-auditable path should at least have a task-visible hook that lets the final task outcome reconcile to some server-side evidence.

The smallest shape I would leave room for is:

  • the task is still bound to the original call envelope: task id, tool, arguments digest, requester/subject/resource, policy context, and approval request;
  • the terminal task result still distinguishes approved-executed, denied-not-executed, execution-error, and cancelled;
  • if the side effect can complete without a live consumer, the server records an outcome entry keyed by the task id / decision id / original call digest;
  • tasks/get can surface an opaque outcome or audit reference when the deployment exposes one;
  • a deployment with neither a durable consumer nor a later-verifiable outcome record should be called out as unsupported for side-effecting tools, not merely discouraged.

That keeps the audit evidence out of MCP core while making the requirement testable. A later verifier does not need the whole approval backend on the MCP wire, but it should be able to answer:

  1. did the tool execute or not;
  2. which bound call envelope did that outcome apply to;
  3. was the approval/denial/cancel terminal before execution;
  4. can a retry or lost consumer cause a second mutation.

So my read is: leave the evidence format to deployments, but leave an explicit task-model attachment point for an opaque outcome/audit reference. Otherwise "independently auditable" is true operationally, but not actionable for implementers reading the SEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proposal SEP proposal without a sponsor. SEP

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants