SEP-2787: Tool call attestation by soup-oss · Pull Request #2787 · modelcontextprotocol/modelcontextprotocol

soup-oss · 2026-05-25T17:37:07Z

Extensions Track SEP proposing signed tool call attestation envelopes for MCP — binding intent, agent identity, tool name, and arguments into a verifiable audit trail. Targets EU AI Act Article 12 compliance.

Motivation and Context

MCP has no standard mechanism to cryptographically prove which agent called which tool, with what arguments, and for what purpose. Regulated deployments (EU AI Act, AI Liability Directive) need this for audit trails. This SEP fills that gap with a minimal envelope carried in _meta, requiring no protocol changes.

How Has This Been Tested?

Example implementation was added in https://github.com/soup-oss/sep-tool-call-attestation/tree/master/example

@vaaraio has put together a full suite of conformance test vectors and validation criteria over in PR #2789

Breaking Changes

None

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

vaaraio · 2026-05-26T06:07:43Z

Hi @soup-oss, thanks for filing this. The regulatory gap is real and the envelope shape lands close to work already in production.

Two pointers in case they are useful as prior art:

OVERT 1.0 (Glacis Technologies, published 2026-03-25, https://overt.is) defines a related primitive: signed, schema-closed envelopes a relying party can verify offline. Apache-2.0 open standard with a royalty-free patent covenant for conformant implementations. The shape rhymes with SEP-2787 but the design choices differ:

Encoding: canonical CBOR per RFC 8949 rather than JSON. Smaller wire size and stricter canonicalisation for signature stability across implementations.
Crypto: Ed25519 signatures over HMAC-SHA256 content commitments. The content commitment lets the request payload stay local while only the HMAC crosses the trust boundary, which matters for privacy and for the EU AI Act Article 12 evidence chain when arguments contain PII or trade secrets.
Schema: closed 9-field shape with IEEE-754 float rejection. Helps interoperability across multiple emitters.
Counter: monotonic counter across the emitter process so gaps are detectable on the verifier side.
Phase 3: the IAP role notary-signs the Provisional Receipt and anchors it in a transparency log, which complements rather than replaces an inline ack callback.

Vaara (https://github.com/vaaraio/vaara, Apache 2.0) ships an MCP proxy that emits OVERT 1.0 Base Envelopes per tools/call, resources/read, and prompts/get since v0.24.0 (released 2026-05-20). Working examples with real upstream MCP servers in examples/github-mcp-proxy-demo/, examples/sap-mcp-proxy-demo/, and examples/goose-mcp-proxy-demo/. The proxy is transparent to both the MCP client and the upstream server, so it works with any stdio MCP server without protocol changes (matches your design constraint).

For the SEP's open design questions:

Emitter location. SEP-2787 implies a client-side emitter. The proxy pattern puts the emitter between client and server, which has the benefit that no MCP client needs to change to gain attestation. The trade-off is that the proxy must be deployed. Both shapes are valid and probably both belong in the spec landscape.
Argument handling. Your inline-or-resource-URL design is one way. The HMAC-commitment route in OVERT keeps the payload entirely local. The verifier never sees the args, only the commitment they were bound to. For regulated deployments where arguments contain personal data this side-steps a category of disclosure risk.
Cryptographic algorithm choice. JWT-family (HS256/ES256/RS256) optimises for ecosystem familiarity. Ed25519 (used in OVERT) gives smaller signatures, no ASN.1/DER awareness needed, and constant-time implementations are simpler to audit. Worth weighing against the JWT-tooling familiarity benefit.

Glacis also ships a Python SDK at https://github.com/Glacis-io/glacis-python (Apache 2.0) as the reference implementation by the standard's authors.

Reference verifier CLI is vaara overt verify RECEIPT.cbor --pubkey-file PUB.bin. The verifier reads only the wire format and takes no dependency on Vaara's emitter, so any OVERT-conformant implementation can route its conformance check through it. Test cases are available if useful for the SEP.

Apache 2.0 throughout, no commercial product.

Rul1an · 2026-05-26T10:45:27Z

This is a useful direction. I like the core instinct here: MCP probably needs a standard way to make a tool call reviewable after the fact, without every client, gateway, or regulated deployment inventing its own envelope.

One boundary I would keep very sharp is the difference between a pre-execution attestation and post-execution evidence.

The current envelope works well as a signed pre-execution statement. It says who is asking, which tool/server is targeted, which arguments or argument digest are bound, what intent was declared, when it was issued, and which key/version signed it.

That proves an intent-bound request. It does not by itself prove that the tool actually executed, what the application-level outcome was, or whether a downstream system accepted the result. The optional ack starts to close that loop, but it feels like a different layer from the core attestation. My bias would be to keep this SEP focused on the request attestation, and treat acknowledgement/outcome receipts as either a separate phase or a follow-up extension.

The other thing I would make explicit is the source of each fact. In audit and replay systems it matters whether a field came from the client/agent planner, the attestation issuer, the MCP server verifier, the tool/application result, a policy engine, or a payload-derived projection/digest. Those are different trust surfaces. Keeping them apart prevents the envelope from claiming more than the layer can actually prove.

I would also be cautious with inline arguments. Signed args are useful in some deployments, but the privacy-friendly default should probably be digest, reference, or redacted projection rather than payload storage. The audit invariant is usually “this call was bound to this exact argument set” or “this reviewed projection was bound,” not “all arguments are now stored in a long-lived compliance artifact.” A more explicit args_digest / args_ref / args_projection shape may be easier to interop than overloading args: string with both inline JSON and resource references.

Canonicalization may need tightening too. “Sorted keys, no whitespace” is a good start, but different language stacks will still disagree on numbers, Unicode escaping, duplicate object keys, floats, NaN, and parser behavior. A JSON Schema plus JCS/RFC 8785, or a small restricted JSON profile, would make conformance testing less surprising.

One smaller concern: the multi-server toolCalls array is powerful, but I would clarify whether it is just a signed plan bundle or whether it is meant to carry stronger workflow semantics. If every server maintains its own nonce cache, a shared nonce helps with replay at each verifier, but it does not fully define what happens when only part of the multi-server plan executes.

So the shape I would find easiest to adopt is:

request attestation: signed before execution, binds issuer/subject/tool/server/arguments-or-digest/intent/time/nonce/key version
verification result: server-side allow/reject/error reason
execution receipt or ack: optional later layer, binding server identity and observed outcome or outcome digest

Overall, strong proposal. I think the most valuable thing this SEP can standardize early is not the whole compliance story, but the stable evidence boundary. Once MCP has a common way to bind a tool call to identity, target, argument digest/projection, intent, nonce, and key version, downstream audit, replay, policy, and receipt systems can compose around it much more cleanly.

vaaraio · 2026-05-26T12:08:12Z

Thanks for the careful read.

The pre-execution attestation vs post-execution evidence split is exactly the boundary worth making explicit. OVERT 1.0 separates these via the Phase 3 IAP layer, where the relying party notary-signs a Provisional Receipt and anchors it in a transparency log, distinct from the inline ack pattern in SEP-2787. That feels like the cleaner place to land: pre-execution envelope is one primitive, execution receipt is a different primitive composed on top.

On argument handling, OVERT keeps the payload entirely local via an HMAC-SHA256 content commitment. The verifier never sees the args, only the commitment they were bound to. For regulated deployments with PII or trade-secret arguments this side-steps a whole disclosure category. The args_digest / args_ref / args_projection shape maps to the same intent. The digest case is essentially what OVERT does.

Canonicalization: OVERT uses canonical CBOR per RFC 8949 with IEEE-754 float rejection. Stricter than JCS / RFC 8785, but the underlying point is the same: pin the bytes so signatures verify identically across implementations. Whichever the SEP lands on, a normative schema plus explicit canonicalization rules will save reviewers from parser folklore.

On toolCalls as a signed plan bundle vs multi-server workflow: real ambiguity worth resolving in the SEP. OVERT envelopes are per-interaction, which sidesteps the question, but the SEP's bundle shape probably needs an explicit statement on partial-execution semantics and per-verifier replay windows.

The framing of a "stable evidence boundary" lands well. That's the right altitude: bind a tool call to identity, target, args commitment, intent, nonce, key version, and stop there. Outcome receipts and policy decisions compose on top, in separate layers.

vaaraio · 2026-05-26T12:26:03Z

Follow-up after sitting with the review longer. Four concrete proposals, bundled so the envelope shape settles in one pass rather than drifting across threads.

On the source of each fact: the cleanest path is to annotate every envelope field with its trust surface. Issuer-asserted fields are set by the attestation issuer (subject, intent, time, nonce, key_version, args commitment). Verifier-asserted fields are set by the MCP server verifier (allow/reject/error reason, observed nonce). Payload-derived fields come deterministically from the request payload (args_digest, args_projection). Planner-declared fields are set by the client or agent upstream of the issuer (declared_purpose, requested_capability). The schema can either tag fields with a source annotation or group them under named blocks. The invariant is that no field gets sourced from "envelope" as an undifferentiated whole.

On argument handling: the current args: string field is overloaded with both inline JSON and resource references. An explicit three-way shape reads cleaner. args_digest is a hash commitment over canonical bytes, privacy-friendly default, payload stays local. args_ref is a content-addressed reference, digest plus retrieval URI. args_projection is a redacted or transformed projection of the args, with its own digest. Implementations pick one per call. The audit invariant becomes "this call was bound to this exact commitment" and inline payload storage is opt-in. OVERT 1.0 does this via its HMAC content commitment.

On canonicalization: the current "sorted keys, no whitespace" rule is too loose for cross-stack conformance. A normative reference to RFC 8785 (JSON Canonicalization Scheme) pins behaviour for numbers, Unicode escaping, duplicate keys, floats, and NaN, the places where language stacks silently disagree. A small JSON Schema accompanying the canonical form makes conformance testing tractable. CBOR per RFC 8949 with IEEE-754 float rejection is the stricter alternative. JCS keeps the JSON ecosystem familiarity at some cost. Either is defensible. The SEP needs to pick one and reference it normatively.

On scope: the optional ack field probably belongs in a follow-up extension rather than this SEP. The stable evidence boundary worth standardizing here is binding a tool call to identity, target, args commitment, intent, nonce, key version. Execution receipts and policy decisions compose on top in separate layers. This keeps the surface tight and lets downstream audit, replay, and receipt systems compose around a clear primitive.

…ape) (#139) * feat(attestation): add SEP-2787 reference implementation, proposed shape Adds vaara.attestation.sep2787 implementing the SEP-2787 Tool Call Attestation envelope (modelcontextprotocol/modelcontextprotocol#2787) with the four schema changes Vaara raised in the v1 draft thread: fact-source labels (three trust-surface blocks), three-way args shape (ArgsDigest / ArgsRef / ArgsProjection), RFC 8785 (JCS) canonicalization with IEEE-754 float rejection, and request-attestation-only scope (the v1 optional ack field is excluded and belongs in a separate extension). Supports HS256, ES256, RS256 signing per the v1 draft. Coexists with the existing OVERT 1.0 implementation. See docs/sep2787-overt-mapping.md for the field-level mapping between the two envelopes. 16 unit tests covering all three signing algorithms, all three args-commitment shapes, tampering rejection, canonicalization invariants, and TTL handling. Ruff-clean. * chore(attestation): address SEP-2787 PR review feedback - Rename the test helper _emit_hs256 to _emit_attestation. It builds envelopes for all three signing algorithms, not only HS256, so the HS256-only name is misleading. - Add test_ttl_clock_skew_tolerance_window covering the verifier's default 30-second skew window: a 60-second TTL with iat + 75 still verifies, iat + 91 does not. - Switch the optional-dependency probe from a try/import block to importlib.util.find_spec. Eliminates the CodeQL unused-import finding on rfc8785 without changing skip semantics. - Convert the docs/sep2787-overt-mapping.md reference to COMPLIANCE.md into a relative markdown link. --------- Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>

Rul1an · 2026-05-26T14:14:58Z

This is useful prior art, and I think it also helps clarify the SEP boundary.

The thing I would keep strict is implementation neutrality. OVERT/Vaara can be one concrete receipt family, but I do not think MCP should standardize any one receipt family in this SEP.

The MCP-level primitive feels smaller: a request attestation that binds issuer, subject, server/tool target, argument commitment or projection, declared intent, nonce, time, and key version. A verifier result can be a separate server-side fact. Execution receipts, policy decisions, transparency logs, and signed receipt chains can then compose on top.

A good test might be: can two independent implementations agree on the request-attestation semantics without agreeing on the later receipt system? That probably means test vectors and conformance cases should be the normative artifact, not any single implementation.

If yes, the SEP has found the right layer. If no, it may be pulling too much of a downstream audit model into the MCP primitive.

soup-oss · 2026-05-26T14:15:25Z

Hi @vaaraio , thank you for this incredible, high-signal feedback.

The work you’ve done with OVERT 1.0 and Vaara is amazing prior art. The privacy-first approach of using HMAC content commitments is a game-changer for handling PII under frameworks like the EU AI Act.

My primary goal with this SEP is ensuring MCP gets a native cryptographic attestation layer so developers can build secure infrastructure around agent intents. Whether the spec leans toward JWT for ecosystem familiarity or adopts a harder-nosed CBOR/commitment model like OVERT, getting this boundary into the protocol is what matters.

I see you’ve opened a related PR, I'll keep an eye on the discussion and try to help how we can align these two shapes

vaaraio · 2026-05-26T14:29:25Z

On test vectors: that's the right call. tests/test_attestation_sep2787.py in the reference impl covers the conformance surfaces (signature verification across HS256/ES256/RS256, all three args-commitment shapes, JCS canonicalization invariants, envelope tampering rejection, TTL handling with clock-skew tolerance). Apache 2.0, ready to lift into a standalone normative test-vector artifact.

On implementation neutrality: vaara.attestation.sep2787 is the SEP-2787 envelope only. It does not embed, reference, or imply any downstream receipt system, transparency log, or signed-chain model. A second independent implementation of the same schema interops with this verifier without taking any position on what happens post-execution. The same package ships vaara.attestation.overt as a separate module for the CBOR-based OVERT 1.0 shape, and the two are wire-independent. The mapping in docs/sep2787-overt-mapping.md is a translation table, not a runtime dependency. OVERT 1.0 by Glacis Technologies (Glacis-io/glacis-python) is the empirical parallel: different org, different wire format, same logical layer.

Reference implementation: vaaraio/vaara#139.

Rul1an · 2026-05-26T14:38:12Z

That sounds like a good seed.

The bit I would keep separate is authorship of the first implementation versus ownership of the conformance surface. Vaara’s tests may be a very useful starting point, but for the SEP I would expect the normative vectors to live with the proposal itself and be validated by at least one second implementation.

The thing to standardize is observable interop: same canonical bytes, same signature input, same verification result, same rejection cases. That keeps Vaara/OVERT as strong prior art and implementation input, while keeping the MCP primitive implementation-neutral.

vaaraio · 2026-05-26T14:40:58Z

@Rul1an agreed. Authorship of the first implementation and ownership of the conformance surface belong in different places. The vectors should live with the SEP. A second implementation validating them is the right gate.

The four conformance dimensions you named (canonical bytes, signature input, verification result, rejection cases) are precisely what tests/test_attestation_sep2787.py exercises today. These can be extracted into a vector set and filed against this PR or a sibling location the SEP wants to maintain. Vaara's repo keeps the implementation, the SEP repo owns the normative artifact.

What format would work best on your side?

Rul1an · 2026-05-26T14:44:36Z

@vaaraio Agreed, that split sounds right.

I cannot speak for the SEP maintainers, but the shape I would find easiest to review is a small fixture set that does not import Vaara, OVERT, or helper code from any implementation.

Maybe test-vectors/sep-2787/v0/, with plain files for the unsigned envelope, expected canonical bytes, signature input bytes, test keys, expected signed envelope, expected verification result, and a few negative cases for tampering, expired TTL, unsupported alg, bad canonicalization, and invalid args commitment shape.

The useful test is whether a second implementation can read those files directly and produce the same bytes and the same pass/fail result. That keeps the vectors boring, portable, and owned by the SEP instead of by the first implementation.

vaaraio · 2026-05-26T15:07:20Z

@Rul1an v0 fixture set ready against your layout. 40 KB zipped.

Six positive cases: HS256, ES256, RS256 round-trips, plus one fixture each for the digest, ref, and projection args commitment shapes. Each carries unsigned_envelope.json, canonical_signing_input.bin, canonical_signing_input.hex, signed_envelope.json, and expected.json.

Seven negative cases: tampered planner_declared block, tampered issuer_asserted block, expired TTL (signature valid, clock past iat + exp + skew), unsupported alg (HS512), IEEE-754 float in canonical input, invalid args commitment kind, HS256 envelope against an ES256 verifier.

Keys are pinned. hs256_secret.bin is 32 raw bytes. ES256 and RS256 keys are PKCS8 / SPKI PEM. ES256 signatures are raw r||s (64 bytes), not ASN.1 DER. HS256 and RS256 are deterministic so a second implementation re-signing reproduces the stored signature_hex exactly. ES256 signing is randomised, so the ES256 case verifies the stored signature against the pinned public key rather than bit-for-bit reproduction.

The bundle includes _check_independent.py, a verifier that imports only the stdlib plus cryptography and rfc8785. It reads the fixtures from disk and walks the four conformance dimensions with no reference to the reference implementation. Output is six positive OKs, two negative OKs on the pure signature cases, and five SKIPs on verifier-policy cases that depend on the verifier's own clock or schema validator.

Origin and license are in README.md and MANIFEST.json. Apache-2.0, derived from tests/test_attestation_sep2787.py at commit 3d7af54 of vaaraio/vaara. SEP maintainers own the final normative artifact location.

sep-2787-vectors-v0.zip

Rul1an · 2026-05-26T16:00:35Z

@vaaraio Nice turnaround. This looks like useful seed material.

I would leave acceptance and final layout to the SEP maintainers, but one distinction seems worth keeping in the bundle itself: byte/signature conformance cases versus verifier-policy cases. The former can be normative immediately. Things like TTL clock choice, unsupported alg handling, and schema rejection may need an explicit validator policy before they become pass/fail requirements.

I would also expect the final artifact to be committed as plain fixture files in the SEP repo rather than kept as an attached zip, so a second implementation can consume it in CI without depending on Vaara or on the comment thread.

From my side, the important gate is still the same: one independent implementation reads the SEP-owned fixtures and gets the same canonical bytes and verification results.

vaaraio · 2026-05-26T19:21:02Z

@Rul1an Filed as #2789, layout under test-vectors/sep-2787/v0/.

normative/ covers signed-envelope round-trips across HS256/ES256/RS256, the three args-commitment shapes (digest, ref, projection), tampering rejection on the planner_declared and issuer_asserted blocks, and IEEE-754 float rejection at the canonicalisation boundary. Nine cases, pass/fail against the SEP-2787 wire format today.

verifier-policy/ covers TTL expiry past iat + exp + skew, unsupported-alg rejection (HS512), schema rejection of unknown args-commitment kinds, and HS256-against-ES256-verifier alg-mismatch. Four cases that depend on an explicit validator-policy paragraph in the SEP before they become normative.

_check_independent.py reads the fixtures from disk and walks the conformance dimensions with no reference to any Apache-2.0 throughout. SEP maintainers own the final layout, including whether the artifact lives at test-vectors/sep-2787/v0/, a sibling path, or in a separate repo. The gate stays the one you named: one independent implementation reads the SEP-owned fixtures and produces the same canonical bytes and signature verification results.

…extprotocol into tool-call-attestation

Rul1an · 2026-05-26T22:23:19Z

Nice update. This is moving in the right direction: deferring ack, switching canonicalization to RFC 8785, and making the argument surface explicit all make the core primitive much easier to reason about.

One verification detail seems worth tightening before this becomes the shape implementers follow. The verification rules currently match the receiving server and tool name, but they do not explicitly require the verifier to bind the actual tools/call.params.arguments to the attested argument commitment.

For args_ref, that probably means resolving the referenced payload, checking the digest, and confirming that the tool arguments being executed are the same payload or the same canonical bytes.

For args_projection, it may need one sentence saying what the verifier can and cannot prove. If the projection is redacted or summarized, the verifier can prove only that the projection was signed, not that it is a complete representation of the runtime arguments. If it is an identity projection, the verifier can compare it directly to the canonicalized runtime arguments.

That keeps the request-attestation boundary tight: the SEP proves identity, target, intent, nonce, time, and an explicit argument commitment. Execution receipts and downstream outcome evidence can still stay deferred.

vaaraio · 2026-05-26T23:12:07Z

@Rul1an On the argument-commitment binding: the reference impl now wires this as Step 5 in v0.37.1, released a few minutes ago. verify_args_commitment covers the three commitment shapes against the spec text.

For args_ref: resolve via a caller-supplied resolver (the verifier does no network IO), hash the content, match both the stored digest and the canonicalized runtime arguments.

For args_projection: recompute the projection digest, then report projection_match as a tri-state. True for identity projections, False for redacted or summarized projections, where the verifier accepts the signed projection but makes no completeness claim, per your reading.

For args_digest (Vaara's commitment-only shape): recompute the JCS-canonical hash of the runtime arguments and compare to the bound commitment.

Returns ArgsCommitmentResult(ok, reason, projection_match) with reason set to args_commitment_mismatch on failure, matching the spec's error-reason enum.

Composed after the existing signature and TTL checks once the tools/call arguments are in hand. Files: src/vaara/attestation/_sep2787_verifier.py plus 11 tests in tests/test_attestation_sep2787.py.

…#150) The SEP-2787 draft envelope adopted MCP camelCase convention in soup-oss/modelcontextprotocol@48c739b1. Vaara's proposed-shape reference implementation now emits camelCase JSON keys on the serialisation boundary while keeping Python dataclass attributes in snake_case, so user code is unchanged. `Attestation.to_dict()` and the JCS-canonical signing payload emit `plannerDeclared`, `issuerAsserted`, `payloadDerived`, `toolCalls`, `serverFingerprint`, `secretVersion`, `expSeconds`, `requestedCapability`, `projectionDigest`. New `issuer_to_dict` helper replaces the prior `asdict()` call so the issuer block sorts and renames deterministically without leaking Python-internal names. `docs/sep2787-overt-mapping.md` updated. CHANGELOG entry under 0.39.1. pyproject.toml, src/vaara/__init__.py, and clients/ts/package.json all bumped. 28 attestation tests pass; ruff clean. The v0 test vector PR (vaaraio/modelcontextprotocol#2789, head 2a9360f, cited in modelcontextprotocol/modelcontextprotocol#2787) was regenerated with the same renames separately on 2026-05-27. Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>

vaaraio · 2026-05-27T17:36:07Z

@soup-oss Trust-surface grouping landing in the envelope is the right shape. Vaara's reference impl will follow it through the remaining mechanical diffs in the next release: move toolCalls under payloadDerived, swap argsProjection to the JSON-stringified encoding, and drop Vaara's kind-discriminated argsDigest extension. Commitment-only audit composes cleanly on top of argsProjection as an identity projection of a hash-only object, no third kind needed in the spec. A v1-current sibling vector set against the merged shape is on offer when useful.

…it-event schema 1.0, Qi survey mapping (#151) The four mechanical alignments Vaara committed to in modelcontextprotocol/modelcontextprotocol#2787 after the trust-surface grouping was incorporated into the SEP draft on soup-oss commit dd030d5b ship as the v2 envelope shape: 1. toolCalls lives under payloadDerived, not plannerDeclared. Tool bindings (name, server fingerprint, args commitment) are facts derived from the request payload, not planner declarations. 2. argsProjection serialises with a JSON-stringified projection field carrying the JCS-canonical encoding of the projection object. The digest is taken over those bytes. 3. The v1 kind-discriminated union is dropped. ArgsRef and ArgsProjection self-discriminate by which fields are present. 4. Commitment-only audit composes on ArgsProjection as a hash-only-identity projection of the form {"digest": "sha256:..."}. No separate ArgsDigest type ships in the spec. parse_attestation(d) is the new wire-decode entrypoint: inverse of Attestation.to_dict(). 13 new tests cover emit -> JCS bytes -> parse -> verify across HS256, ES256, RS256 for both ArgsRef and ArgsProjection, plus parse rejection on missing-field and unsupported-alg inputs and a byte-identical re-emit check. Two doc artefacts ship in the same release: - docs/audit_event_schema.md: AUDIT-EVENT-SCHEMA-1.0, versioned wire/storage contract for the audit events Vaara emits. Independent of code version so third-party consumers can pin without coupling to a Python runtime version. - docs/qi_survey_mapping.md: Vaara surface coverage against the taxonomy in Qi et al., Towards Trustworthy Agentic AI (arXiv:2605.23989, 2026-05-17). Direct, partial, and out-of-scope rows by Perceive / Plan / Act / Reflect / Learn / Multi-agent / Long-horizon stage under both top-level dimensions. SEP-2787 reference implementation tag sep2787-ref-v2 lands on this release commit alongside v0.39.2 for cross-repo provenance. The v0.40 slot stays reserved for the deployment-shape scope (HTTP transport, multi-tenancy schema, hot-reload extended, fan-out) per project_v040_roadmap_opa_frame_20260527.md. Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>

Rul1an · 2026-05-27T20:59:15Z

This is converging nicely. One wording nit before this gets reviewed as the stable boundary: a few places still sound like the attestation proves execution, while the body later correctly defers execution acknowledgement/receipts.

In particular, the PR summary mentions "execution proof" / "whether it executed", and the Authorization section says attestation proves "that they called it." I think the tighter wording is that the attestation binds an observed tools/call request to issuer, subject, target, intent, nonce, time, and argument commitment/projection. Whether the tool executed, and what outcome occurred, stays in the deferred execution acknowledgement/receipt layer.

That keeps the current SEP crisp as pre-execution request attestation without weakening the future receipt story.

One Vaara process now serves a fleet of upstream MCP servers over Streamable HTTP, with multi-tenant policy, audit chain, and OVERT attestation on the same substrate. v0.39 ran one Vaara process per upstream; v0.40 collapses that into a single multi-tenant deployment. Streamable HTTP transport on the proxy. `vaara-mcp-proxy --transport http --http-host H --http-port P` runs POST /mcp via FastAPI / uvicorn. The endpoint reads `X-Vaara-Tenant` and `X-Vaara-Upstream` per request, pushes them into ContextVars, and dispatches into the existing `_handle_request` path so policy, perimeter, OVERT, and progress-notification handling all light up unchanged. Notifications return 202. Bodies above 1 MiB return 413. Unknown upstream returns 404. Fan-out via repeatable `--upstream NAME=CMD`. One Vaara process holds N UpstreamMCPClient instances in a name -> client map. Bare `--upstream CMD` keeps the v0.39 single-upstream contract (lands in the "default" slot). When more than one upstream is configured, a request with no `X-Vaara-Upstream` header returns 400 with the list of valid slots in the error envelope. Single-upstream deployments keep the silent-default contract. tenant_id end-to-end. ScoreRequest, AuditEventRequest, PolicyReloadRequest accept a `tenant_id` body field, with `X-Vaara-Tenant` as the HTTP-header alternative (body wins over header). AuditRecord gains a `tenant_id` field, excluded from `compute_hash()` so pre-v0.40 chains still re-verify. AuditTrail keeps an `action_id -> tenant_id` map seeded by `record_action_requested`, soft-capped at 50k entries. SQLiteAuditBackend.write_record prefers per-record tenant. OVERT envelopes carry `tenant_id` as a `non_content_metadata` claim. Per-tenant policy plane. `vaara.policy.registry.PolicyRegistry` holds one PolicyController per tenant with the empty string slot reserved as the default fallback. `vaara serve --policy-dir DIR` loads one YAML/JSON policy per file (filename stem = tenant_id). `POST /v1/policy/reload` routes per tenant via body field or header. Installs `vaara-mcp-proxy` as a top-level console script so the proxy CLI matches what every v0.39+ docs surface advertises. Earlier releases only shipped the proxy as `python -m vaara.integrations.mcp_proxy`; v0.40 closes that gap. v0.41 will fold the proxy into the main `vaara` verb tree (`vaara mcp-proxy ...`) and keep `vaara-mcp-proxy` as a thin alias for one release cycle. Per-tenant threshold dispatch at evaluate-time. `AdaptiveScorer.evaluate` consults the registry on every call. A new `policy_lookup` constructor arg (and `set_policy_lookup` for late binding from ServerState) lets the scorer ask which tenant policy applies right now and use its allow/deny thresholds for THIS evaluation. Unknown tenant or no lookup configured falls back to the scorer-bound defaults that the default-slot listener keeps fresh on reload. The backend decision dict surfaces the applied threshold_allow and threshold_deny so operators can confirm which tenant's policy ran. MWU expert state, the conformal calibrator, agent profiles, and sequence patterns stay shared across tenants; only threshold application is per-tenant in v0.40. Scope notes. HTTP transport is POST-only (GET-SSE is v0.41). Per-tenant policy reload is hot; classifier hot-reload still restart-only. Cancellation routing across fan-out is v0.41 hardening. Fan-out latency bench is v0.40.1 measurement. 862 passed, 12 skipped. 45 new tests across tests/test_v040_tenant.py, tests/test_v040_policy_registry.py, tests/test_v040_mcp_http_transport.py, tests/test_v040_per_tenant_threshold.py. References modelcontextprotocol/modelcontextprotocol#2787 for the SEP-2787 envelope shape v0.40 builds on top of.

One Vaara process now serves a fleet of upstream MCP servers over Streamable HTTP, with multi-tenant policy, audit chain, and OVERT attestation on the same substrate. v0.39 ran one Vaara process per upstream; v0.40 collapses that into a single multi-tenant deployment. Streamable HTTP transport on the proxy. `vaara-mcp-proxy --transport http --http-host H --http-port P` runs POST /mcp via FastAPI / uvicorn. The endpoint reads `X-Vaara-Tenant` and `X-Vaara-Upstream` per request, pushes them into ContextVars, and dispatches into the existing `_handle_request` path so policy, perimeter, OVERT, and progress-notification handling all light up unchanged. Notifications return 202. Bodies above 1 MiB return 413. Unknown upstream returns 404. Fan-out via repeatable `--upstream NAME=CMD`. One Vaara process holds N UpstreamMCPClient instances in a name -> client map. Bare `--upstream CMD` keeps the v0.39 single-upstream contract (lands in the "default" slot). When more than one upstream is configured, a request with no `X-Vaara-Upstream` header returns 400 with the list of valid slots in the error envelope. Single-upstream deployments keep the silent-default contract. tenant_id end-to-end. ScoreRequest, AuditEventRequest, PolicyReloadRequest accept a `tenant_id` body field, with `X-Vaara-Tenant` as the HTTP-header alternative (body wins over header). AuditRecord gains a `tenant_id` field, excluded from `compute_hash()` so pre-v0.40 chains still re-verify. AuditTrail keeps an `action_id -> tenant_id` map seeded by `record_action_requested`, soft-capped at 50k entries. SQLiteAuditBackend.write_record prefers per-record tenant. OVERT envelopes carry `tenant_id` as a `non_content_metadata` claim. Per-tenant policy plane. `vaara.policy.registry.PolicyRegistry` holds one PolicyController per tenant with the empty string slot reserved as the default fallback. `vaara serve --policy-dir DIR` loads one YAML/JSON policy per file (filename stem = tenant_id). `POST /v1/policy/reload` routes per tenant via body field or header. Installs `vaara-mcp-proxy` as a top-level console script so the proxy CLI matches what every v0.39+ docs surface advertises. Earlier releases only shipped the proxy as `python -m vaara.integrations.mcp_proxy`; v0.40 closes that gap. v0.41 will fold the proxy into the main `vaara` verb tree (`vaara mcp-proxy ...`) and keep `vaara-mcp-proxy` as a thin alias for one release cycle. Per-tenant threshold dispatch at evaluate-time. `AdaptiveScorer.evaluate` consults the registry on every call. A new `policy_lookup` constructor arg (and `set_policy_lookup` for late binding from ServerState) lets the scorer ask which tenant policy applies right now and use its allow/deny thresholds for THIS evaluation. Unknown tenant or no lookup configured falls back to the scorer-bound defaults that the default-slot listener keeps fresh on reload. The backend decision dict surfaces the applied threshold_allow and threshold_deny so operators can confirm which tenant's policy ran. MWU expert state, the conformal calibrator, agent profiles, and sequence patterns stay shared across tenants; only threshold application is per-tenant in v0.40. Scope notes. HTTP transport is POST-only (GET-SSE is v0.41). Per-tenant policy reload is hot; classifier hot-reload still restart-only. Cancellation routing across fan-out is v0.41 hardening. Fan-out latency bench is v0.40.1 measurement. 862 passed, 12 skipped. 45 new tests across tests/test_v040_tenant.py, tests/test_v040_policy_registry.py, tests/test_v040_mcp_http_transport.py, tests/test_v040_per_tenant_threshold.py. References modelcontextprotocol/modelcontextprotocol#2787 for the SEP-2787 envelope shape v0.40 builds on top of. Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>

vaaraio · 2026-05-29T11:51:48Z

@Rul1an the request/execution boundary you're drawing is right, and it's why Vaara v0.42.0 ships the complement alongside the attestation impl.

The execution receipt takes the attestation wire bytes as its backLink input (via attestationDigest over the full wire bytes + nonce) and records what actually executed. Envelope is three blocks: backLink (binds the receipt to the specific attestation it follows), receiptAsserted (issuer block, same signing surface as the attestation), outcomeDerived (status executed/refused/errored + completedAt + optional result commitment). No TTL: durable record, not a capability.

Reference implementation at vaaraio/vaara@4608c36, docs at docs/execution-receipts.md, v0 conformance vectors in tests/vectors/execution_receipt_v0/ with a stdlib-only independent verifier. Reuses RFC 8785 JCS + HS256/ES256/RS256 from SEP-2787 unchanged; a 2787 verifier needs no new crypto to verify a receipt.

Rul1an · 2026-05-30T09:34:24Z

Helpful prior-art and benchmark context from both sides.

The main thing worth preserving is that SEP-2787 v1 stays narrowly about request attestation.

For me, that means:

field-level trust surface first: what is being attested, by whom, and with what canonicalization/binding guarantees
execution/outcome evidence stays out of scope for v1: no ack, no post-execution rejection, no runtime effects, no broader execution-context drift in the same shape

Bench data can help prioritize. Prior-art can help expose tradeoffs. But neither should quietly widen the primitive itself.

A narrow v1 does not need to solve every threat class. It just needs to be explicit about what it proves, and equally explicit about what it does not.

soup-oss · 2026-05-30T12:29:49Z

We strongly align with @Rul1an framing that v1 must stay thin and foundational. Rather than expanding the envelope scope now, we'd like to flag three design considerations that the ACK extension (when it comes) we believe should be considered:

Pre-flight readines: The client needs a way to confirm the ACK endpoint is reachable, trusted, and accepting receipts before the tool executes. The exact mechanism (capability ping, session-level negotiation, per-call probe) should be left to implementation; the important thing is the protocol acknowledges this exists as a requirement, not that it prescribes the answer.
Verifier key binding: The ACK payload should be verifiable by the entity controlling ackEndpoint without requiring the original signing key. How a deployment achieves this (shared key, derived encryption, asymmetric wrapping) is out-of-scope, but the ACK field should define a wrappedKey or encryptedPayload slot so that the receipt can be bound to a key the endpoint controls, keeping the attestation's intent and arguments opaque to the ACK infrastructure.
Infrastructure discovery: If ackEndpoint is in the envelope plaintext, intermediaries learn the receipt topology. A nonce-derived capability URL or encrypted endpoint address prevents infrastructure discovery without adding per-call key exchange.

vaaraio · 2026-05-30T12:46:03Z

Agree v1 should stay thin and request-scoped. The current envelope already carries what a verifier needs, and widening it now would slow the part that's ready to land.

On the ACK considerations, there's a shipped reference worth pulling from when that work opens. Vaara emits an execution receipt as the post-execution sibling of the 2787 attestation, signed over the same RFC 8785 JCS canonical bytes (ES256/RS256/HS256), with an independent verifier and published test vectors in the repo. On point 2 specifically: asymmetric receipts are already verifiable by whoever holds the public key, without the original signing key, so a wrappedKey/encryptedPayload slot is only needed for the confidentiality case, not for verification itself. I can bring the concrete format and verifier when the ACK extension is taken up.

vaaraio · 2026-05-30T14:01:06Z

On the conformance-vector thread: the receipt-verification failure modes coming up here are mostly already normative cases, and they sit in the SEP's own test-vectors tree where a verifier can run them with nothing external in the loop.

In #2789 the SEP-2787 v0 vectors carry the negatives directly: 07-tampered-planner-declared and 08-tampered-issuer-asserted for signature and field tamper, and 09-ieee754-float-in-canonical-input for a non-canonical JCS payload, alongside the HS256/ES256/RS256 positives and a stdlib-only _check_independent.py walker. The receipt layer carries its own negatives in the reference vectors: a broken back-link and a result-commitment mismatch, checked by the same independent walker.

Two of the modes raised here aren't cased yet: an unknown alg identifier, and a replay that substitutes a payload field rather than the verifier time. Both are small. I'll add the alg case to the #2789 set and the replay-substitution case to the receipt vectors, so the negative coverage for this surface is complete and lives in one place implementers already pull from.

Keeping the conformance artifact in the test-vectors tree is the same instinct as keeping v1 thin: an implementer should be able to check an envelope against fixtures in the repo, offline, without taking a dependency on any running service.

heysoup added 4 commits May 25, 2026 14:28

Draft

29230eb

Rename SEP

435e9fb

Fix markdown

73287f2

Pipeline

6236fb7

soup-oss requested review from a team as code owners May 25, 2026 17:56

soup-oss changed the title ~~Draft - Tool call attestation~~ SEP-2787: Tool call attestation May 25, 2026

vaaraio mentioned this pull request May 26, 2026

feat(attestation): add SEP-2787 reference implementation (proposed shape) vaaraio/vaara#139

Merged

5 tasks

vaaraio mentioned this pull request May 26, 2026

SEP-2787 proposed-shape test vectors (v0) #2789

Open

heysoup and others added 4 commits May 26, 2026 19:16

Update SEP with reviewers feedback

7db8788

Merge branch 'modelcontextprotocol:main' into tool-call-attestation

d89604f

Markdown issues

0cd9ebe

Merge branch 'tool-call-attestation' of github.com:soup-oss/modelcont…

b999756

…extprotocol into tool-call-attestation

Tool call argument verification rule

dfd309c

Markdown issue

0974352

soup-oss added 2 commits May 27, 2026 17:38

Update definitions after community feedback

c759c35

Run missing operation

43d2079

soup-oss added 2 commits May 27, 2026 18:00

Remove duplicated text

15bf8bd

Remove previous references to post-execution

ca6f3f6

vaaraio mentioned this pull request May 27, 2026

release(v0.40.0): streamable HTTP, fan-out, multi-tenant policy vaaraio/vaara#154

Merged

5 tasks

AI Disclosure

ad6c08f

This comment was marked as spam.

Sign in to view

localden added SEP draft SEP proposal with a sponsor. labels Jun 8, 2026

github-project-automation Bot added this to SEP Review Pipeline Jun 8, 2026

localden added proposal SEP proposal without a sponsor. and removed draft SEP proposal with a sponsor. labels Jun 8, 2026

vaaraio mentioned this pull request Jun 8, 2026

v0.62.0: normalize adjacent MCP records into the SEP-2828 evidence model vaaraio/vaara#221

Merged

Rul1an mentioned this pull request Jun 9, 2026

Example pattern for scoped execution receipts on high-risk provider tools #2852

Open

Conversation

soup-oss commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

vaaraio commented May 26, 2026

Uh oh!

Rul1an commented May 26, 2026

Uh oh!

vaaraio commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaaraio commented May 26, 2026

Uh oh!

Rul1an commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soup-oss commented May 26, 2026

Uh oh!

vaaraio commented May 26, 2026

Uh oh!

Rul1an commented May 26, 2026

Uh oh!

vaaraio commented May 26, 2026

Uh oh!

Rul1an commented May 26, 2026

Uh oh!

vaaraio commented May 26, 2026

Uh oh!

Rul1an commented May 26, 2026

Uh oh!

vaaraio commented May 26, 2026

Uh oh!

Rul1an commented May 26, 2026

Uh oh!

vaaraio commented May 26, 2026

Uh oh!

vaaraio commented May 27, 2026

Uh oh!

Rul1an commented May 27, 2026

Uh oh!

vaaraio commented May 29, 2026

Uh oh!

This comment was marked as spam.

Rul1an commented May 30, 2026

Uh oh!

soup-oss commented May 30, 2026

Uh oh!

vaaraio commented May 30, 2026

Uh oh!

This comment was marked as spam.

vaaraio commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

soup-oss commented May 25, 2026 •

edited

Loading

vaaraio commented May 26, 2026 •

edited

Loading

Rul1an commented May 26, 2026 •

edited

Loading