SEP-2787: Tool call attestation#2787
Conversation
|
Hi @soup-oss, thanks for filing this. The regulatory gap is real and the envelope shape lands close to work already in production. Two pointers in case they are useful as prior art: OVERT 1.0 (Glacis Technologies, published 2026-03-25, https://overt.is) defines a related primitive: signed, schema-closed envelopes a relying party can verify offline. Apache-2.0 open standard with a royalty-free patent covenant for conformant implementations. The shape rhymes with SEP-2787 but the design choices differ:
Vaara (https://github.com/vaaraio/vaara, Apache 2.0) ships an MCP proxy that emits OVERT 1.0 Base Envelopes per For the SEP's open design questions:
Glacis also ships a Python SDK at https://github.com/Glacis-io/glacis-python (Apache 2.0) as the reference implementation by the standard's authors. Reference verifier CLI is Apache 2.0 throughout, no commercial product. |
|
This is a useful direction. I like the core instinct here: MCP probably needs a standard way to make a tool call reviewable after the fact, without every client, gateway, or regulated deployment inventing its own envelope. One boundary I would keep very sharp is the difference between a pre-execution attestation and post-execution evidence. The current envelope works well as a signed pre-execution statement. It says who is asking, which tool/server is targeted, which arguments or argument digest are bound, what intent was declared, when it was issued, and which key/version signed it. That proves an intent-bound request. It does not by itself prove that the tool actually executed, what the application-level outcome was, or whether a downstream system accepted the result. The optional The other thing I would make explicit is the source of each fact. In audit and replay systems it matters whether a field came from the client/agent planner, the attestation issuer, the MCP server verifier, the tool/application result, a policy engine, or a payload-derived projection/digest. Those are different trust surfaces. Keeping them apart prevents the envelope from claiming more than the layer can actually prove. I would also be cautious with inline arguments. Signed args are useful in some deployments, but the privacy-friendly default should probably be digest, reference, or redacted projection rather than payload storage. The audit invariant is usually “this call was bound to this exact argument set” or “this reviewed projection was bound,” not “all arguments are now stored in a long-lived compliance artifact.” A more explicit Canonicalization may need tightening too. “Sorted keys, no whitespace” is a good start, but different language stacks will still disagree on numbers, Unicode escaping, duplicate object keys, floats, NaN, and parser behavior. A JSON Schema plus JCS/RFC 8785, or a small restricted JSON profile, would make conformance testing less surprising. One smaller concern: the multi-server So the shape I would find easiest to adopt is:
Overall, strong proposal. I think the most valuable thing this SEP can standardize early is not the whole compliance story, but the stable evidence boundary. Once MCP has a common way to bind a tool call to identity, target, argument digest/projection, intent, nonce, and key version, downstream audit, replay, policy, and receipt systems can compose around it much more cleanly. |
|
Thanks for the careful read. The pre-execution attestation vs post-execution evidence split is exactly the boundary worth making explicit. OVERT 1.0 separates these via the Phase 3 IAP layer, where the relying party notary-signs a Provisional Receipt and anchors it in a transparency log, distinct from the inline On argument handling, OVERT keeps the payload entirely local via an HMAC-SHA256 content commitment. The verifier never sees the args, only the commitment they were bound to. For regulated deployments with PII or trade-secret arguments this side-steps a whole disclosure category. The args_digest / args_ref / args_projection shape maps to the same intent. The digest case is essentially what OVERT does. Canonicalization: OVERT uses canonical CBOR per RFC 8949 with IEEE-754 float rejection. Stricter than JCS / RFC 8785, but the underlying point is the same: pin the bytes so signatures verify identically across implementations. Whichever the SEP lands on, a normative schema plus explicit canonicalization rules will save reviewers from parser folklore. On toolCalls as a signed plan bundle vs multi-server workflow: real ambiguity worth resolving in the SEP. OVERT envelopes are per-interaction, which sidesteps the question, but the SEP's bundle shape probably needs an explicit statement on partial-execution semantics and per-verifier replay windows. The framing of a "stable evidence boundary" lands well. That's the right altitude: bind a tool call to identity, target, args commitment, intent, nonce, key version, and stop there. Outcome receipts and policy decisions compose on top, in separate layers. |
|
Follow-up after sitting with the review longer. Four concrete proposals, bundled so the envelope shape settles in one pass rather than drifting across threads. On the source of each fact: the cleanest path is to annotate every envelope field with its trust surface. Issuer-asserted fields are set by the attestation issuer (subject, intent, time, nonce, key_version, args commitment). Verifier-asserted fields are set by the MCP server verifier (allow/reject/error reason, observed nonce). Payload-derived fields come deterministically from the request payload (args_digest, args_projection). Planner-declared fields are set by the client or agent upstream of the issuer (declared_purpose, requested_capability). The schema can either tag fields with a source annotation or group them under named blocks. The invariant is that no field gets sourced from "envelope" as an undifferentiated whole. On argument handling: the current On canonicalization: the current "sorted keys, no whitespace" rule is too loose for cross-stack conformance. A normative reference to RFC 8785 (JSON Canonicalization Scheme) pins behaviour for numbers, Unicode escaping, duplicate keys, floats, and NaN, the places where language stacks silently disagree. A small JSON Schema accompanying the canonical form makes conformance testing tractable. CBOR per RFC 8949 with IEEE-754 float rejection is the stricter alternative. JCS keeps the JSON ecosystem familiarity at some cost. Either is defensible. The SEP needs to pick one and reference it normatively. On scope: the optional |
…ape) (#139) * feat(attestation): add SEP-2787 reference implementation, proposed shape Adds vaara.attestation.sep2787 implementing the SEP-2787 Tool Call Attestation envelope (modelcontextprotocol/modelcontextprotocol#2787) with the four schema changes Vaara raised in the v1 draft thread: fact-source labels (three trust-surface blocks), three-way args shape (ArgsDigest / ArgsRef / ArgsProjection), RFC 8785 (JCS) canonicalization with IEEE-754 float rejection, and request-attestation-only scope (the v1 optional ack field is excluded and belongs in a separate extension). Supports HS256, ES256, RS256 signing per the v1 draft. Coexists with the existing OVERT 1.0 implementation. See docs/sep2787-overt-mapping.md for the field-level mapping between the two envelopes. 16 unit tests covering all three signing algorithms, all three args-commitment shapes, tampering rejection, canonicalization invariants, and TTL handling. Ruff-clean. * chore(attestation): address SEP-2787 PR review feedback - Rename the test helper _emit_hs256 to _emit_attestation. It builds envelopes for all three signing algorithms, not only HS256, so the HS256-only name is misleading. - Add test_ttl_clock_skew_tolerance_window covering the verifier's default 30-second skew window: a 60-second TTL with iat + 75 still verifies, iat + 91 does not. - Switch the optional-dependency probe from a try/import block to importlib.util.find_spec. Eliminates the CodeQL unused-import finding on rfc8785 without changing skip semantics. - Convert the docs/sep2787-overt-mapping.md reference to COMPLIANCE.md into a relative markdown link. --------- Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>
|
This is useful prior art, and I think it also helps clarify the SEP boundary. The thing I would keep strict is implementation neutrality. OVERT/Vaara can be one concrete receipt family, but I do not think MCP should standardize any one receipt family in this SEP. The MCP-level primitive feels smaller: a request attestation that binds issuer, subject, server/tool target, argument commitment or projection, declared intent, nonce, time, and key version. A verifier result can be a separate server-side fact. Execution receipts, policy decisions, transparency logs, and signed receipt chains can then compose on top. A good test might be: can two independent implementations agree on the request-attestation semantics without agreeing on the later receipt system? That probably means test vectors and conformance cases should be the normative artifact, not any single implementation. If yes, the SEP has found the right layer. If no, it may be pulling too much of a downstream audit model into the MCP primitive. |
|
Hi @vaaraio , thank you for this incredible, high-signal feedback. The work you’ve done with OVERT 1.0 and Vaara is amazing prior art. The privacy-first approach of using HMAC content commitments is a game-changer for handling PII under frameworks like the EU AI Act. My primary goal with this SEP is ensuring MCP gets a native cryptographic attestation layer so developers can build secure infrastructure around agent intents. Whether the spec leans toward JWT for ecosystem familiarity or adopts a harder-nosed CBOR/commitment model like OVERT, getting this boundary into the protocol is what matters. I see you’ve opened a related PR, I'll keep an eye on the discussion and try to help how we can align these two shapes |
|
On test vectors: that's the right call. On implementation neutrality: Reference implementation: vaaraio/vaara#139. |
|
That sounds like a good seed. The bit I would keep separate is authorship of the first implementation versus ownership of the conformance surface. Vaara’s tests may be a very useful starting point, but for the SEP I would expect the normative vectors to live with the proposal itself and be validated by at least one second implementation. The thing to standardize is observable interop: same canonical bytes, same signature input, same verification result, same rejection cases. That keeps Vaara/OVERT as strong prior art and implementation input, while keeping the MCP primitive implementation-neutral. |
|
@Rul1an agreed. Authorship of the first implementation and ownership of the conformance surface belong in different places. The vectors should live with the SEP. A second implementation validating them is the right gate. The four conformance dimensions you named (canonical bytes, signature input, verification result, rejection cases) are precisely what What format would work best on your side? |
|
@vaaraio Agreed, that split sounds right. I cannot speak for the SEP maintainers, but the shape I would find easiest to review is a small fixture set that does not import Vaara, OVERT, or helper code from any implementation. Maybe The useful test is whether a second implementation can read those files directly and produce the same bytes and the same pass/fail result. That keeps the vectors boring, portable, and owned by the SEP instead of by the first implementation. |
|
@Rul1an v0 fixture set ready against your layout. 40 KB zipped. Six positive cases: HS256, ES256, RS256 round-trips, plus one fixture each for the digest, ref, and projection args commitment shapes. Each carries Seven negative cases: tampered Keys are pinned. The bundle includes Origin and license are in |
|
@vaaraio Nice turnaround. This looks like useful seed material. I would leave acceptance and final layout to the SEP maintainers, but one distinction seems worth keeping in the bundle itself: byte/signature conformance cases versus verifier-policy cases. The former can be normative immediately. Things like TTL clock choice, unsupported alg handling, and schema rejection may need an explicit validator policy before they become pass/fail requirements. I would also expect the final artifact to be committed as plain fixture files in the SEP repo rather than kept as an attached zip, so a second implementation can consume it in CI without depending on Vaara or on the comment thread. From my side, the important gate is still the same: one independent implementation reads the SEP-owned fixtures and gets the same canonical bytes and verification results. |
|
@Rul1an Filed as #2789, layout under
|
|
Nice update. This is moving in the right direction: deferring One verification detail seems worth tightening before this becomes the shape implementers follow. The verification rules currently match the receiving server and tool name, but they do not explicitly require the verifier to bind the actual For For That keeps the request-attestation boundary tight: the SEP proves identity, target, intent, nonce, time, and an explicit argument commitment. Execution receipts and downstream outcome evidence can still stay deferred. |
|
@Rul1an On the argument-commitment binding: the reference impl now wires this as Step 5 in v0.37.1, released a few minutes ago. For For For Returns Composed after the existing signature and TTL checks once the |
…#150) The SEP-2787 draft envelope adopted MCP camelCase convention in soup-oss/modelcontextprotocol@48c739b1. Vaara's proposed-shape reference implementation now emits camelCase JSON keys on the serialisation boundary while keeping Python dataclass attributes in snake_case, so user code is unchanged. `Attestation.to_dict()` and the JCS-canonical signing payload emit `plannerDeclared`, `issuerAsserted`, `payloadDerived`, `toolCalls`, `serverFingerprint`, `secretVersion`, `expSeconds`, `requestedCapability`, `projectionDigest`. New `issuer_to_dict` helper replaces the prior `asdict()` call so the issuer block sorts and renames deterministically without leaking Python-internal names. `docs/sep2787-overt-mapping.md` updated. CHANGELOG entry under 0.39.1. pyproject.toml, src/vaara/__init__.py, and clients/ts/package.json all bumped. 28 attestation tests pass; ruff clean. The v0 test vector PR (vaaraio/modelcontextprotocol#2789, head 2a9360f, cited in modelcontextprotocol/modelcontextprotocol#2787) was regenerated with the same renames separately on 2026-05-27. Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>
|
@soup-oss Trust-surface grouping landing in the envelope is the right shape. Vaara's reference impl will follow it through the remaining mechanical diffs in the next release: move toolCalls under payloadDerived, swap argsProjection to the JSON-stringified encoding, and drop Vaara's kind-discriminated argsDigest extension. Commitment-only audit composes cleanly on top of argsProjection as an identity projection of a hash-only object, no third kind needed in the spec. A v1-current sibling vector set against the merged shape is on offer when useful. |
…it-event schema 1.0, Qi survey mapping (#151) The four mechanical alignments Vaara committed to in modelcontextprotocol/modelcontextprotocol#2787 after the trust-surface grouping was incorporated into the SEP draft on soup-oss commit dd030d5b ship as the v2 envelope shape: 1. toolCalls lives under payloadDerived, not plannerDeclared. Tool bindings (name, server fingerprint, args commitment) are facts derived from the request payload, not planner declarations. 2. argsProjection serialises with a JSON-stringified projection field carrying the JCS-canonical encoding of the projection object. The digest is taken over those bytes. 3. The v1 kind-discriminated union is dropped. ArgsRef and ArgsProjection self-discriminate by which fields are present. 4. Commitment-only audit composes on ArgsProjection as a hash-only-identity projection of the form {"digest": "sha256:..."}. No separate ArgsDigest type ships in the spec. parse_attestation(d) is the new wire-decode entrypoint: inverse of Attestation.to_dict(). 13 new tests cover emit -> JCS bytes -> parse -> verify across HS256, ES256, RS256 for both ArgsRef and ArgsProjection, plus parse rejection on missing-field and unsupported-alg inputs and a byte-identical re-emit check. Two doc artefacts ship in the same release: - docs/audit_event_schema.md: AUDIT-EVENT-SCHEMA-1.0, versioned wire/storage contract for the audit events Vaara emits. Independent of code version so third-party consumers can pin without coupling to a Python runtime version. - docs/qi_survey_mapping.md: Vaara surface coverage against the taxonomy in Qi et al., Towards Trustworthy Agentic AI (arXiv:2605.23989, 2026-05-17). Direct, partial, and out-of-scope rows by Perceive / Plan / Act / Reflect / Learn / Multi-agent / Long-horizon stage under both top-level dimensions. SEP-2787 reference implementation tag sep2787-ref-v2 lands on this release commit alongside v0.39.2 for cross-repo provenance. The v0.40 slot stays reserved for the deployment-shape scope (HTTP transport, multi-tenancy schema, hot-reload extended, fan-out) per project_v040_roadmap_opa_frame_20260527.md. Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>
|
This is converging nicely. One wording nit before this gets reviewed as the stable boundary: a few places still sound like the attestation proves execution, while the body later correctly defers execution acknowledgement/receipts. In particular, the PR summary mentions "execution proof" / "whether it executed", and the Authorization section says attestation proves "that they called it." I think the tighter wording is that the attestation binds an observed That keeps the current SEP crisp as pre-execution request attestation without weakening the future receipt story. |
One Vaara process now serves a fleet of upstream MCP servers over Streamable HTTP, with multi-tenant policy, audit chain, and OVERT attestation on the same substrate. v0.39 ran one Vaara process per upstream; v0.40 collapses that into a single multi-tenant deployment. Streamable HTTP transport on the proxy. `vaara-mcp-proxy --transport http --http-host H --http-port P` runs POST /mcp via FastAPI / uvicorn. The endpoint reads `X-Vaara-Tenant` and `X-Vaara-Upstream` per request, pushes them into ContextVars, and dispatches into the existing `_handle_request` path so policy, perimeter, OVERT, and progress-notification handling all light up unchanged. Notifications return 202. Bodies above 1 MiB return 413. Unknown upstream returns 404. Fan-out via repeatable `--upstream NAME=CMD`. One Vaara process holds N UpstreamMCPClient instances in a name -> client map. Bare `--upstream CMD` keeps the v0.39 single-upstream contract (lands in the "default" slot). When more than one upstream is configured, a request with no `X-Vaara-Upstream` header returns 400 with the list of valid slots in the error envelope. Single-upstream deployments keep the silent-default contract. tenant_id end-to-end. ScoreRequest, AuditEventRequest, PolicyReloadRequest accept a `tenant_id` body field, with `X-Vaara-Tenant` as the HTTP-header alternative (body wins over header). AuditRecord gains a `tenant_id` field, excluded from `compute_hash()` so pre-v0.40 chains still re-verify. AuditTrail keeps an `action_id -> tenant_id` map seeded by `record_action_requested`, soft-capped at 50k entries. SQLiteAuditBackend.write_record prefers per-record tenant. OVERT envelopes carry `tenant_id` as a `non_content_metadata` claim. Per-tenant policy plane. `vaara.policy.registry.PolicyRegistry` holds one PolicyController per tenant with the empty string slot reserved as the default fallback. `vaara serve --policy-dir DIR` loads one YAML/JSON policy per file (filename stem = tenant_id). `POST /v1/policy/reload` routes per tenant via body field or header. Installs `vaara-mcp-proxy` as a top-level console script so the proxy CLI matches what every v0.39+ docs surface advertises. Earlier releases only shipped the proxy as `python -m vaara.integrations.mcp_proxy`; v0.40 closes that gap. v0.41 will fold the proxy into the main `vaara` verb tree (`vaara mcp-proxy ...`) and keep `vaara-mcp-proxy` as a thin alias for one release cycle. Per-tenant threshold dispatch at evaluate-time. `AdaptiveScorer.evaluate` consults the registry on every call. A new `policy_lookup` constructor arg (and `set_policy_lookup` for late binding from ServerState) lets the scorer ask which tenant policy applies right now and use its allow/deny thresholds for THIS evaluation. Unknown tenant or no lookup configured falls back to the scorer-bound defaults that the default-slot listener keeps fresh on reload. The backend decision dict surfaces the applied threshold_allow and threshold_deny so operators can confirm which tenant's policy ran. MWU expert state, the conformal calibrator, agent profiles, and sequence patterns stay shared across tenants; only threshold application is per-tenant in v0.40. Scope notes. HTTP transport is POST-only (GET-SSE is v0.41). Per-tenant policy reload is hot; classifier hot-reload still restart-only. Cancellation routing across fan-out is v0.41 hardening. Fan-out latency bench is v0.40.1 measurement. 862 passed, 12 skipped. 45 new tests across tests/test_v040_tenant.py, tests/test_v040_policy_registry.py, tests/test_v040_mcp_http_transport.py, tests/test_v040_per_tenant_threshold.py. References modelcontextprotocol/modelcontextprotocol#2787 for the SEP-2787 envelope shape v0.40 builds on top of.
One Vaara process now serves a fleet of upstream MCP servers over Streamable HTTP, with multi-tenant policy, audit chain, and OVERT attestation on the same substrate. v0.39 ran one Vaara process per upstream; v0.40 collapses that into a single multi-tenant deployment. Streamable HTTP transport on the proxy. `vaara-mcp-proxy --transport http --http-host H --http-port P` runs POST /mcp via FastAPI / uvicorn. The endpoint reads `X-Vaara-Tenant` and `X-Vaara-Upstream` per request, pushes them into ContextVars, and dispatches into the existing `_handle_request` path so policy, perimeter, OVERT, and progress-notification handling all light up unchanged. Notifications return 202. Bodies above 1 MiB return 413. Unknown upstream returns 404. Fan-out via repeatable `--upstream NAME=CMD`. One Vaara process holds N UpstreamMCPClient instances in a name -> client map. Bare `--upstream CMD` keeps the v0.39 single-upstream contract (lands in the "default" slot). When more than one upstream is configured, a request with no `X-Vaara-Upstream` header returns 400 with the list of valid slots in the error envelope. Single-upstream deployments keep the silent-default contract. tenant_id end-to-end. ScoreRequest, AuditEventRequest, PolicyReloadRequest accept a `tenant_id` body field, with `X-Vaara-Tenant` as the HTTP-header alternative (body wins over header). AuditRecord gains a `tenant_id` field, excluded from `compute_hash()` so pre-v0.40 chains still re-verify. AuditTrail keeps an `action_id -> tenant_id` map seeded by `record_action_requested`, soft-capped at 50k entries. SQLiteAuditBackend.write_record prefers per-record tenant. OVERT envelopes carry `tenant_id` as a `non_content_metadata` claim. Per-tenant policy plane. `vaara.policy.registry.PolicyRegistry` holds one PolicyController per tenant with the empty string slot reserved as the default fallback. `vaara serve --policy-dir DIR` loads one YAML/JSON policy per file (filename stem = tenant_id). `POST /v1/policy/reload` routes per tenant via body field or header. Installs `vaara-mcp-proxy` as a top-level console script so the proxy CLI matches what every v0.39+ docs surface advertises. Earlier releases only shipped the proxy as `python -m vaara.integrations.mcp_proxy`; v0.40 closes that gap. v0.41 will fold the proxy into the main `vaara` verb tree (`vaara mcp-proxy ...`) and keep `vaara-mcp-proxy` as a thin alias for one release cycle. Per-tenant threshold dispatch at evaluate-time. `AdaptiveScorer.evaluate` consults the registry on every call. A new `policy_lookup` constructor arg (and `set_policy_lookup` for late binding from ServerState) lets the scorer ask which tenant policy applies right now and use its allow/deny thresholds for THIS evaluation. Unknown tenant or no lookup configured falls back to the scorer-bound defaults that the default-slot listener keeps fresh on reload. The backend decision dict surfaces the applied threshold_allow and threshold_deny so operators can confirm which tenant's policy ran. MWU expert state, the conformal calibrator, agent profiles, and sequence patterns stay shared across tenants; only threshold application is per-tenant in v0.40. Scope notes. HTTP transport is POST-only (GET-SSE is v0.41). Per-tenant policy reload is hot; classifier hot-reload still restart-only. Cancellation routing across fan-out is v0.41 hardening. Fan-out latency bench is v0.40.1 measurement. 862 passed, 12 skipped. 45 new tests across tests/test_v040_tenant.py, tests/test_v040_policy_registry.py, tests/test_v040_mcp_http_transport.py, tests/test_v040_per_tenant_threshold.py. References modelcontextprotocol/modelcontextprotocol#2787 for the SEP-2787 envelope shape v0.40 builds on top of.
One Vaara process now serves a fleet of upstream MCP servers over Streamable HTTP, with multi-tenant policy, audit chain, and OVERT attestation on the same substrate. v0.39 ran one Vaara process per upstream; v0.40 collapses that into a single multi-tenant deployment. Streamable HTTP transport on the proxy. `vaara-mcp-proxy --transport http --http-host H --http-port P` runs POST /mcp via FastAPI / uvicorn. The endpoint reads `X-Vaara-Tenant` and `X-Vaara-Upstream` per request, pushes them into ContextVars, and dispatches into the existing `_handle_request` path so policy, perimeter, OVERT, and progress-notification handling all light up unchanged. Notifications return 202. Bodies above 1 MiB return 413. Unknown upstream returns 404. Fan-out via repeatable `--upstream NAME=CMD`. One Vaara process holds N UpstreamMCPClient instances in a name -> client map. Bare `--upstream CMD` keeps the v0.39 single-upstream contract (lands in the "default" slot). When more than one upstream is configured, a request with no `X-Vaara-Upstream` header returns 400 with the list of valid slots in the error envelope. Single-upstream deployments keep the silent-default contract. tenant_id end-to-end. ScoreRequest, AuditEventRequest, PolicyReloadRequest accept a `tenant_id` body field, with `X-Vaara-Tenant` as the HTTP-header alternative (body wins over header). AuditRecord gains a `tenant_id` field, excluded from `compute_hash()` so pre-v0.40 chains still re-verify. AuditTrail keeps an `action_id -> tenant_id` map seeded by `record_action_requested`, soft-capped at 50k entries. SQLiteAuditBackend.write_record prefers per-record tenant. OVERT envelopes carry `tenant_id` as a `non_content_metadata` claim. Per-tenant policy plane. `vaara.policy.registry.PolicyRegistry` holds one PolicyController per tenant with the empty string slot reserved as the default fallback. `vaara serve --policy-dir DIR` loads one YAML/JSON policy per file (filename stem = tenant_id). `POST /v1/policy/reload` routes per tenant via body field or header. Installs `vaara-mcp-proxy` as a top-level console script so the proxy CLI matches what every v0.39+ docs surface advertises. Earlier releases only shipped the proxy as `python -m vaara.integrations.mcp_proxy`; v0.40 closes that gap. v0.41 will fold the proxy into the main `vaara` verb tree (`vaara mcp-proxy ...`) and keep `vaara-mcp-proxy` as a thin alias for one release cycle. Per-tenant threshold dispatch at evaluate-time. `AdaptiveScorer.evaluate` consults the registry on every call. A new `policy_lookup` constructor arg (and `set_policy_lookup` for late binding from ServerState) lets the scorer ask which tenant policy applies right now and use its allow/deny thresholds for THIS evaluation. Unknown tenant or no lookup configured falls back to the scorer-bound defaults that the default-slot listener keeps fresh on reload. The backend decision dict surfaces the applied threshold_allow and threshold_deny so operators can confirm which tenant's policy ran. MWU expert state, the conformal calibrator, agent profiles, and sequence patterns stay shared across tenants; only threshold application is per-tenant in v0.40. Scope notes. HTTP transport is POST-only (GET-SSE is v0.41). Per-tenant policy reload is hot; classifier hot-reload still restart-only. Cancellation routing across fan-out is v0.41 hardening. Fan-out latency bench is v0.40.1 measurement. 862 passed, 12 skipped. 45 new tests across tests/test_v040_tenant.py, tests/test_v040_policy_registry.py, tests/test_v040_mcp_http_transport.py, tests/test_v040_per_tenant_threshold.py. References modelcontextprotocol/modelcontextprotocol#2787 for the SEP-2787 envelope shape v0.40 builds on top of.
One Vaara process now serves a fleet of upstream MCP servers over Streamable HTTP, with multi-tenant policy, audit chain, and OVERT attestation on the same substrate. v0.39 ran one Vaara process per upstream; v0.40 collapses that into a single multi-tenant deployment. Streamable HTTP transport on the proxy. `vaara-mcp-proxy --transport http --http-host H --http-port P` runs POST /mcp via FastAPI / uvicorn. The endpoint reads `X-Vaara-Tenant` and `X-Vaara-Upstream` per request, pushes them into ContextVars, and dispatches into the existing `_handle_request` path so policy, perimeter, OVERT, and progress-notification handling all light up unchanged. Notifications return 202. Bodies above 1 MiB return 413. Unknown upstream returns 404. Fan-out via repeatable `--upstream NAME=CMD`. One Vaara process holds N UpstreamMCPClient instances in a name -> client map. Bare `--upstream CMD` keeps the v0.39 single-upstream contract (lands in the "default" slot). When more than one upstream is configured, a request with no `X-Vaara-Upstream` header returns 400 with the list of valid slots in the error envelope. Single-upstream deployments keep the silent-default contract. tenant_id end-to-end. ScoreRequest, AuditEventRequest, PolicyReloadRequest accept a `tenant_id` body field, with `X-Vaara-Tenant` as the HTTP-header alternative (body wins over header). AuditRecord gains a `tenant_id` field, excluded from `compute_hash()` so pre-v0.40 chains still re-verify. AuditTrail keeps an `action_id -> tenant_id` map seeded by `record_action_requested`, soft-capped at 50k entries. SQLiteAuditBackend.write_record prefers per-record tenant. OVERT envelopes carry `tenant_id` as a `non_content_metadata` claim. Per-tenant policy plane. `vaara.policy.registry.PolicyRegistry` holds one PolicyController per tenant with the empty string slot reserved as the default fallback. `vaara serve --policy-dir DIR` loads one YAML/JSON policy per file (filename stem = tenant_id). `POST /v1/policy/reload` routes per tenant via body field or header. Installs `vaara-mcp-proxy` as a top-level console script so the proxy CLI matches what every v0.39+ docs surface advertises. Earlier releases only shipped the proxy as `python -m vaara.integrations.mcp_proxy`; v0.40 closes that gap. v0.41 will fold the proxy into the main `vaara` verb tree (`vaara mcp-proxy ...`) and keep `vaara-mcp-proxy` as a thin alias for one release cycle. Per-tenant threshold dispatch at evaluate-time. `AdaptiveScorer.evaluate` consults the registry on every call. A new `policy_lookup` constructor arg (and `set_policy_lookup` for late binding from ServerState) lets the scorer ask which tenant policy applies right now and use its allow/deny thresholds for THIS evaluation. Unknown tenant or no lookup configured falls back to the scorer-bound defaults that the default-slot listener keeps fresh on reload. The backend decision dict surfaces the applied threshold_allow and threshold_deny so operators can confirm which tenant's policy ran. MWU expert state, the conformal calibrator, agent profiles, and sequence patterns stay shared across tenants; only threshold application is per-tenant in v0.40. Scope notes. HTTP transport is POST-only (GET-SSE is v0.41). Per-tenant policy reload is hot; classifier hot-reload still restart-only. Cancellation routing across fan-out is v0.41 hardening. Fan-out latency bench is v0.40.1 measurement. 862 passed, 12 skipped. 45 new tests across tests/test_v040_tenant.py, tests/test_v040_policy_registry.py, tests/test_v040_mcp_http_transport.py, tests/test_v040_per_tenant_threshold.py. References modelcontextprotocol/modelcontextprotocol#2787 for the SEP-2787 envelope shape v0.40 builds on top of.
One Vaara process now serves a fleet of upstream MCP servers over Streamable HTTP, with multi-tenant policy, audit chain, and OVERT attestation on the same substrate. v0.39 ran one Vaara process per upstream; v0.40 collapses that into a single multi-tenant deployment. Streamable HTTP transport on the proxy. `vaara-mcp-proxy --transport http --http-host H --http-port P` runs POST /mcp via FastAPI / uvicorn. The endpoint reads `X-Vaara-Tenant` and `X-Vaara-Upstream` per request, pushes them into ContextVars, and dispatches into the existing `_handle_request` path so policy, perimeter, OVERT, and progress-notification handling all light up unchanged. Notifications return 202. Bodies above 1 MiB return 413. Unknown upstream returns 404. Fan-out via repeatable `--upstream NAME=CMD`. One Vaara process holds N UpstreamMCPClient instances in a name -> client map. Bare `--upstream CMD` keeps the v0.39 single-upstream contract (lands in the "default" slot). When more than one upstream is configured, a request with no `X-Vaara-Upstream` header returns 400 with the list of valid slots in the error envelope. Single-upstream deployments keep the silent-default contract. tenant_id end-to-end. ScoreRequest, AuditEventRequest, PolicyReloadRequest accept a `tenant_id` body field, with `X-Vaara-Tenant` as the HTTP-header alternative (body wins over header). AuditRecord gains a `tenant_id` field, excluded from `compute_hash()` so pre-v0.40 chains still re-verify. AuditTrail keeps an `action_id -> tenant_id` map seeded by `record_action_requested`, soft-capped at 50k entries. SQLiteAuditBackend.write_record prefers per-record tenant. OVERT envelopes carry `tenant_id` as a `non_content_metadata` claim. Per-tenant policy plane. `vaara.policy.registry.PolicyRegistry` holds one PolicyController per tenant with the empty string slot reserved as the default fallback. `vaara serve --policy-dir DIR` loads one YAML/JSON policy per file (filename stem = tenant_id). `POST /v1/policy/reload` routes per tenant via body field or header. Installs `vaara-mcp-proxy` as a top-level console script so the proxy CLI matches what every v0.39+ docs surface advertises. Earlier releases only shipped the proxy as `python -m vaara.integrations.mcp_proxy`; v0.40 closes that gap. v0.41 will fold the proxy into the main `vaara` verb tree (`vaara mcp-proxy ...`) and keep `vaara-mcp-proxy` as a thin alias for one release cycle. Per-tenant threshold dispatch at evaluate-time. `AdaptiveScorer.evaluate` consults the registry on every call. A new `policy_lookup` constructor arg (and `set_policy_lookup` for late binding from ServerState) lets the scorer ask which tenant policy applies right now and use its allow/deny thresholds for THIS evaluation. Unknown tenant or no lookup configured falls back to the scorer-bound defaults that the default-slot listener keeps fresh on reload. The backend decision dict surfaces the applied threshold_allow and threshold_deny so operators can confirm which tenant's policy ran. MWU expert state, the conformal calibrator, agent profiles, and sequence patterns stay shared across tenants; only threshold application is per-tenant in v0.40. Scope notes. HTTP transport is POST-only (GET-SSE is v0.41). Per-tenant policy reload is hot; classifier hot-reload still restart-only. Cancellation routing across fan-out is v0.41 hardening. Fan-out latency bench is v0.40.1 measurement. 862 passed, 12 skipped. 45 new tests across tests/test_v040_tenant.py, tests/test_v040_policy_registry.py, tests/test_v040_mcp_http_transport.py, tests/test_v040_per_tenant_threshold.py. References modelcontextprotocol/modelcontextprotocol#2787 for the SEP-2787 envelope shape v0.40 builds on top of. Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com>
|
@Rul1an the request/execution boundary you're drawing is right, and it's why Vaara v0.42.0 ships the complement alongside the attestation impl. The execution receipt takes the attestation wire bytes as its Reference implementation at |
This comment was marked as spam.
This comment was marked as spam.
|
Helpful prior-art and benchmark context from both sides. The main thing worth preserving is that SEP-2787 v1 stays narrowly about request attestation. For me, that means:
Bench data can help prioritize. Prior-art can help expose tradeoffs. But neither should quietly widen the primitive itself. A narrow v1 does not need to solve every threat class. It just needs to be explicit about what it proves, and equally explicit about what it does not. |
|
We strongly align with @Rul1an framing that v1 must stay thin and foundational. Rather than expanding the envelope scope now, we'd like to flag three design considerations that the ACK extension (when it comes) we believe should be considered:
|
|
Agree v1 should stay thin and request-scoped. The current envelope already carries what a verifier needs, and widening it now would slow the part that's ready to land. On the ACK considerations, there's a shipped reference worth pulling from when that work opens. Vaara emits an execution receipt as the post-execution sibling of the 2787 attestation, signed over the same RFC 8785 JCS canonical bytes (ES256/RS256/HS256), with an independent verifier and published test vectors in the repo. On point 2 specifically: asymmetric receipts are already verifiable by whoever holds the public key, without the original signing key, so a wrappedKey/encryptedPayload slot is only needed for the confidentiality case, not for verification itself. I can bring the concrete format and verifier when the ACK extension is taken up. |
This comment was marked as spam.
This comment was marked as spam.
|
On the conformance-vector thread: the receipt-verification failure modes coming up here are mostly already normative cases, and they sit in the SEP's own test-vectors tree where a verifier can run them with nothing external in the loop. In #2789 the SEP-2787 v0 vectors carry the negatives directly: Two of the modes raised here aren't cased yet: an unknown Keeping the conformance artifact in the test-vectors tree is the same instinct as keeping v1 thin: an implementer should be able to check an envelope against fixtures in the repo, offline, without taking a dependency on any running service. |
Extensions Track SEP proposing signed tool call attestation envelopes for MCP — binding intent, agent identity, tool name, and arguments into a verifiable audit trail. Targets EU AI Act Article 12 compliance.
Motivation and Context
MCP has no standard mechanism to cryptographically prove which agent called which tool, with what arguments, and for what purpose. Regulated deployments (EU AI Act, AI Liability Directive) need this for audit trails. This SEP fills that gap with a minimal envelope carried in _meta, requiring no protocol changes.
How Has This Been Tested?
Example implementation was added in https://github.com/soup-oss/sep-tool-call-attestation/tree/master/example
@vaaraio has put together a full suite of conformance test vectors and validation criteria over in PR #2789
Breaking Changes
None
Types of changes
Checklist
Additional context