Skip to content

SEP: Server-Side Signed Execution Record for MCP Tool Calls#2828

Open
vaaraio wants to merge 8 commits into
modelcontextprotocol:mainfrom
vaaraio:sep-server-side-signed-execution-record
Open

SEP: Server-Side Signed Execution Record for MCP Tool Calls#2828
vaaraio wants to merge 8 commits into
modelcontextprotocol:mainfrom
vaaraio:sep-server-side-signed-execution-record

Conversation

@vaaraio

@vaaraio vaaraio commented May 31, 2026

Copy link
Copy Markdown

This PR adds a Standards Track SEP (status: Draft, seeking a sponsor): a server-authoritative signed record of what a governing server or proxy decided about a tool call and what the call actually did.

It is the follow-up that SEP-2817 and Discussion #2704 defer to. SEP-2817 standardizes client-asserted input-audit context and states that server-side decision records are left to a later SEP; @hangum asked for this to be opened separately so SEP-2817 stays scoped. This is that SEP.

What it defines

Two records, signed by the enforcement point:

  • Decision record, emitted before the side effect: the allow / block / escalate verdict and the risk basis behind it.
  • Outcome record, emitted after execution: the executed / refused / errored status and a commitment over the result.

They are paired by backLink and bound to the originating SEP-2787 attestation instance, so a verifier can reconstruct what the agent was permitted to do, why, and what it actually did. Both use RFC 8785 (JCS) canonical JSON and the same detached-signature stack as SEP-2787, so a 2787 verifier needs no new cryptographic code. The trust surface is server-signed (issuerAsserted / receiptAsserted), kept distinct from 2787's call attestation and 2817's client-asserted input context.

Why server-authoritative

A client can claim its intent and its arguments. It cannot credibly attest that a call was allowed, why it was allowed or blocked, or what the tool returned, because it does not own that logic and is not a neutral observer of its own behaviour. Article 12 logging for a deployment where the server enforces policy needs a record signed by the enforcement point, paired to the request it answers, that survives the client.

Reference implementation

The wire shape already ships in the Vaara MCP proxy. The outcome record is vaara.attestation.receipt.ExecutionReceipt (shipping since v0.42); the decision record is vaara.attestation.decision in the reference repo with round-trip, tamper, and pairing tests. The instance-binding join is a SHA-256 over the full SEP-2787 attestation wire bytes with the signature included, so a record cannot be replayed against a different instance of a byte-identical call. Verification is offline and standard-library only; JCS conformance vectors for the shared SEP-2787 surface are in #2789.

Prior art reconciled in the SEP

SEP-2787 (tool call attestation), SEP-2817 (AI invocation audit context), SEP-414 (request _meta), and the decision/outcome split in agent-guard (the instance-binding point was settled in discussion on 2026-05-30).

AI assistance disclosure

This SEP was prepared with AI assistance for structure and wording. The proposal direction, the implementation behind it, and the final technical judgment were reviewed and edited by the author.

vaaraio and others added 3 commits May 31, 2026 14:01
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vaaraio vaaraio requested review from a team as code owners May 31, 2026 11:03
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chopmob-cloud

This comment was marked as spam.

@rpelevin

Copy link
Copy Markdown

This split looks like the right place for the server-authoritative half.

SEP-2817 should stay limited to client-asserted input-audit context: why the model/client invoked a tool, which model emitted the call, optional user intent, and the user-turn correlation. This draft picks up the other half: what the enforcement point actually decided before side effect, and what happened afterward.

The load-bearing parts for me are:

  • a decision record before the side effect, not only an outcome log after it;
  • an outcome record after execution, paired back to the same attested invocation;
  • backLink binding to the attestation instance, so the verifier is checking this exact call, not just a recomputable action description;
  • JCS/canonical signing so the record can be verified offline;
  • a clean distinction between allow/block/escalate decisions and executed/refused/errored outcomes.

That keeps human review out of the outcome enum. Escalate/refer is a decision with no outcome yet; execution either never happens or gets recorded later against the same attested call.

The shape also preserves the boundary from SEP-2817: client-asserted intent helps explain the invocation, but the server/enforcement-point record proves what was allowed and what actually ran.

@hangum

hangum commented May 31, 2026

Copy link
Copy Markdown

Thanks for opening this separately.

This split looks right to me:

  • SEP-2817 stays focused on client-asserted input-audit context: why the AI invoked the tool, which model produced the invocation, optional user intent, and user-turn correlation.
  • This SEP can focus on the server/proxy-authoritative layer: what decision was made before execution and what outcome was observed after execution.

The decision/outcome split also maps well to operational audit flows such as:

user turn -> AI/MCP request -> server policy or approval decision -> execution/refusal/error -> audit/event record

The main thing I would watch is keeping this SEP generic enough for MCP implementations that do not use SEP-2787 or a specific proxy implementation. Binding to a signed attestation instance is useful where available, but the fallback binding for deployments without SEP-2787 should remain clear.

I'll review this with that boundary in mind.

@chopmob-cloud

This comment was marked as spam.

@vaaraio

vaaraio commented May 31, 2026

Copy link
Copy Markdown
Author

Thanks both. @hangum @rpelevin the scope split matches what I intended: 2817 keeps the client-asserted input context, this SEP carries the server-authoritative decision and outcome.

@hangum on keeping it usable without SEP-2787: the backLink section already has the fallback. When no 2787 attestation exists, the record sets attestationDigest to a SHA-256 over the JCS-canonical request envelope the server observed (the tools/call params plus _meta), with a server-chosen per-call nonce, so the binding is still to the request instance and not a recomputable description. I'll lift that out of the field table into its own short "deployments without SEP-2787" paragraph so it isn't buried, and change the front-matter from "Requires: SEP-2787" to "Related", since 2787 strengthens the binding but isn't load-bearing.

@rpelevin your read is the intended model. An escalate is a decision with no outcome record yet. If a human later resolves it, that is a new decision record carrying the same backLink and a later decidedAt, and the record with the latest decidedAt wins (Pairing). Human review never enters the outcome enum, which stays executed/refused/errored.

On whether the decision enum should grow to carry deferral sub-states: I'd keep it at allow/block/escalate. A deployment with a real deferral lifecycle models that in its own policy layer and supersedes the escalate with a later decision record, rather than the core enum taking on domain-specific verdict classes. A small enum is what lets a generic verifier check any implementation.

@vaaraio

vaaraio commented May 31, 2026

Copy link
Copy Markdown
Author

@XuebinMa the follow-up we discussed on 2817 is open here. The normative pairing rule is the one your decision/outcome framing shaped, and the instance-binding join is the version we settled on. If you still want to run the agent-guard implementation pass against it, this is the place. An independent implementation would help confirm the record shapes aren't tied to one proxy.

@XuebinMa

XuebinMa commented Jun 1, 2026

Copy link
Copy Markdown

@vaaraio happy to — here's the agent-guard implementation pass. It's a standalone Rust runtime (no proxy, no SEP-2787 dependency), so it's a useful test of whether these shapes hold outside a proxy deployment. The short version: the two-record split and the signed-approval lifecycle already exist independently; the instance-binding join and JCS canonicalization are where agent-guard would have to move to match this SEP. I'll be explicit about both.

What already matches, arrived at independently

  • Decision/outcome split. Pre-execution decision is GuardDecision { Allow, Deny, AskUser }, written before the side effect; the post-execution outcome (exit code, sandbox type) is written after. AskUser is your escalate.
  • Signed record covering the approval. ExecutionReceipt { receipt_version, agent_id, tool, policy_version, sandbox_type, decision, command_hash, timestamp, approval: Option<ApprovalProof>, signature }, Ed25519. When a human approves an AskUser, with_approval() attaches ApprovalProof { request_id, decided_by, decided_at } and re-signs so the signature covers the approval — the approval isn't a separate unsigned log line.
  • Escalate has no sibling outcome, and that's terminal-or-resumable, not an error. Confirms @rpelevin's and your reading from a second implementation. agent-guard keeps the deferral lifecycle off both enums, in a separate append-only ledger: ApprovalStatus { Pending, Approved, Denied, Expired }. So "decision written, outcome absent" is observable as a Pending ledger entry with no receipt yet; an Expired is the timed-out escalate that never produced an outcome. That's a concrete data point for the @chopmob-cloud rule_class question: the deferral sub-state lived cleanly outside the decision enum here, which argues for keeping the core enum at allow/block/escalate and modeling deferral lifecycle in the policy/ledger layer.

Where agent-guard would have to move to match this SEP (honest gaps, not claims)

  • Binding. Today ExecutionReceipt binds command_hash (SHA-256 of the request payload) plus an internal request_id. That's content binding — recomputable, can't distinguish retries — exactly the property you and @chopmob-cloud flagged. To pair against this exact call I'd adopt your backLink instance digest. And since agent-guard runs standalone, your no-SEP-2787 fallback (SHA-256 over the JCS-canonical observed request envelope + server nonce) is the binding it would actually use — so I'd second @hangum's point that the fallback needs to be first-class, not a footnote. It's the only binding a non-2787 deployment has.
  • Canonicalization. agent-guard's current signing payload is a delimiter-joined field concatenation, not RFC 8785. For a generic verifier to check records across implementations, JCS has to be the normative wire form; a bespoke per-implementation payload defeats the "2787 verifier needs no new code" goal. This is the change with the most reach, and I think it's the right call.

Net: the decision/outcome + signed-approval shape reproduces in an independent, non-proxy runtime, which is some evidence it isn't tied to one implementation. The instance-binding join and JCS are the two things that make the records cross-verifiable, and both are SEP-level decisions agent-guard should follow rather than re-invent. Glad to review the decision-record field table directly if you want eyes on specific field names.

@Rul1an

Rul1an commented Jun 1, 2026

Copy link
Copy Markdown

This is the right layer for the server-authoritative half. SEP-2817 can stay with client-asserted input context, SEP-2787 can stay with request attestation, and this SEP can carry what the enforcement point decided and what happened afterward.

The bit I would make especially crisp before implementers copy the shape is the verifier contract, not only the emitter shape. A proxy can emit a decision/outcome pair, but the durable value comes from an independent verifier being able to reconstruct the same answer from committed records alone.

From that side, I would want the SEP to make a few cases boring and testable: which decision record applies to this call instance; whether an outcome is required, absent, or intentionally absent; how superseded decisions are ordered; how conflicting decisions are rejected; and whether the fallback binding without SEP-2787 is still instance-bound enough to catch replay or substitution.

The latest decidedAt wins rule is useful. It probably needs deterministic tie-breaking, though, so two implementations do not disagree under equal timestamps or duplicated records. Same for escalate: a decision with no outcome can be a valid state, but the verifier needs to know whether the SEP treats that as terminal, pending, or policy-layer state outside the core record. Kleine scherpte there will save a lot of later folklore.

The conformance artifact I would find most useful as a downstream evidence consumer is a plain SEP-owned fixture set: observed request or attestation input, decision record, optional outcome record, expected pairing result, and negative cases for substituted backlink, duplicate decision, stale superseded decision, missing required outcome, and fallback-binding replay.

That keeps Vaara or any other proxy as strong implementation input, while making the standard testable by independent consumers that only read the wire records and do not share the emitter's code path.

@vaaraio

vaaraio commented Jun 1, 2026

Copy link
Copy Markdown
Author

@Rul1an agreed, the verifier contract is the part worth pinning before implementers copy the shape. The emitter side is easy to get superficially right and still produce records nobody can independently check.

Given a decision record and its paired outcome record, a conformant verifier MUST:

  1. Recompute the JCS canonicalization and verify the signature against the declared issuer key, rejecting any record whose canonical form does not round-trip.
  2. Verify the backLink join: the outcome record's backLink resolves to the decision record's content digest, and both bind to the same attestation instance (the SEP-2787 attestation digest, or the request-envelope SHA-256 fallback when no SEP-2787 attestation exists).
  3. Reject an inconsistent pair: a terminal deny decision with an executed outcome, or a refer decision with no later record resolving it.
  4. Treat the records as data, not as authority. The verifier trusts the issuer key and the chain, not the proxy that emitted them.

I'll fold an explicit "Verifier" subsection into the next revision stating these as MUSTs, with one test vector per rule.

@XuebinMa the agent-guard pass is the useful check here precisely because it verifies without being the emitter. If the instance-binding join holds outside a proxy, the contract is implementer-portable rather than tied to one deployment.

On vocabulary: keeping the record server-authoritative and decoupled from any settlement or payment semantics is deliberate. The decision record states what the enforcement point decided before a side effect, and the outcome record states what was observed after. It does not assume the side effect is a payment or carry settlement state, which keeps it usable as regulatory evidence without importing a transaction model.

@Rul1an

Rul1an commented Jun 1, 2026

Copy link
Copy Markdown

This verifier-first direction is the right one.

One downstream data point: Assay has now landed a small independent consumer verifier for SEP-2787 + server execution-record fixtures: Rul1an/assay#1462, with edge-case coverage in #1463. It does not emit records, proxy MCP, or establish issuer trust; it only reads committed fixtures and reports whether the attestation, decision, and optional outcome pair up. That is the verifier role I meant in my earlier comment.

That exercise makes one contract detail worth making boring before the next revision lands: the join model. The draft mostly reads as decision and outcome sharing the same request/attestation backLink; your verifier note says the outcome record's backLink resolves to the decision record's content digest, while both bind to the same attestation/request instance. Both shapes can work, but they are different verifier contracts.

My preference would be to spell this as machine-testable fields rather than prose: request/attestation instance binding, decision-to-outcome pairing, supersession ordering and tie-breaks, and absent-outcome semantics for escalate. Small vocabulary point too: the normative decision enum is escalate, while the verifier note says refer; if refer is just prior-art vocabulary, mapping it explicitly will prevent implementers from treating it as a fourth wire value.

Assay can consume SEP-owned fixtures as an independent downstream verifier once they exist. That gives the SEP a second implementation path that only reads the wire records and does not share Vaara's emitter code.

@vaaraio

vaaraio commented Jun 1, 2026

Copy link
Copy Markdown
Author

Status update: the Markdown Format Check that was failing is resolved. The cause was small: table cell padding drifted after an author-line edit, so prettier flagged it. format and validate are both green now, and render-seps passes, so the proposal is ready for review with no outstanding CI.

For anyone evaluating the record format itself: there are now two independent implementations, Rul1an/assay and XuebinMa/agent-guard, so the spec can be checked against more than the reference. I can add SEP-owned test fixtures next if that would help a reviewer confirm interop.

@Rul1an

Rul1an commented Jun 1, 2026

Copy link
Copy Markdown

That fixture set would be useful.

Small precision on the implementation wording: Assay is an independent consumer/verifier for fixture pairing, not a full SEP-2828 issuer or trust implementation. It verifies the request/decision/outcome linkage from committed records and deliberately does not emit records, proxy MCP, establish issuer trust, or claim runtime truth.

That is still the useful second path for this SEP: a downstream consumer that only reads the wire artifacts. Agent-guard looks like a strong independent implementation data point for the runtime shape, while Assay is the narrower verifier-consumer path.

From Assay's side, the most useful first SEP-owned fixture set would be:

  • valid attestation + decision + outcome pairing
  • decision-only escalate
  • substituted request/attestation backLink
  • substituted decision/outcome pairing link, if that becomes a separate field
  • equal-decidedAt supersession tie
  • fallback request-envelope binding replay/substitution

@XuebinMa

XuebinMa commented Jun 1, 2026

Copy link
Copy Markdown

@vaaraio @Rul1an the join model is the right thing to pin, and an honest note from the agent-guard side since it sits as a third data point.

agent-guard today does not emit two separately-signed records joined by a backLink. It emits a single signed ExecutionReceipt after execution that carries the decision verdict and the outcome context together (decision field + command_hash + sandbox/exit context), with the human-approval sub-record (ApprovalProof) folded in and covered by the same signature. The approval-to-call join is by an internal request_id (a locally-generated UUID), which is content-independent but not instance-bound to any attestation — so it's neither of the two joins on the table: not "share the same attestation backLink," not "outcome.backLink resolves to decision content digest."

That's useful precisely as a contrast: a single-record implementation is a real point in the design space, and it tells you which of your two join models is more portable. The outcome-backLink-resolves-to-decision-digest model is the one that survives a runtime like agent-guard splitting its single receipt into two records later, because the decision digest is computable from the decision record alone — whereas "both share the same attestation backLink" silently assumes both records always exist and both always carry the attestation binding. For the SEP I'd make the outcome→decision-digest pairing the normative join, with the shared-attestation binding as the instance anchor underneath it. Two distinct fields, two distinct checks, as @Rul1an said.

On vocabulary, to close that loop concretely: agent-guard's pre-execution verdict is AskUser, which maps to the normative escalate. refer is only discussion-vintage vocabulary (it came up via the AskUser/REFER analogy earlier in the 2817 thread) and should not be a wire value — please map it to escalate explicitly in the enum note so implementers don't add a fourth.

And yes to SEP-owned fixtures. Once they exist agent-guard can run them from the emitter side — produce records and check they verify — which complements @Rul1an/Assay's consumer-side reading of committed fixtures. Between the two you get emit-and-verify coverage from independent code paths. The single-vs-two-record point above is the one place agent-guard's emitter output won't line up byte-for-byte yet, so it's worth a fixture that exercises a record pair explicitly.

@vaaraio

vaaraio commented Jun 1, 2026

Copy link
Copy Markdown
Author

@XuebinMa @Rul1an the single-record contrast settles it. I'm pinning the join as two distinct checks and adopting your outcome-to-decision-digest as the normative pairing.

Check A, instance binding (the anchor). Decision and outcome each bind the same call instance: the SEP-2787 attestation digest, or, with no 2787 attestation, a SHA-256 over the JCS-canonical request envelope plus a server nonce.

Check B, pairing (normative). The outcome resolves to the decision record's content digest. Your portability argument is the deciding one: the decision digest is computable from the decision record alone, so a runtime that emits one record today and splits it into two later still pairs. "Both carry the same attestation backLink" assumes both records always exist and both always carry the binding, which agent-guard's single-receipt shape shows is not safe to assume.

Honest status on the reference impl: it pairs on shared instance binding (Check A) today. records_paired compares the two records' attestation backLinks, not a decision content digest. The explicit decision-content-digest field on the outcome is Check B, and it lands in the next revision. I did not want to backfill the thread with a field the code does not carry yet.

refer is not a wire value. The enum stays allow/block/escalate, and the note will map both refer and agent-guard's AskUser to escalate.

Fixtures are up: vaaraio/vaara#185, six cases under tests/vectors/decision_pairing_v0 with a stdlib-only walker (no Vaara imports) that Assay can read on the consumer side and agent-guard can run on the emitter side. They encode Check A pairing as shipped. I will add the Check B explicit-digest case in the same revision that lands the field, plus the supersession tie-break, which is the one case I left marked open since equal decidedAt has no deterministic order in the impl yet. Proposed tie-break: lexicographic on the record nonce, lowest wins, so two verifiers never disagree.

@Rul1an

Rul1an commented Jun 1, 2026

Copy link
Copy Markdown

Thanks for landing these. I pointed Assay's current consumer verifier at the committed Check-A vectors to see where it actually lands.

The two positive shapes come through clean: the valid allow plus executed pairing, and the decision-only escalate. Both substitution cases Assay can evaluate today fail exactly where they should, the swapped attestation backLink and the swapped pairing nonce, so the instance binding is doing its job.

Two of them Assay can't judge yet, and I'd rather be honest about why than paper over it. The fallback request-envelope binding needs an attestation input that Assay v0 still assumes is there, and the equal-decidedAt supersession case needs multi-decision ordering that Assay doesn't model yet. That's a real piece of work on our side, not just a re-run.

So it lands right on the boundary we talked about. Assay can consume the shared-instance-binding fixtures today as an independent reader, and once the explicit outcome-to-decision-digest Check B lands I'll re-run against it. The supersession side follows when Assay actually models ordering, rather than pretending this is already just another fixture run.

Bring the pairing rule in line with the implemented and conformance-tested
behaviour:

- outcomeDerived.decisionDigest: sha256 over the JCS-canonical full signed
  decision-record wire bytes the outcome was produced under. A conforming
  emitter MUST set it; pairing fails without it.
- Pairing now states both checks. Check A (instance anchor) is the shared
  backLink; Check B (the normative pairing) is the outcome's decisionDigest
  equalling this decision's digest. Check A alone admits a different decision
  taken under the same attestation (an escalate and the verdict that
  supersedes it share the attestation); Check B pins which decision's content
  the outcome answers.
- Supersession: when two decision records for one backLink carry the same
  decidedAt, the tie breaks on the lexicographically lowest issuerAsserted.nonce,
  so every verifier selects the same effective decision with no clock authority.
- Test Vectors: the decision-and-outcome pairing suite is now published
  (tests/vectors/decision_pairing_v0/), driven by a standard-library-only
  walker with a per-case expected verdict, so an independent emitter or
  consumer can run it against its own implementation.
@vaaraio

vaaraio commented Jun 2, 2026

Copy link
Copy Markdown
Author

Check B is in the PR now. The pairing rule reads on two checks: Check A pins the call instance through the shared backLink, and Check B, the normative one, pins content through outcomeDerived.decisionDigest, a sha256 over the full signed decision-record bytes the outcome was produced under. Check A alone admits a different decision taken under the same attestation, so an escalate and the human verdict that supersedes it both pass Check A; Check B is what says which decision the outcome actually answers. A receipt without decisionDigest does not pair.

Supersession is resolved in the spec too: when two decision records share a backLink and carry the same decidedAt, the effective one is the lowest issuerAsserted.nonce, so verifiers agree with no clock authority.

The pairing conformance suite is published at tests/vectors/decision_pairing_v0/. Seven cases, each carrying its expected verdict, driven by a standard-library-only walker that takes no Vaara import: a valid pair, two Check A substitution negatives, a Check B negative (a substituted decision under a shared attestation), the no-attestation fallback with a replay rejection, a decision-only escalate, and the supersession tie.

@Rul1an, this is the Check B you were waiting on for Assay. The two cases your consumer could not judge before, the fallback and the supersession, now have fixtures. @XuebinMa, the digest binding is the content-versus-instance point from our thread, in wire form. Both of you can run the suite against your own side, since the walker carries the expected verdicts and takes no dependency on us.

@Rul1an

Rul1an commented Jun 2, 2026

Copy link
Copy Markdown

Thanks for landing Check B. This is the shape I was hoping for: Check A anchors the call instance, and Check B answers the separate question of which decision the outcome actually resolves to.

I pulled the current fixture suite and ran Assay over it after adding the Check B consumer check on our side. The two positive cases still come through clean, and all three single-decision negatives Assay can evaluate today fail exactly where they should, the substituted attestation backLink, the substituted pairing nonce, and the new substituted decision under a shared attestation, which our outcome_decision_digest_match now catches.

Two are still genuine work on our side rather than a re-run, and I'd rather say so plainly. The fallback request-envelope binding needs a no-attestation input path Assay doesn't have yet, and supersession needs multi-decision ordering we don't model. The upside is your fixtures now give both of those a concrete target to build against.

One small thing from a clean checkout, in case it bites the next person running these independently. _check_independent.py looks for tests/vectors/decision_pairing_v0/keys/*, but that directory is gitignored and never committed, so the checker bails before any case runs. It also pulls in rfc8785 and cryptography, which is no problem, just a little different from the stdlib-only description. Both easy fixes, and worth it since these vectors are most useful when someone other than the emitter can run them clean.

@vaaraio

vaaraio commented Jun 2, 2026

Copy link
Copy Markdown
Author

You're right, and thanks for running it from a clean checkout. That's exactly the case the vectors exist to serve. The keys directory was caught by a broad ignore rule, so the public key and the HMAC secret never got committed, and the checker bailed before the first case. Fixed: tests/vectors/*/keys/ now ships, with the private signing key still held back. I also corrected the description. The walker is the standard library plus cryptography and rfc8785, not stdlib-only, as you saw.

From a fresh clone the checker now runs 7/7 with no Vaara import, so the substituted-decision Check B negative that your outcome_decision_digest_match caught is reproducible on your side from the committed material alone.

On the two Assay can't judge yet, the fallback envelope binding and the supersession ordering: agreed, those are real work rather than a re-run, and they sit on your side, not in the fixtures. The fixtures carry both as concrete targets for when you get to them. Nothing owed back from me there.

@Rul1an

Rul1an commented Jun 2, 2026

Copy link
Copy Markdown

Appreciate the quick turnaround. 7/7 from a clean clone with no Vaara import is exactly the property these vectors should have, and reproducing the Check B negative from the committed material alone is the whole point of an independent consumer, so that's a real step up.

Agreed on the split too. The fallback envelope binding and supersession ordering are work on our side, not anything I'm asking from the fixtures right now. We've scoped the no-attestation input path and will pick both up when we get to them. Thanks for making the suite cleanly runnable.

@vaaraio

vaaraio commented Jun 7, 2026

Copy link
Copy Markdown
Author

Quick status, and a process question.

The anti-backdating mechanism raised earlier in this thread (a trusted timestamp over the chain head, so a later signing-key compromise cannot produce a backdated alternate chain) is implemented in the reference. It uses an RFC 3161 token over the record-chain head, pinned to an eIDAS-qualified TSA so it is recognised EU-wide, and it verifies offline with the standard library plus the TSA certificate. v0.59.0 also exposes it in the one-command Article 12 regulator export, so the property is reachable end to end.

The SEP scope is unchanged: server-authoritative decision and outcome records paired by backLink, reusing the SEP-2787 canonicalisation and signing stack, so a 2787 verifier needs no new cryptographic code. Could a maintainer point me at the sponsorship path from here, or note what would need adjusting before it can move to review?

@Rul1an

Rul1an commented Jun 7, 2026

Copy link
Copy Markdown

The v0.60 verify-record addition is useful in exactly the way this SEP needs: it separates wire-format conformance from the emitter. A record Vaara did not produce can still be checked for schema shape, digest formats, status values, projection digest consistency, and optional attestation backlink. That is a much stronger review artifact than “the reference implementation accepts its own records.”

I would keep the proof boundary explicit, though. A keyless conformance check is not issuer trust, not signature verification, not time-anchor verification, and not runtime truth. It answers “is this a well-formed SEP-2828-shaped record, and do the recomputable bindings match?” The signer-key path and the time-anchor path remain separate checks.

For sponsorship/review, the clean route from my side would be:

  • SEP text owns the normative field contract and verifier obligations.
  • SEP-owned fixtures cover conformance, negative cases, backlink pairing, decision-digest pairing, fallback binding, and supersession.
  • Vaara’s verify-record is treated as reference/prior art for those vectors, not as the normative authority.
  • Independent consumers like Assay can run the same fixtures without sharing Vaara’s emitter or trust path.

That keeps the proposal implementation-backed without making the standard Vaara-backed. It also gives maintainers a concrete thing to review: the verifier contract and fixture surface, rather than a broad compliance claim.

@vaaraio

vaaraio commented Jun 7, 2026

Copy link
Copy Markdown
Author

v0.60.0 ships a conformance check for the record format in this SEP. vaara verify-record takes any JSON that claims to be a server-side execution record and checks it against the wire schema plus the one binding a record proves about itself: the result commitment's projectionDigest is the SHA-256 of the projection bytes beside it, recomputable from the record alone with nothing but a hash function.

It is keyless. A party that holds neither the signing key nor the request attestation can still tell whether a record is well formed. Pass the attestation and it also checks the back-link, still without a key. The signature check stays where it needs the key.

For this SEP, the point is that the format is checkable by someone who runs none of the producer's software. The conformance vectors ship with a standalone checker that imports no Vaara code, so a second implementation reproduces every verdict offline. That is what lets the format, not any one implementation, be the thing a verifier trusts.

@Rul1an

Rul1an commented Jun 7, 2026

Copy link
Copy Markdown

This is the right direction, but I would keep the trust boundary tight. vaara verify-record is useful because it shows the record format can be checked independently of the emitter; it should not become the authority for what the format means. The SEP should own that contract.

For review, I would draw the line this way:

  • Vaara is reference implementation evidence and a useful vector producer, not the normative authority.
  • The SEP owns the wire contract, verifier obligations, and conformance fixtures.
  • Keyless conformance covers schema shape, enum values, digest formats, recomputable projectionDigest, optional attestation/backLink binding, and negative substitution/malformed-digest cases.
  • The fixture set should prove the layer split: format-conformant is not issuer-trusted, signature-valid is not time-anchored, time-anchored is not runtime truth, and Vaara-produced records get no special status over records from another implementation.
  • Signer-key verification, time-anchor verification, and any runtime-truth claim stay separate checks.

@AgentGymLeader

Copy link
Copy Markdown

+1 to keeping the normative authority in the SEP rather than in any single reference implementation.

A reference implementation like vaara verify-record is valuable as evidence that the record format can be checked independently of the emitter — but the contract itself (the wire format, verifier obligations, and conformance fixtures) should live in the SEP, so that no one implementation's behavior becomes the de facto definition. That separation is also what lets independent implementations interoperate on equal footing, rather than conforming to a single producer.

@vaaraio

vaaraio commented Jun 8, 2026

Copy link
Copy Markdown
Author

Strong +1, this is exactly the intent. The normative contract belongs in the SEP: the wire schema, the verifier obligations, the projectionDigest = sha256(projection) binding, and the conformance fixtures. vaara verify-record is deliberately keyless and producer-agnostic. It checks any record that claims the format, including records Vaara never emitted, so the spec stays the definition and the tool is just one checker of it.

The conformance vectors ship as fixtures anyone can run from a clean checkout. An independent developer reproduced the full suite from scratch with no shared code, which is the equal-footing interop you're describing: implementations conform to the SEP, not to each other.

@AgentGymLeader

Copy link
Copy Markdown

Thanks, that’s exactly the boundary I was hoping to preserve.

vaara verify-record is useful as implementation evidence, but the SEP should stay the source of truth for the wire schema, verifier obligations, and conformance fixtures. That keeps the format producer-agnostic and gives independent implementations a shared target without turning any one checker into the definition.

@localden localden added SEP draft SEP proposal with a sponsor. labels Jun 8, 2026
@localden localden added proposal SEP proposal without a sponsor. and removed draft SEP proposal with a sponsor. labels Jun 8, 2026
@vaaraio

vaaraio commented Jun 8, 2026

Copy link
Copy Markdown
Author

Implementation note: the reference implementation now reads the records from the adjacent proposals into this SEP's evidence model. vaara normalize takes a SEP-2643 authorization denial, a SEP-2787 tool-call attestation, or a SEP-2817 invocation audit context and reports which part of an execution record each one establishes and what is left unproven: a denial is a refused outcome, an attestation fixes the back-link a receipt pins, and invocation context is advisory input that is never treated as authorization evidence. Conformance vectors built from each proposal's own examples and a dependency-free checker that reproduces the mapping are in the repo for anyone who wants to check it (vaaraio/vaara#221).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proposal SEP proposal without a sponsor. SEP

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

8 participants