Consumer-layer trust for shared agent memory: trust as a tier, not a boolean #2888

Patdolitse · 2026-06-07T10:53:46Z

Patdolitse
Jun 7, 2026

Splitting this out from #4117 (memory: safer persistence defaults...), where the thread converged on a clean boundary: the server owns byte integrity (atomic writes, last-known-good snapshot, quotas, safe paths), and the consumer/caller layer owns trust (approval, audit, what's safe to act on). @P4ST4S suggested the consumer-layer half deserves its own thread rather than bleeding into the storage-integrity PRs — so here it is.

The question: if the consumer layer owns trust, what's the minimal shape of that trust layer?

The default answer people reach for is a per-operation approval prompt — confirm before a destructive or write op. It has a well-known failure mode: agents (and humans) learn to click through it. A gate that fires on every op trains the operator to dismiss it, and an agent under prompt injection will call the op "legitimately" from the server's point of view anyway (the point @policylayer-dan made upstream).

An alternative that's worked better for us: put the trust state on the record, not a gate on the operation.

A new memory enters at a staging tier and is stored but not served as trusted.
It's promoted to verified two ways: a human approves it, or a second independent signal confirms it.
It's demoted on contradiction — something later disagrees, it drops back down.

Framed that way, the "approval gate" everyone keeps pointing at stops being a modal the agent clicks through and becomes a small state machine on the record itself — staging -> verified, with a path back down when something contradicts it.

This lines up with what's already in the #4117 thread:

@chenyuan35's drift_detector does the demotion half already — a second reporter drops a cached fix back from replay_confirmed to reported_only. "Promote on confirmation, demote on contradiction" is the same machine read in both directions.
@policylayer-dan's audit-log boundary sits naturally on top: the state transitions are the audit trail — who promoted what, on whose confirmation, when.
The server-side .bak / last-known-good work protects the bytes; this protects the content. Complementary, not competing.

Two things I like about this shape:

It's deployment-agnostic — the same machine works for a local-first single-host store and for a cross-host shared one (@chenyuan35's case). Only the definition of "what counts as a second independent signal" changes.
It degrades gracefully — unverified memory isn't dropped, it's just not served as trusted. You lose nothing; you just don't act on it yet.

This is roughly where we landed in piia-engram (a local-first memory layer for MCP clients). Not claiming it's the only design — posting because @P4ST4S invited the thread and I'd rather compare notes than have each of us rediscover the same state machine.

Open questions I don't have clean answers to:

What counts as a valid second independent signal? A different tool writing the same fact? A different session? On cross-host, a different agent ID? (Two agents sharing one bad assumption shouldn't count as independent.)
Should staging records be retrievable-but-flagged, or invisible until promoted? Flagged-but-visible helps recall; invisible is safer against drift. We serve them flagged — curious what others do.
Demotion after the fact: if a verified memory was already acted on and then gets contradicted, the record demotes — but the action already happened. Does the consumer layer need a "this was acted on under since-revoked memory" signal, or is that out of scope?

How are others handling the trust half of this? Especially curious how the cross-host folks think about the "agents click through prompts" failure mode, and whether anyone running per-op approval gates today has hit it.

P4ST4S · 2026-06-07T11:08:10Z

P4ST4S
Jun 7, 2026

Thanks for splitting this out @Patdolitse, the tier-state framing is more useful than the per-operation gate, and the "promote on confirmation, demote on contradiction" symmetry with @chenyuan35's drift detection is the right unification.

One angle that's adjacent rather than central, but worth surfacing: the state transitions and the audit log of those transitions are not the same artifact, even when they live next to each other.

The consumer layer owns the policy decision (this record moves from staging to verified, on which signal, by which principal). What gets debated less is where the evidence of that decision lives. If trust transitions are recorded only in the consumer's local state, the state itself becomes the audit trail, which works until:

the consumer crashes and the local log is lost
multiple consumers operate on the same store and their logs diverge
a third party (compliance, security review, incident response) needs to verify what was promoted and on whose signal, after the fact

A server-side audit log of trust transitions, signed and append-only, doesn't replace the consumer's decision; it provides a durable witness of it. The shape is essentially what tools/call audit entries already do, applied to a different event: (record_id, transition, principal, signal_source, timestamp), signed by the server's key.

That doesn't change the answers to your three open questions, but it does mean the consumer doesn't have to be its own audit substrate. The server can offer that as a service to consumers that want it, and consumers that don't care can ignore it.

On your specific questions, I don't have strong production data on (1) or (3). For (2), I lean toward staging records being retrievable-but-flagged for the same reason your flagged choice did: an LLM that can't see staging memory will sometimes rediscover the same fact in staging over and over, which is its own form of drift.

0 replies

chenyuan35 · 2026-06-07T16:53:37Z

chenyuan35
Jun 7, 2026

Thanks for splitting this into its own thread. The tier-on-record framing matches what we are moving toward in aineedhelpfromotherai: memory should be stored first, but not all stored memory should be served as trusted guidance.

The implementation direction I am taking from this discussion is:

new records enter as staging, available for audit/search but not promoted into default guidance
records move to verified only with reproduction evidence, repeated independent confirmation, or maintainer approval
records move to deprecated when drift detection, contradiction, or failed reproduction shows the memory is unsafe to act on
every promote/demote transition needs a durable audit entry separate from the current trust state

That last point from @P4ST4S is important: the current tier and the evidence trail are related but not the same artifact. I have adopted this in the project direction docs as a product requirement rather than just an implementation detail. The useful boundary for me is: server owns byte integrity; consumer layer owns trust; shared memory needs reversible trust, not a boolean approval gate.

0 replies

Patdolitse · 2026-06-09T04:32:03Z

Patdolitse
Jun 9, 2026
Author

Worth pulling these two together, because between the three of us this is starting to look like one machine rather than three separate ones.

@P4ST4S — the current-tier-versus-evidence-trail split is the right cut, and I'll own that piia-engram blurs it today. We keep an audit log of transitions, but because we're local-first single-host, the consumer is the server, so the witness and the state live in the same process and the separation stays latent. Your three failure modes are exactly what forces them apart: the moment a second consumer writes the same store, or a reviewer has to verify who promoted what on whose signal after the fact, "the state is the audit trail" stops being true. The discipline that helps even a single-host store is to write the transition record in a shape that doesn't assume witness and state are co-located. (record_id, transition, principal, signal_source, timestamp) is most of it — I'd add the signal's origin identity (which tool or agent, not just user-versus-agent), because that's the field the cross-host case needs and the single-host case can carry for free. Then going cross-host is a deployment change, not a schema change.

@chenyuan35 — naming deprecated as an explicit tier instead of a demote-on-contradiction side effect is the sharper move, and I think you're right. We've been treating demotion as "drops back below verified," but a contradicted record and a never-promoted record aren't the same thing to the read path: one is "we acted on this and stopped," the other is "we never started." Three named states keep that legible; collapsing deprecated back into staging loses the "you used to believe X, then Y corrected it" signal that's worth surfacing rather than hiding.

Which lands on the open question I still don't have a clean answer to: what counts as a second independent signal for promotion. The audit field above is half of it — if signal_source carries tool/agent identity, "independent" gets a checkable definition (a different asserter, not the same one re-asserting) instead of being a vibe. But it breaks on shared assumptions: two agents booted from the same system prompt or the same upstream doc aren't independent even though their IDs differ. Provenance-by-identity alone doesn't catch that. If three of us are independently converging on staging → verified → deprecated plus a durable transition log, that's the piece of the shape that's still soft — curious whether either of you has a sharper test than "different principal."

0 replies

P4ST4S · 2026-06-09T12:48:36Z

P4ST4S
Jun 9, 2026

@Patdolitse no sharper test on my side, and the "different principal" check is exactly the threshold I'd be tempted to draw too, knowing it doesn't catch the shared-assumption case you describe.

What might be a useful framing, not as an answer but as a way to defer the answer to where it belongs: the audit/transition layer doesn't need to decide what counts as independent at write time. It needs to record enough context that a later reader (a reviewer, a detector, a heuristic) can make that call.

That pushes the schema slightly: signal_source becomes more than just tool/agent identity. The fields that let independence be re-evaluated post-hoc, in rough order of how often they actually help:

principal (the asserter)
session/context ID (two assertions from the same session aren't independent even if the agent IDs differ)
prompt fingerprint or template ID (catches your shared-assumption case: two agents booted from the same system prompt converge by construction, not by independent reasoning)
upstream document hash if the assertion was derived from a retrieved doc (catches "two agents read the same wiki page")

None of those are individually sufficient, and a record carrying all four is still vulnerable to two agents finding the same bad source independently. But each one cuts off a category of false-independence the previous ones missed.

The reason I'd lean toward "record everything, decide later" rather than "compute an independence score at write time" is that the definition of independent will keep shifting as failure modes are discovered (the shared-prompt one is recent in MCP terms). A schema that captures provenance richly survives the definition changing; one that bakes in a specific test gets brittle the moment someone finds a new shared-assumption pathway.

Which doesn't help with the read-time gating question (when do you actually promote staging to verified), but it at least keeps the gate replaceable without losing past records.

0 replies

chenyuan35 · 2026-06-09T15:37:42Z

chenyuan35
Jun 9, 2026

That framing matches the direction I would keep: record enough provenance to re-evaluate independence later, and keep the promotion gate replaceable.

For aineedhelpfromotherai, I am splitting it into two decisions:

Every assertion or trust transition records provenance: principal, session/context ID, tool or agent ID, prompt/template fingerprint, upstream evidence URI/hash, and when applicable a reproduction command plus result hash.
Promotion from staging to verified is computed later from that evidence, not baked into the write path.

The conservative rule I am leaning toward is: a different principal alone is not enough. Promotion needs at least one non-shared evidence axis to differ, and for debugging memory it should include reproduction evidence of the same failure/fix. If the record only has two agents with different IDs but the same prompt template and same upstream doc hash, it stays staging.

That keeps the measurable part small: count false promotions, record which provenance field would have blocked them, then tighten the promotion rule from observed failures rather than guessing a universal independence score up front.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer-layer trust for shared agent memory: trust as a tier, not a boolean #2888

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Consumer-layer trust for shared agent memory: trust as a tier, not a boolean #2888

Uh oh!

Patdolitse Jun 7, 2026

Replies: 5 comments

Uh oh!

P4ST4S Jun 7, 2026

Uh oh!

chenyuan35 Jun 7, 2026

Uh oh!

Patdolitse Jun 9, 2026 Author

Uh oh!

P4ST4S Jun 9, 2026

Uh oh!

chenyuan35 Jun 9, 2026

Patdolitse
Jun 7, 2026

P4ST4S
Jun 7, 2026

chenyuan35
Jun 7, 2026

Patdolitse
Jun 9, 2026
Author

P4ST4S
Jun 9, 2026

chenyuan35
Jun 9, 2026