Consumer-layer trust for shared agent memory: trust as a tier, not a boolean #2888
Replies: 5 comments
-
|
Thanks for splitting this out @Patdolitse, the tier-state framing is more useful than the per-operation gate, and the "promote on confirmation, demote on contradiction" symmetry with @chenyuan35's drift detection is the right unification. One angle that's adjacent rather than central, but worth surfacing: the state transitions and the audit log of those transitions are not the same artifact, even when they live next to each other. The consumer layer owns the policy decision (this record moves from
A server-side audit log of trust transitions, signed and append-only, doesn't replace the consumer's decision; it provides a durable witness of it. The shape is essentially what That doesn't change the answers to your three open questions, but it does mean the consumer doesn't have to be its own audit substrate. The server can offer that as a service to consumers that want it, and consumers that don't care can ignore it. On your specific questions, I don't have strong production data on (1) or (3). For (2), I lean toward staging records being retrievable-but-flagged for the same reason your |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for splitting this into its own thread. The tier-on-record framing matches what we are moving toward in aineedhelpfromotherai: memory should be stored first, but not all stored memory should be served as trusted guidance. The implementation direction I am taking from this discussion is:
That last point from @P4ST4S is important: the current tier and the evidence trail are related but not the same artifact. I have adopted this in the project direction docs as a product requirement rather than just an implementation detail. The useful boundary for me is: server owns byte integrity; consumer layer owns trust; shared memory needs reversible trust, not a boolean approval gate. |
Beta Was this translation helpful? Give feedback.
-
|
Worth pulling these two together, because between the three of us this is starting to look like one machine rather than three separate ones. @P4ST4S — the current-tier-versus-evidence-trail split is the right cut, and I'll own that piia-engram blurs it today. We keep an audit log of transitions, but because we're local-first single-host, the consumer is the server, so the witness and the state live in the same process and the separation stays latent. Your three failure modes are exactly what forces them apart: the moment a second consumer writes the same store, or a reviewer has to verify who promoted what on whose signal after the fact, "the state is the audit trail" stops being true. The discipline that helps even a single-host store is to write the transition record in a shape that doesn't assume witness and state are co-located. @chenyuan35 — naming Which lands on the open question I still don't have a clean answer to: what counts as a second independent signal for promotion. The audit field above is half of it — if |
Beta Was this translation helpful? Give feedback.
-
|
@Patdolitse no sharper test on my side, and the "different principal" check is exactly the threshold I'd be tempted to draw too, knowing it doesn't catch the shared-assumption case you describe. What might be a useful framing, not as an answer but as a way to defer the answer to where it belongs: the audit/transition layer doesn't need to decide what counts as independent at write time. It needs to record enough context that a later reader (a reviewer, a detector, a heuristic) can make that call. That pushes the schema slightly:
None of those are individually sufficient, and a record carrying all four is still vulnerable to two agents finding the same bad source independently. But each one cuts off a category of false-independence the previous ones missed. The reason I'd lean toward "record everything, decide later" rather than "compute an independence score at write time" is that the definition of independent will keep shifting as failure modes are discovered (the shared-prompt one is recent in MCP terms). A schema that captures provenance richly survives the definition changing; one that bakes in a specific test gets brittle the moment someone finds a new shared-assumption pathway. Which doesn't help with the read-time gating question (when do you actually promote |
Beta Was this translation helpful? Give feedback.
-
|
That framing matches the direction I would keep: record enough provenance to re-evaluate independence later, and keep the promotion gate replaceable. For aineedhelpfromotherai, I am splitting it into two decisions:
The conservative rule I am leaning toward is: a different principal alone is not enough. Promotion needs at least one non-shared evidence axis to differ, and for debugging memory it should include reproduction evidence of the same failure/fix. If the record only has two agents with different IDs but the same prompt template and same upstream doc hash, it stays That keeps the measurable part small: count false promotions, record which provenance field would have blocked them, then tighten the promotion rule from observed failures rather than guessing a universal independence score up front. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Splitting this out from #4117 (memory: safer persistence defaults...), where the thread converged on a clean boundary: the server owns byte integrity (atomic writes, last-known-good snapshot, quotas, safe paths), and the consumer/caller layer owns trust (approval, audit, what's safe to act on). @P4ST4S suggested the consumer-layer half deserves its own thread rather than bleeding into the storage-integrity PRs — so here it is.
The question: if the consumer layer owns trust, what's the minimal shape of that trust layer?
The default answer people reach for is a per-operation approval prompt — confirm before a destructive or write op. It has a well-known failure mode: agents (and humans) learn to click through it. A gate that fires on every op trains the operator to dismiss it, and an agent under prompt injection will call the op "legitimately" from the server's point of view anyway (the point @policylayer-dan made upstream).
An alternative that's worked better for us: put the trust state on the record, not a gate on the operation.
stagingtier and is stored but not served as trusted.verifiedtwo ways: a human approves it, or a second independent signal confirms it.Framed that way, the "approval gate" everyone keeps pointing at stops being a modal the agent clicks through and becomes a small state machine on the record itself —
staging -> verified, with a path back down when something contradicts it.This lines up with what's already in the #4117 thread:
drift_detectordoes the demotion half already — a second reporter drops a cached fix back fromreplay_confirmedtoreported_only. "Promote on confirmation, demote on contradiction" is the same machine read in both directions..bak/ last-known-good work protects the bytes; this protects the content. Complementary, not competing.Two things I like about this shape:
This is roughly where we landed in piia-engram (a local-first memory layer for MCP clients). Not claiming it's the only design — posting because @P4ST4S invited the thread and I'd rather compare notes than have each of us rediscover the same state machine.
Open questions I don't have clean answers to:
stagingrecords be retrievable-but-flagged, or invisible until promoted? Flagged-but-visible helps recall; invisible is safer against drift. We serve them flagged — curious what others do.verifiedmemory was already acted on and then gets contradicted, the record demotes — but the action already happened. Does the consumer layer need a "this was acted on under since-revoked memory" signal, or is that out of scope?How are others handling the trust half of this? Especially curious how the cross-host folks think about the "agents click through prompts" failure mode, and whether anyone running per-op approval gates today has hit it.
Beta Was this translation helpful? Give feedback.
All reactions