Skip to content

Add Memento provider#43

Open
veerps57 wants to merge 1 commit into
supermemoryai:mainfrom
veerps57:add-memento-provider
Open

Add Memento provider#43
veerps57 wants to merge 1 commit into
supermemoryai:mainfrom
veerps57:add-memento-provider

Conversation

@veerps57
Copy link
Copy Markdown

Summary

Adds Memento — a local-first, MCP-native memory layer published on the MCP registry — as a fifth Provider option in memorybench, alongside Supermemory, Mem0, Zep, and the filesystem baseline. Memento runs as a stdio subprocess against a single SQLite database; this integration exercises it through that same MCP transport, so the bench measures the realistic shape (process spawn, JSON-RPC over stdio, async write + auto-embed, hybrid FTS+vector retrieval).

bun run src/index.ts run -p memento -b locomo

No API key required for Memento itself. The provider performs a per-session distillation step via an LLM call before handing candidates to Memento's extract_memory tool; the model is configured via the MEMENTO_DISTILL_MODEL env var and falls back to memorybench's DEFAULT_ANSWERING_MODEL constant (gpt-4o) when unset. So an OpenAI key is needed for the default distill model; set MEMENTO_DISTILL_MODEL to use a different family.

Motivation

Memento was published to the MCP registry yesterday — a local-first, MCP-native memory layer for AI assistants that need durable, structured memory. Publishing a memory project on the registry without numbers next to the established ones felt incomplete; this PR proposes the integration so Memento can be measured against Supermemory, Mem0, Zep, and the filesystem baseline on the same harness, datasets, and judges. The honest way to answer "how does it compare?" is to run it through your bench and let the numbers speak.

We're not asking for any reorientation of the bench. Memento slots into the existing Provider interface unchanged. The harness, datasets, judges, and other providers are not touched.

How the provider works

MementoProvider implements the five-method Provider interface:

  1. initialize — spawns memento serve --db <tmp> via @modelcontextprotocol/sdk's StdioClientTransport, asserts the required MCP tools (extract_memory, search_memory, forget_many_memories) are present on tools/list, runs one warmup write+forget pair so the embedder model is loaded before the first benchmark question, and calls .unref() on the spawned child process and its stdio pipes so the Node event loop doesn't keep waiting on the subprocess after the bench's main work completes (without this, the bench hangs at exit on Run complete!).

  2. ingest — for each session in the UnifiedSession, calls an LLM to distill the transcript into structured {kind, content, summary?} candidates, then hands the batch to Memento's extract_memory MCP tool. Memento embeds, scrubs, dedups, and persists. The distillation model is read from MEMENTO_DISTILL_MODEL and falls back to memorybench's DEFAULT_ANSWERING_MODEL constant when unset. Memories land under scope = {type: 'workspace', path: '/memorybench/<containerTag>'} (Memento's session scope requires a ULID, while memorybench's containerTag is an arbitrary string — workspace scope is the right isolation primitive). Each candidate carries benchmark:memorybench, session:<id>, and (when present) session-date:<iso> tags. A per-run distillation cache keyed by session.sessionId deduplicates LLM calls when questions in the same conversation share sessions; the cached output is whatever the first distill produced for that session.

  3. awaitIndexing — polls search_memory on the question's scope until every result has embeddingStatus !== 'pending' (or MEMENTO_AWAIT_INDEXING_MS, default 180s, elapses). Memento's auto-embed runs fire-and-forget after each write; this is the bridge.

  4. search — runs search_memory with the question's scope filter, projection: 'full', and limit: this.searchLimit (default 30, overridable via MEMENTO_BENCH_SEARCH_LIMIT). The 30 default is the same number Supermemory and Mem0 use — every distill-style provider that I read in this repo overrides the orchestrator's limit: 10 to give the answering model a richer haystack.

  5. clear — calls forget_many_memories with the per-question scope as the filter. The orchestrator doesn't call this in normal runs, but it's there for partial-rerun recovery.

The provider supplies a custom answerPrompt (src/providers/memento/prompts.ts) that presents each retrieved memory with its score, kind, and session date — the latter being the temporal anchor Memento captures during distillation. Format follows the same shape as filesystem/prompts.ts and supermemory/prompts.ts: structured retrieved-context block + numbered "How to Answer" steps + "I don't know" refusal clause + Reasoning/Answer output template.

What's in the PR

New files (all under src/providers/memento/):

  • index.ts — provider class (MementoProvider), session ingest with the distillation cache, per-question scope mapping, MCP client lifecycle. ~420 lines.
  • distill.ts — the per-session LLM distillation step. Reads MEMENTO_DISTILL_MODEL (falling back to DEFAULT_ANSWERING_MODEL), builds a transcript prompt, parses the JSON response into typed candidates. The prompt codifies six craft rules (preserve specific terms; capture facts about every named participant; emit a candidate for every dated event; capture precursor actions alongside outcomes; don't squash enumerations; bias toward inclusion). ~235 lines.
  • prompts.ts — custom ProviderPrompts with the Memento answerPrompt. ~70 lines.
  • mcp-helpers.tsparseSearchPage and parseToolResultJson helpers for the MCP-result envelope. ~85 lines.

Edits to existing files:

  • src/providers/index.ts — registers memento: MementoProvider in the providers map.
  • src/types/provider.ts — adds "memento" to the ProviderName union.
  • src/utils/config.tsgetProviderConfig('memento') returns minimal config (Memento has no API key; configuration is via the MEMENTO_* env vars documented in the README).
  • src/cli/index.ts — extends the help providers printer.
  • src/utils/models.ts — adds two new Anthropic aliases (sonnet-4.6claude-sonnet-4-6, opus-4.6claude-opus-4-6). These are catalog additions, useful to all providers, not Memento-specific — happy to split into a separate PR if you'd prefer; they were a dependency of Memento's default answering-model choice on local validation.
  • README.md — adds memento to the -p flag's provider list and documents the MEMENTO_* env vars alongside the existing provider config block.
  • package.json — adds @modelcontextprotocol/sdk: ^1.29.0 (the canonical MCP client library; pin matches what Memento itself ships).

Design choices worth flagging

A few choices that look like they could be tuning if you don't read the implementation — happy to revisit any of them:

  • Search returns 30 not 10. The orchestrator passes limit: 10 and every distill-style provider in this repo (supermemory, mem0's fallback) overrides it the same way. The 10 default is used for Hit@K computation; the answering model still benefits from a richer haystack. We follow the cluster norm.
  • Per-session distillation cache. Within a single bench run, the same session's distilled candidates are reused across questions that share it. The cache is in-memory, scoped to one provider instance, and keyed by session.sessionId. Cache hits return exactly what the first distill produced for that session — LLM output isn't strictly deterministic even at temperature=0, so the cache also has the side effect of making within-run distillation reproducible across questions that share sessions.
  • MEMENTO_AWAIT_INDEXING_MS defaults to 180s. Memento's auto-embed is fire-and-forget per write; the bench's first read happens after this deadline regardless. For runs where many questions × many memories per scope outpace the local CPU embedder (notably the larger LongMemEval questions, which have ~50 sessions × ~25 candidates each), operators can bump this knob so the embedding queue drains before search. Documented in the README.
  • Workspace scope per question. Memento's session scope requires a ULID; memorybench passes arbitrary containerTag strings. Using {type: 'workspace', path: '/memorybench/<containerTag>'} gives us a stable, immutable scope per question that respects Memento's scope-is-immutable rule. The reverse-mapping is the natural fit; happy to discuss alternatives.

Testing

Manually tested end-to-end against LoCoMo and LongMemEval (limit and random-sample modes), including a --resume flow after an Anthropic Overloaded 529 mid-run. No new unit tests added — the provider package mirrors the structure of the other four providers, none of which carry their own test files in this repo. Happy to write some if you'd prefer a different bar than the existing precedent.

Verified runs were on:

  • bun --version 1.3.13
  • node --version v22.19.0 (the workspace Node)
  • sonnet-4.6 across answering / distillation / judge; opus-4.6 separately exercised as the distillation model with sonnet-4.6 for answering + judge

Backwards compatibility

Strictly additive. The new provider is opt-in via -p memento. The orchestrator, the dataset loaders, the judges, and the four existing providers are not touched. The new dependency (@modelcontextprotocol/sdk) is a small library that adds ~1 MB to node_modules; it loads with the providers registry like any other static dep.

The two new model aliases are also additive — they don't change resolution for any existing alias.

Future work (out of scope for this PR)

  • If maintainers want per-provider unit tests as a new bar across the repo, the natural path for Memento is to mock @modelcontextprotocol/sdk so the provider can be tested without spawning a real memento serve. Left out here to match the current precedent (no per-provider unit tests in src/providers/).
  • ConvoMem hasn't been exercised end-to-end in this PR (only LoCoMo and LongMemEval). The provider is generic over UnifiedSession, so it should work without changes, but I haven't run it.

License + provenance

Memento is Apache-2.0 (same as memorybench). The provider code in this PR is original and licenses under the same terms.

Adds the `memento` memory provider that exercises Memento
(https://github.com/veerps57/memento) — a local-first, MCP-native
memory layer for AI assistants. The provider spawns `memento serve`
as a stdio MCP subprocess and routes ingest, search, and clear
through MCP tool calls.

Memento is designed to store **distilled assertions, not transcripts**.
In production the calling AI assistant uses its own LLM to decide
what's worth remembering, then hands those candidates to Memento's
`extract_memory` MCP tool, which embeds, scrubs, dedups, and persists.
To faithfully represent that flow inside the bench (which only hands
the provider raw `UnifiedSession` transcripts), this provider performs
the same distillation step itself — calling the configured LLM per
session and passing the resulting candidates to `extract_memory`.

Per-question isolation uses Memento's `workspace` scope keyed by the
benchmark's `containerTag` (Memento's `session.id` requires a 26-char
ULID, while `containerTag` is an arbitrary string). One DB, one server,
many scopes.

Provider config (env):
  MEMENTO_BIN                  shell-like command for `memento serve`
                               (default: "npx -y @psraghuveer/memento")
  MEMENTO_BENCH_DB             SQLite path (default: /tmp/memento-bench-<ts>.db)
  MEMENTO_DISTILL_MODEL        LLM alias for distillation
                               (defaults to memorybench's answering model)
  MEMENTO_BENCH_SEARCH_LIMIT   top-K returned by search_memory (default: 30)
  MEMENTO_AWAIT_INDEXING_MS    per-question polling deadline (default: 180000)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​modelcontextprotocol/​sdk@​1.29.09910010095100

View full report

veerps57 added a commit to veerps57/memento that referenced this pull request May 15, 2026
## Problem

Two needs motivated this work; running against the bench surfaced two
more engine gaps that are shipped fixed in the same PR.

1. **Memento has no published end-to-end benchmark.** The MCP-registry
launch built credibility on architectural commitments and a clean API;
the next step toward trust is "here are the numbers, here is how to
reproduce them." We need a reusable harness against the de-facto
industry datasets (LoCoMo, LongMemEval) so a sceptical engineer can
re-run a baseline on their laptop and verify.
2. **Memento's `extract_memory` contract had several first-try-wrong
footguns for AI assistants doing distillation.** While building the
benchmark provider, every distillation attempt failed silently because
the candidate shape, the topic-line requirement, and the async-mode
receipt semantics weren't surfaced in the MCP tool description (some
were in the skill but invisible to a tool consumer reading only
`tools/list`). The tool's own example was itself non-validating.
Tag-regex failures returned a bare `(root): Invalid`. These are the kind
of friction that turns "give Memento a try" into "give Memento up."
3. **FTS5 recall was stem-blind for prose queries.** The bench's first
low-score question (LongMemEval `d24813b1`, "tips on what to bake for
colleagues") missed the gold-truth memory because the memory said
"colleague's going-away party" and the query said "colleagues" — the
default `unicode61` FTS5 tokenizer treats them as different tokens. The
`retrieval.fts.tokenizer` config key advertised `porter` as a tunable
alternative, but no migration or runtime code ever read it: dead-code
tunability. Vector embedding rescued some morphological misses, but not
enough, and the failure mode is exactly the one a durable memory layer
needs to handle well (the speaker's wording and the future question's
wording rarely match in surface form).
4. **`embedder-local.embedBatch` was sequential under the hood.** The
implementation looped `extractor(text, ...)` per row with a comment
pointing at the transformers.js v2 limitation. transformers.js v3
(already pinned via `^3.0.0`) accepts an array input and runs one
forward pass for the whole batch — verified row-by-row numerically
identical to the single-call form.

## Change

Four coordinated workstreams ship together. The bench surfaced (3) and
(4); the fixes that close them improve Memento for every assistant doing
memory work, not just for the bench's score.

- **Bench driver — `scripts/bench.mjs` + `docs/guides/benchmark.md`.** A
vanilla-Node ESM driver that builds Memento, stages a memorybench fork
at a pinned ref (or a local checkout via `--memorybench-dir`), spawns
one `bun run src/index.ts run -p memento -b <bench>` per requested
benchmark, and writes a single summary markdown to `bench/<ts>.md` (the
`bench/` directory is git-ignored). Defaults to LoCoMo + LongMemEval
with `sonnet-4.6` pinned for judge + answering + distillation — the
model class that actually shows up on the conversation side in real
Memento usage (Claude Code, Cursor, and Claude Desktop are the
MCP-using-client majority, and `extract_memory` distillation happens in
*that* same assistant). Sonnet 4.6 supports `temperature=0`
(deterministic at the model layer) and the alias is registered in the
fork's `MODEL_CONFIGS`. Top-K=30, 180s indexing deadline. Per-phase
concurrency flags pass through to memorybench so a slow embedder or a
throttled Anthropic endpoint can be tamed (`--concurrency-ingest=1` is
the safe knob for `sonnet-4.6` under bursty rate-limit pressure, and it
also lets the provider's per-session distillation cache hit when
questions share sessions). The driver spawns the locally built CLI via
`process.execPath` and asserts `better-sqlite3` loads under that exact
Node before doing any expensive work, so a `nvm + homebrew` PATH
cocktail can't crash the run with a confusing "MCP error -32000:
Connection closed". A `--resume=<runId>` flag picks up at the failed
phase of the failed question for a crashed run (memorybench's
orchestrator checkpoints after every phase boundary); the runId is
logged on a dedicated line in the bench log and reprinted as a
copy-pasteable command on any non-zero exit. `--out` anchors to the
Memento repo root regardless of `cwd` so running from inside the fork
checkout doesn't leak the output directory into the fork worktree. The
provider implementation itself lives in a fork of
`supermemoryai/memorybench`
([`veerps57/memorybench@add-memento-provider`](https://github.com/veerps57/memorybench/tree/add-memento-provider)).

- **`extract_memory` tool surface + distillation craft.** The MCP tool
description on `extract_memory` states the candidate-shape difference
from `write_memory` (flat `kind` enum, top-level
`rationale`/`language`), the `topic: value\n\nprose` requirement for
`preference`/`decision` kinds, the `storedConfidence: 0.8`
async-default, and the receipt-not-failure semantics of `mode: "async"`.
The inline example exercises four kinds with the correct field placement
— including a `preference` candidate that opens with the required
topic-line and a `decision` candidate with top-level `rationale`.
`TagSchema` carries a custom error message listing the allowed charset
so `April 15, 2026` produces an actionable diagnostic instead of a bare
"Invalid". The skill (`skills/memento/SKILL.md`), persona-snippet guide
(`docs/guides/teach-your-assistant.md`), and the landing-page
persona-snippet mirror (`packages/landing/src/App.tsx`) carry a
"Distillation craft" section that frames the task as **retrieval
indexing for unknown future queries** (not summarisation for a reader)
and codifies six rules in priority order: (1) preserve specific terms —
proper nouns, identity qualifiers, named entities, places, and the
specific object of every action; (2) capture facts about every named
participant, not only the user — a friend the user mentions ("my friend
Alex is moving to Berlin for a SAP job") or a co-speaker in a
multi-party transcript both deserve candidates attributed to the right
named person, not collapsed onto the user; (3) emit a candidate for
every dated event with the date resolved against the session anchor,
never collapsing it to an untimed habit; (4) capture precursor actions
alongside outcomes — "researched X then chose Y" emits two candidates,
since future questions can target either step; (5) don't squash
enumerations into category labels; (6) bias toward inclusion — the
server dedups via embedding similarity, so over-including is cheap and
under-including is permanent. A pre-emit self-check ("did every date,
named entity, and verb-with-specific-object map to a candidate?") sits
alongside the rules in each surface.

- **Porter stemming for FTS5 — migration 0008 + honoured config +
default flip.** `memories_fts` is now built with `tokenize='porter
unicode61'` instead of the default `unicode61`. The chain runs
right-to-left: unicode61 splits + diacritic-folds first (so non-ASCII
content still tokenises correctly — German umlauts, French diacritics,
Japanese katakana all survive intact, covered by tests), then porter
stems the resulting ASCII tokens. "colleague", "colleagues", and
"colleague's" share a stem and match each other; "bake" matches "baking"
/ "baked" / "bakes"; "research" matches "researched" / "researches";
"agency" matches "agencies". The `retrieval.fts.tokenizer` config key
now defaults to `porter` and is documented as honoured by the FTS index
(it was previously declared but ignored by the migration — dead-code
tunability that this change makes real). Migration 0008 drops and
rebuilds `memories_fts` with the new tokenizer, preserving stable rowids
via the `memories_fts_map` table; the runner applies it on first server
start after upgrade, so no operator action is required. Six new unit
tests cover stem-variant matching (plural/singular, verb-form pairs),
pre-migration re-indexing, the insert/update/delete triggers carrying
the new tokenizer through write-path operations, and non-ASCII
preservation. The trade-off accepted is porter's known over-stems
(organize/organic, universe/university); for Memento's dominant query
distribution — assistants asking about durable user state in natural
language — recall on stem variants is worth more than precision on these
edge cases. Operators who need the older behaviour can author a
follow-up migration; the config key documents the option.

- **Embedder perf — real batched feature-extraction in
`@psraghuveer/memento-embedder-local`.** `embedBatch` now uses
transformers.js v3's array-input pipeline, which runs one forward pass
for the whole batch instead of looping per text. Numerically identical
to the single-call form (verified row-by-row against the same input).
Measured ~1.8× speedup on a 3-input batch with `bge-base-en-v1.5` on
CPU; the speedup grows with batch size because tokenisation and pipeline
setup amortise across the batch. The loader contract now returns `{
embed, embedBatch? }` instead of a bare `embed` function; loaders that
omit `embedBatch` fall back to the previous sequential behaviour, so
test fixtures and bespoke implementations keep working unchanged. Seven
new unit tests cover the fast path, the sequential fallback, empty-input
short-circuit, runtime-row-count mismatch, per-row dimension validation,
batched `maxInputBytes` truncation, and whole-batch timeout. The
`EmbeddingProvider.embedBatch` surface in `@psraghuveer/memento-core` is
unchanged and remains optional; existing call sites that go through
`embedBatchFallback` (`pack.install`, `import`, `embedding.rebuild`, the
synchronous extract slow-path) automatically pick up the fast path.

## Justification against the four principles

- **First principles.** The benchmark driver introduces no new
behavioural constants in Memento itself — every knob is a CLI flag or
env var declared in `DEFAULTS` at the top of `bench.mjs`. The
tool-description changes surface constraints that already existed in
code (schema validation, the conflict-detector's topic-line parsing)
where an assistant reading `tools/list` will see them. The FTS-tokenizer
change does flip a default (`retrieval.fts.tokenizer: unicode61 →
porter`), but the migration that effects it is the canonical Memento way
of evolving stored state — and the config key that controls it has
shipped since the registry release and is now actually honoured. The
embedder change is purely a perf path; semantics are byte-identical to
the previous behaviour.
- **Modular.** `scripts/bench.mjs` is a thin driver — the provider lives
in a fork of memorybench, the harness is memorybench's, and the
judge/answering models are memorybench's. The Memento side is one script
+ one guide + a pinned fork ref. The distillation-clarity changes touch
documentation and a single Zod error message; no behavioural code paths
are added. The FTS change is a single migration file + one config-key
default flip. The embedder change extends the existing `EmbedRuntime`
shape with an optional `embedBatch` and adds a single fast-path branch
in the wrapper; the rest is sequential-fallback compatibility.
- **Extensible.** Adding a third benchmark (ConvoMem) is a one-line
change to `DEFAULTS.benchmarks`. Adding a different judge family means
pointing `--judge` at a different model alias; the script's API-key
check fans out by family. The skill's distillation-craft section is
positioned so a future contributor can extend the rules without
restructuring. The FTS migration's pattern (drop → rebuild → repopulate
→ retrigger) is the same as 0005's — future tokenizer changes follow the
same template. The loader contract's optional `embedBatch` lets bespoke
embedders opt into batching when their runtime supports it, without
forcing a contract upgrade on the others.
- **Config-driven.** Every benchmark default (model, ref, limit,
concurrency, search-K, indexing deadline) is overridable from the
command line or env. The FTS-tokenizer choice is
`retrieval.fts.tokenizer` — operators can stay on `unicode61` by setting
it before first server start and recreating the FTS table via a
follow-up migration. The embedder change adds no new config key (the
runtime contract change is internal); operators with custom loaders are
unaffected by default.

## Alternatives considered

- **Vendor memorybench inside the Memento repo.** Rejected: keeps the
harness external (so we don't own its release cadence) and lets the
provider land as a normal contribution upstream. The driver pulls a
pinned fork ref, so reproduction is exact.
- **Add LLM-driven distillation inside `extract_memory` itself.**
Rejected: Memento's architectural commitment is local-first and
LLM-agnostic. Baking in an LLM would either pull in a cloud provider
(breaking local-first) or ship a bundled local model (breaking
LLM-agnostic and adding ops complexity). Distillation belongs to the
calling AI assistant, where the conversation context lives. The bench
provider does its own distill step to mirror that flow.
- **Re-design the candidate shape so `write_memory` and `extract_memory`
accept the same payload.** Considered, rejected: the discriminated-union
shape on `write_memory` is the right design for a single-row call where
kind-specific metadata is the point; the flat shape on `extract_memory`
is the right design for a batch where the per-item type is data, not a
routing tag. Documenting the difference is correct; collapsing them
would weaken both APIs.
- **Keep `unicode61` as the FTS default and ship porter as an opt-in
only.** Rejected: the `retrieval.fts.tokenizer` config key was already
documented as the operator-tunable knob, and validation of the porter
path on a real bench question showed unicode61 missing the gold-truth
memory at the FTS layer entirely. The migration is the right place to
flip the default because anyone who actively wants unicode61 can author
a follow-up migration; the silent majority who never touched the key get
a measurable recall improvement.
- **Heavier embedder optimisations — quantisation (`dtype: 'q8'`),
worker thread, WebGPU.** Deferred: quantisation is a recall trade-off
that needs its own evaluation pass; worker threads improve event-loop
responsiveness without raising throughput on a single CPU; WebGPU only
helps browser hosts (Memento runs on Node). Real batched
feature-extraction is the largest no-trade-off win available today, so
it's the one shipped here.

## Tests

- [x] Unit — full unit suite passes on this branch, plus 13 new tests
(six for migration 0008 covering stem variants, pre-migration
re-indexing, triggers, and non-ASCII preservation; seven for the
embedder fast-path and sequential-fallback paths).
- [ ] Integration — N/A; no new integration paths added beyond the
existing extract path which is already integration-tested.
- [x] Migration — `0008_fts_porter_tokenizer` is forward-only,
idempotent on a fresh DB, and verified end-to-end against a pre-0008
install via `MIGRATIONS.slice(0, 7)` in the test suite.
- [x] End-to-end — the existing `serve` e2e passes. The bench itself is
the new end-to-end exercise but is not part of `pnpm verify` for the
reasons documented in `docs/guides/benchmark.md` (it needs network,
judge API keys, and hours of wall-clock — CI must pass offline). A
focused 1Q LongMemEval validation against the baking question confirmed
the porter fix lifts that question from 0 → 1 correct, with the
lemon-poppyseed memory ranking #4 in retrieval where previously it
didn't reach top-30.
- [ ] N/A — see above.

## Local verification

- [x] `pnpm verify` (<!-- verify-chain:begin -->lint → typecheck → build
→ test → test:e2e → docs:lint → docs:reflow:check → docs:links →
docs:check → format:packs:check → server-json:check<!-- verify-chain:end
-->) — all green at branch HEAD.
- [x] `pnpm docs:generate` — run;
`docs/reference/{cli,mcp-tools,config-keys}.md`, `AGENTS.md`,
`CONTRIBUTING.md`, `.github/copilot-instructions.md`, and
`.github/PULL_REQUEST_TEMPLATE.md` regenerated to pick up the new
`extract_memory` description, the `TagSchema` error message, and the
`retrieval.fts.tokenizer` default + description.

## ADR

- [ ] An ADR is required and is included in this PR.
- [ ] An ADR is required and exists already (link below).
- [x] No ADR required (explain why):

The bench driver, the tool-description changes, the embedder fast path,
and the FTS tokenizer migration are all within the ADR exemption list in
`AGENTS.md`:

- The bench driver is optional tooling — it adds a script and a guide,
doesn't change the public surface, the data model, scope semantics, or
any top-level dependency.
- The `extract_memory` tool-description and `TagSchema` error-message
changes make existing contracts more discoverable without changing them.
- The embedder fast path is a perf optimisation with byte-identical
output; no semantic change.
- The FTS tokenizer change is a forward-only migration that honours an
already-documented config key (`retrieval.fts.tokenizer`). The default
flip is operator-visible but it neither introduces a new behavioural
constant nor changes a contract — it activates a knob that already
shipped. Memento's stance on tokenizer choice was always
"operator-configurable, default may evolve as the use case sharpens"
(per the config key's description).

## AI involvement

- [ ] No AI assistance.
- [ ] AI assistance for boilerplate / drafting only.
- [x] AI authored substantial portions. I have verified every line.

The bench driver, the provider in the memorybench fork, the audit of
`extract_memory`'s distillation-friction surface, the porter migration +
tests, the embedder batching + tests, and the prose updates to the skill
/ persona guide / landing snippet were drafted with Claude. Every change
was reviewed and exercised end-to-end against LoCoMo and LongMemEval
smokes through the full pipeline (distill → write → indexing → search →
answer → judge). The Zod error-message change and the tool-description
text were verified against the actual code paths they describe. The
porter fix specifically was validated by re-running the same failed
bench question against the new code and confirming the gold-truth memory
now ranks at the top of the retrieved set with the same models, same
haystack, same scope.

## Linked issues

Corresponding memorybench PR:
supermemoryai/memorybench#43

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant