Add Memento provider by veerps57 · Pull Request #43 · supermemoryai/memorybench

veerps57 · 2026-05-15T05:30:29Z

Summary

Adds Memento — a local-first, MCP-native memory layer published on the MCP registry — as a fifth Provider option in memorybench, alongside Supermemory, Mem0, Zep, and the filesystem baseline. Memento runs as a stdio subprocess against a single SQLite database; this integration exercises it through that same MCP transport, so the bench measures the realistic shape (process spawn, JSON-RPC over stdio, async write + auto-embed, hybrid FTS+vector retrieval).

bun run src/index.ts run -p memento -b locomo

No API key required for Memento itself. The provider performs a per-session distillation step via an LLM call before handing candidates to Memento's extract_memory tool; the model is configured via the MEMENTO_DISTILL_MODEL env var and falls back to memorybench's DEFAULT_ANSWERING_MODEL constant (gpt-4o) when unset. So an OpenAI key is needed for the default distill model; set MEMENTO_DISTILL_MODEL to use a different family.

Motivation

Memento was published to the MCP registry yesterday — a local-first, MCP-native memory layer for AI assistants that need durable, structured memory. Publishing a memory project on the registry without numbers next to the established ones felt incomplete; this PR proposes the integration so Memento can be measured against Supermemory, Mem0, Zep, and the filesystem baseline on the same harness, datasets, and judges. The honest way to answer "how does it compare?" is to run it through your bench and let the numbers speak.

We're not asking for any reorientation of the bench. Memento slots into the existing Provider interface unchanged. The harness, datasets, judges, and other providers are not touched.

How the provider works

MementoProvider implements the five-method Provider interface:

initialize — spawns memento serve --db <tmp> via @modelcontextprotocol/sdk's StdioClientTransport, asserts the required MCP tools (extract_memory, search_memory, forget_many_memories) are present on tools/list, runs one warmup write+forget pair so the embedder model is loaded before the first benchmark question, and calls .unref() on the spawned child process and its stdio pipes so the Node event loop doesn't keep waiting on the subprocess after the bench's main work completes (without this, the bench hangs at exit on Run complete!).
ingest — for each session in the UnifiedSession, calls an LLM to distill the transcript into structured {kind, content, summary?} candidates, then hands the batch to Memento's extract_memory MCP tool. Memento embeds, scrubs, dedups, and persists. The distillation model is read from MEMENTO_DISTILL_MODEL and falls back to memorybench's DEFAULT_ANSWERING_MODEL constant when unset. Memories land under scope = {type: 'workspace', path: '/memorybench/<containerTag>'} (Memento's session scope requires a ULID, while memorybench's containerTag is an arbitrary string — workspace scope is the right isolation primitive). Each candidate carries benchmark:memorybench, session:<id>, and (when present) session-date:<iso> tags. A per-run distillation cache keyed by session.sessionId deduplicates LLM calls when questions in the same conversation share sessions; the cached output is whatever the first distill produced for that session.
awaitIndexing — polls search_memory on the question's scope until every result has embeddingStatus !== 'pending' (or MEMENTO_AWAIT_INDEXING_MS, default 180s, elapses). Memento's auto-embed runs fire-and-forget after each write; this is the bridge.
search — runs search_memory with the question's scope filter, projection: 'full', and limit: this.searchLimit (default 30, overridable via MEMENTO_BENCH_SEARCH_LIMIT). The 30 default is the same number Supermemory and Mem0 use — every distill-style provider that I read in this repo overrides the orchestrator's limit: 10 to give the answering model a richer haystack.
clear — calls forget_many_memories with the per-question scope as the filter. The orchestrator doesn't call this in normal runs, but it's there for partial-rerun recovery.

The provider supplies a custom answerPrompt (src/providers/memento/prompts.ts) that presents each retrieved memory with its score, kind, and session date — the latter being the temporal anchor Memento captures during distillation. Format follows the same shape as filesystem/prompts.ts and supermemory/prompts.ts: structured retrieved-context block + numbered "How to Answer" steps + "I don't know" refusal clause + Reasoning/Answer output template.

What's in the PR

New files (all under src/providers/memento/):

index.ts — provider class (MementoProvider), session ingest with the distillation cache, per-question scope mapping, MCP client lifecycle. ~420 lines.
distill.ts — the per-session LLM distillation step. Reads MEMENTO_DISTILL_MODEL (falling back to DEFAULT_ANSWERING_MODEL), builds a transcript prompt, parses the JSON response into typed candidates. The prompt codifies six craft rules (preserve specific terms; capture facts about every named participant; emit a candidate for every dated event; capture precursor actions alongside outcomes; don't squash enumerations; bias toward inclusion). ~235 lines.
prompts.ts — custom ProviderPrompts with the Memento answerPrompt. ~70 lines.
mcp-helpers.ts — parseSearchPage and parseToolResultJson helpers for the MCP-result envelope. ~85 lines.

Edits to existing files:

src/providers/index.ts — registers memento: MementoProvider in the providers map.
src/types/provider.ts — adds "memento" to the ProviderName union.
src/utils/config.ts — getProviderConfig('memento') returns minimal config (Memento has no API key; configuration is via the MEMENTO_* env vars documented in the README).
src/cli/index.ts — extends the help providers printer.
src/utils/models.ts — adds two new Anthropic aliases (sonnet-4.6 → claude-sonnet-4-6, opus-4.6 → claude-opus-4-6). These are catalog additions, useful to all providers, not Memento-specific — happy to split into a separate PR if you'd prefer; they were a dependency of Memento's default answering-model choice on local validation.
README.md — adds memento to the -p flag's provider list and documents the MEMENTO_* env vars alongside the existing provider config block.
package.json — adds @modelcontextprotocol/sdk: ^1.29.0 (the canonical MCP client library; pin matches what Memento itself ships).

Design choices worth flagging

A few choices that look like they could be tuning if you don't read the implementation — happy to revisit any of them:

Search returns 30 not 10. The orchestrator passes limit: 10 and every distill-style provider in this repo (supermemory, mem0's fallback) overrides it the same way. The 10 default is used for Hit@K computation; the answering model still benefits from a richer haystack. We follow the cluster norm.
Per-session distillation cache. Within a single bench run, the same session's distilled candidates are reused across questions that share it. The cache is in-memory, scoped to one provider instance, and keyed by session.sessionId. Cache hits return exactly what the first distill produced for that session — LLM output isn't strictly deterministic even at temperature=0, so the cache also has the side effect of making within-run distillation reproducible across questions that share sessions.
MEMENTO_AWAIT_INDEXING_MS defaults to 180s. Memento's auto-embed is fire-and-forget per write; the bench's first read happens after this deadline regardless. For runs where many questions × many memories per scope outpace the local CPU embedder (notably the larger LongMemEval questions, which have ~50 sessions × ~25 candidates each), operators can bump this knob so the embedding queue drains before search. Documented in the README.
Workspace scope per question. Memento's session scope requires a ULID; memorybench passes arbitrary containerTag strings. Using {type: 'workspace', path: '/memorybench/<containerTag>'} gives us a stable, immutable scope per question that respects Memento's scope-is-immutable rule. The reverse-mapping is the natural fit; happy to discuss alternatives.

Testing

Manually tested end-to-end against LoCoMo and LongMemEval (limit and random-sample modes), including a --resume flow after an Anthropic Overloaded 529 mid-run. No new unit tests added — the provider package mirrors the structure of the other four providers, none of which carry their own test files in this repo. Happy to write some if you'd prefer a different bar than the existing precedent.

Verified runs were on:

bun --version 1.3.13
node --version v22.19.0 (the workspace Node)
sonnet-4.6 across answering / distillation / judge; opus-4.6 separately exercised as the distillation model with sonnet-4.6 for answering + judge

Backwards compatibility

Strictly additive. The new provider is opt-in via -p memento. The orchestrator, the dataset loaders, the judges, and the four existing providers are not touched. The new dependency (@modelcontextprotocol/sdk) is a small library that adds ~1 MB to node_modules; it loads with the providers registry like any other static dep.

The two new model aliases are also additive — they don't change resolution for any existing alias.

Future work (out of scope for this PR)

If maintainers want per-provider unit tests as a new bar across the repo, the natural path for Memento is to mock @modelcontextprotocol/sdk so the provider can be tested without spawning a real memento serve. Left out here to match the current precedent (no per-provider unit tests in src/providers/).
ConvoMem hasn't been exercised end-to-end in this PR (only LoCoMo and LongMemEval). The provider is generic over UnifiedSession, so it should work without changes, but I haven't run it.

License + provenance

Memento is Apache-2.0 (same as memorybench). The provider code in this PR is original and licenses under the same terms.

Adds the `memento` memory provider that exercises Memento (https://github.com/veerps57/memento) — a local-first, MCP-native memory layer for AI assistants. The provider spawns `memento serve` as a stdio MCP subprocess and routes ingest, search, and clear through MCP tool calls. Memento is designed to store **distilled assertions, not transcripts**. In production the calling AI assistant uses its own LLM to decide what's worth remembering, then hands those candidates to Memento's `extract_memory` MCP tool, which embeds, scrubs, dedups, and persists. To faithfully represent that flow inside the bench (which only hands the provider raw `UnifiedSession` transcripts), this provider performs the same distillation step itself — calling the configured LLM per session and passing the resulting candidates to `extract_memory`. Per-question isolation uses Memento's `workspace` scope keyed by the benchmark's `containerTag` (Memento's `session.id` requires a 26-char ULID, while `containerTag` is an arbitrary string). One DB, one server, many scopes. Provider config (env): MEMENTO_BIN shell-like command for `memento serve` (default: "npx -y @psraghuveer/memento") MEMENTO_BENCH_DB SQLite path (default: /tmp/memento-bench-<ts>.db) MEMENTO_DISTILL_MODEL LLM alias for distillation (defaults to memorybench's answering model) MEMENTO_BENCH_SEARCH_LIMIT top-K returned by search_memory (default: 30) MEMENTO_AWAIT_INDEXING_MS per-question polling deadline (default: 180000) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

socket-security · 2026-05-15T05:31:19Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	@modelcontextprotocol/sdk@1.29.0

View full report

## Problem Two needs motivated this work; running against the bench surfaced two more engine gaps that are shipped fixed in the same PR. 1. **Memento has no published end-to-end benchmark.** The MCP-registry launch built credibility on architectural commitments and a clean API; the next step toward trust is "here are the numbers, here is how to reproduce them." We need a reusable harness against the de-facto industry datasets (LoCoMo, LongMemEval) so a sceptical engineer can re-run a baseline on their laptop and verify. 2. **Memento's `extract_memory` contract had several first-try-wrong footguns for AI assistants doing distillation.** While building the benchmark provider, every distillation attempt failed silently because the candidate shape, the topic-line requirement, and the async-mode receipt semantics weren't surfaced in the MCP tool description (some were in the skill but invisible to a tool consumer reading only `tools/list`). The tool's own example was itself non-validating. Tag-regex failures returned a bare `(root): Invalid`. These are the kind of friction that turns "give Memento a try" into "give Memento up." 3. **FTS5 recall was stem-blind for prose queries.** The bench's first low-score question (LongMemEval `d24813b1`, "tips on what to bake for colleagues") missed the gold-truth memory because the memory said "colleague's going-away party" and the query said "colleagues" — the default `unicode61` FTS5 tokenizer treats them as different tokens. The `retrieval.fts.tokenizer` config key advertised `porter` as a tunable alternative, but no migration or runtime code ever read it: dead-code tunability. Vector embedding rescued some morphological misses, but not enough, and the failure mode is exactly the one a durable memory layer needs to handle well (the speaker's wording and the future question's wording rarely match in surface form). 4. **`embedder-local.embedBatch` was sequential under the hood.** The implementation looped `extractor(text, ...)` per row with a comment pointing at the transformers.js v2 limitation. transformers.js v3 (already pinned via `^3.0.0`) accepts an array input and runs one forward pass for the whole batch — verified row-by-row numerically identical to the single-call form. ## Change Four coordinated workstreams ship together. The bench surfaced (3) and (4); the fixes that close them improve Memento for every assistant doing memory work, not just for the bench's score. - **Bench driver — `scripts/bench.mjs` + `docs/guides/benchmark.md`.** A vanilla-Node ESM driver that builds Memento, stages a memorybench fork at a pinned ref (or a local checkout via `--memorybench-dir`), spawns one `bun run src/index.ts run -p memento -b <bench>` per requested benchmark, and writes a single summary markdown to `bench/<ts>.md` (the `bench/` directory is git-ignored). Defaults to LoCoMo + LongMemEval with `sonnet-4.6` pinned for judge + answering + distillation — the model class that actually shows up on the conversation side in real Memento usage (Claude Code, Cursor, and Claude Desktop are the MCP-using-client majority, and `extract_memory` distillation happens in *that* same assistant). Sonnet 4.6 supports `temperature=0` (deterministic at the model layer) and the alias is registered in the fork's `MODEL_CONFIGS`. Top-K=30, 180s indexing deadline. Per-phase concurrency flags pass through to memorybench so a slow embedder or a throttled Anthropic endpoint can be tamed (`--concurrency-ingest=1` is the safe knob for `sonnet-4.6` under bursty rate-limit pressure, and it also lets the provider's per-session distillation cache hit when questions share sessions). The driver spawns the locally built CLI via `process.execPath` and asserts `better-sqlite3` loads under that exact Node before doing any expensive work, so a `nvm + homebrew` PATH cocktail can't crash the run with a confusing "MCP error -32000: Connection closed". A `--resume=<runId>` flag picks up at the failed phase of the failed question for a crashed run (memorybench's orchestrator checkpoints after every phase boundary); the runId is logged on a dedicated line in the bench log and reprinted as a copy-pasteable command on any non-zero exit. `--out` anchors to the Memento repo root regardless of `cwd` so running from inside the fork checkout doesn't leak the output directory into the fork worktree. The provider implementation itself lives in a fork of `supermemoryai/memorybench` ([`veerps57/memorybench@add-memento-provider`](https://github.com/veerps57/memorybench/tree/add-memento-provider)). - **`extract_memory` tool surface + distillation craft.** The MCP tool description on `extract_memory` states the candidate-shape difference from `write_memory` (flat `kind` enum, top-level `rationale`/`language`), the `topic: value\n\nprose` requirement for `preference`/`decision` kinds, the `storedConfidence: 0.8` async-default, and the receipt-not-failure semantics of `mode: "async"`. The inline example exercises four kinds with the correct field placement — including a `preference` candidate that opens with the required topic-line and a `decision` candidate with top-level `rationale`. `TagSchema` carries a custom error message listing the allowed charset so `April 15, 2026` produces an actionable diagnostic instead of a bare "Invalid". The skill (`skills/memento/SKILL.md`), persona-snippet guide (`docs/guides/teach-your-assistant.md`), and the landing-page persona-snippet mirror (`packages/landing/src/App.tsx`) carry a "Distillation craft" section that frames the task as **retrieval indexing for unknown future queries** (not summarisation for a reader) and codifies six rules in priority order: (1) preserve specific terms — proper nouns, identity qualifiers, named entities, places, and the specific object of every action; (2) capture facts about every named participant, not only the user — a friend the user mentions ("my friend Alex is moving to Berlin for a SAP job") or a co-speaker in a multi-party transcript both deserve candidates attributed to the right named person, not collapsed onto the user; (3) emit a candidate for every dated event with the date resolved against the session anchor, never collapsing it to an untimed habit; (4) capture precursor actions alongside outcomes — "researched X then chose Y" emits two candidates, since future questions can target either step; (5) don't squash enumerations into category labels; (6) bias toward inclusion — the server dedups via embedding similarity, so over-including is cheap and under-including is permanent. A pre-emit self-check ("did every date, named entity, and verb-with-specific-object map to a candidate?") sits alongside the rules in each surface. - **Porter stemming for FTS5 — migration 0008 + honoured config + default flip.** `memories_fts` is now built with `tokenize='porter unicode61'` instead of the default `unicode61`. The chain runs right-to-left: unicode61 splits + diacritic-folds first (so non-ASCII content still tokenises correctly — German umlauts, French diacritics, Japanese katakana all survive intact, covered by tests), then porter stems the resulting ASCII tokens. "colleague", "colleagues", and "colleague's" share a stem and match each other; "bake" matches "baking" / "baked" / "bakes"; "research" matches "researched" / "researches"; "agency" matches "agencies". The `retrieval.fts.tokenizer` config key now defaults to `porter` and is documented as honoured by the FTS index (it was previously declared but ignored by the migration — dead-code tunability that this change makes real). Migration 0008 drops and rebuilds `memories_fts` with the new tokenizer, preserving stable rowids via the `memories_fts_map` table; the runner applies it on first server start after upgrade, so no operator action is required. Six new unit tests cover stem-variant matching (plural/singular, verb-form pairs), pre-migration re-indexing, the insert/update/delete triggers carrying the new tokenizer through write-path operations, and non-ASCII preservation. The trade-off accepted is porter's known over-stems (organize/organic, universe/university); for Memento's dominant query distribution — assistants asking about durable user state in natural language — recall on stem variants is worth more than precision on these edge cases. Operators who need the older behaviour can author a follow-up migration; the config key documents the option. - **Embedder perf — real batched feature-extraction in `@psraghuveer/memento-embedder-local`.** `embedBatch` now uses transformers.js v3's array-input pipeline, which runs one forward pass for the whole batch instead of looping per text. Numerically identical to the single-call form (verified row-by-row against the same input). Measured ~1.8× speedup on a 3-input batch with `bge-base-en-v1.5` on CPU; the speedup grows with batch size because tokenisation and pipeline setup amortise across the batch. The loader contract now returns `{ embed, embedBatch? }` instead of a bare `embed` function; loaders that omit `embedBatch` fall back to the previous sequential behaviour, so test fixtures and bespoke implementations keep working unchanged. Seven new unit tests cover the fast path, the sequential fallback, empty-input short-circuit, runtime-row-count mismatch, per-row dimension validation, batched `maxInputBytes` truncation, and whole-batch timeout. The `EmbeddingProvider.embedBatch` surface in `@psraghuveer/memento-core` is unchanged and remains optional; existing call sites that go through `embedBatchFallback` (`pack.install`, `import`, `embedding.rebuild`, the synchronous extract slow-path) automatically pick up the fast path. ## Justification against the four principles - **First principles.** The benchmark driver introduces no new behavioural constants in Memento itself — every knob is a CLI flag or env var declared in `DEFAULTS` at the top of `bench.mjs`. The tool-description changes surface constraints that already existed in code (schema validation, the conflict-detector's topic-line parsing) where an assistant reading `tools/list` will see them. The FTS-tokenizer change does flip a default (`retrieval.fts.tokenizer: unicode61 → porter`), but the migration that effects it is the canonical Memento way of evolving stored state — and the config key that controls it has shipped since the registry release and is now actually honoured. The embedder change is purely a perf path; semantics are byte-identical to the previous behaviour. - **Modular.** `scripts/bench.mjs` is a thin driver — the provider lives in a fork of memorybench, the harness is memorybench's, and the judge/answering models are memorybench's. The Memento side is one script + one guide + a pinned fork ref. The distillation-clarity changes touch documentation and a single Zod error message; no behavioural code paths are added. The FTS change is a single migration file + one config-key default flip. The embedder change extends the existing `EmbedRuntime` shape with an optional `embedBatch` and adds a single fast-path branch in the wrapper; the rest is sequential-fallback compatibility. - **Extensible.** Adding a third benchmark (ConvoMem) is a one-line change to `DEFAULTS.benchmarks`. Adding a different judge family means pointing `--judge` at a different model alias; the script's API-key check fans out by family. The skill's distillation-craft section is positioned so a future contributor can extend the rules without restructuring. The FTS migration's pattern (drop → rebuild → repopulate → retrigger) is the same as 0005's — future tokenizer changes follow the same template. The loader contract's optional `embedBatch` lets bespoke embedders opt into batching when their runtime supports it, without forcing a contract upgrade on the others. - **Config-driven.** Every benchmark default (model, ref, limit, concurrency, search-K, indexing deadline) is overridable from the command line or env. The FTS-tokenizer choice is `retrieval.fts.tokenizer` — operators can stay on `unicode61` by setting it before first server start and recreating the FTS table via a follow-up migration. The embedder change adds no new config key (the runtime contract change is internal); operators with custom loaders are unaffected by default. ## Alternatives considered - **Vendor memorybench inside the Memento repo.** Rejected: keeps the harness external (so we don't own its release cadence) and lets the provider land as a normal contribution upstream. The driver pulls a pinned fork ref, so reproduction is exact. - **Add LLM-driven distillation inside `extract_memory` itself.** Rejected: Memento's architectural commitment is local-first and LLM-agnostic. Baking in an LLM would either pull in a cloud provider (breaking local-first) or ship a bundled local model (breaking LLM-agnostic and adding ops complexity). Distillation belongs to the calling AI assistant, where the conversation context lives. The bench provider does its own distill step to mirror that flow. - **Re-design the candidate shape so `write_memory` and `extract_memory` accept the same payload.** Considered, rejected: the discriminated-union shape on `write_memory` is the right design for a single-row call where kind-specific metadata is the point; the flat shape on `extract_memory` is the right design for a batch where the per-item type is data, not a routing tag. Documenting the difference is correct; collapsing them would weaken both APIs. - **Keep `unicode61` as the FTS default and ship porter as an opt-in only.** Rejected: the `retrieval.fts.tokenizer` config key was already documented as the operator-tunable knob, and validation of the porter path on a real bench question showed unicode61 missing the gold-truth memory at the FTS layer entirely. The migration is the right place to flip the default because anyone who actively wants unicode61 can author a follow-up migration; the silent majority who never touched the key get a measurable recall improvement. - **Heavier embedder optimisations — quantisation (`dtype: 'q8'`), worker thread, WebGPU.** Deferred: quantisation is a recall trade-off that needs its own evaluation pass; worker threads improve event-loop responsiveness without raising throughput on a single CPU; WebGPU only helps browser hosts (Memento runs on Node). Real batched feature-extraction is the largest no-trade-off win available today, so it's the one shipped here. ## Tests - [x] Unit — full unit suite passes on this branch, plus 13 new tests (six for migration 0008 covering stem variants, pre-migration re-indexing, triggers, and non-ASCII preservation; seven for the embedder fast-path and sequential-fallback paths). - [ ] Integration — N/A; no new integration paths added beyond the existing extract path which is already integration-tested. - [x] Migration — `0008_fts_porter_tokenizer` is forward-only, idempotent on a fresh DB, and verified end-to-end against a pre-0008 install via `MIGRATIONS.slice(0, 7)` in the test suite. - [x] End-to-end — the existing `serve` e2e passes. The bench itself is the new end-to-end exercise but is not part of `pnpm verify` for the reasons documented in `docs/guides/benchmark.md` (it needs network, judge API keys, and hours of wall-clock — CI must pass offline). A focused 1Q LongMemEval validation against the baking question confirmed the porter fix lifts that question from 0 → 1 correct, with the lemon-poppyseed memory ranking #4 in retrieval where previously it didn't reach top-30. - [ ] N/A — see above. ## Local verification - [x] `pnpm verify` (lint → typecheck → build → test → test:e2e → docs:lint → docs:reflow:check → docs:links → docs:check → format:packs:check → server-json:check) — all green at branch HEAD. - [x] `pnpm docs:generate` — run; `docs/reference/{cli,mcp-tools,config-keys}.md`, `AGENTS.md`, `CONTRIBUTING.md`, `.github/copilot-instructions.md`, and `.github/PULL_REQUEST_TEMPLATE.md` regenerated to pick up the new `extract_memory` description, the `TagSchema` error message, and the `retrieval.fts.tokenizer` default + description. ## ADR - [ ] An ADR is required and is included in this PR. - [ ] An ADR is required and exists already (link below). - [x] No ADR required (explain why): The bench driver, the tool-description changes, the embedder fast path, and the FTS tokenizer migration are all within the ADR exemption list in `AGENTS.md`: - The bench driver is optional tooling — it adds a script and a guide, doesn't change the public surface, the data model, scope semantics, or any top-level dependency. - The `extract_memory` tool-description and `TagSchema` error-message changes make existing contracts more discoverable without changing them. - The embedder fast path is a perf optimisation with byte-identical output; no semantic change. - The FTS tokenizer change is a forward-only migration that honours an already-documented config key (`retrieval.fts.tokenizer`). The default flip is operator-visible but it neither introduces a new behavioural constant nor changes a contract — it activates a knob that already shipped. Memento's stance on tokenizer choice was always "operator-configurable, default may evolve as the use case sharpens" (per the config key's description). ## AI involvement - [ ] No AI assistance. - [ ] AI assistance for boilerplate / drafting only. - [x] AI authored substantial portions. I have verified every line. The bench driver, the provider in the memorybench fork, the audit of `extract_memory`'s distillation-friction surface, the porter migration + tests, the embedder batching + tests, and the prose updates to the skill / persona guide / landing snippet were drafted with Claude. Every change was reviewed and exercised end-to-end against LoCoMo and LongMemEval smokes through the full pipeline (distill → write → indexing → search → answer → judge). The Zod error-message change and the tool-description text were verified against the actual code paths they describe. The porter fix specifically was validated by re-running the same failed bench question against the new code and confirming the gold-truth memory now ranks at the top of the retrieved set with the same models, same haystack, same scope. ## Linked issues Corresponding memorybench PR: supermemoryai/memorybench#43 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

veerps57 mentioned this pull request May 15, 2026

feat: memorybench harness + AI-assistant memory craft veerps57/memento#62

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Memento provider#43

Add Memento provider#43
veerps57 wants to merge 1 commit into
supermemoryai:mainfrom
veerps57:add-memento-provider

veerps57 commented May 15, 2026

Uh oh!

socket-security Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

veerps57 commented May 15, 2026

Summary

Motivation

How the provider works

What's in the PR

Design choices worth flagging

Testing

Backwards compatibility

Future work (out of scope for this PR)

License + provenance

Uh oh!

socket-security Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant