summarization

Summarization middleware for automatic and tool-based conversation compaction.

This module provides two middleware classes and a convenience factory:

SummarizationMiddleware — automatically compacts the conversation when token usage exceeds a configurable threshold.

Older messages are summarized via an LLM call and the full history is offloaded to a backend for later retrieval.
SummarizationToolMiddleware — exposes a compact_conversation tool that lets the agent (or a human-in-the-loop approval flow) trigger compaction on demand.

Composes with a SummarizationMiddleware instance and reuses its summarization engine.
create_summarization_tool_middleware — convenience factory that creates both middleware layers with model-aware defaults.

Usage

from deepagents import create_deep_agent
from deepagents.middleware.summarization import (
    SummarizationMiddleware,
    SummarizationToolMiddleware,
)
from deepagents.backends import FilesystemBackend

backend = FilesystemBackend(root_dir="/data")

summ = SummarizationMiddleware(
    model="gpt-4o-mini",
    backend=backend,
    trigger=("fraction", 0.85),
    keep=("fraction", 0.10),
)
tool_mw = SummarizationToolMiddleware(summ)

agent = create_deep_agent(middleware=[summ, tool_mw])

Storage

Offloaded messages are stored as markdown at /conversation_history/{thread_id}.md.

Each summarization event appends a new section to this file, creating a running log of all evicted messages.

Create a SummarizationToolMiddleware with model-aware defaults.

Convenience factory that creates a SummarizationMiddleware via create_summarization_middleware and wraps it in a SummarizationToolMiddleware.

Protocol for pluggable memory backends (single, unified).

Backends can store files in different locations (state, filesystem, database, etc.) and provide a uniform interface for file operations.

All file data is represented as dicts with the following structure::

{
    "content": str,  # Text content (utf-8) or base64-encoded binary
    "encoding": str,  # "utf-8" for text, "base64" for binary data
    "created_at": str,  # ISO format timestamp
    "modified_at": str,  # ISO format timestamp
}

Settings for truncating large tool-call arguments in older messages.

This is a lightweight, pre-summarization optimization that fires at a lower token threshold than full conversation compaction. When triggered, only the args values on AIMessage.tool_calls in messages before the keep window are shortened — recent messages are left intact.

Typical large arguments include write_file content, edit_file patches, and verbose execute outputs.

Middleware that provides a compact_conversation tool for manual compaction.

This middleware composes with a SummarizationMiddleware instance, reusing its summarization engine (model, backend, trigger thresholds) to let the agent compact its own context window.

This middleware never compacts automatically. Compaction only occurs when compact_conversation is called as a normal tool call (by the model or by an explicit user action, e.g. as implemented in the deepagents-cli).

To avoid compacting too early, compact tool execution is gated by _is_eligible_for_compaction, which requires reported usage to reach about 50% of the configured auto-summarization trigger.

The tool and auto-summarization share the same _summarization_event state key, so they interoperate correctly.

For a simpler setup, use create_summarization_tool_middleware which handles both steps.

LangChain Assistant

Menu

Attributes

Functions

Classes

Type Aliases

Usage

Storage