MCP spec should address tool schema token overhead (~1000 tokens/tool consumed per session)

# MCP spec should address tool schema token overhead (~1000 tokens/tool consumed per session)

**Date:** 2026-05-28  
**Reporter:** [@luwei-will](https://github.com/luwei-will)  
**Project:** context-mode (Claude Code plugin)  
**Scope:** 11 MCP tools — token footprint per tool definition  

---

## 1. Executive Summary

MCP tool definitions consume **5–15× more tokens than the simplest possible schema for the same tool** (type-only, no descriptions). In a typical Claude Code session with 20–30 registered MCP tools, the tool schema alone occupies **15–30 KB of context window before a single user message is sent**. This is a structural inefficiency in the MCP protocol: context window capacity is consumed even when billing is amortized via prompt caching.

This issue documents measured token overhead of 11 production MCP tools, identifies the root cause, and proposes three protocol-level mitigations.

---

## 2. Measured Data

| Tool | Tokens | Relative to `ctx_stats` |
|------|--------|------------------------|
| `ctx_batch_execute` | 1,024 | 9.9× |
| `ctx_execute` | 1,024 | 9.9× |
| `ctx_fetch_and_index` | 972 | 9.4× |
| `ctx_execute_file` | 822 | 8.0× |
| `ctx_index` | 858 | 8.3× |
| `ctx_search` | 785 | 7.6× |
| `ctx_purge` | 646 | 6.3× |
| `ctx_insight` | 339 | 3.3× |
| `ctx_upgrade` | 127 | 1.2× |
| `ctx_doctor` | 107 | 1.0× (baseline) |
| `ctx_stats` | 103 | 1.0× |

**Methodology:** Token counts measured via Anthropic's token counting API (`/v1/messages/count_tokens`) with each tool definition serialized as its full JSON Schema including descriptions. Counts represent input tokens consumed per tool per session entry.

**Key observation:** Heavy tools (`ctx_batch_execute`, `ctx_execute`, `ctx_fetch_and_index`) each cost **~1,000 tokens** to define. Light tools (`ctx_stats`, `ctx_doctor`) cost **~100 tokens**. The 10× delta is entirely in JSON Schema size — parameter descriptions, type definitions, and nested object structures.

---

## 3. Root Cause Analysis

### 3.1 Schema Bloat

MCP requires every tool to expose a full JSON Schema. For `ctx_batch_execute`, the schema includes:

- `commands`: array of objects, each with `label` (string) and `command` (string)
- `queries`: array of strings
- `concurrency`: integer with range constraints
- `timeout`: integer

The **field descriptions** (required by MCP for LLM reasoning) add ~400 tokens. The **type definitions** add ~300. The **nested object structure** for `commands` adds ~300 more. Total: ~1,000 tokens for one tool that a human could describe in two sentences.

### 3.2 First-Turn Tax and Cache Fragility

Tool schemas are injected into the system prompt prefix. While prompt caching avoids re-billing cached prefixes on subsequent turns, the overhead is real in these scenarios:

1. **First turn of every conversation** — full cost paid, no cache
2. **Cache eviction** — any tool added, removed, or updated invalidates the entire prefix, forcing a cold re-injection
3. **Context window capacity** — cached or not, 10,000 tokens of schemas occupy attention slots, reducing the model's effective working memory for reasoning
4. **Short conversations** — cache amortization fails when sessions are fewer than ~5 turns, which is common for quick lookups

With 20 tools averaging 500 tokens each, **10,000 tokens of context window are consumed by schemas** regardless of billing amortization.

---

## 4. Impact on Real Workloads

From production usage (context-mode v1.0.151, 22 days, 2,600 conversations):

- **Per-conversation schema cost (first turn, no cache):** ~10,000 tokens × $15/1M (Opus input) = $0.15/conversation
- **Across 2,600 conversations:** ~$390 in first-turn schema cost
- **With 75% prompt cache hit rate:** effective cost ~$0.04/conversation ≈ ~$100
- **Context window opportunity cost:** 10K tokens is approximately 5 pages of reasoning capacity permanently unavailable to the model per session

The billing cost is moderate. The **reasoning capacity cost** is the primary concern: a model reasoning over a complex task with 20 MCP tools registered has 10,000 fewer tokens available for actual work.

---

## 5. Proposed Mitigations

### 5.1 Tiered Schema Detail (Short-Term)

Allow MCP servers to register tools at two detail levels:

- **Discovery tier:** Name + one-line description + parameter names/types only (no field docs)
- **Invocation tier:** Full schema with descriptions, injected on-demand when the model selects the tool

The model sees the discovery tier in the system prompt. When it decides to call a tool, the runtime injects the invocation-tier schema for that specific tool into the next turn.

```json
{
  "name": "ctx_batch_execute",
  "description": "Run commands in parallel, auto-index output, return matching sections.",
  "discovery_schema": {
    "type": "object",
    "properties": {
      "commands": {"type": "array"},
      "queries": {"type": "array"},
      "concurrency": {"type": "integer"},
      "timeout": {"type": "integer"}
    }
  },
  "invocation_schema": "<full schema with descriptions>"
}
```

**Estimated saving:** 60–70% of per-session schema tokens.  
**Tradeoff:** Slightly reduced tool selection accuracy for rarely-used tools with non-obvious names.

### 5.2 Explicit Schema Versioning (Medium-Term)

Add a `schema_version` field to MCP tool registrations. The host runtime tracks versions and only re-sends schemas to the model when the version changes. This makes the caching contract explicit rather than relying on implicit prompt-prefix matching, and enables delta updates (add/remove a single tool without invalidating the entire prefix).

```json
{
  "name": "ctx_batch_execute",
  "schema_version": "2.1.0",
  "schema": { ... }
}
```

**Estimated saving:** Prevents full prefix invalidation on incremental tool changes.

### 5.3 Tool Namespacing (Long-Term)

Allow tools to be grouped under namespaces with shared schema prefixes. Instead of 20 top-level tools, expose 3–5 namespaces with 4–6 tools each. The namespace schema (shared parameter patterns, shared description vocabulary) is sent once; individual tools inherit and override.

**Estimated saving:** 30–40% of schema tokens via deduplication of repeated patterns.

---

## 6. Reproduction

```bash
# Measure token cost of a single tool definition
curl -s https://api.anthropic.com/v1/messages/count_tokens \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-5",
    "tools": [<paste full MCP tool JSON schema here>],
    "messages": [{"role": "user", "content": "hi"}]
  }'

# Compare count with vs. without tools to isolate schema overhead
```

---

## 7. Request

The MCP protocol is well-designed for capability discovery, but **the cost model assumes schemas are cheap**. They are not. Every token of schema is a token of context window — and context window capacity directly bounds model reasoning quality and session length.

**Requested additions to the MCP specification:**

1. A **"schema efficiency" recommendation** section advising a maximum schema size per tool (suggested: 300 tokens for discovery, 1,000 for invocation)
2. A **tiered schema detail** mechanism (discovery vs. invocation tiers)
3. A **`schema_version` field** in tool registration for explicit cache invalidation

This would benefit all MCP implementations and hosts, not just context-mode.

---

*Data from production usage of context-mode v1.0.151 over 22 days, 2,600 conversations.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP spec should address tool schema token overhead (~1000 tokens/tool consumed per session) #2808

MCP spec should address tool schema token overhead (~1000 tokens/tool consumed per session)

1. Executive Summary

2. Measured Data

3. Root Cause Analysis

3.1 Schema Bloat

3.2 First-Turn Tax and Cache Fragility

4. Impact on Real Workloads

5. Proposed Mitigations

5.1 Tiered Schema Detail (Short-Term)

5.2 Explicit Schema Versioning (Medium-Term)

5.3 Tool Namespacing (Long-Term)

6. Reproduction

7. Request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tool	Tokens	Relative to `ctx_stats`
`ctx_batch_execute`	1,024	9.9×
`ctx_execute`	1,024	9.9×
`ctx_fetch_and_index`	972	9.4×
`ctx_execute_file`	822	8.0×
`ctx_index`	858	8.3×
`ctx_search`	785	7.6×
`ctx_purge`	646	6.3×
`ctx_insight`	339	3.3×
`ctx_upgrade`	127	1.2×
`ctx_doctor`	107	1.0× (baseline)
`ctx_stats`	103	1.0×

MCP spec should address tool schema token overhead (~1000 tokens/tool consumed per session) #2808

Description

MCP spec should address tool schema token overhead (~1000 tokens/tool consumed per session)

1. Executive Summary

2. Measured Data

3. Root Cause Analysis

3.1 Schema Bloat

3.2 First-Turn Tax and Cache Fragility

4. Impact on Real Workloads

5. Proposed Mitigations

5.1 Tiered Schema Detail (Short-Term)

5.2 Explicit Schema Versioning (Medium-Term)

5.3 Tool Namespacing (Long-Term)

6. Reproduction

7. Request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions