SEP-1913: Trust and Sensitivity Annotations by SamMorrowDrums · Pull Request #1913 · modelcontextprotocol/modelcontextprotocol

SamMorrowDrums · 2025-11-27T17:41:31Z

SEP: Trust and Sensitivity Annotations

Summary

This SEP proposes trust and sensitivity annotations for MCP requests and responses, enabling clients and servers to track, propagate, and enforce trust boundaries on data as it flows through tool invocations.

Motivation

As MCP adoption grows, data flows across tool boundaries without standardized trust metadata. This creates security gaps:

Indirect Prompt Injection: Data from untrusted sources enters context without markers
Data Exfiltration: Sensitive information can be passed to external destinations without policy enforcement
Cross-Organization Boundaries: No way to mark internal vs. external data

Key Features

Annotations

sensitiveHint: Granular sensitivity levels (low, medium, high)
privateHint: Marks internal/private data
openWorldHint: Indicates untrusted/external data sources
maliciousActivityHint: Signals detected suspicious patterns
attribution: Provenance tracking for audit trails

Propagation Rules

Sensitivity escalates (never decreases) within an agent session
Boolean hints use union semantics (once true, stays true)
Attribution accumulates across context boundaries

Integration Points

Tool Resolution (#1862): Pre-execution annotation refinement
Per-item annotations: Support for mixed results (e.g., search results with varying sensitivity)
Defense-in-depth: Complements tool-level annotations (SEP-1487: Addition of trustedHint Tool Annotation #1487, SEP-1560: Addition of secretHint Tool Annotation #1560, SEP-1561: Addition of unsafeOutputHint Tool Annotation #1561)

Related Work

Builds on discussion in [SPEC] Annotations for MCP Requests and Responses (security/privacy) #711
Informed by academic research (Design Patterns for Securing LLM Agents, FIDES, ShardGuard)
Industry input from Microsoft Azure MCP team

Open Questions

Label namespaces for organization-specific classifications
Declassification mechanisms
Cross-server annotation sharing

Closes #711

/cc @dend (sponsor)

realArcherL · 2025-12-11T11:54:51Z

seps/DRAFT-trust-annotations.md

+    Note over Web MCP: Detects prompt injection<br/>in page content
+    Web MCP-->>Client: Result (maliciousActivityHint: true,<br/>openWorldHint: true)
+
+    Client->>User: ⚠️ Warning: Potential malicious content detected


Nit: Should we call them MCP Server (FILE) and MCP Server (HTTP)

Although, it is kind of implied hence nit.

realArcherL · 2025-12-11T11:59:32Z

seps/DRAFT-trust-annotations.md

+    User->>Client: "Summarize this webpage"
+    Client->>Web MCP: tools/call (fetch URL)
+
+    Note over Web MCP: Detects prompt injection<br/>in page content


Should we also highlight that this is best opportunity for servers to apply any preventative measures against indirect prompt injection (ex: Spotlighting, Prompt Sandwich etc)?

For example: Server applies Spotlighting and marks the data along with additional instruction. reference

OR do we want clients to deal with it, since the real attack of prompt injection(s) begin with LLMs?

I haven't read it fully, but it seems like this is just a notification mechanism, and this should be, maybe, a new field inside the schema for suggestion mitigation if the server wants to do it.

SamMorrowDrums · 2026-01-22T22:03:05Z

@localden, @rreichel3 (Open AI) is seeking to co-author this SEP as they see significant value for MCP Apps, and want to ensure that it does what they need, especially with respect to consequences of tool calls (such as being irreversible), would you be happy to also take a look at Robert's PR?

Open AI are in a unique position to require adoption of certain spec features for inclusion in their app store, which I think would be a boost
I also think Robert's ideas are cool and showcase the potential for this proposal.

He's going to get Nick Cooper to take a look also.

SamMorrowDrums · 2026-01-28T00:19:13Z

@localden @nickcoai I merged @rreichel3's PR so now have co-author.

Introduces trust and sensitivity annotations for MCP requests and responses, enabling clients and servers to track, propagate, and enforce trust boundaries on data as it flows through tool invocations. Key features: - Result annotations: sensitiveHint, privateHint, openWorldHint, maliciousActivityHint, attribution - Request annotations for propagating trust context - Propagation rules ensuring sensitivity markers persist across agent sessions - Integration with Tool Resolution (modelcontextprotocol#1862) for pre-execution annotations - Per-item annotations for mixed results (e.g., search results) - Defense-in-depth approach complementing tool-level annotations Closes modelcontextprotocol#711

… type - Extend existing ToolAnnotations with trust fields (privateHint, sensitiveHint, etc.) - Leverage existing openWorldHint with refined meaning per context - Remove per-item annotations (response-level aggregation only) - Remove _meta nesting - trust annotations live in flat annotations field - Add Alternative 1 explaining why separate type was rejected - Update Tool Resolution integration to use flat annotations

Co-authored-by: Sam Morrow <sammorrowdrums@github.com>

- Rename DRAFT-trust-annotations.md to 1913-trust-and-sensitivity-annotations.md - Update header to match SEP-1850 template format (dash-prefixed list) - Add full PR URL - Move issue reference to note below header - Regenerate SEP documentation for docs site

Agent-Hellboy · 2026-02-04T11:44:56Z

docs/community/seps/1913-trust-and-sensitivity-annotations.mdx

+- **User consent** cannot be meaningfully enforced without knowing a tool's real-world impact.
+- **Distrust by default** leads to confirmation fatigue and bad user experience.
+
+Action security metadata provides a declarative contract that describes where inputs go, where outputs originate, and what outcomes the tool can cause. This complements trust annotations, which track data characteristics in transit.


Action security metadata provides a declarative contract that describes where inputs go, where outputs originate, and what outcomes the tool can cause. This complements trust annotations, which track data characteristics in transit.

Just for my understanding. Suppose my mcp is hosted inside a cluster as a pod and it needs egress to my internal service or maybe external, why do I enforce the security rule for data flow inside code running in that pod(I mean at protocol level), shouldn't I do it at infra(egress) level?

where inputs go, where outputs originate

I mean, shouldn't it be controlled at the infra level, not the protocol level? Since LLM clients are not deterministic, shouldn't we enforce security rules deterministically?

Annotations are handled by clients, not LLMs themselves, so deterministic policy enforcement is exactly the sort of thing this could enable.

Agent-Hellboy · 2026-02-04T11:45:14Z

docs/community/seps/1913-trust-and-sensitivity-annotations.mdx

+
+Indicates the origin of returned data.
+
+- **untrustedPublic** — Public but unverified sources.


are enterpise setup allowing untrustedPublic? There must have been a check at the egress controller , whatever the company is using.

connor4312 · 2026-02-04T18:59:34Z

maliciousActivityHint I have some concerns about this:
- This is returned in tools/resolve which happens, in theory, before the actual tool execution happens. If I have a fetch_webpage tool, a server won't know if the response is potentially malicious before actually doing the fetch. It could in theory pre-fetch and cache the result, but that requires statefulness and also breaks the notion that "Resolution requests should complete in milliseconds" from SEP-1862
- As a client this is not maximally useful to present warnings to users. Tool results size is unbounded. In my vision of strong injection/malicious detection in VS Code, we would use a model and highlight portions of the tool result which were flagged as concerning for potential manual review. The boolean hint just says something is wrong, without letting me give any better UX to users.
- Generally speaking from the view of a client, I'm not going to trust the implementation of malicious content detection of random MCP servers. We will, at some point, do something in-product for this in VS Code. That will be tested, benchmarked, and controlled by user preference and organization policy. I might use maliciousActivityHint as a hint to give more or less scrutiny to content an MCP server returns, but nothing more.
Same tools/resolve concern for other hints. I think these would better belong on the Annotations which are associated with each ContentBlock in the result. That would also let you naturally be able to give ranges to which given annotations apply (byte offsets or code points, depending on the content type)
InputMetadata/ReturnMetadata seem okay. I would not that unlike maliciousActivityHint, I would be able to trust these as a client. The server is an authorized entity of whatever service it's representing, e.g. emails, and so I'm okay using its categorization of sources/destinations/outcomes. I think these metadata are generally fine but I am not an expert in the regulartory/data classification area.
RequestAnnotations.attribution -- as a client I don't think I can represent this very well. It can both be too comprehensive and also incomplete:
- I don't know which resources the model synthesized into a given tool call, so I would have to present every resource/annotation I encountered in the conversation, which does not seem useful.
- I don't know every resource and data encountered in a conversation. E.g. a model can use a terminal tool my client doesn't specifically recognize and that could pull in data from any number of unknown sources. Or to give another example, a previous agent session may have generated a file as intermediate content derived from any number of sources, and a new session that pulls it in would see 'just a file.'

SamMorrowDrums mentioned this pull request Nov 27, 2025

[SPEC] Annotations for MCP Requests and Responses (security/privacy) #711

Open

dsp-ant changed the title ~~SEP: Trust and Sensitivity Annotations~~ SEP-1913: Trust and Sensitivity Annotations Dec 3, 2025

localden added security SEP labels Dec 4, 2025

localden self-assigned this Dec 4, 2025

realArcherL reviewed Dec 11, 2025

View reviewed changes

SamMorrowDrums mentioned this pull request Jan 15, 2026

SEP-2091: Server Capability Signatures #2091

Closed

localden added this to SEP Review Pipeline Jan 21, 2026

localden added the draft SEP proposal with a sponsor. label Jan 21, 2026

SamMorrowDrums marked this pull request as ready for review January 28, 2026 20:33

SamMorrowDrums force-pushed the sep-trust-annotations branch from 936c53b to d255f08 Compare January 29, 2026 13:54

SamMorrowDrums requested a review from a team as a code owner January 29, 2026 13:54

SamMorrowDrums force-pushed the sep-trust-annotations branch from d255f08 to 271fcba Compare January 29, 2026 14:32

SamMorrowDrums and others added 11 commits January 30, 2026 16:01

resolve prettier issues

ac4ab9e

Formalize SEP: add PR number and rename file

2e310c4

Delete seps/1862-dynamic-tool-annotations.md

830c185

Remove personal quote, format with prettier

f1ce2bb

MOdified trust annotations to merge in my SEP

0a56c85

Update seps/DRAFT-trust-annotations.md

0b37ff0

Co-authored-by: Sam Morrow <sammorrowdrums@github.com>

Update seps/DRAFT-trust-annotations.md

8f2d478

Co-authored-by: Sam Morrow <sammorrowdrums@github.com>

Format markdown with prettier

3ec2179

SamMorrowDrums force-pushed the sep-trust-annotations branch from 271fcba to f46d45e Compare January 30, 2026 15:01

Agent-Hellboy reviewed Feb 4, 2026

View reviewed changes

Agent-Hellboy mentioned this pull request Feb 4, 2026

SEP-1763: Interceptors for Model Context Protocol #1763

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEP-1913: Trust and Sensitivity Annotations#1913

SEP-1913: Trust and Sensitivity Annotations#1913
SamMorrowDrums wants to merge 11 commits intomodelcontextprotocol:mainfrom
SamMorrowDrums:sep-trust-annotations

SamMorrowDrums commented Nov 27, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

realArcherL Dec 11, 2025

Uh oh!

realArcherL Dec 11, 2025 •

edited

Loading

Uh oh!

Agent-Hellboy Feb 4, 2026

Uh oh!

SamMorrowDrums commented Jan 22, 2026 •

edited

Loading

Uh oh!

SamMorrowDrums commented Jan 28, 2026

Uh oh!

Agent-Hellboy Feb 4, 2026

Uh oh!

SamMorrowDrums Feb 4, 2026

Uh oh!

Agent-Hellboy Feb 4, 2026

Uh oh!

connor4312 commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants


		Indicates the origin of returned data.

		- untrustedPublic — Public but unverified sources.

Conversation

SamMorrowDrums commented Nov 27, 2025

SEP: Trust and Sensitivity Annotations

Summary

Motivation

Key Features

Annotations

Propagation Rules

Integration Points

Related Work

Open Questions

Uh oh!

This comment was marked as resolved.

Uh oh!

realArcherL Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

realArcherL Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Agent-Hellboy Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

SamMorrowDrums commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamMorrowDrums commented Jan 28, 2026

Uh oh!

Agent-Hellboy Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

SamMorrowDrums Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Agent-Hellboy Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

connor4312 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

realArcherL Dec 11, 2025 •

edited

Loading

SamMorrowDrums commented Jan 22, 2026 •

edited

Loading

connor4312 commented Feb 4, 2026 •

edited

Loading