-
Notifications
You must be signed in to change notification settings - Fork 1.2k
SEP-1913: Trust and Sensitivity Annotations #1913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
SEP-1913: Trust and Sensitivity Annotations #1913
Conversation
Introduces trust and sensitivity annotations for MCP requests and responses, enabling clients and servers to track, propagate, and enforce trust boundaries on data as it flows through tool invocations. Key features: - Result annotations: sensitiveHint, privateHint, openWorldHint, maliciousActivityHint, attribution - Request annotations for propagating trust context - Propagation rules ensuring sensitivity markers persist across agent sessions - Integration with Tool Resolution (modelcontextprotocol#1862) for pre-execution annotations - Per-item annotations for mixed results (e.g., search results) - Defense-in-depth approach complementing tool-level annotations Closes modelcontextprotocol#711
… type - Extend existing ToolAnnotations with trust fields (privateHint, sensitiveHint, etc.) - Leverage existing openWorldHint with refined meaning per context - Remove per-item annotations (response-level aggregation only) - Remove _meta nesting - trust annotations live in flat annotations field - Add Alternative 1 explaining why separate type was rejected - Update Tool Resolution integration to use flat annotations
| Note over Web MCP: Detects prompt injection<br/>in page content | ||
| Web MCP-->>Client: Result (maliciousActivityHint: true,<br/>openWorldHint: true) | ||
|
|
||
| Client->>User: ⚠️ Warning: Potential malicious content detected |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we call them MCP Server (FILE) and MCP Server (HTTP)
Although, it is kind of implied hence nit.
| User->>Client: "Summarize this webpage" | ||
| Client->>Web MCP: tools/call (fetch URL) | ||
|
|
||
| Note over Web MCP: Detects prompt injection<br/>in page content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also highlight that this is best opportunity for servers to apply any preventative measures against indirect prompt injection (ex: Spotlighting, Prompt Sandwich etc)?
For example: Server applies Spotlighting and marks the data along with additional instruction. reference
OR do we want clients to deal with it, since the real attack of prompt injection(s) begin with LLMs?
SEP: Trust and Sensitivity Annotations
Summary
This SEP proposes trust and sensitivity annotations for MCP requests and responses, enabling clients and servers to track, propagate, and enforce trust boundaries on data as it flows through tool invocations.
Motivation
As MCP adoption grows, data flows across tool boundaries without standardized trust metadata. This creates security gaps:
Key Features
Annotations
sensitiveHint: Granular sensitivity levels (low,medium,high)privateHint: Marks internal/private dataopenWorldHint: Indicates untrusted/external data sourcesmaliciousActivityHint: Signals detected suspicious patternsattribution: Provenance tracking for audit trailsPropagation Rules
Integration Points
trustedHintTool Annotation #1487, SEP-1560: Addition of secretHint Tool Annotation #1560, SEP-1561: Addition of unsafeOutputHint Tool Annotation #1561)Related Work
Open Questions
Closes #711
/cc @dend (sponsor)