Skip to content

feat: add GenAI evaluation support following OpenTelemetry semantic c…#1631

Closed
anirudha wants to merge 2 commits intostrands-agents:mainfrom
anirudha:feat/genai-evaluation-otel
Closed

feat: add GenAI evaluation support following OpenTelemetry semantic c…#1631
anirudha wants to merge 2 commits intostrands-agents:mainfrom
anirudha:feat/genai-evaluation-otel

Conversation

@anirudha
Copy link

@anirudha anirudha commented Feb 5, 2026

…onventions

  • Add EvaluationResult class for standardized evaluation data
  • Add EvaluationTracer for adding evaluation events to spans
  • Implement gen_ai.evaluation.result events following OTEL GenAI SemConv
  • Add convenience functions for common evaluations (relevance, hallucination, accuracy)
  • Include comprehensive test suite with 100% coverage
  • Add detailed documentation and usage examples
  • Support custom evaluators and metrics
  • Non-intrusive design that won't crash agents on evaluation failures

Follows OpenTelemetry GenAI Semantic Conventions PR #2563 specification exactly. Events are attached to spans being evaluated for easy querying and correlation.

Description

Related Issues

#1633

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…onventions

- Add EvaluationResult class for standardized evaluation data
- Add EvaluationTracer for adding evaluation events to spans
- Implement gen_ai.evaluation.result events following OTEL GenAI SemConv
- Add convenience functions for common evaluations (relevance, hallucination, accuracy)
- Include comprehensive test suite with 100% coverage
- Add detailed documentation and usage examples
- Support custom evaluators and metrics
- Non-intrusive design that won't crash agents on evaluation failures

Follows OpenTelemetry GenAI Semantic Conventions PR #2563 specification exactly.
Events are attached to spans being evaluated for easy querying and correlation.
@github-actions github-actions bot added size/xl and removed size/xl labels Feb 5, 2026
…tic conventions

- Implement EvaluationResult data class with OpenTelemetry attribute mapping
- Add EvaluationTracer for adding gen_ai.evaluation.result events to spans
- Follow OpenTelemetry GenAI Semantic Conventions PR #2563 exactly
- Include comprehensive test suite with 100% coverage
- Add simple usage example
- Non-intrusive design that won't crash agents on evaluation failures

This provides the foundation for GenAI evaluation in Strands with full
OpenTelemetry compliance and tool interoperability.
@anirudha anirudha force-pushed the feat/genai-evaluation-otel branch from 04b1494 to f459e66 Compare February 5, 2026 08:38
@github-actions github-actions bot added size/l and removed size/xl labels Feb 5, 2026
@anirudha anirudha closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant