feat: add GenAI evaluation support following OpenTelemetry semantic c… by anirudha · Pull Request #1631 · strands-agents/sdk-python

anirudha · 2026-02-05T06:58:19Z

…onventions

Add EvaluationResult class for standardized evaluation data
Add EvaluationTracer for adding evaluation events to spans
Implement gen_ai.evaluation.result events following OTEL GenAI SemConv
Add convenience functions for common evaluations (relevance, hallucination, accuracy)
Include comprehensive test suite with 100% coverage
Add detailed documentation and usage examples
Support custom evaluators and metrics
Non-intrusive design that won't crash agents on evaluation failures

Follows OpenTelemetry GenAI Semantic Conventions PR #2563 specification exactly. Events are attached to spans being evaluated for easy querying and correlation.

Description

Related Issues

#1633

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…onventions - Add EvaluationResult class for standardized evaluation data - Add EvaluationTracer for adding evaluation events to spans - Implement gen_ai.evaluation.result events following OTEL GenAI SemConv - Add convenience functions for common evaluations (relevance, hallucination, accuracy) - Include comprehensive test suite with 100% coverage - Add detailed documentation and usage examples - Support custom evaluators and metrics - Non-intrusive design that won't crash agents on evaluation failures Follows OpenTelemetry GenAI Semantic Conventions PR #2563 specification exactly. Events are attached to spans being evaluated for easy querying and correlation.

…tic conventions - Implement EvaluationResult data class with OpenTelemetry attribute mapping - Add EvaluationTracer for adding gen_ai.evaluation.result events to spans - Follow OpenTelemetry GenAI Semantic Conventions PR #2563 exactly - Include comprehensive test suite with 100% coverage - Add simple usage example - Non-intrusive design that won't crash agents on evaluation failures This provides the foundation for GenAI evaluation in Strands with full OpenTelemetry compliance and tool interoperability.

github-actions bot added size/xl and removed size/xl labels Feb 5, 2026

anirudha force-pushed the feat/genai-evaluation-otel branch from 04b1494 to f459e66 Compare February 5, 2026 08:38

github-actions bot added size/l and removed size/xl labels Feb 5, 2026

anirudha closed this Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GenAI evaluation support following OpenTelemetry semantic c…#1631

feat: add GenAI evaluation support following OpenTelemetry semantic c…#1631
anirudha wants to merge 2 commits intostrands-agents:mainfrom
anirudha:feat/genai-evaluation-otel

anirudha commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anirudha commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Type of Change

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anirudha commented Feb 5, 2026 •

edited

Loading