[eval] UI: evaluation metric dashboard

Part of #380
Depends on: #385

## What

Dashboard showing historical evaluation runs for a given scenario — with detail view, run comparison, and regression detection.

- List of past eval runs: timestamp, version, pass/fail, key metrics summary
- Run detail view: per-metric breakdown, inputs/outputs sample
- Comparison view: select two runs side-by-side, highlight metric deltas
- Regression indicator: flag runs where metrics regressed beyond a threshold vs prior run
- Data sourced from Scouter via OpsML proxy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[eval] UI: evaluation metric dashboard #387

What

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[eval] UI: evaluation metric dashboard #387

Description

What

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions