Skip to content

[eval] UI: evaluation metric dashboard #387

@thorrester

Description

@thorrester

Part of #380
Depends on: #385

What

Dashboard showing historical evaluation runs for a given scenario — with detail view, run comparison, and regression detection.

  • List of past eval runs: timestamp, version, pass/fail, key metrics summary
  • Run detail view: per-metric breakdown, inputs/outputs sample
  • Comparison view: select two runs side-by-side, highlight metric deltas
  • Regression indicator: flag runs where metrics regressed beyond a threshold vs prior run
  • Data sourced from Scouter via OpsML proxy

Metadata

Metadata

Assignees

No one assigned

    Labels

    UIRequires UI workenhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions