Skip to content

[Serve] emit replica utilization metric #60755

@abrarsheikh

Description

@abrarsheikh

Measuring replica utilization is currently a bit finicky due to noise in GPU utilization metrics. add a replica-level metric measuring time spent running user code, which should be a more robust signal and help isolate framework vs user-code performance issues.

In a tumbling 10 min window : total time spent executing user code across all requests / (10 * max_ongoing_requests)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions