Measuring replica utilization is currently a bit finicky due to noise in GPU utilization metrics. add a replica-level metric measuring time spent running user code, which should be a more robust signal and help isolate framework vs user-code performance issues.
In a tumbling 10 min window : total time spent executing user code across all requests / (10 * max_ongoing_requests)