Skip to content

[Flow Control] Research and Implement Advanced Fairness/Scheduling Policies #1797

@LukeAVanDrie

Description

@LukeAVanDrie

What would you like to be added: The current Flow Control layer uses basic policies like FCFS and RoundRobin. This issue proposes researching and implementing more sophisticated scheduling and fairness policies specifically suited to the unique characteristics of LLM inference workloads.

Good starting points for investigation are:

  • Intra-Flow Policy: Earliest Deadline First (EDF) TTFT or E22 could be a powerful alternative to FCFS for ordering requests within a single flow, especially for workloads with explicit latency SLOs.
  • Inter-Flow Policy: Virtual Token Counting (VTC) or similar work-based fairness metrics could provide more equitable resource distribution between tenants than simple Round Robin, accounting for the variable cost of different requests.

Why is this needed: Standard policies like Round Robin or FCFS are not always optimal for LLM serving, where request cost can vary dramatically based on prompt size and output length. Implementing LLM-aware policies will allow the Flow Control layer to make smarter decisions about request ordering and resource allocation, leading to better SLO attainment and higher overall system efficiency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions