generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 187
Open
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What would you like to be added: The current Flow Control layer uses basic policies like FCFS and RoundRobin. This issue proposes researching and implementing more sophisticated scheduling and fairness policies specifically suited to the unique characteristics of LLM inference workloads.
Good starting points for investigation are:
- Intra-Flow Policy: Earliest Deadline First (EDF) TTFT or E22 could be a powerful alternative to FCFS for ordering requests within a single flow, especially for workloads with explicit latency SLOs.
- Inter-Flow Policy: Virtual Token Counting (VTC) or similar work-based fairness metrics could provide more equitable resource distribution between tenants than simple Round Robin, accounting for the variable cost of different requests.
Why is this needed: Standard policies like Round Robin or FCFS are not always optimal for LLM serving, where request cost can vary dramatically based on prompt size and output length. Implementing LLM-aware policies will allow the Flow Control layer to make smarter decisions about request ordering and resource allocation, leading to better SLO attainment and higher overall system efficiency.
Metadata
Metadata
Assignees
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.