Inference Pool Level Model Name Redirect and Traffic Splitting


This issue tracks the implementation of the proposal to re-introduce model name redirection and traffic splitting functionality at the inference pool level.

Original proposal doc can be found [here](https://docs.google.com/document/d/12yR_nAWM-Tg2ZmgGYX1h-dlUNi0AqYoACUjNElipl0M/edit?tab=t.0#heading=h.ndjipawyw195)

### Problem 
The deprecation of `InferenceModel` has removed the ability to perform model name aliasing/versioning and granular traffic splitting within an inference pool. This functionality is crucial for use cases like gradual rollouts of new LoRA adapters without requiring client-side changes.

### Proposed Solution
The proposal suggests introducing a new Custom Resource Definition (CRD) called `InferenceModelRewrite`. This CRD will contain the configuration for model redirection and traffic splitting.

The Endpoint Pool Proxy (EPP) will be responsible for:
  * Watching InferenceModelRewrite resources.
  * Parsing request bodies and modifying the model field based on the rewrite rules.
  * Handling weight-based traffic splitting.

The implementation will be done in two phases:
   * Phase 1: EPP-Driven Intra-Pool Rewrite: EPP will be enhanced to act as a read-only controller for the `InferenceModelRewrite` CRD, executing request body mutation and traffic splitting within a single InferencePool.
   * Phase 2 (Conditional): Promote Rewrite Logic to BBR: If necessary, the core rewrite/splitting logic can be moved into a shared library for both BBR and EPP, allowing BBR to make routing decisions after the model name has been rewritten.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference Pool Level Model Name Redirect and Traffic Splitting #1811

Problem

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference Pool Level Model Name Redirect and Traffic Splitting #1811

Description

Problem

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions