Skip to content

Conversation

@rahulgurnani
Copy link
Contributor

@rahulgurnani rahulgurnani commented Oct 31, 2025

Add PrepareData and AdmitRequest plugins based on recommendations in evolving datalayer changes

Also, some refactor of director.go

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Addresses #1743

Does this PR introduce a user-facing change?:

Yes, enables writing prepare data and admit request plugins for users of IGW.


@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 31, 2025
@netlify
Copy link

netlify bot commented Oct 31, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit a7354ca
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/690a8475b3f9960008b1aabb
😎 Deploy Preview https://deploy-preview-1796--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 31, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rahulgurnani
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and elevran October 31, 2025 16:22
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 31, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @rahulgurnani. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 31, 2025
@kfswain
Copy link
Collaborator

kfswain commented Oct 31, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 31, 2025
loggerDebug := log.FromContext(ctx).V(logutil.DEBUG)
for _, plugin := range d.requestControlPlugins.admitRequestPlugins {
loggerDebug.Info("Running AdmitRequest plugin", "plugin", plugin.TypedName())
if !plugin.Admit(ctx, request, pods) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should allow the plugin to return a string explaining the reason for rejection. We can then just treat the empty string as the allow mechanism (less opinionated on this part tho).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

"sync"
)

// TODO(rahulgurnani): Deprecate this AttributeMap in favor of AttributeMap added in scheduling layer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to invert this. The attribute map is a component of the data system, and so we should refactor the data layer attribute map to fit our needs. The scheduling layer is informed by the data layer, so our structures should reflect that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

// It always returns the requestContext even in the error case, as the request context is used in error handling.
func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) {
logger := log.FromContext(ctx)
// getInferenceObjective creates inferenceObjective based on reqCtx.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment to reflect the fetch, and it only creates (with a default prio) should no infObjective be found

// Parse Request, Resolve Target Models, and Determine Parameters
// resolveTargetModel is a helper that resolves targetModel
// and updates the reqCtx and ctx.
func (d *Director) resolveTargetModel(reqCtx *handlers.RequestContext) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Principle of Least Surprise - Recommend returning the reqCtx here, generally we want to avoid modifying param structures without explicitly returning. And if we do need to modify the object in mem, our goDocs should specify that we do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need to modify the reqCtx, copying it may be an overkill. Also, the comments of the method state it updates reqCtx. Please let me know if its unclear presently. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention was NOT to copy, but to explicitly return the pointer.

Copy link
Contributor Author

@rahulgurnani rahulgurnani Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the function. Thanks!


// Attributes provides a goroutine-safe implementation of AttributeMap.
type Attributes struct {
data sync.Map // key: attribute name (string), value: attribute value (opaque, Cloneable)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically I am all for using prebuild libs to handle this type of complexity.

But since writes to specific attributes will lock the entire data object, we may have high lock contention here. Did we explore having a lock per attribute key?

That would let locks be at the granularity of a specific endpoint & a specific attribute, which should hopefully reduce lock contention & let our system be more performant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. This attribute map is a per request copy where we take a snapshot of the attributes so that we can use them in the scheduling layer. Given the per request nature of the map, I think it won't have contention because it will take like t < microsecond to update the map. I think its reasonable to use sync map here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we scale test to ensure we don't have any regression? We can consider baseline metrics what we have here: #1458

targetModelName: model,
},
{
name: "successful chat completions request with prepare data plugins",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test cast that functionally validates the intent of the plugin?

Currently we have a thin set of tests that basically just validate that we can successfully call a noop function.

snapshotOfCandidatePods := d.toSchedulerPodMetrics(candidatePods)

// Prepare per request data
d.runPrepareDataPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doc detailed adding timeouts and error limits to these prepare data plugins as to not become a blocker in the event that the plugins are performing poorly, allowing the system to remain resilient. Was this considered in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to do that in a follow up PR. Keeping it simpler in the initial change to ensure the SLO aware routing work is unblocked and can use this. Also I think it would be simpler to comprehend/debug if we don't have retires/timeouts. Added TODO for now. Hope that's okay. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it simpler in the initial change to ensure the SLO aware routing work is unblocked and can use this.

SLO aware routing is actually the use case I am concerned about, as it has known performance issues. In an effort to defend the systems current operations we need timeouts & retires to safely implement this functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Updated the methods to make prepare data execute parallely for all plugins with timeout and retries. PTAL. Thanks!

func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) {
logger := log.FromContext(ctx)
// getInferenceObjective fetches the inferenceObjective from the datastore otherwise creates a new one based on reqCtx.
func (d *Director) getInferenceObjective(logger logr.Logger, reqCtx *handlers.RequestContext) *v1alpha2.InferenceObjective {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove logger from the arguments?
This comes back to a discussion we had on a different PR. the convention we use in the project is contextual logging, we take logger from context and use it instead of passing around logger as arg.
e.g.,

log.FromContext(ctx).V(logutil.VERBOSE).Info(...)

@rahulgurnani rahulgurnani force-pushed the preparedata-changes branch 2 times, most recently from e9f0b7f to 3e16069 Compare November 3, 2025 03:05
// AttributeMap is used to store flexible metadata or traits
// across different aspects of an inference server.
// Stored values must be Cloneable. This is a per-request snapshot of the attributes.
type AttributeMap interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we duplicate this code from datalayer?

Comment on lines 63 to 66
type PrepareData interface {
plugins.Plugin
PrepareData(ctx context.Context, request *types.LLMRequest, pods []types.Pod)
}
Copy link
Contributor

@nirrozenbaum nirrozenbaum Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bit confused by this PR.

we've added recently two interfaces for ProducerPlugin and ConsumerPlugin.
ProducerPlugin is a plugin that produces data.
why do we add another interface for a plugin the produces data?
shouldn't PrepareData function be on the same ProducerPlugin interface and validate that the produced data types are matching the types declaration?

Copy link
Collaborator

@kfswain kfswain Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended reading: https://docs.google.com/document/d/1EQwXL2pCuUyM1B917FUgP_8pFS3VF8F_bUfjy8IE7gM/edit?tab=t.vmaefhinvkl5#heading=h.s9mr6kynb3ls

But the TL;DR is

Consume/Produce are expressions of intent only called on startup.

PrepareData is the actual generation of request specific data

Copy link
Contributor

@nirrozenbaum nirrozenbaum Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.
would it be correct to say that in order for a plugin to implement PrepareData it should always declare which data it would produce? meaning these two should always go together in the same interface?

if that’s not the case, can you give an example where one is used but the other isn’t?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, some data producers will still specify the data they produce, but not generate request specific data, and so do not need to implement PrepareData()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So while its true that an implementer of PrepareData() will always implement Produces(). Not all plugins that implement Produces() will implement PrepareData(). The Metrics scrapers are a great example of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. metrics collectors only produce, and not prepare per request.

an implementer of PrepareData() will always implement Produces()

I think the above should be forced, for example through the interface:

type PrepareData interface {
	plugins.ProducerPlugin //  < THIS IS THE CHANGE
	PrepareData(ctx context.Context, request *types.LLMRequest, pods []types.Pod)
}

as a side note - I think the naming selection is critical for readability.
PrepareData is too generic and doesn't say anything about a "per request".
it might be worth calling the function - PrepareRequestData (and rename the interface, e.g., DataProducer or something like that).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all sounds reasonable to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks for the suggestions!


// AdmitRequest is called by the director after the PrepareData plugins and before scheduling.
// AdmitRequest plugin is implemented by plugins for admission control. These plugins need to implement Admit method.
type AdmitRequest interface {
Copy link
Contributor

@nirrozenbaum nirrozenbaum Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming selection should be changed IMO.
more specifically, struct/interface names should be nouns, e.g., AdmissionPlugin, while function names should be verbs based, e.g., AdmitRequest.
additionally, the documentation should explain what happens when one is using multiple admission plugins (e.g., the request should pass all admission checks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

return reqCtx, errutil.Error{Code: errutil.ServiceUnavailable, Msg: "failed to find candidate pods for serving the request"}
}

// TODO(rahulgurnani/lukevandrie): Perhaps, refactor/implement Admit plugin for Admission control.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is important. what is the relation between admission controller Admit and the admitRequest plugins?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

This would be a break in contract of how flow control operates. This specific plugin is for request specific semantics. We have currently do not have Flow Control considering request specific semantics, and there hasnt been a proposal suggesting this change. I think we should remove this todo until we have strong reasoning to actually do this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my apologize, but I don't understand the intention of this new Admission plugin.
I thought we want to have admission check pluggable, but it seems now that we have two types of admission checks, with two different interfaces.
this seems wrong.

Copy link
Contributor Author

@rahulgurnani rahulgurnani Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the comment. Thanks for the catch!

result, err := d.scheduler.Schedule(ctx, reqCtx.SchedulingRequest, d.toSchedulerPodMetrics(candidatePods))
// Prepare per request data
// TODO(rahulgurnani): Add retries and timeout in the preparedata step.
d.runPrepareDataPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods)
Copy link
Contributor

@nirrozenbaum nirrozenbaum Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we create snapshot of candidate pods?
we should work with candidatePods and create a snapshot only when calling the scheduler.
this is true for both prep data and admit request.
helper functions in the director should not rely on the internal scheduler representation of the endpoints.

Copy link
Collaborator

@kfswain kfswain Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both prep data and admit request are request specific, and so if we add request specific data to the shared endpoints that could risk data corruption.

Snapshotting before these steps ensures that this data lifecycle is only in the context it is consumed in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intention of converting the endpoints representation to scheduler internal structure was only for the purpose of sending it to the scheduler.
PodMetrics has MetricsState behind atomic pointer and reading the metrics is an atomic operation (read all metrics in one operation).
I must be missing something although I've read the proposal doc.


// Config provides a configuration for the requestcontrol plugins.
type Config struct {
admitRequestPlugins []AdmissionPlugin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
admitRequestPlugins []AdmissionPlugin
admissionPlugins []AdmissionPlugin

Comment on lines +79 to +83
// WithAdmitRequestPlugins sets the given plugins as the AdmitRequest plugins.
func (c *Config) WithAdmitRequestPlugins(plugins ...AdmissionPlugin) *Config {
c.admitRequestPlugins = plugins
return c
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// WithAdmitRequestPlugins sets the given plugins as the AdmitRequest plugins.
func (c *Config) WithAdmitRequestPlugins(plugins ...AdmissionPlugin) *Config {
c.admitRequestPlugins = plugins
return c
}
// WithAdmissionPlugins sets the given plugins as the AdmissionPlugin plugins.
func (c *Config) WithAdmissionPlugins(plugins ...AdmissionPlugin) *Config {
c.admissionPlugins = plugins
return c
}

return c
}

// WithPrepareDataPlugins sets the given plugins as the PrepareData plugins.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// WithPrepareDataPlugins sets the given plugins as the PrepareData plugins.
// WithDataProducers sets the given plugins as the DataProducer plugins.


result, err := d.scheduler.Schedule(ctx, reqCtx.SchedulingRequest, d.toSchedulerPodMetrics(candidatePods))
// Run admit request plugins
if !d.runAdmitRequestPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods) {
Copy link
Contributor

@nirrozenbaum nirrozenbaum Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example of how AdmissionPlugin is using candidate pods?
IMO admission check should include only the request (including its metadata like headers) and the system state (like saturation or other system metrics).

@kfswain I really think we should think carefully on the interfaces and align on them.
it doesn't make sense to me to pass the candidate pods to admission check.

I would be happy to get an example for an admission check that is per request data and depends on the candidate pods (other than saturation that is checked separately).
I'm a bit concerned about over complication here, and unless there is a real use case for that we should probably wait with this addition (talking only about admission, data producer/consumer is clearly needed).

separately, I think there is a missing struct, like RequestState, that conceptually should be similar to PluginState we have today. RequestState should be created at the beginning of the request, and passed around for RequestDataProducers to be filled, and later on to ConsumerPlugins (probably instead of CycleState).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly following recommendation in https://docs.google.com/document/d/1EQwXL2pCuUyM1B917FUgP_8pFS3VF8F_bUfjy8IE7gM/edit?tab=t.vmaefhinvkl5#heading=h.5qzcvbilfn6d

I think we would be adding the first AdmitRequest plugin with latency predictor.

@kfswain please keep me honest here. Thanks!

@rahulgurnani rahulgurnani changed the title Add PrepareData and AdmitRequest plugins Add DataProducer (PrepareData) and Admission control plugins Nov 4, 2025
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Nov 4, 2025

@rahulgurnani: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-inference-extension-verify-main a7354ca link true /test pull-gateway-api-inference-extension-verify-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants