-
Notifications
You must be signed in to change notification settings - Fork 187
Add DataProducer (PrepareData) and Admission control plugins #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add DataProducer (PrepareData) and Admission control plugins #1796
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rahulgurnani The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @rahulgurnani. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
pkg/epp/requestcontrol/director.go
Outdated
| loggerDebug := log.FromContext(ctx).V(logutil.DEBUG) | ||
| for _, plugin := range d.requestControlPlugins.admitRequestPlugins { | ||
| loggerDebug.Info("Running AdmitRequest plugin", "plugin", plugin.TypedName()) | ||
| if !plugin.Admit(ctx, request, pods) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should allow the plugin to return a string explaining the reason for rejection. We can then just treat the empty string as the allow mechanism (less opinionated on this part tho).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
| "sync" | ||
| ) | ||
|
|
||
| // TODO(rahulgurnani): Deprecate this AttributeMap in favor of AttributeMap added in scheduling layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to invert this. The attribute map is a component of the data system, and so we should refactor the data layer attribute map to fit our needs. The scheduling layer is informed by the data layer, so our structures should reflect that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
pkg/epp/requestcontrol/director.go
Outdated
| // It always returns the requestContext even in the error case, as the request context is used in error handling. | ||
| func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) { | ||
| logger := log.FromContext(ctx) | ||
| // getInferenceObjective creates inferenceObjective based on reqCtx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment to reflect the fetch, and it only creates (with a default prio) should no infObjective be found
pkg/epp/requestcontrol/director.go
Outdated
| // Parse Request, Resolve Target Models, and Determine Parameters | ||
| // resolveTargetModel is a helper that resolves targetModel | ||
| // and updates the reqCtx and ctx. | ||
| func (d *Director) resolveTargetModel(reqCtx *handlers.RequestContext) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Principle of Least Surprise - Recommend returning the reqCtx here, generally we want to avoid modifying param structures without explicitly returning. And if we do need to modify the object in mem, our goDocs should specify that we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do need to modify the reqCtx, copying it may be an overkill. Also, the comments of the method state it updates reqCtx. Please let me know if its unclear presently. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention was NOT to copy, but to explicitly return the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the function. Thanks!
|
|
||
| // Attributes provides a goroutine-safe implementation of AttributeMap. | ||
| type Attributes struct { | ||
| data sync.Map // key: attribute name (string), value: attribute value (opaque, Cloneable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically I am all for using prebuild libs to handle this type of complexity.
But since writes to specific attributes will lock the entire data object, we may have high lock contention here. Did we explore having a lock per attribute key?
That would let locks be at the granularity of a specific endpoint & a specific attribute, which should hopefully reduce lock contention & let our system be more performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. This attribute map is a per request copy where we take a snapshot of the attributes so that we can use them in the scheduling layer. Given the per request nature of the map, I think it won't have contention because it will take like t < microsecond to update the map. I think its reasonable to use sync map here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we scale test to ensure we don't have any regression? We can consider baseline metrics what we have here: #1458
| targetModelName: model, | ||
| }, | ||
| { | ||
| name: "successful chat completions request with prepare data plugins", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test cast that functionally validates the intent of the plugin?
Currently we have a thin set of tests that basically just validate that we can successfully call a noop function.
| snapshotOfCandidatePods := d.toSchedulerPodMetrics(candidatePods) | ||
|
|
||
| // Prepare per request data | ||
| d.runPrepareDataPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design doc detailed adding timeouts and error limits to these prepare data plugins as to not become a blocker in the event that the plugins are performing poorly, allowing the system to remain resilient. Was this considered in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning to do that in a follow up PR. Keeping it simpler in the initial change to ensure the SLO aware routing work is unblocked and can use this. Also I think it would be simpler to comprehend/debug if we don't have retires/timeouts. Added TODO for now. Hope that's okay. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it simpler in the initial change to ensure the SLO aware routing work is unblocked and can use this.
SLO aware routing is actually the use case I am concerned about, as it has known performance issues. In an effort to defend the systems current operations we need timeouts & retires to safely implement this functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Updated the methods to make prepare data execute parallely for all plugins with timeout and retries. PTAL. Thanks!
63208cf to
bf49fb5
Compare
pkg/epp/requestcontrol/director.go
Outdated
| func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) { | ||
| logger := log.FromContext(ctx) | ||
| // getInferenceObjective fetches the inferenceObjective from the datastore otherwise creates a new one based on reqCtx. | ||
| func (d *Director) getInferenceObjective(logger logr.Logger, reqCtx *handlers.RequestContext) *v1alpha2.InferenceObjective { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove logger from the arguments?
This comes back to a discussion we had on a different PR. the convention we use in the project is contextual logging, we take logger from context and use it instead of passing around logger as arg.
e.g.,
log.FromContext(ctx).V(logutil.VERBOSE).Info(...)e9f0b7f to
3e16069
Compare
3e16069 to
e3021c7
Compare
| // AttributeMap is used to store flexible metadata or traits | ||
| // across different aspects of an inference server. | ||
| // Stored values must be Cloneable. This is a per-request snapshot of the attributes. | ||
| type AttributeMap interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we duplicate this code from datalayer?
pkg/epp/requestcontrol/plugins.go
Outdated
| type PrepareData interface { | ||
| plugins.Plugin | ||
| PrepareData(ctx context.Context, request *types.LLMRequest, pods []types.Pod) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bit confused by this PR.
we've added recently two interfaces for ProducerPlugin and ConsumerPlugin.
ProducerPlugin is a plugin that produces data.
why do we add another interface for a plugin the produces data?
shouldn't PrepareData function be on the same ProducerPlugin interface and validate that the produced data types are matching the types declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommended reading: https://docs.google.com/document/d/1EQwXL2pCuUyM1B917FUgP_8pFS3VF8F_bUfjy8IE7gM/edit?tab=t.vmaefhinvkl5#heading=h.s9mr6kynb3ls
But the TL;DR is
Consume/Produce are expressions of intent only called on startup.
PrepareData is the actual generation of request specific data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right.
would it be correct to say that in order for a plugin to implement PrepareData it should always declare which data it would produce? meaning these two should always go together in the same interface?
if that’s not the case, can you give an example where one is used but the other isn’t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, some data producers will still specify the data they produce, but not generate request specific data, and so do not need to implement PrepareData()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So while its true that an implementer of PrepareData() will always implement Produces(). Not all plugins that implement Produces() will implement PrepareData(). The Metrics scrapers are a great example of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. metrics collectors only produce, and not prepare per request.
an implementer of PrepareData() will always implement Produces()
I think the above should be forced, for example through the interface:
type PrepareData interface {
plugins.ProducerPlugin // < THIS IS THE CHANGE
PrepareData(ctx context.Context, request *types.LLMRequest, pods []types.Pod)
}as a side note - I think the naming selection is critical for readability.
PrepareData is too generic and doesn't say anything about a "per request".
it might be worth calling the function - PrepareRequestData (and rename the interface, e.g., DataProducer or something like that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all sounds reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks for the suggestions!
pkg/epp/requestcontrol/plugins.go
Outdated
|
|
||
| // AdmitRequest is called by the director after the PrepareData plugins and before scheduling. | ||
| // AdmitRequest plugin is implemented by plugins for admission control. These plugins need to implement Admit method. | ||
| type AdmitRequest interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming selection should be changed IMO.
more specifically, struct/interface names should be nouns, e.g., AdmissionPlugin, while function names should be verbs based, e.g., AdmitRequest.
additionally, the documentation should explain what happens when one is using multiple admission plugins (e.g., the request should pass all admission checks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
pkg/epp/requestcontrol/director.go
Outdated
| return reqCtx, errutil.Error{Code: errutil.ServiceUnavailable, Msg: "failed to find candidate pods for serving the request"} | ||
| } | ||
|
|
||
| // TODO(rahulgurnani/lukevandrie): Perhaps, refactor/implement Admit plugin for Admission control. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is important. what is the relation between admission controller Admit and the admitRequest plugins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
This would be a break in contract of how flow control operates. This specific plugin is for request specific semantics. We have currently do not have Flow Control considering request specific semantics, and there hasnt been a proposal suggesting this change. I think we should remove this todo until we have strong reasoning to actually do this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my apologize, but I don't understand the intention of this new Admission plugin.
I thought we want to have admission check pluggable, but it seems now that we have two types of admission checks, with two different interfaces.
this seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comment. Thanks for the catch!
| result, err := d.scheduler.Schedule(ctx, reqCtx.SchedulingRequest, d.toSchedulerPodMetrics(candidatePods)) | ||
| // Prepare per request data | ||
| // TODO(rahulgurnani): Add retries and timeout in the preparedata step. | ||
| d.runPrepareDataPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we create snapshot of candidate pods?
we should work with candidatePods and create a snapshot only when calling the scheduler.
this is true for both prep data and admit request.
helper functions in the director should not rely on the internal scheduler representation of the endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both prep data and admit request are request specific, and so if we add request specific data to the shared endpoints that could risk data corruption.
Snapshotting before these steps ensures that this data lifecycle is only in the context it is consumed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intention of converting the endpoints representation to scheduler internal structure was only for the purpose of sending it to the scheduler.
PodMetrics has MetricsState behind atomic pointer and reading the metrics is an atomic operation (read all metrics in one operation).
I must be missing something although I've read the proposal doc.
|
|
||
| // Config provides a configuration for the requestcontrol plugins. | ||
| type Config struct { | ||
| admitRequestPlugins []AdmissionPlugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| admitRequestPlugins []AdmissionPlugin | |
| admissionPlugins []AdmissionPlugin |
| // WithAdmitRequestPlugins sets the given plugins as the AdmitRequest plugins. | ||
| func (c *Config) WithAdmitRequestPlugins(plugins ...AdmissionPlugin) *Config { | ||
| c.admitRequestPlugins = plugins | ||
| return c | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // WithAdmitRequestPlugins sets the given plugins as the AdmitRequest plugins. | |
| func (c *Config) WithAdmitRequestPlugins(plugins ...AdmissionPlugin) *Config { | |
| c.admitRequestPlugins = plugins | |
| return c | |
| } | |
| // WithAdmissionPlugins sets the given plugins as the AdmissionPlugin plugins. | |
| func (c *Config) WithAdmissionPlugins(plugins ...AdmissionPlugin) *Config { | |
| c.admissionPlugins = plugins | |
| return c | |
| } |
| return c | ||
| } | ||
|
|
||
| // WithPrepareDataPlugins sets the given plugins as the PrepareData plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // WithPrepareDataPlugins sets the given plugins as the PrepareData plugins. | |
| // WithDataProducers sets the given plugins as the DataProducer plugins. |
|
|
||
| result, err := d.scheduler.Schedule(ctx, reqCtx.SchedulingRequest, d.toSchedulerPodMetrics(candidatePods)) | ||
| // Run admit request plugins | ||
| if !d.runAdmitRequestPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give an example of how AdmissionPlugin is using candidate pods?
IMO admission check should include only the request (including its metadata like headers) and the system state (like saturation or other system metrics).
@kfswain I really think we should think carefully on the interfaces and align on them.
it doesn't make sense to me to pass the candidate pods to admission check.
I would be happy to get an example for an admission check that is per request data and depends on the candidate pods (other than saturation that is checked separately).
I'm a bit concerned about over complication here, and unless there is a real use case for that we should probably wait with this addition (talking only about admission, data producer/consumer is clearly needed).
separately, I think there is a missing struct, like RequestState, that conceptually should be similar to PluginState we have today. RequestState should be created at the beginning of the request, and passed around for RequestDataProducers to be filled, and later on to ConsumerPlugins (probably instead of CycleState).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly following recommendation in https://docs.google.com/document/d/1EQwXL2pCuUyM1B917FUgP_8pFS3VF8F_bUfjy8IE7gM/edit?tab=t.vmaefhinvkl5#heading=h.5qzcvbilfn6d
I think we would be adding the first AdmitRequest plugin with latency predictor.
@kfswain please keep me honest here. Thanks!
6b5e3c4 to
a7354ca
Compare
|
@rahulgurnani: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add PrepareData and AdmitRequest plugins based on recommendations in evolving datalayer changes
Also, some refactor of director.go
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Addresses #1743
Does this PR introduce a user-facing change?:
Yes, enables writing prepare data and admit request plugins for users of IGW.