-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
v1: Support KV events from connectors #19737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @orozery, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances the KV cache event system by extending the connector API to allow connectors to emit KV cache events. It also introduces a flexible metadata field for KVCacheEvent objects and integrates the collection of these new connector-emitted events into the scheduler's event publishing mechanism.
Highlights
- KVCacheEvent Metadata: Introduced a generic
metadata: Optional[dict[str, Any]]field to theKVCacheEventbase class, allowing custom data to be associated with KV cache events. TheKVCacheEventdataclass was also updated to make its fields keyword-only. - New KV Event API for Connectors: Added a new
take_events()method to theKVConnectorbase class. This method provides an API for connectors to expose KV cache events they collect, returning an optional list ofKVCacheEventobjects. - Scheduler Event Aggregation: Updated the scheduler's
schedulemethod to collect and publish KV cache events. It now aggregates events from both the existingkv_cache_managerand the newly introducedKVConnector.take_events()method, ensuring all relevant events are published.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new scheduler-side connector API to collect KV cache events, along with a generic metadata field for KVCacheEvent. The changes look good, with a suggestion for optimizing list concatenation in the scheduler and a few minor suggestions for improved code clarity.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
daec62d to
cdd5414
Compare
vllm/distributed/kv_events.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vMaroon adding metadata here kind of defeats the purpose of having the strongly-typed event types. Could you elaborate on what it's for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any generic use for this right now. The intention here was to add free metadata, starting with device-type. This can be explicitly added. Plus, static metadata information should be most likely discovered and not mounted per event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the generic metadata field.
Instead, added a "medium" optional field which can be used to associate the event with a specific medium (e.g. gpu/cpu/disk).
cdd5414 to
c0f66eb
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
c0f66eb to
393cff5
Compare
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@orozery some test failures are due to the changes. It would probably be best to have |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orozery I think MultiConnector should also be updated to collect events from the aggregated collectors?
1f3e942 to
ab12890
Compare
ab12890 to
36bec03
Compare
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @orozery, just have a small nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid list creation
| for c in self._connectors: | |
| events += c.take_events() or [] | |
| for c in self._connectors: | |
| events.extend(c.take_events() or ()) |
36bec03 to
448212f
Compare
|
pre-commit failed for some unknown reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orozery could we instead return Iterable[KVCacheEvent]? I think it may be a bit odd to have a generator function itself as part of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
76965d6 to
39dc6c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just return ()here? rather than having the interface method itself be a generator?
39dc6c0 to
d87c649
Compare
This commit adds a new scheduler-side connector API to collect KV cache events. Additionally, we add a medium field to KV events, to allow distinguishing KV events on different mediums (e.g. blocks stored on cpu, disk, or gpu (default)). Signed-off-by: Or Ozeri <oro@il.ibm.com>
d87c649 to
ebca231
Compare
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
This PR adds a new scheduler-side connector API to collect KV cache events.
Additionally, we add a generic metadata field to KVCacheEvent to allow custom metadata on KV events.
Part of the work described in RFC #19854