Skip to content

Conversation

@orozery
Copy link
Contributor

@orozery orozery commented Jun 17, 2025

This PR adds a new scheduler-side connector API to collect KV cache events.
Additionally, we add a generic metadata field to KVCacheEvent to allow custom metadata on KV events.

Part of the work described in RFC #19854

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @orozery, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the KV cache event system by extending the connector API to allow connectors to emit KV cache events. It also introduces a flexible metadata field for KVCacheEvent objects and integrates the collection of these new connector-emitted events into the scheduler's event publishing mechanism.

Highlights

  • KVCacheEvent Metadata: Introduced a generic metadata: Optional[dict[str, Any]] field to the KVCacheEvent base class, allowing custom data to be associated with KV cache events. The KVCacheEvent dataclass was also updated to make its fields keyword-only.
  • New KV Event API for Connectors: Added a new take_events() method to the KVConnector base class. This method provides an API for connectors to expose KV cache events they collect, returning an optional list of KVCacheEvent objects.
  • Scheduler Event Aggregation: Updated the scheduler's schedule method to collect and publish KV cache events. It now aggregates events from both the existing kv_cache_manager and the newly introduced KVConnector.take_events() method, ensuring all relevant events are published.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the v1 label Jun 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a new scheduler-side connector API to collect KV cache events, along with a generic metadata field for KVCacheEvent. The changes look good, with a suggestion for optimizing list concatenation in the scheduler and a few minor suggestions for improved code clarity.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vMaroon adding metadata here kind of defeats the purpose of having the strongly-typed event types. Could you elaborate on what it's for?

Copy link
Contributor

@vMaroon vMaroon Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any generic use for this right now. The intention here was to add free metadata, starting with device-type. This can be explicitly added. Plus, static metadata information should be most likely discovered and not mounted per event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the generic metadata field.
Instead, added a "medium" optional field which can be used to associate the event with a specific medium (e.g. gpu/cpu/disk).

@orozery orozery force-pushed the connector-kv-events branch from cdd5414 to c0f66eb Compare August 3, 2025 17:47
@mergify
Copy link

mergify bot commented Aug 3, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @orozery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 3, 2025
@orozery orozery force-pushed the connector-kv-events branch from c0f66eb to 393cff5 Compare August 3, 2025 17:50
@mergify mergify bot removed the needs-rebase label Aug 3, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 4, 2025
@njhill
Copy link
Member

njhill commented Aug 5, 2025

@orozery some test failures are due to the changes. It would probably be best to have medium default to None.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orozery I think MultiConnector should also be updated to collect events from the aggregated collectors?

@orozery orozery force-pushed the connector-kv-events branch 2 times, most recently from 1f3e942 to ab12890 Compare August 6, 2025 17:05
@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 6, 2025
@orozery orozery force-pushed the connector-kv-events branch from ab12890 to 36bec03 Compare August 6, 2025 17:13
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @orozery, just have a small nit.

Comment on lines 209 to 210
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid list creation

Suggested change
for c in self._connectors:
events += c.take_events() or []
for c in self._connectors:
events.extend(c.take_events() or ())

@orozery orozery force-pushed the connector-kv-events branch from 36bec03 to 448212f Compare August 10, 2025 04:59
@orozery
Copy link
Contributor Author

orozery commented Aug 10, 2025

pre-commit failed for some unknown reason

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orozery could we instead return Iterable[KVCacheEvent]? I think it may be a bit odd to have a generator function itself as part of the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@orozery orozery force-pushed the connector-kv-events branch 5 times, most recently from 76965d6 to 39dc6c0 Compare August 11, 2025 19:12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just return ()here? rather than having the interface method itself be a generator?

@orozery orozery force-pushed the connector-kv-events branch from 39dc6c0 to d87c649 Compare August 12, 2025 17:51
This commit adds a new scheduler-side connector API
to collect KV cache events.
Additionally, we add a medium field to KV events, to allow
distinguishing KV events on different mediums
(e.g. blocks stored on cpu, disk, or gpu (default)).

Signed-off-by: Or Ozeri <oro@il.ibm.com>
@orozery orozery force-pushed the connector-kv-events branch from d87c649 to ebca231 Compare August 14, 2025 12:36
@njhill njhill enabled auto-merge (squash) August 28, 2025 23:20
@njhill njhill disabled auto-merge August 31, 2025 18:45
@njhill njhill enabled auto-merge (squash) August 31, 2025 18:45
@njhill njhill merged commit 14b4326 into vllm-project:main Sep 1, 2025
45 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: Or Ozeri <oro@il.ibm.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants