-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[Feature][P/D]: Optimize NIXL Connector xfer Launch #23887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a performance optimization in the NixlConnector by vectorizing the computation in the _get_block_descs_ids method. The change replaces nested Python loops with NumPy broadcasting operations, which significantly reduces the execution time for generating block descriptor IDs, as demonstrated by the performance results in the description. The implementation is correct and effectively leverages NumPy for better performance. The addition of the numpy import is necessary and appropriate for this change.
|
nice job |
Signed-off-by: ycyaw66 <497410282@qq.com>
Signed-off-by: ycyaw66 <497410282@qq.com>
Signed-off-by: ycyaw66 <497410282@qq.com>
590edbc to
db73beb
Compare
|
hi @robertgshaw2-redhat, PTAL thanks. |
|
really nice work |
Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>
Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>
Purpose
issue #23780
Test Plan
prefill instance:
decode instance:
proxy:
Test Result
origin:

after this pr:

_get_block_descs_ids time reduced from ~2ms to ~0.05ms
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.