-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[Kernels] Overlap shared experts with send/recv #23273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernels] Overlap shared experts with send/recv #23273
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
d19e631 to
679ff7b
Compare
SageMoore
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @bnellnm. I only have minor nits. Otherwise looks good.
LucasWilkinson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Would be good to get a trace though to show the overlap
1e81a2c to
5939843
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
17eeab8 to
e2a93cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A parameterless function. I can change the wording if it's not clear.
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Head branch was pushed to by a user without write access
4f87a94 to
068ad88
Compare
|
/ready |
Signed-off-by: Bill Nell <bnell@redhat.com>
Head branch was pushed to by a user without write access
|
@SageMoore , @LucasWilkinson could you guys take a final look? Should be ready to merge. @robertgshaw2-redhat verified that it works on multi-node systems now. |
Signed-off-by: Bill Nell <bnell@redhat.com>
|
This PR introduces the R1 accuracy issue #24530 |
Signed-off-by: Bill Nell <bnell@redhat.com>
Purpose
Overlap shared experts computation with send and receive operations in all2all dispatcher.
TODO
prepare_asyncfor flashinfer prepare/finalize class?Test Plan
Tried it with llama4 and deepseek_v2
Added unit test for shared experts in
test_pplx_moe.pyTest Result
This trace shows the MLP (nvjet...) overlapped with DeepEP dispatch send/receive (green).
Documentation Update
Updated modular kernel docs
cc @SageMoore , @LucasWilkinson