-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
break execute_model in gpu_model_runner into sub-functions for custom scopes #24265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the execute_model function by breaking it down into smaller, more focused sub-functions, each wrapped in a profiling scope. This is a great improvement for code readability, maintainability, and performance profiling. The refactoring appears to be logically sound. I've identified a couple of issues with incorrect type hints in the new function signatures that should be addressed to ensure code correctness from a static analysis perspective.
vllm/v1/worker/gpu_model_runner.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint for _preprocess is incorrect. It specifies 9 elements in the tuple, but the function returns 8. Additionally, several of the types are mismatched with the actual returned values (e.g., the 3rd element is Optional[torch.Tensor] but hinted as int). This should be corrected to match the returned tuple for type consistency and to help static analysis tools.
) -> tuple[
int,
int,
Optional[torch.Tensor],
Optional[torch.Tensor],
Optional[torch.Tensor],
torch.Tensor,
Optional[IntermediateTensors],
dict[str, Any],
]:
vllm/v1/worker/gpu_model_runner.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
41d00e1 to
757588e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
757588e to
c18d76a
Compare
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
61d77aa to
e823d15
Compare
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
e823d15 to
2e9ed8d
Compare
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
2e9ed8d to
e0f8874
Compare
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
e0f8874 to
994669b
Compare
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
houseroad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this move. It will make the profiling result easier to parse. As by product, it also break long function into smaller pieces.
vllm/v1/worker/gpu_model_runner.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
later we can move this into something like _post_processing.
vllm/v1/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason not to turn it on by default? how much is the overhead? (given it's just nvtx context?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For use case for extreme perf, we would like to avoid additional overhead. Although, this should be light.
|
This pull request has merge conflicts that must be resolved before it can be |
… scopes (vllm-project#24265) Summary: break down the main execute_model() function into multiple parts so that we can have better latency breakdown in profiles: - preprocess - forward - postprocess - bookkeep (includes sync) - draft (if spec decoding is enabled) this is meant to be a refactor + some new function scopes, there's no functionality difference Test Plan: CI run the vLLM predictor runbook and take a trace {F1981500506} Reviewed By: houseroad, frank-wei Differential Revision: D81009244
3f21393 to
9610aee
Compare
|
This pull request was exported from Phabricator. Differential Revision: D81009244 |
… scopes (vllm-project#24265) Co-authored-by: Bangsheng Tang <bangsheng@meta.com>
… scopes (vllm-project#24265) Co-authored-by: Bangsheng Tang <bangsheng@meta.com>
… scopes (vllm-project#24265) Co-authored-by: Bangsheng Tang <bangsheng@meta.com>
… scopes (vllm-project#24265) Co-authored-by: Bangsheng Tang <bangsheng@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
… scopes (vllm-project#24265) Co-authored-by: Bangsheng Tang <bangsheng@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Summary:
break down the main execute_model() function into multiple parts so that we can have better latency breakdown in profiles:
this is meant to be a refactor + some new function scopes, there's no functionality difference
Test Plan:
CI
run the vLLM predictor runbook and take a trace

Differential Revision: D81009244