-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image #23129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image #23129
Conversation
Signed-off-by: mgoin <mgoin64@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates FlashInfer to version 0.2.12. The changes involve updating the version in setup.py and modifying the Dockerfile to support the new version's build process. My review identified a critical issue in the Dockerfile related to a syntax error that would prevent an environment variable from being set correctly, as well as a high-severity issue regarding a new build-time network dependency that could impact build robustness. I've provided a single, combined suggestion to address both problems.
Signed-off-by: mgoin <mgoin64@gmail.com>
|
Instead of disabling by default, what if we just disabled this in CI? |
Signed-off-by: mgoin <mgoin64@gmail.com>
|
@ProExpertProg it is still enabled for the release image, but that is the only case. I think we are also concerned about users building the docker themselves, so I'm okay with disabling by default but keeping the release image AOT |
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com>
…elease image (vllm-project#23129) Signed-off-by: mgoin <mgoin64@gmail.com>
Purpose
https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.2.12
Also include small update to download cubins ahead of time from flashinfer. This affects docker build time, so I moved AOT compilation to not happen by default. You can enable AOT compile by setting
--build-arg FLASHINFER_AOT_COMPILE=trueduring docker buildTest Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.