Disable remote caching when calling compile_fx#16611
Disable remote caching when calling compile_fx#16611houseroad merged 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
houseroad
left a comment
There was a problem hiding this comment.
Thanks for the fix, let me play around locally.
The problem is at follows: - vLLM requires its monkeypatched functions to run (e.g. https://github.com/vllm-project/vllm/blob/7b5ecf79bd94aab0d782c70126d0dcc37c16bc60/vllm/compilation/compiler_interface.py#L251) - These functions may not run if (1) a user has torch.compile remote cache setup and (2) there is a remote cache hit. - When the monkeypatched/hijacked functions fail to run, we get some assertions: https://github.com/vllm-project/vllm/blob/7b5ecf79bd94aab0d782c70126d0dcc37c16bc60/vllm/compilation/compiler_interface.py#L299-L302 This PR disables torch.compile remote caching for vLLM compile. Test Plan: - tested locally Signed-off-by: rzou <zou3519@gmail.com>
fda8175 to
cbc49dc
Compare
|
This PR is mostly to fix a meta-internal bug I noticed (where we do have the torch.compile remote caches on), but I think this is generally applicable to vLLM so we should ship it |
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
The problem is as follows:
vllm/vllm/compilation/compiler_interface.py
Line 251 in 7b5ecf7
vllm/vllm/compilation/compiler_interface.py
Lines 299 to 302 in 7b5ecf7
This PR disables torch.compile remote caching for vLLM compile.
Test Plan:
vllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000 --override-generation-config='{"attn_temperature_tuning": true}'