[Bugfix] fix pp for llama4#16746
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Lu Fang <fanglu@fb.com>
fcd5805 to
fdc4236
Compare
| self.language_model = _initialize_model( | ||
| vllm_config=vllm_config.with_hf_config(config.text_config), | ||
| vllm_config=vllm_config.with_hf_config(config.text_config, | ||
| ["LlamaForCausalLM"]), |
There was a problem hiding this comment.
Should we use Llama4ForCausalLM?
There was a problem hiding this comment.
Llama4ForCasualLm is not registered architecture, we should avoid using that which requires adding a lot of hacks as in the initial PR
There was a problem hiding this comment.
Probably a dumb question, but since _initialize_model is already pointing to model_class=Llama4ForCausalLM, why do we need to override the architectures here to LlamaForCausalLM?
vllm/vllm/model_executor/model_loader/loader.py
Lines 114 to 123 in 8cac35b
There was a problem hiding this comment.
during __post__init__ of the hf config when we call replace inside with_hf_config function
Line 3781 in a018e55
Line 3790 in a018e55
vllm/vllm/model_executor/models/registry.py
Lines 441 to 442 in a018e55
| # using llama4's load_weights routine. | ||
| language_model_weights, other_weights = self.separate_weights( | ||
| weights, prefix="language_model.model.") | ||
| weights, prefix="language_model.") |
There was a problem hiding this comment.
Wondering why this issue was not triggered before?
There was a problem hiding this comment.
language_model.lm_head can also be loaded by the parent model weight loader if not PP enabled, but for PP, as we split the weights into 2 parts, the lm_head is missing in PP=0 so it will raise an issue weight not found. while in llama4.py model loading we have logic handling is_pp_missing_parameter to avoid the exception
Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
This PR fixes PP for llama4 (#16385)
Loading language model with architecture to support PP verification and correct the prefix to separate model loading weights for language model and rest of the models so both PP=0 and PP=1 can work.