-
Notifications
You must be signed in to change notification settings - Fork 3.3k
fix: llama 4 + EP IndexError #12636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: llama 4 + EP IndexError #12636
Conversation
Summary of ChangesHello @b8zhong, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug that caused Llama 4 models to fail loading with Expert Parallelism due to an Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical IndexError that occurred during weight loading for Llama 4 models with Expert Parallelism (EP). The root cause was the incorrect use of a global expert ID to index a parameter array expecting a local expert ID when expert location metadata was not available. The fix correctly redirects the weight loading process to the _weight_loader_physical method, which properly handles the global-to-local expert ID mapping. This change is a direct and effective solution to the bug, and the provided benchmark results confirm its correctness. The code change is sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the weight loading logic for MoE (Mixture of Experts) layers when global expert location metadata is not available. The key change ensures that expert ID mapping from global to local is correctly handled in all cases.
- Changed the method call from
_weight_loader_implto_weight_loader_physicalwhen global expert location metadata is absent - Added explanatory comment clarifying that the physical loader handles global-to-local expert ID mapping
- Ensures consistent expert ID handling regardless of whether global metadata is present
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
It seems Llama 4 EP previously did not work.
Fix #12577
My understanding of EP is naive, but here is my explanation:
When expert-location metadata isn’t available, the loader was indexing param.data[expert_id] with a global ID (e.g., 2) while the param only has num_local_experts rows (2), causing the IndexError.To avoid breaking other models (DSR1) I think CI should be able to catch if it does.
Loading is fixed + accuracy is fine