-
Notifications
You must be signed in to change notification settings - Fork 31.1k
feat: RoPE-related typing improvements #41967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
LGTM but cc @zucchini-nlp @ArthurZucker! |
f300361 to
7e7ab8b
Compare
ddcd9e9 to
cbb3751
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma3 |
cbb3751 to
db871df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding a docstring, much clearer! I just left a few comments to be aligned with current code
| logger.warning( | ||
| "Unable to find a rope_theta and defaults are not supported for this value. Returning the config" | ||
| " un-modified. Verify that the model uses RoPE and that it should have been present in this config." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not raise a warning? Standardization can be called on an already standardized config and this warning will be raised every time it happens
| rope_theta = getattr(config, "rope_theta", None) | ||
|
|
||
| # Case 1: one RoPE theat = one RoPE param per model without nesting | ||
| def process_rope_parameters_for_layer_type(rope_theta: dict[str, float], layer_type: str) -> RopeParameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's expect a single theta here, since we're already unfolding thetas dict below
| ) | ||
| return | ||
|
|
||
| config.rope_parameters = RopeParameters(rope_type=rope_type, rope_theta=rope_theta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we create a new value, we can lose scaling params that existed before like rope_parameters.factor
What does this PR do?
Fixes #41964
total=FalseonRopeParametersRequiredto the relevant properties ofRopeParametersBefore submitting
Pull Request section?
to it if that's the case. Improve typing support and flexibility for RopeParameters #41964
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.