Skip to content

Conversation

@RyanMullins
Copy link
Contributor

What does this PR do?

Fixes #41964

  • Sets total=False on RopeParameters
  • Applies Required to the relevant properties of RopeParameters
  • More coming soon...

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Rocketknight1
Copy link
Member

LGTM but cc @zucchini-nlp @ArthurZucker!

@RyanMullins RyanMullins force-pushed the rope-type-hints branch 2 times, most recently from ddcd9e9 to cbb3751 Compare November 4, 2025 20:04
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding a docstring, much clearer! I just left a few comments to be aligned with current code

Comment on lines +160 to +163
logger.warning(
"Unable to find a rope_theta and defaults are not supported for this value. Returning the config"
" un-modified. Verify that the model uses RoPE and that it should have been present in this config."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not raise a warning? Standardization can be called on an already standardized config and this warning will be raised every time it happens

rope_theta = getattr(config, "rope_theta", None)

# Case 1: one RoPE theat = one RoPE param per model without nesting
def process_rope_parameters_for_layer_type(rope_theta: dict[str, float], layer_type: str) -> RopeParameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's expect a single theta here, since we're already unfolding thetas dict below

)
return

config.rope_parameters = RopeParameters(rope_type=rope_type, rope_theta=rope_theta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we create a new value, we can lose scaling params that existed before like rope_parameters.factor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve typing support and flexibility for RopeParameters

3 participants