[optim] prevent problematic tensor aliasing in lr_scheduler by filipviz · Pull Request #163098 · pytorch/pytorch

filipviz · 2025-09-16T20:14:56Z

Prevents edge cases in SequentialLR and ReduceLROnPlateau which could corrupt learning rates or trigger recompilation.

Supersedes #162360
Fixes #162359
Fixes #163093

While putting #162360 together, I noticed the class of issue I was fixing (i.e. unintended aliasing in lr_schedulers when using Tensor lrs) appeared in several other places. @janeyx99 suggested I put together a follow-up PR.

There are several bugs resembling the one fixed in #162360. I added a helper to fix these:

def _update_param_group_val(param_group: dict[str, Any], key: str, val: float | Tensor):
    """Set param_group[key] to val without aliasing or assignment when they're both tensors.
    Raises a KeyError if param_group[key] does not exist.
    """
    if isinstance(param_group[key], Tensor):
        param_group[key].fill_(_to_scalar(val))
    else:
        param_group[key] = val

And applied it to fix bugs in SequentialLR.__init__ and LRScheduler._update_lr. I also added it to CyclicLR.__init__ which was using an equivalent pattern, and CosineAnnealingWarmRestarts.step which should have had a similar issue:

for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()):
    param_group["lr"] = lr

But did not, because get_lr() actually returns tensors when using a tensor lr (despite its list[float] return type annotation). Relying on this propagation seems fragile, so I conservatively added the method here as well. I'll be fixing the type annotations and several related issues in followup PRs built off of this one.

Prevents edge cases in SequentialLR and ReduceLROnPlateau which could corrupt learning rates or trigger recompilation. Supersedes pytorch#162360 Fixes pytorch#162359 Fixes pytorch#163093

pytorch-bot · 2025-09-16T20:15:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163098

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9f60631 with merge base e900a27 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

janeyx99 · 2025-09-16T20:23:29Z

@pytorchbot merge

thanks for your detailed work!

pytorchmergebot · 2025-09-16T20:25:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add a helper to address several lr_scheduler aliasing issues Fix inaccurate type annotations throughout lr_scheduler.py Fixes pytorch#163103

pytorchmergebot · 2025-09-16T21:02:32Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

filipviz · 2025-09-16T21:04:56Z

Sorry! I accidentally pushed 39c342e (which was supposed to be a follow-up PR) to this branch. Let me know how you'd like me to handle this @janeyx99. That commit fixes a number of type annotations and an issue where LRScheduler._update_lr, ChainedScheduler.__init__, ChainedScheduler.step, ReduceLROnPlateau.__init__, ReduceLROOnPlateau.step, and CosineAnnealingWarmRestarts.step previously aliased self._last_lr with group["lr"] if group["lr"] was a tensor, due to patterns like this:

self._last_lr: list[float] = [group["lr"] for group in self.optimizer.param_groups]

It adds another helper to address this:

def _param_groups_val_list(optimizer: Optimizer, key: str) -> list[Any]:
    """Create a list containing group[key] for each optimizer param_group.
    Prevents aliasing when group[key] could be a Tensor.
    Raises a KeyError when group[key] does not exist.
    """
    return [
        group[key].clone() if isinstance(group[key], Tensor) else group[key]
        for group in optimizer.param_groups
    ]

The same helper fixes a bug where LRScheduler.__init__ makes group["initial_lr"] a Tensor if group["lr"] is one, meaning that self.base_lrs alias with the group["initial_lr"]s.

Also adds a test for unintended tensor aliasing in LR schedulers Depends on pytorch#163098 Fixes pytorch#163105

janeyx99 · 2025-09-16T21:22:07Z

Let's undo that change and land the original first!

I can review the followup PR separately. It's good to keep changes minimal per PR.

This reverts commit 39c342e.

filipviz · 2025-09-16T21:28:58Z

Same page @janeyx99. Reverted that commit but added test_reduce_lr_on_plateau_preserves_lr_type back in (since this PR fixes that bug). Let me know if you'd like any other changes.

janeyx99

This commit also fixes CosineAnnealingWarmRestarts, yes?

janeyx99 · 2025-09-16T22:49:33Z

@pytorchbot merge

no need for other changes as long as CI is green!

pytorchmergebot · 2025-09-16T22:52:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-09-17T04:49:57Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

janeyx99 · 2025-09-17T13:38:00Z

@pytorchbot merge

pytorchmergebot · 2025-09-17T13:39:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@janeyx99

…163098) Prevents edge cases in SequentialLR and ReduceLROnPlateau which could corrupt learning rates or trigger recompilation. Supersedes pytorch#162360 Fixes pytorch#162359 Fixes pytorch#163093 While putting pytorch#162360 together, I noticed the class of issue I was fixing (i.e. unintended aliasing in lr_schedulers when using Tensor lrs) appeared in several other places. @janeyx99 suggested I put together a follow-up PR. There are several bugs resembling the one fixed in pytorch#162360. I added a helper to fix these: ```python def _update_param_group_val(param_group: dict[str, Any], key: str, val: float | Tensor): """Set param_group[key] to val without aliasing or assignment when they're both tensors. Raises a KeyError if param_group[key] does not exist. """ if isinstance(param_group[key], Tensor): param_group[key].fill_(_to_scalar(val)) else: param_group[key] = val ``` And applied it to fix bugs in `SequentialLR.__init__` and `LRScheduler._update_lr`. I also added it to `CyclicLR.__init__` which was using an equivalent pattern, and `CosineAnnealingWarmRestarts.step` which *should* have had a similar issue: ```python for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()): param_group["lr"] = lr ``` But did not, because `get_lr()` actually returns tensors when using a tensor lr (despite its `list[float]` return type annotation). Relying on this propagation seems fragile, so I conservatively added the method here as well. I'll be fixing the type annotations and several related issues in followup PRs built off of this one. Pull Request resolved: pytorch#163098 Approved by: https://github.com/janeyx99

@janeyx99

…163098) Prevents edge cases in SequentialLR and ReduceLROnPlateau which could corrupt learning rates or trigger recompilation. Supersedes pytorch#162360 Fixes pytorch#162359 Fixes pytorch#163093 While putting pytorch#162360 together, I noticed the class of issue I was fixing (i.e. unintended aliasing in lr_schedulers when using Tensor lrs) appeared in several other places. @janeyx99 suggested I put together a follow-up PR. There are several bugs resembling the one fixed in pytorch#162360. I added a helper to fix these: ```python def _update_param_group_val(param_group: dict[str, Any], key: str, val: float | Tensor): """Set param_group[key] to val without aliasing or assignment when they're both tensors. Raises a KeyError if param_group[key] does not exist. """ if isinstance(param_group[key], Tensor): param_group[key].fill_(_to_scalar(val)) else: param_group[key] = val ``` And applied it to fix bugs in `SequentialLR.__init__` and `LRScheduler._update_lr`. I also added it to `CyclicLR.__init__` which was using an equivalent pattern, and `CosineAnnealingWarmRestarts.step` which *should* have had a similar issue: ```python for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()): param_group["lr"] = lr ``` But did not, because `get_lr()` actually returns tensors when using a tensor lr (despite its `list[float]` return type annotation). Relying on this propagation seems fragile, so I conservatively added the method here as well. I'll be fixing the type annotations and several related issues in followup PRs built off of this one. Pull Request resolved: pytorch#163098 Approved by: https://github.com/janeyx99

@janeyx99

…163098) Prevents edge cases in SequentialLR and ReduceLROnPlateau which could corrupt learning rates or trigger recompilation. Supersedes pytorch#162360 Fixes pytorch#162359 Fixes pytorch#163093 While putting pytorch#162360 together, I noticed the class of issue I was fixing (i.e. unintended aliasing in lr_schedulers when using Tensor lrs) appeared in several other places. @janeyx99 suggested I put together a follow-up PR. There are several bugs resembling the one fixed in pytorch#162360. I added a helper to fix these: ```python def _update_param_group_val(param_group: dict[str, Any], key: str, val: float | Tensor): """Set param_group[key] to val without aliasing or assignment when they're both tensors. Raises a KeyError if param_group[key] does not exist. """ if isinstance(param_group[key], Tensor): param_group[key].fill_(_to_scalar(val)) else: param_group[key] = val ``` And applied it to fix bugs in `SequentialLR.__init__` and `LRScheduler._update_lr`. I also added it to `CyclicLR.__init__` which was using an equivalent pattern, and `CosineAnnealingWarmRestarts.step` which *should* have had a similar issue: ```python for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()): param_group["lr"] = lr ``` But did not, because `get_lr()` actually returns tensors when using a tensor lr (despite its `list[float]` return type annotation). Relying on this propagation seems fragile, so I conservatively added the method here as well. I'll be fixing the type annotations and several related issues in followup PRs built off of this one. Pull Request resolved: pytorch#163098 Approved by: https://github.com/janeyx99

[optim] prevent problematic tensor aliasing in lr_scheduler

6545ef4

Prevents edge cases in SequentialLR and ReduceLROnPlateau which could corrupt learning rates or trigger recompilation. Supersedes pytorch#162360 Fixes pytorch#162359 Fixes pytorch#163093

filipviz requested review from albanD and janeyx99 as code owners September 16, 2025 20:14

pytorch-bot bot added the release notes: optim label Sep 16, 2025

filipviz mentioned this pull request Sep 16, 2025

[optim] avoid aliasing lr and initial_lr in SequentialLR.__init__ #162360

Closed

pytorchbot added the open source label Sep 16, 2025

janeyx99 approved these changes Sep 16, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 16, 2025

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 16, 2025

pytorchmergebot added the merging label Sep 16, 2025

[optim] fix aliasing for lr scheduler base_lrs and _last_lr

39c342e

Add a helper to address several lr_scheduler aliasing issues Fix inaccurate type annotations throughout lr_scheduler.py Fixes pytorch#163103

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Sep 16, 2025

pytorchmergebot removed the merging label Sep 16, 2025

filipviz added a commit to filipviz/pytorch that referenced this pull request Sep 16, 2025

[optim] override SWALR.state_dict and load_state_dict

9f74b14

Also adds a test for unintended tensor aliasing in LR schedulers Depends on pytorch#163098 Fixes pytorch#163105

filipviz added 2 commits September 16, 2025 17:24

Revert "[optim] fix aliasing for lr scheduler base_lrs and _last_lr"

a5a24a3

This reverts commit 39c342e.

add back test_reduce_lr_on_plateau_preserves_lr_type

9f60631

janeyx99 approved these changes Sep 16, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 16, 2025

pytorchmergebot added the merging label Sep 16, 2025

pytorchmergebot added the Merged label Sep 17, 2025

pytorchmergebot closed this in bc38c5b Sep 17, 2025

pytorchmergebot removed the merging label Sep 17, 2025

filipviz mentioned this pull request Sep 18, 2025

[optim] prevent unintended aliasing in lr_scheduler; update type annotations/docs #163120

Closed

Conversation

filipviz commented Sep 16, 2025

Uh oh!

pytorch-bot bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163098

✅ No Failures

Uh oh!

janeyx99 commented Sep 16, 2025

Uh oh!

pytorchmergebot commented Sep 16, 2025

Merge started

Uh oh!

pytorchmergebot commented Sep 16, 2025

Merge failed

Uh oh!

filipviz commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janeyx99 commented Sep 16, 2025

Uh oh!

filipviz commented Sep 16, 2025

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

janeyx99 commented Sep 16, 2025

Uh oh!

pytorchmergebot commented Sep 16, 2025

Merge started

Uh oh!

pytorchmergebot commented Sep 17, 2025

Uh oh!

janeyx99 commented Sep 17, 2025

Uh oh!

pytorchmergebot commented Sep 17, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Sep 16, 2025 •

edited

Loading

filipviz commented Sep 16, 2025 •

edited

Loading