Skip to content

Conversation

@izmailovpavel
Copy link
Contributor

This PR adds a description of torch.optim.swa_utils added in #35032 to the docs at docs/source/optim.rst. Please let me know what you think!

@vincentqb @andrewgordonwilson

@dr-ci
Copy link

dr-ci bot commented Jul 10, 2020

💊 CI failures summary and remediations

As of commit c7d5bbd (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 8 times.

@ailzhang ailzhang requested a review from vincentqb July 13, 2020 15:49
@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 13, 2020
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Let's see what this looks like here, once the PR has landed.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@vincentqb merged this pull request in 509c18a.

@vincentqb vincentqb added the module: docs Related to our documentation, both in docs/ and docblocks label Jul 28, 2020
@vincentqb
Copy link
Contributor

cc @jlin27 to add to the release

>>> loss_fn(model(input), target).backward()
>>> optimizer.step()
>>> if i > swa_start:
>>> swa_model.update_parameters(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@izmailovpavel Hi, thanks for the implementation of SWA in PyTorch! Do you find it useful to average the weights during annealing phase? By default anneal_epochs equals to 10 (and cannot be set to 0), so in this snippet for the first 10 epochs after swa_start weights will be averaged during annealing phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Daniil, thank you for looking into the implementation! Generally, it's reasonable to average the weights during the annealing phase, but I imagine there could be cases when it's not desirable, e.g. when the learning rate before the annealing is way too high

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response! It is just confusing a bit. According to the paper, the purpose of SWA is to average weights during exploration of loss surface (to find high-performing networks), but averaging during annealing would lead to averaging with different (decreasing) learning rates, which paper refers as non suitable (links to experiments in Ruppert, 1988) for improving generalization, due to SGD does not perform very differently under this schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not averaging in the annealing phase would be desirable would depend on how much the learning rate changes in the annealing phase. Generally, averaging at different learning rates can work well, see e.g. https://arxiv.org/abs/1806.05594. Although it can also be bad, as you said

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be safer to fix it in the docs as you suggest.

lr_anneal_epochs = 10
swa_scheduler = SWALR(optimizer, swa_lr=0.05, anneal_epochs=lr_anneal_epochs)
...

if i > swa_start + lr_anneal_epochs:
    swa_model.update_parameters(model)

It makes the example a bit more complex... @vincentqb @andrewgordonwilson what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any chance to remove annealing phase from the scheduler? I believe this is the part which accidentally differs from the paper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be safer to fix it in the docs as you suggest.

lr_anneal_epochs = 10
swa_scheduler = SWALR(optimizer, swa_lr=0.05, anneal_epochs=lr_anneal_epochs)
...

if i > swa_start + lr_anneal_epochs:
    swa_model.update_parameters(model)

It makes the example a bit more complex... @vincentqb @andrewgordonwilson what do you think?

What would be an alternate fix that would make this simpler?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the simplest option is to allow SWALR take anneal_epochs=0 by default, thus skip annealing phase. In this case this snippet will be consistent to the paper and can be left as is (but it requires SWALR code modification). Can make a PR if it will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not following up on this discussion for so long. @Daniil-Osokin I agree that we should allow anneal_epochs=0, and I think it would be great if you could make a pull request for that. I am not sure if we should by default set the number of annealing epochs to zero or not, I am up for discussion. I think both options are reasonable, but I agree that it may be better to set anneal_epochs=0 by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: docs Related to our documentation, both in docs/ and docblocks open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants