Documentation for `torch.optim.swa_utils` #41228

izmailovpavel · 2020-07-10T06:36:59Z

This PR adds a description of torch.optim.swa_utils added in #35032 to the docs at docs/source/optim.rst. Please let me know what you think!

@vincentqb @andrewgordonwilson

dr-ci · 2020-07-10T06:59:28Z

💊 CI failures summary and remediations

As of commit c7d5bbd (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 8 times.

docs/source/optim.rst

vincentqb

LGTM!

Let's see what this looks like here, once the PR has landed.

facebook-github-bot

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-07-28T02:13:43Z

@vincentqb merged this pull request in 509c18a.

vincentqb · 2020-07-28T15:53:25Z

cc @jlin27 to add to the release

Daniil-Osokin · 2020-10-11T15:00:06Z

docs/source/optim.rst

+>>>           loss_fn(model(input), target).backward()
+>>>           optimizer.step()
+>>>       if i > swa_start:
+>>>           swa_model.update_parameters(model)


@izmailovpavel Hi, thanks for the implementation of SWA in PyTorch! Do you find it useful to average the weights during annealing phase? By default anneal_epochs equals to 10 (and cannot be set to 0), so in this snippet for the first 10 epochs after swa_start weights will be averaged during annealing phase.

Hey Daniil, thank you for looking into the implementation! Generally, it's reasonable to average the weights during the annealing phase, but I imagine there could be cases when it's not desirable, e.g. when the learning rate before the annealing is way too high

Thanks for the response! It is just confusing a bit. According to the paper, the purpose of SWA is to average weights during exploration of loss surface (to find high-performing networks), but averaging during annealing would lead to averaging with different (decreasing) learning rates, which paper refers as non suitable (links to experiments in Ruppert, 1988) for improving generalization, due to SGD does not perform very differently under this schedule.

Whether or not averaging in the annealing phase would be desirable would depend on how much the learning rate changes in the annealing phase. Generally, averaging at different learning rates can work well, see e.g. https://arxiv.org/abs/1806.05594. Although it can also be bad, as you said

It may be safer to fix it in the docs as you suggest.

lr_anneal_epochs = 10 swa_scheduler = SWALR(optimizer, swa_lr=0.05, anneal_epochs=lr_anneal_epochs) ... if i > swa_start + lr_anneal_epochs: swa_model.update_parameters(model)

It makes the example a bit more complex... @vincentqb @andrewgordonwilson what do you think?

Is any chance to remove annealing phase from the scheduler? I believe this is the part which accidentally differs from the paper.

It may be safer to fix it in the docs as you suggest.

lr_anneal_epochs = 10 swa_scheduler = SWALR(optimizer, swa_lr=0.05, anneal_epochs=lr_anneal_epochs) ... if i > swa_start + lr_anneal_epochs: swa_model.update_parameters(model)

It makes the example a bit more complex... @vincentqb @andrewgordonwilson what do you think?

What would be an alternate fix that would make this simpler?

I believe the simplest option is to allow SWALR take anneal_epochs=0 by default, thus skip annealing phase. In this case this snippet will be consistent to the paper and can be left as is (but it requires SWALR code modification). Can make a PR if it will work.

Sorry for not following up on this discussion for so long. @Daniil-Osokin I agree that we should allow anneal_epochs=0, and I think it would be great if you could make a pull request for that. I am not sure if we should by default set the number of annealing epochs to zero or not, I am up for discussion. I think both options are reasonable, but I agree that it may be better to set anneal_epochs=0 by default.

pytorchbot added the open source label Jul 10, 2020

ailzhang requested a review from vincentqb July 13, 2020 15:49

ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 13, 2020

vincentqb reviewed Jul 17, 2020

View reviewed changes

docs/source/optim.rst Outdated Show resolved Hide resolved

vincentqb approved these changes Jul 17, 2020

View reviewed changes

facebook-github-bot reviewed Jul 17, 2020

View reviewed changes

facebook-github-bot reviewed Jul 27, 2020

View reviewed changes

izmailovpavel and others added 11 commits July 27, 2020 14:18

started doc

c2ea483

Update optim.rst

414d904

Update optim.rst

276109f

SWALR doc

1b0edac

Update optim.rst

1d6e94a

putting all together

690a43c

fix typos

6799131

add paper link

617359a

typo

e65c5da

added prediction with swa_model to example

874476e

lint

c7d5bbd

vincentqb force-pushed the swa_docs branch from a4f75c5 to c7d5bbd Compare July 27, 2020 21:19

facebook-github-bot reviewed Jul 27, 2020

View reviewed changes

facebook-github-bot closed this in 509c18a Jul 28, 2020

facebook-github-bot added the merged label Jul 28, 2020

vincentqb added the module: docs Related to our documentation, both in docs/ and docblocks label Jul 28, 2020

zou3519 mentioned this pull request Aug 4, 2020

master docs push isn't working #42532

Closed

Daniil-Osokin reviewed Oct 11, 2020

View reviewed changes

mruberry added the Merged label Oct 28, 2020

Daniil-Osokin mentioned this pull request Nov 8, 2020

Cannot set 0 annealing epochs in SWALR #47578

Closed

Documentation for torch.optim.swa_utils #41228

Documentation for torch.optim.swa_utils #41228

Uh oh!

Conversation

izmailovpavel commented Jul 10, 2020

Uh oh!

dr-ci bot commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

Uh oh!

vincentqb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 28, 2020

Uh oh!

vincentqb commented Jul 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Documentation for `torch.optim.swa_utils` #41228

Documentation for `torch.optim.swa_utils` #41228

dr-ci bot commented Jul 10, 2020 •

edited

Loading

vincentqb left a comment •

edited

Loading