[optimizer] introduce optimizer functional API, refactor Adagrad #44715

wanchaol · 2020-09-15T15:51:18Z

Stack from ghstack:

[dist_optim] introduce distributed functional optimizer #45221 [dist_optim] introduce distributed functional optimizer
[optimizer] refactor Adam to use functional API #44791 [optimizer] refactor Adam to use functional API
[optimizer] introduce optimizer functional API, refactor Adagrad #44715 [optimizer] introduce optimizer functional API, refactor Adagrad

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of nn.functional), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

Differential Revision: D23935258

[ghstack-poisoned]

ghstack-source-id: a18daa0 Pull Request resolved: #44715

codecov · 2020-09-15T19:20:58Z

Codecov Report

Merging #44715 into gh/wanchaol/130/base will increase coverage by 0.00%.
The diff coverage is 90.00%.

@@                  Coverage Diff                  @@
##           gh/wanchaol/130/base   #44715   +/-   ##
=====================================================
  Coverage                 68.10%   68.11%           
=====================================================
  Files                       393      394    +1     
  Lines                     50945    50957   +12     
=====================================================
+ Hits                      34698    34710   +12     
  Misses                    16247    16247

Impacted Files	Coverage Δ
torch/optim/functional.py	`84.61% <84.61%> (ø)`
torch/optim/adagrad.py	`83.33% <100.00%> (+4.30%)`	⬆️
torch/testing/_internal/expecttest.py	`77.55% <0.00%> (-1.03%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 022ba5a...3c8b9eb. Read the comment docs.

[ghstack-poisoned]

…dagrad" We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. [ghstack-poisoned]

dr-ci · 2020-09-16T16:24:45Z

💊 CI failures summary and remediations

As of commit 3c8b9eb (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.7-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 26 times.

…dagrad" We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. [ghstack-poisoned]

torch/optim/adagrad.py

torch/optim/functional.py

vincentqb · 2020-09-18T17:25:01Z

also relates to #39279

…dagrad" We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. [ghstack-poisoned]

vincentqb · 2020-09-23T23:11:59Z

One more observation: from this PR, the C++ implementation doesn't get a functional implementation. Is that something you planned to do @wanchaol ?

cc @anjali411

vincentqb · 2020-09-23T23:13:05Z

Oh right, we should add that to the documentation :)

wanchaol · 2020-09-23T23:21:38Z

One more observation: from this PR, the C++ implementation doesn't get a functional implementation. Is that something you planned to do @wanchaol ?

@vincentqb yes I can make follow up PRs to make C++ APIs to also include functional APIs

Oh right, we should add that to the documentation :)

do you mean that we should add the functional APIs to the documentation like this? https://pytorch.org/docs/stable/nn.functional.html

vincentqb · 2020-09-23T23:53:19Z

do you mean that we should add the functional APIs to the documentation like this? https://pytorch.org/docs/stable/nn.functional.html

We could also simply add them here, though the page is getting long :)

…dagrad" We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. [ghstack-poisoned]

albanD

LGTM

vincentqb

I second @albanD, the PR LGTM. We can do C++, doc, etc in a separate one later.

…dagrad" We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. [ghstack-poisoned]

facebook-github-bot · 2020-09-26T02:11:16Z

@wanchaol merged this pull request in 0444c37.

vincentqb · 2020-10-06T20:33:36Z

The documentation is in #45513.

vincentqb · 2020-12-18T20:33:31Z

I second @albanD, the PR LGTM. We can do C++, doc, etc in a separate one later.

@wanchaol, can you clarify if there were other steps needed that you had in mind so that we can move this beyond prototype?

wanchaol · 2020-12-18T23:08:13Z

@wanchaol, can you clarify if there were other steps needed that you had in mind so that we can move this beyond prototype?

@vincentqb I would say improving the coverage of functional optimizers, to cover most of commonly used optimizers (which I am planning to do before 1.8 release), and improve the test coverage, this will make it be becomes beta to use.

To mark this as stable, we need to cover all existing optimizers that could be turned into functional optimizer, and possibly integrate with the foreach api.

[WIP][optimizer] introduce functional API for optimizer

6a5f209

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request Sep 15, 2020

[WIP][optimizer] introduce functional API for optimizer

a2d67c7

ghstack-source-id: a18daa0 Pull Request resolved: #44715

Update on "[WIP][optimizer] introduce functional API for optimizer"

3f14048

[ghstack-poisoned]

wanchaol mentioned this pull request Sep 16, 2020

[optimizer] refactor Adam to use functional API #44791

Closed

wanchaol changed the title ~~[WIP][optimizer] introduce functional API for optimizer~~ [optimizer] introduce optimizer functional API, refactor Adagrad Sep 16, 2020

wanchaol requested review from albanD, izdeby, mrshenli, pritamdamania87 and vincentqb September 16, 2020 17:07

vincentqb reviewed Sep 18, 2020

View reviewed changes

torch/optim/adagrad.py Show resolved Hide resolved

torch/optim/functional.py Show resolved Hide resolved

torch/optim/functional.py Show resolved Hide resolved

torch/optim/functional.py Show resolved Hide resolved

vincentqb mentioned this pull request Sep 18, 2020

Differentiable Optimizers #39279

Open

wanchaol mentioned this pull request Sep 23, 2020

[dist_optim] introduce distributed functional optimizer #45221

Closed

wanchaol requested a review from vincentqb September 23, 2020 21:53

vincentqb mentioned this pull request Sep 23, 2020

Functional interface for optimizers #45242

Open

2 tasks

albanD approved these changes Sep 24, 2020

View reviewed changes

vincentqb approved these changes Sep 24, 2020

View reviewed changes

wanchaol added 2 commits September 24, 2020 11:34

facebook-github-bot closed this in 0444c37 Sep 26, 2020

facebook-github-bot added the merged label Sep 26, 2020

facebook-github-bot deleted the gh/wanchaol/130/head branch September 29, 2020 14:23

wanchaol mentioned this pull request Oct 26, 2020

[RFC] Distributed optimizer with TorchScript support #46883

Closed

mruberry added the Merged label Oct 28, 2020

[optimizer] introduce optimizer functional API, refactor Adagrad #44715

[optimizer] introduce optimizer functional API, refactor Adagrad #44715

Uh oh!

Conversation

wanchaol commented Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dr-ci bot commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincentqb commented Sep 18, 2020

Uh oh!

vincentqb commented Sep 23, 2020

Uh oh!

vincentqb commented Sep 23, 2020

Uh oh!

wanchaol commented Sep 23, 2020

Uh oh!

vincentqb commented Sep 23, 2020

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

vincentqb left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 26, 2020

Uh oh!

vincentqb commented Oct 6, 2020

Uh oh!

vincentqb commented Dec 18, 2020

Uh oh!

wanchaol commented Dec 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wanchaol commented Sep 15, 2020 •

edited

Loading

codecov bot commented Sep 15, 2020 •

edited

Loading

dr-ci bot commented Sep 16, 2020 •

edited

Loading