Skip to content

Conversation

@wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented Sep 15, 2020

Stack from ghstack:

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of nn.functional), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

Differential Revision: D23935258

wanchaol added a commit that referenced this pull request Sep 15, 2020
@codecov
Copy link

codecov bot commented Sep 15, 2020

Codecov Report

Merging #44715 into gh/wanchaol/130/base will increase coverage by 0.00%.
The diff coverage is 90.00%.

Impacted file tree graph

@@                  Coverage Diff                  @@
##           gh/wanchaol/130/base   #44715   +/-   ##
=====================================================
  Coverage                 68.10%   68.11%           
=====================================================
  Files                       393      394    +1     
  Lines                     50945    50957   +12     
=====================================================
+ Hits                      34698    34710   +12     
  Misses                    16247    16247           
Impacted Files Coverage Δ
torch/optim/functional.py 84.61% <84.61%> (ø)
torch/optim/adagrad.py 83.33% <100.00%> (+4.30%) ⬆️
torch/testing/_internal/expecttest.py 77.55% <0.00%> (-1.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 022ba5a...3c8b9eb. Read the comment docs.

…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
@wanchaol wanchaol changed the title [WIP][optimizer] introduce functional API for optimizer [optimizer] introduce optimizer functional API, refactor Adagrad Sep 16, 2020
@dr-ci
Copy link

dr-ci bot commented Sep 16, 2020

💊 CI failures summary and remediations

As of commit 3c8b9eb (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 26 times.

…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
@vincentqb
Copy link
Contributor

also relates to #39279

…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
@wanchaol wanchaol requested a review from vincentqb September 23, 2020 21:53
…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
@vincentqb
Copy link
Contributor

One more observation: from this PR, the C++ implementation doesn't get a functional implementation. Is that something you planned to do @wanchaol ?

cc @anjali411

@vincentqb
Copy link
Contributor

Oh right, we should add that to the documentation :)

@wanchaol
Copy link
Collaborator Author

One more observation: from this PR, the C++ implementation doesn't get a functional implementation. Is that something you planned to do @wanchaol ?

@vincentqb yes I can make follow up PRs to make C++ APIs to also include functional APIs

Oh right, we should add that to the documentation :)

do you mean that we should add the functional APIs to the documentation like this? https://pytorch.org/docs/stable/nn.functional.html

@vincentqb
Copy link
Contributor

do you mean that we should add the functional APIs to the documentation like this? https://pytorch.org/docs/stable/nn.functional.html

We could also simply add them here, though the page is getting long :)

…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @albanD, the PR LGTM. We can do C++, doc, etc in a separate one later.

…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
…dagrad"

We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency.

This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

@wanchaol merged this pull request in 0444c37.

@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/130/head branch September 29, 2020 14:23
@vincentqb
Copy link
Contributor

The documentation is in #45513.

@vincentqb
Copy link
Contributor

I second @albanD, the PR LGTM. We can do C++, doc, etc in a separate one later.

@wanchaol, can you clarify if there were other steps needed that you had in mind so that we can move this beyond prototype?

@wanchaol
Copy link
Collaborator Author

@wanchaol, can you clarify if there were other steps needed that you had in mind so that we can move this beyond prototype?

@vincentqb I would say improving the coverage of functional optimizers, to cover most of commonly used optimizers (which I am planning to do before 1.8 release), and improve the test coverage, this will make it be becomes beta to use.

To mark this as stable, we need to cover all existing optimizers that could be turned into functional optimizer, and possibly integrate with the foreach api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants