[dist_optim] introduce distributed functional optimizer #45221

wanchaol · 2020-09-23T19:15:36Z

Stack from ghstack:

[dist_optim] introduce distributed functional optimizer #45221 [dist_optim] introduce distributed functional optimizer
[optimizer] refactor Adam to use functional API #44791 [optimizer] refactor Adam to use functional API
[optimizer] introduce optimizer functional API, refactor Adagrad #44715 [optimizer] introduce optimizer functional API, refactor Adagrad

This PR introduces a distributed functional optimizer, so that
distributed optimizer can reuse the functional optimizer APIs and
maintain their own states. This could enable the torchscript compatible
functional optimizer when using distributed optimizer, helps getting rid
of GIL and improve overall performance of training, especially distributed
model parallel training

Differential Revision: D23935256

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training [ghstack-poisoned]

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training ghstack-source-id: 0c75d84 Pull Request resolved: #45221

dr-ci · 2020-09-23T19:27:07Z

💊 CI failures summary and remediations

As of commit 500ac24 (more details on the Dr. CI page):

1/1 failures introduced in this PR

XLA failure

Job pytorch_xla_linux_bionic_py3_6_clang9_test is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 34 times.

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training [ghstack-poisoned]

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training ghstack-source-id: 115adfe Pull Request resolved: #45221

pritamdamania87

Looks good overall, requesting changes to see if we can further optimize the local optimizer step to be completely in torchscript.

pritamdamania87 · 2020-09-23T22:57:25Z

torch/distributed/optim/adagrad.py

@@ -0,0 +1,90 @@
+from typing import List, Dict, Optional


nit: maybe rename this file to functional_adagrad.py

pritamdamania87 · 2020-09-23T22:58:16Z

torch/distributed/optim/adagrad.py

+# NOTE: This should be only used by distributed optimizer internals
+# and not meant to expose to the user.
+@torch.jit.script
+class FunctionalAdagrad(object):


nit: Should this be _FunctionalAdagrad for now to indicate its internal only?

pritamdamania87 · 2020-09-23T23:01:27Z

torch/optim/functional.py


 # TODO: use foreach API in optim.functional to do all the computation

+def _make_sparse(grad, grad_indices, values):


@vincentqb Could you review these changes?

LGTM. What is the advantage of writing this in this way?

It's mainly because nested functions are not supported by TorchScript, so I moved it out as a separate function. Also I feel like define a nested function inside a for loop is not a good practice

pritamdamania87 · 2020-09-23T23:05:35Z

torch/distributed/optim/optimizer.py

            rpc_futs.append(rpc.rpc_async(
-                optim.owner(),
+                optimizer.owner(),
                _local_optimizer_step,


Can we have two versions of _local_optimizer_step? If we're using a torchscript functional optimizer, we can have a torchscript _local_optimizer_step which means RPC will call an torchscript function and there would be no GIL anywhere. If it is an optimizer that is not torchscript we call a regular python _local_optimizer_step.

In the current version, we still end up holding GIL since _local_optimizer_step is not torchscript.

hmm I see, this make sense, the problem of this is that _LocalOptimizer still contains Lock() objects to protect those who doesn't have a functional optimizer implementation. I could try adding a new _LocalScriptOptimizer to enable the use case you mentioned

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training [ghstack-poisoned]

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training ghstack-source-id: 0b4ef3c Pull Request resolved: #45221

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training [ghstack-poisoned]

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training ghstack-source-id: 5dd33d0 Pull Request resolved: #45221

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training [ghstack-poisoned]

This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training ghstack-source-id: 0e0a953 Pull Request resolved: #45221

facebook-github-bot · 2020-09-26T02:11:16Z

@wanchaol merged this pull request in 32c355a.

wanchaol requested review from apaszke, mrshenli, pietern, pritamdamania87, rohan-varma and zhaojuanmao as code owners September 23, 2020 19:15

This was referenced Sep 23, 2020

[optimizer] introduce optimizer functional API, refactor Adagrad #44715

Closed

[optimizer] refactor Adam to use functional API #44791

Closed

wanchaol changed the title ~~[WIP][dist_optim] introduce distributed functional optimizer~~ [dist_optim] introduce distributed functional optimizer Sep 23, 2020

pritamdamania87 suggested changes Sep 23, 2020

View reviewed changes

pritamdamania87 approved these changes Sep 24, 2020

View reviewed changes

facebook-github-bot closed this in 32c355a Sep 26, 2020

facebook-github-bot added the merged label Sep 26, 2020

facebook-github-bot deleted the gh/wanchaol/132/head branch September 29, 2020 14:23

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[dist_optim] introduce distributed functional optimizer #45221

[dist_optim] introduce distributed functional optimizer #45221

Uh oh!

wanchaol commented Sep 23, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Sep 23, 2020 •

edited

Loading

Uh oh!

pritamdamania87 left a comment

Uh oh!

pritamdamania87 Sep 23, 2020

Uh oh!

pritamdamania87 Sep 23, 2020

Uh oh!

pritamdamania87 Sep 23, 2020

Uh oh!

vincentqb Sep 25, 2020

Uh oh!

wanchaol Sep 25, 2020

Uh oh!

pritamdamania87 Sep 23, 2020

Uh oh!

wanchaol Sep 23, 2020

Uh oh!

facebook-github-bot commented Sep 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		# TODO: use foreach API in optim.functional to do all the computation

		def _make_sparse(grad, grad_indices, values):

[dist_optim] introduce distributed functional optimizer #45221

[dist_optim] introduce distributed functional optimizer #45221

Uh oh!

Conversation

wanchaol commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

XLA failure

Uh oh!

pritamdamania87 left a comment

Choose a reason for hiding this comment

Uh oh!

pritamdamania87 Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

pritamdamania87 Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

pritamdamania87 Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

vincentqb Sep 25, 2020

Choose a reason for hiding this comment

Uh oh!

wanchaol Sep 25, 2020

Choose a reason for hiding this comment

Uh oh!

pritamdamania87 Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

wanchaol Sep 23, 2020

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wanchaol commented Sep 23, 2020 •

edited

Loading

dr-ci bot commented Sep 23, 2020 •

edited

Loading