-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Skip allreducing local_used_maps_dev_ when find_unused_param=False
#40407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Solve issue #38942. In reducer.cpp, I check whether `find_unused_param` is set to False by !unused_parameters_.empty(). If !unused_parameters_.empty(), then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li Tasks: T68705534 Tags: DDP [ghstack-poisoned]
Summary: Solve issue #38942. In reducer.cpp, I check whether `find_unused_param` is set to False by !unused_parameters_.empty(). If !unused_parameters_.empty(), then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li Tasks: T68705534 Tags: DDP ghstack-source-id: 3da54f3 Pull Request resolved: #40407
💊 CI failures summary and remediationsAs of commit fe145bb (more details on the Dr. CI page):
Extra GitHub checks: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 27 times. |
|
And since the unused parameters from different processes can be different, let's also add a test for that case. |
…param=False`" Summary: 1. Solves issue [38942](#38942). 2. In reducer.cpp, I check whether `find_unused_param` is set to `false` by `!unused_parameters_.empty()`. If `!unused_parameters_.empty()`, then it avoids `allreduce(local_used_maps_dev_)`. 3. `unused_parameters_()` is always empty, as we set `find_unused_param = false`. Further, there may be additional cases with an empty `unused_parameters_`, but `find_unused_param = true`. Therefore, in addition to the issue [38942](#38942), this diff also avoids `allreduce(local_used_maps_dev_)` for that case. I think that since we never call `unused_parameters_.clear()` in separate processes, this will not cause any issue and will not lead to DDP comm hang. In fact, it will be more beneficial, because if all parameters are used, then each should already be reduced. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
Pull Request resolved: #40407 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. Differential Revision: [D22176231](https://our.internmc.facebook.com/intern/diff/D22176231/) ghstack-source-id: 106447051
…param=False`" Summary: 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li, Yanli Zhao Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
Pull Request resolved: #40407 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. ghstack-source-id: 106478491 Differential Revision: [D22176231](https://our.internmc.facebook.com/intern/diff/D22176231/)
pritamdamania87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And since the unused parameters from different processes can be different, let's also add a test for that case.
@mrshenli Could you elaborate on the test case we need here? Looks like we have unit tests where we test both cases for find_unused_parameters_. Is there something missing in our current tests?
…param=False`" Summary: 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li, Yanli Zhao Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
Pull Request resolved: #40407 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. ghstack-source-id: 106487672 Differential Revision: [D22176231](https://our.internmc.facebook.com/intern/diff/D22176231/)
Sure. The first two attempts in this PR reminds us that we might want to add tests for the following cases:
pytorch/torch/csrc/distributed/c10d/reducer.cpp Lines 751 to 756 in e490352
|
…param=False`" Summary: 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li, Yanli Zhao Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
…param=False`" Summary: 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li, Yanli Zhao Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
|
…param=False`" Summary: 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li, Yanli Zhao Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
Pull Request resolved: #40407 1. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in the Reducer's Python API. 2. Solves issue [38942](#38942). When find_unused_parameters_ is set to false, there is no need to allreduce local_used_maps_dev_, because all parameters will be reduced anyway. Therefore, we can avoid allocating memory for local_used_maps and local_used_maps_dev_ if find_unused_parameters_ is false. ghstack-source-id: 106525355 Differential Revision: [D22176231](https://our.internmc.facebook.com/intern/diff/D22176231/)
|
Avoided some VS Code editorial changes. |
I am looking into it. |
…param=False`" Summary: 1. Solves issue [38942](#38942). 2. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. Test Plan: Run `test/distributed/test_c10d.py` and make sure all tests pass. Reviewers: Pritam Damania Subscribers: Pritam Damania, Shen Li, Yanli Zhao Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
Pull Request resolved: #40407 1. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in the Reducer's Python API. 2. Solves issue [38942](#38942). When find_unused_parameters_ is set to false, there is no need to allreduce local_used_maps_dev_, because all parameters will be reduced anyway. Therefore, we can avoid allocating memory for local_used_maps and local_used_maps_dev_ if find_unused_parameters_ is false. ghstack-source-id: 106644409 Differential Revision: [D22176231](https://our.internmc.facebook.com/intern/diff/D22176231/)
|
The new changes in the final commit:
The problem is with
|
pritamdamania87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some merge conflicts in CI you probably need to rebase and resubmit the PR.
| if (!has_rebuilt_bucket_ && !find_unused_parameters_ && | ||
| index.replica_index == 0) { | ||
| rebuilt_params_.push_back( | ||
| replicas_[index.replica_index][index.variable_index]); | ||
| rebuilt_param_indices_.push_back(index.variable_index); | ||
| } | ||
|
|
||
| // If there are model parameters that went unused when computing the model | ||
| // output, they won't be part of the autograd graph, and won't receive | ||
| // gradients. These parameters are discovered in the `prepare_for_backward` | ||
| // function and their indexes stored in the `unused_parameters_` vector. | ||
| if (!has_marked_unused_parameters_ && !unused_parameters_.empty()) { | ||
| // If `find_unused_parameters_` is true there may be model parameters that | ||
| // went unused when computing the model output, they won't be part of the | ||
| // autograd graph, and won't receive gradients. These parameters are discovered | ||
| // in the `prepare_for_backward` function and their indexes stored in | ||
| // the `unused_parameters_` vector. | ||
| if (!has_marked_unused_parameters_ && find_unused_parameters_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me, although I'll let Shen stamp the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know about the merge conflict :) Just rebased and resolved the conflict at init.cpp.
…param=False`" Summary: 1. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. 2. Solves issue [38942](#38942). 3. Fixes incorrect `find_unused_parameters_` passing like checking `outputs.empty()` or `unused_parameters_.empty()`. Test Plan: 1. Run `test/distributed/test_c10d.py` and make sure all tests pass. 2. A new test case `test_find_unused_parameters_when_unused_parameters_empty` is included. Old `reducer.cpp` was failing in that unit test because it was checking `find_unused_parameters_` by `unused_parameters_.empty()`. Current `reducer.cpp` passes this unit test. 3. Two test cases were failing `test_forward_backward_unused_parameters` and `test_forward_backward_optimizer` , because `find_unused_parameter_` of their `reducer` object was not set properly. I fixed that as well. Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
zhaojuanmao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
…param=False`" Summary: 1. In reducer.cpp, we have a new boolean `find_unused_param_` and its value is set in `Reducer::prepare_for_backward`. If `!find_unused_param_`, then it avoids `allreduce(local_used_maps_dev_)`. 2. Solves issue [38942](#38942). 3. Fixes incorrect `find_unused_parameters_` passing like checking `outputs.empty()` or `unused_parameters_.empty()`. Test Plan: 1. Run `test/distributed/test_c10d.py` and make sure all tests pass. 2. A new test case `test_find_unused_parameters_when_unused_parameters_empty` is included. Old `reducer.cpp` was failing in that unit test because it was checking `find_unused_parameters_` by `unused_parameters_.empty()`. Current `reducer.cpp` passes this unit test. 3. Two test cases were failing `test_forward_backward_unused_parameters` and `test_forward_backward_optimizer` , because `find_unused_parameter_` of their `reducer` object was not set properly. I fixed that as well. Tasks: T68705534 Tags: DDP Differential Revision: [D22176231](https://www.internalfb.com/intern/diff/D22176231/) [ghstack-poisoned]
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for working on this!
| } | ||
| } | ||
|
|
||
| // Note [Skip allreducing local_used_maps_dev] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, any reason for adding this note to the dtor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Pritam's example, the notes were placed after the constructor. Thats why I put it there.
https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L100. Example of referring to a note: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L64.
| output.mean().backward() | ||
|
|
||
| # Now locally unused parameter should have grad updated on all ranks. | ||
| [self.assertIsNotNone(t_p.grad) for t_p in model.module.task_parameters()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if we should also verify that the grad values are as expected? We can do this by
- creating two local models, one's
forwarduses botht0andt1, and the other one'sforwardonly usest1 - run forward/backward on them, and then compute the mean grads of these two models
- Compare the mean grad with DDP's grad.
I noticed the test above (test_global_local_unused_params_grad) didn't test the accuracy either. So feel free to do this in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted this and do a followup soon.
| process_group = c10d.ProcessGroupGloo(store, self.rank, self.world_size) | ||
|
|
||
| # Test on CPU | ||
| cpu_model = DistributedDataParallel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In a follow up PR, It may be worth separating the GPU and CPU test so that (a portion of) this test can be run in CPU-only builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I noted that.
Stack from ghstack:
local_used_maps_dev_whenfind_unused_param=False#40407 Skip allreducinglocal_used_maps_dev_whenfind_unused_param=FalseSummary:
find_unused_param_and its value is set inReducer::prepare_for_backward.If
!find_unused_param_, then it avoidsallreduce(local_used_maps_dev_).find_unused_parameters_passing like checkingoutputs.empty()orunused_parameters_.empty().Test Plan:
test/distributed/test_c10d.pyand make sure all tests pass.test_find_unused_parameters_when_unused_parameters_emptyis included. Oldreducer.cppwas failing in that unit test because it was checkingfind_unused_parameters_byunused_parameters_.empty(). Currentreducer.cpppasses this unit test.test_forward_backward_unused_parametersandtest_forward_backward_optimizer, becausefind_unused_parameter_of theirreducerobject was not set properly. I fixed that as well.Tasks: T68705534
Tags: DDP
Differential Revision: D22176231