Remove unnecessary copies in ProcessGroupGloo for multiple inputs allreduce #43543
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
Closes #14691. This is not needed in the multiple outputs case, because gloo allreduce
broadcasts the result tensor to all the outputs. See
pytorch/gloo#152 and commit
pytorch/gloo@9cabb5a
for more details. Came across this when debugging #42577.
This effectively reverts #14688 while keeping the tests.
Tested by ensuring
test_allreduce_basicsintest_c10d.pystill works as expected.Differential Revision: D23173945
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!