-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add alltoall_ to CommTensor #90512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alltoall_ to CommTensor #90512
Conversation
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90512
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 FailuresAs of commit b8818e9: The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
yifuwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
torch/csrc/distributed/c10d/init.cpp
Outdated
| const std::vector<at::Tensor>& output_tensors, | ||
| const std::vector<at::Tensor>& input_tensors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect calling this through pg.alltoall(..) to be backward compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, yeah that BC breaking issue is weird, let me investigate a bit what happened there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BC error is on the return type for alltoall_. Looks like this is recently added into BC check. This should be fine, as it is not a user-facing API, not even user-facing for PG extensions devs.
2022-12-09T03:04:38.6291268Z processing existing schema: c10d::alltoall_(Tensor[] _0, Tensor[] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3) -> __torch__.torch.classes.c10d.Work _0
2022-12-09T03:04:38.6292120Z Can NOT find forward compatible schemas after changes for schema c10d::alltoall_(Tensor[] _0, Tensor[] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3) -> __torch__.torch.classes.c10d.Work _0 from the following candidates:
2022-12-09T03:04:38.6292185Z [
2022-12-09T03:04:38.6292525Z c10d::alltoall_(Tensor[] _0, Tensor[] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3) -> (Tensor[] _0, __torch__.torch.classes.c10d.Work _1)
2022-12-09T03:04:38.6292634Z ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding const here should be fine, we are not going to change the tensor instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the insights @mrshenli! Let me update the BC test to skip it :)
torch/csrc/distributed/c10d/init.cpp
Outdated
| }, | ||
| py::arg("output"), | ||
| py::arg("input"), | ||
| py::arg("output_tensors"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, changing the names might be have BC issue (but your change is indeed the right thing to do) for programs that directly call into pg.alltoall(). Shall we do the name change in a separate PR, and mark that one as BC-breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah make sense, updated
torch/csrc/distributed/c10d/init.cpp
Outdated
| const std::vector<at::Tensor>& output_tensors, | ||
| const std::vector<at::Tensor>& input_tensors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding const here should be fine, we are not going to change the tensor instance.
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamp to unblock. Please address comments :)
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
This PR adds alltoall_ to the CommTensor [ghstack-poisoned]
|
This pull request has been merged in 3ba9e4c. |
Stack from ghstack (oldest at bottom):
This PR adds alltoall_ to the CommTensor