Skip to content

Conversation

@wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented Dec 9, 2022

This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 9, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90512

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Failures

As of commit b8818e9:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Dec 9, 2022
wanchaol added a commit that referenced this pull request Dec 9, 2022
This PR adds alltoall_ to the CommTensor

ghstack-source-id: 30e8e84
Pull Request resolved: #90512
This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Dec 9, 2022
This PR adds alltoall_ to the CommTensor

ghstack-source-id: 5e71d33
Pull Request resolved: #90512
Copy link
Collaborator

@yifuwang yifuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 1488 to 1489
const std::vector<at::Tensor>& output_tensors,
const std::vector<at::Tensor>& input_tensors) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect calling this through pg.alltoall(..) to be backward compatible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, yeah that BC breaking issue is weird, let me investigate a bit what happened there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BC error is on the return type for alltoall_. Looks like this is recently added into BC check. This should be fine, as it is not a user-facing API, not even user-facing for PG extensions devs.

2022-12-09T03:04:38.6291268Z processing existing schema:  c10d::alltoall_(Tensor[] _0, Tensor[] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3) -> __torch__.torch.classes.c10d.Work _0
2022-12-09T03:04:38.6292120Z Can NOT find forward compatible schemas after changes for schema c10d::alltoall_(Tensor[] _0, Tensor[] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3) -> __torch__.torch.classes.c10d.Work _0 from the following candidates:
2022-12-09T03:04:38.6292185Z [
2022-12-09T03:04:38.6292525Z c10d::alltoall_(Tensor[] _0, Tensor[] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3) -> (Tensor[] _0, __torch__.torch.classes.c10d.Work _1)
2022-12-09T03:04:38.6292634Z ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding const here should be fine, we are not going to change the tensor instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the insights @mrshenli! Let me update the BC test to skip it :)

},
py::arg("output"),
py::arg("input"),
py::arg("output_tensors"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, changing the names might be have BC issue (but your change is indeed the right thing to do) for programs that directly call into pg.alltoall(). Shall we do the name change in a separate PR, and mark that one as BC-breaking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah make sense, updated

Comment on lines 1488 to 1489
const std::vector<at::Tensor>& output_tensors,
const std::vector<at::Tensor>& input_tensors) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding const here should be fine, we are not going to change the tensor instance.

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp to unblock. Please address comments :)

This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
@wanchaol wanchaol added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 9, 2022
This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
This PR adds alltoall_ to the CommTensor

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 3ba9e4c.

@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/229/head branch June 8, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants