-
Notifications
You must be signed in to change notification settings - Fork 26.3k
All Gather and gather APIs for Python Objects #42189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 2594d33 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Pull Request resolved: #42189 Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. ghstack-source-id: 108711909 Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)!
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Pull Request resolved: #42189 Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. ghstack-source-id: 108837035 Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)!
| work.wait() | ||
|
|
||
|
|
||
| def _object_to_tensor(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(can add in followup PRs) Do we need to accept optional pickle_module and pickle_protocol as torch.save does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a need to add these the protocols since we have control over most of the serialization here, i.e. we directly serialize and deserialize ourselves. Looking at the docs it seems that specifying a higher protocol might give some performance win over a lower protocol, but I would advocate for keeping this interface simpler if we don't see demand for it.
|
|
||
| def _object_to_tensor(obj): | ||
| buffer = io.BytesIO() | ||
| torch.save(obj, buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, since we only accept picklable objects, any reason for using torch.save instead of pickle.dump? Looks like torch.save will be converting the object into zip-file format, do we know how much overhead that will be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I went with torch.save/torch.load since it seems that is the de-facto way of doing serialization in PT. I think it might be more efficient to just use the pickle module directly, though.
| group (ProcessGroup, optional): The process group to work on | ||
| Returns: | ||
| None. If the calling rank is part of this group, the output of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also say that if the rank is not in the group, the object_list argument will stay intact.
| # Avoid copying intermediate tensors back and forth to CUDA by using gloo PG | ||
| # for object collectives. | ||
| if get_backend(group) == Backend.NCCL: | ||
| gloo_group = new_group( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will introduce one more rendezvous, will this be faster than copy the tensor to GPU when using NCCL backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, maybe we can just go with the transfer to CUDA approach. I was thinking if we can cache this gloo group, then we might pay less amortized cost if this function is called many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, let's use the provided group in this PR. If we got complain regarding its speed, we can find ways to optimize. Another reason for not creating a new Gloo group is that users might build distributed package without Glo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I think this can work for all_gather_object but for gather_object, we get the error RuntimeError: ProcessGroupNCCL does not support gather. I guess if users try to call all_gather_object with NCCL backend, it makes sense to throw, since NCCL does not implement gather. So should we just throw in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yep, I think it makes sense to just throw when calling gather_object with NCCL backend.
| local_max_size_tensor = torch.ByteTensor(size=(max_object_size,)) | ||
| local_max_size_tensor[: local_size.item()] = input_tensor | ||
| output_tensors = [ | ||
| torch.empty(max_object_size, dtype=torch.uint8) for _ in range(group_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not sure if this can work) we can avoid multiple memory allocation by creating a big tensor and then let output tensors points to its views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would still end up allocating the same amount of memory right? Although there may be a perf win by doing less memory allocation calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly.
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Pull Request resolved: #42189 Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. ghstack-source-id: 108999423 Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)!
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Pull Request resolved: #42189 Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. ghstack-source-id: 109003753 Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)!
Rehash of #28811, which was several months old. As part of addressing #23232, this PR adds support for the following APIs: `allgather_object` and `gather_object` to support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in. The methodology is what is proposed in the original issue: 1) Pickle object to ByteTensor using torch.save 2) Comm. tensor sizes 3) Copy local ByteTensor into a tensor of maximal size 4) Call tensor-based collectives on the result of (3) 5) Unpickle back into object using torch.load Note that the API is designed to match the tensor-based collectives other than supporting `async_op`. For now, it is a blocking call. If we see demand to support `async_op`, we will have to make more progress on merging work/future to support this. If this is a suitable approach, we can support `scatter`, `broadcast` in follow up PRs. Differential Revision: [D22785387](https://our.internmc.facebook.com/intern/diff/D22785387/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22785387/)! [ghstack-poisoned]
Pull Request resolved: #43887 As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as #42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. ghstack-source-id: 111098700 Differential Revision: [D23422577](https://our.internmc.facebook.com/intern/diff/D23422577/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23422577/)!
As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as #42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. Differential Revision: [D23422577](https://our.internmc.facebook.com/intern/diff/D23422577/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23422577/)! [ghstack-poisoned]
Pull Request resolved: #43887 As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as #42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. ghstack-source-id: 111108942 Differential Revision: [D23422577](https://our.internmc.facebook.com/intern/diff/D23422577/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23422577/)!
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as #42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. Differential Revision: [D23422577](https://our.internmc.facebook.com/intern/diff/D23422577/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23422577/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as #42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. Differential Revision: [D23422577](https://our.internmc.facebook.com/intern/diff/D23422577/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23422577/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Summary: Pull Request resolved: #43887 As part of addressing #23232, this PR adds support for `broadcast_object_list` which is an API to broadcast arbitrary picklable objects to all the other ranks. This has been a long-requested feature, so would be good for Pytorch to natively support this. The implementation approach follows a similar approach as #42189. The input is a list of objects to be broadcasted and it is in place, meaning all ranks part of the group will have their input list modified to contain the broadcasted objects from the src rank. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. ghstack-source-id: 111180436 Reviewed By: mrshenli Differential Revision: D23422577 fbshipit-source-id: fa700abb86eff7128dc29129a0823e83caf4ab0e
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)! [ghstack-poisoned]
Pull Request resolved: #43930 Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. ghstack-source-id: 117904065 Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
Summary: Pull Request resolved: #43930 Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks. The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter. Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object. Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this. It only works for Gloo because NCCL doesn't support scatter. ghstack-source-id: 117904065 Reviewed By: mrshenli Differential Revision: D23430686 fbshipit-source-id: f033b89cd82dadd194f2b036312a98423449c26b
Stack from ghstack:
Rehash of #28811, which was several months old.
As part of addressing #23232, this PR adds support for the following APIs:
allgather_objectandgather_objectto support gather/allgather of generic, pickable Python objects. This has been a long-requested feature so PyTorch should provide these helpers built-in.The methodology is what is proposed in the original issue:
Note that the API is designed to match the tensor-based collectives other than supporting
async_op. For now, it is a blocking call. If we see demand to supportasync_op, we will have to make more progress on merging work/future to support this.If this is a suitable approach, we can support
scatter,broadcastin follow up PRs.Differential Revision: D22785387
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!