-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[BC-breaking] Use ScatterGatherKernel for scatter_reduce (CPU-only) #74226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BC-breaking] Use ScatterGatherKernel for scatter_reduce (CPU-only) #74226
Conversation
[ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 241b898 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
[ghstack-poisoned]
[ghstack-poisoned]
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce, *, bool include_input=True)`
- Update `scatter_reduce` to call into the cpu/cuda kernels for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Add an argument `include_input` which indicates whether the value in the `self` Tensor at a given position is included in the reduction with the elements from `src` scattered to that position. For
`I_self = {all indices of self}`
`I_src= {all indices of src}`
`S = {indices of self modified by scatter}`
`self_indices_to_src_indices : I_self --> I_src` maps indices in `self` to a tuple of indices in `src` scattered to that index of `self`
Then for `s ∈ S` and `t ∈ I\S` when `include_input=False`
`self[s] = reduction_op(src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
and when `include_input=True` (regular scatter(reduce=op) behavior)
`self[s] = reduction_op(self[s], src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
The [`optional_out` case of pytorch_scatter.scatter](https://github.com/rusty1s/pytorch_scatter/blob/master/csrc/scatter.cpp#L32) can then be handled by
`torch.zeros(shape).scatter_reduce_(dim, index, src, reduce, include_input=False)`
[ghstack-poisoned]
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce, *, bool include_input=True)`
- Update `scatter_reduce` to call into the cpu/cuda kernels for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Add an argument `include_input` which indicates whether the value in the `self` Tensor at a given position is included in the reduction with the elements from `src` scattered to that position. For
`I_self = {all indices of self}`
`I_src= {all indices of src}`
`S = {indices of self modified by scatter}`
`self_indices_to_src_indices : I_self --> I_src` maps indices in `self` to a tuple of indices in `src` scattered to that index of `self`
Then for `s ∈ S` and `t ∈ I\S` when `include_input=False`
`self[s] = reduction_op(src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
and when `include_input=True` (regular scatter(reduce=op) behavior)
`self[s] = reduction_op(self[s], src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
The [`optional_out` case of pytorch_scatter.scatter](https://github.com/rusty1s/pytorch_scatter/blob/master/csrc/scatter.cpp#L32) can then be handled by
`torch.zeros(shape).scatter_reduce_(dim, index, src, reduce, include_input=False)`
[ghstack-poisoned]
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce, *, bool include_input=True)`
- Update `scatter_reduce` to call into the cpu/cuda kernels for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Add an argument `include_input` which indicates whether the value in the `self` Tensor at a given position is included in the reduction with the elements from `src` scattered to that position. For
`I_self = {all indices of self}`
`I_src= {all indices of src}`
`S = {indices of self modified by scatter}`
`self_indices_to_src_indices : I_self --> I_src` maps indices in `self` to a tuple of indices in `src` scattered to that index of `self`
Then for `s ∈ S` and `t ∈ I\S` when `include_input=False`
`self[s] = reduction_op(src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
and when `include_input=True` (regular scatter(reduce=op) behavior)
`self[s] = reduction_op(self[s], src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
The [`optional_out` case of pytorch_scatter.scatter](https://github.com/rusty1s/pytorch_scatter/blob/master/csrc/scatter.cpp#L32) can then be handled by
`torch.zeros(shape).scatter_reduce_(dim, index, src, reduce, include_input=False)`
[ghstack-poisoned]
Update signature of `scatter_reduce_` to match `scatter_/scatter_add_`
`Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce, *, bool include_input=True)`
- Update `scatter_reduce` to call into the cpu/cuda kernels for `scatter.reduce`
- `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_`
- Add an argument `include_input` which indicates whether the value in the `self` Tensor at a given position is included in the reduction with the elements from `src` scattered to that position. For
`I_self = {all indices of self}`
`I_src= {all indices of src}`
`S = {indices of self modified by scatter}`
`self_indices_to_src_indices : I_self --> I_src` maps indices in `self` to a tuple of indices in `src` scattered to that index of `self`
Then for `s ∈ S` and `t ∈ I\S` when `include_input=False`
`self[s] = reduction_op(src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
and when `include_input=True` (regular scatter(reduce=op) behavior)
`self[s] = reduction_op(self[s], src[self_indices_to_src_indices[s]])`
`self[t] = self[t]`
The [`optional_out` case of pytorch_scatter.scatter](https://github.com/rusty1s/pytorch_scatter/blob/master/csrc/scatter.cpp#L32) can then be handled by
`torch.zeros(shape).scatter_reduce_(dim, index, src, reduce, include_input=False)`
[ghstack-poisoned]
| auto index_stride = self_dim_stride; | ||
|
|
||
|
|
||
| AT_DISPATCH_FLOATING_TYPES_AND2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only difference between this kernel and cuda_scatter_gather_base_kernel is
AT_DISPATCH_FLOATING_TYPES_AND2 vs AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 it might be easier to explicitly check for a specific scalar type and error out in a unified kernel than copy-pasting the code and just changing this line.
The error message to throw can be found in the dispatch macro definitions.
test/forward_backward_compatibility/check_forward_backward_compatibility.py
Outdated
Show resolved
Hide resolved
cpuhrsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, but I'd try to avoid the code duplication of copy-pasting cpu_scatter_gather_base_kernel to deal with the different set of dtypes.
| cpu_scatter_gather_base_kernel<>()(self, dim, index, value, | ||
| "scatter_scalar_reduce_multiply_", reduce_multiply); | ||
| break; | ||
| default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this the build on CI will fail. I think it's due to this
Using the -Werror compiler flag, a switch statement over a value of an enum type without a default label will fail to compile if any enumerator of the enum doesn’t have a corresponding case. This is sometimes called an exhaustive or defaultless switch statement.
cpuhrsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just added two small comments
…CPU-only)" Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` [ghstack-poisoned]
|
@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…CPU-only)" Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` Differential Revision: [D35222842](https://our.internmc.facebook.com/intern/diff/D35222842) [ghstack-poisoned]
|
@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…CPU-only)" Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` Differential Revision: [D35222842](https://our.internmc.facebook.com/intern/diff/D35222842) [ghstack-poisoned]
|
@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…CPU-only)" Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` Differential Revision: [D35222842](https://our.internmc.facebook.com/intern/diff/D35222842) [ghstack-poisoned]
|
@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…CPU-only)" Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` Differential Revision: [D35222842](https://our.internmc.facebook.com/intern/diff/D35222842) [ghstack-poisoned]
|
@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…CPU-only)" Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` Differential Revision: [D35222842](https://our.internmc.facebook.com/intern/diff/D35222842) [ghstack-poisoned]
|
@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…74226) Summary: Pull Request resolved: #74226 Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D35222842 Pulled By: mikaylagawarecki fbshipit-source-id: 84930add2ad30baf872c495251373313cb7428bd
Stack from ghstack:
Update signature of
scatter_reduce_to matchscatter_/scatter_add_Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)scatter_reduceto call into the cpu kernel forscatter.reducescatter_reducenow has the same shape constraints asscatter_andscatter_add_test/test_torch.py:test_scatter_reducetotest/test_scatter_gather_ops.pyDifferential Revision: D35222842