Commit 2435d94
Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned (#44642)
Summary:
For #44206 and #42218, I'd like to update trilinear interpolate backward and grid_sample backward to use `fastAtomicAdd`.
As a prelude, I spotted a UB risk in `fastAtomicAdd`. I think existing code incurs a misaligned `__half2` atomicAdd when `index` is odd and `tensor` is not 32-bit aligned (`index % 2 == 1` and `(reinterpret_cast<std::uintptr_t>(tensor) % sizeof(__half2) == 1`). In this case we think we're `!low_bit` and go down the `!low_bit` code path, but in fact we are `low_bit`. It appears the original [fastAtomicAdd PR](#21879 (comment) discussion did not consider that case explicitly.
I wanted to push my tentative fix for discussion ASAP. jjsjann123 and mkolod as original authors of `fastAtomicAdd`. (I'm also curious why we need to `reinterpret_cast<std::uintptr_t>(tensor...` for the address modding, but that's minor.)
Pull Request resolved: #44642
Reviewed By: mruberry
Differential Revision: D23699820
Pulled By: ngimel
fbshipit-source-id: 0db57150715ebb45e6a1fb36897e46f00d61defd1 parent 2fd142a commit 2435d94
1 file changed
+7
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
26 | 27 | | |
27 | | - | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
31 | | - | |
| 32 | + | |
32 | 33 | | |
33 | | - | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
| 38 | + | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| |||
0 commit comments