Back out "Back out "free up dispatch key space (in C++)"" #74963

bdhirsh · 2022-03-30T17:54:32Z

This PR is a re-land of #69633 (this is the second re-land attempt, the first one is at #72827). The original PR had a memory corruption bug that only surfaced on mobile builds.

Background: Existing Mobile Optimization

Pytorch mobile builds have an existing optimization (here and here), which works as follows:

Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).
In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.
The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here.
The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator).

The Bug

This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line).
That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the Dispatcher object, backendFallbackKernels_.

Stack from ghstack (oldest at bottom):

-> Back out "Back out "free up dispatch key space (in C++)"" #74963

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Original commit changeset: b962de5d5eff Original Phabricator Diff: D35192346 Back out "Back out "DispatchKeySet perf improvements"" Original commit changeset: e38081810a56 Original Phabricator Diff: D35192317 Differential Revision: [D35222806](https://our.internmc.facebook.com/intern/diff/D35222806/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)! [ghstack-poisoned]

facebook-github-bot · 2022-03-30T17:54:35Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74963
↩️ [fb-only] Re-run with SSH instructions
Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 65b26fa (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pull / win-vs2019-cuda11.3-py3 / build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

2022-03-31T14:10:42.7649219Z CMake Error: Error...ariable not set, cmake may not be built correctly.

2022-03-31T14:10:42.1949433Z CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
2022-03-31T14:10:42.2068298Z CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
2022-03-31T14:10:42.2068846Z Missing variable is:
2022-03-31T14:10:42.2069239Z CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
2022-03-31T14:10:42.2177561Z CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
2022-03-31T14:10:42.2178089Z Missing variable is:
2022-03-31T14:10:42.2178609Z CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
2022-03-31T14:10:42.2231880Z CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
2022-03-31T14:10:42.2232421Z Missing variable is:
2022-03-31T14:10:42.2232798Z CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
2022-03-31T14:10:42.7649219Z CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
2022-03-31T14:10:42.7649839Z Missing variable is:
2022-03-31T14:10:42.7650242Z CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
2022-03-31T14:10:42.9656409Z -- Generating done
2022-03-31T14:10:43.0929506Z CMake Warning:
2022-03-31T14:10:43.0930141Z   Manually-specified variables were not used by the project:
2022-03-31T14:10:43.0930512Z 
2022-03-31T14:10:43.0930798Z     BUILD_ENVIRONMENT
2022-03-31T14:10:43.0931123Z     BUILD_TYPE
2022-03-31T14:10:43.0931412Z     BUILD_WHEEL
2022-03-31T14:10:43.0931592Z

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Original commit changeset: b962de5d5eff Original Phabricator Diff: D35192346 Back out "Back out "DispatchKeySet perf improvements"" Original commit changeset: e38081810a56 Original Phabricator Diff: D35192317 Differential Revision: [D35222806](https://our.internmc.facebook.com/intern/diff/D35222806/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)! [ghstack-poisoned]

Pull Request resolved: #74963 Original commit changeset: b962de5d5eff Original Phabricator Diff: D35192346 Back out "Back out "DispatchKeySet perf improvements"" Original commit changeset: e38081810a56 Original Phabricator Diff: D35192317 ghstack-source-id: 152614490 Differential Revision: [D35222806](https://our.internmc.facebook.com/intern/diff/D35222806/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)!

albanD

Yes!

This PR is a re-land of #69633 (this is the second re-land attempt, the first one is at #72827). The original PR had a memory corruption bug that only surfaced on mobile builds. *Background: Existing Mobile Optimization* Pytorch mobile builds have an existing optimization ([here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382) and [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214)), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined [here](https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294). The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). *The Bug* This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on [this](https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294) line). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the Dispatcher object, backendFallbackKernels_. **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)! [ghstack-poisoned]

@dreiss

Pull Request resolved: #74963 This is a re-land of D35192346 and D35192317, which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: #69633. The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`. **Background: Existing Mobile Optimization** Pytorch mobile builds have an existing optimization (here https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382 and here https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294. The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64. **The Bug** This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`. **Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?** Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. @dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this? **The debugging experience was pretty difficult** Debugging the Milan-specific failure was made difficult by the following: (1) lack of CI - the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging. (2) It's difficult to get a repro. - my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space) - There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from @mflporto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/) (3) Lack of stack-traces. - Most Milan failures didn't include actionable stack traces. @phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash. Differential Revision: [D35222806](https://our.internmc.facebook.com/intern/diff/D35222806/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)! ghstack-source-id: 152688542

Summary: X-link: pytorch/pytorch#74963 This is a re-land of D35192346 and D35192317, which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: pytorch/pytorch#69633. The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`. **Background: Existing Mobile Optimization** Pytorch mobile builds have an existing optimization (here https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382 and here https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294. The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64. **The Bug** This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`. **Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?** Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this? **The debugging experience was pretty difficult** Debugging the Milan-specific failure was made difficult by the following: (1) lack of CI - the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging. (2) It's difficult to get a repro. - my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space) - There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/) (3) Lack of stack-traces. - Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash. ghstack-source-id: 152688542 Reviewed By: phding, albanD Differential Revision: D35222806 fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30

Summary: Pull Request resolved: #74963 This is a re-land of D35192346 (9872a06) and D35192317 (a9216cd), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: #69633. The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`. **Background: Existing Mobile Optimization** Pytorch mobile builds have an existing optimization (here https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/c10/core/DispatchKey.h#L382 and here https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/OperatorEntry.h#L214), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: https://github.com/pytorch/pytorch/blob/cc23725e89713138aa1c81ce5fb4a8dbcd440ccf/aten/src/ATen/core/dispatch/Dispatcher.h#L294. The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64. **The Bug** This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`. **Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?** Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this? **The debugging experience was pretty difficult** Debugging the Milan-specific failure was made difficult by the following: (1) lack of CI - the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging. (2) It's difficult to get a repro. - my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space) - There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/) (3) Lack of stack-traces. - Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash. ghstack-source-id: 152688542 Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing. Reviewed By: phding, albanD Differential Revision: D35222806 fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30

github-actions · 2022-03-31T21:53:14Z

Hey @bdhirsh.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Original commit changeset: b962de5d5eff Original Phabricator Diff: D35192346 Back out "Back out "DispatchKeySet perf improvements"" Original commit changeset: e38081810a56 Original Phabricator Diff: D35192317 Differential Revision: [D35222806](https://our.internmc.facebook.com/intern/diff/D35222806/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35222806/)! ghstack-source-id: eb033a6 Pull Request resolved: pytorch/pytorch#74963

…ons for quantized & non-quantized tensors in item" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR separates the calls to quantized & non-quantized backends for item using a dispatcher. Simultaneous support of CompositeImplicitAutograd and Quantized dispatch keys was made possible with #74963 Test plan: There are numerous tests in the suite that make use of torch.Tensor.item. ``` python test/run_test.py ``` can be used for comprehensive evaluation, or alternatively, b/c this PR should not affect torch.Tensor.item calls on non-quantized tensors, we can specifically test on quantized tensors: ``` python test/test_quantization.py ``` Differential Revision: [D34004298](https://our.internmc.facebook.com/intern/diff/D34004298) [ghstack-poisoned]

…d & non-quantized tensors in item" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR separates the calls to quantized & non-quantized backends for item using a dispatcher. Simultaneous support of CompositeImplicitAutograd and Quantized dispatch keys was made possible with #74963 Test plan: There are numerous tests in the suite that make use of torch.Tensor.item. ``` python test/run_test.py ``` can be used for comprehensive evaluation, or alternatively, b/c this PR should not affect torch.Tensor.item calls on non-quantized tensors, we can specifically test on quantized tensors: ``` python test/test_quantization.py ``` Differential Revision: [D34004298](https://our.internmc.facebook.com/intern/diff/D34004298) [ghstack-poisoned]

…mentations for quantized & non-quantized tensors in item" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR separates the calls to quantized & non-quantized backends for item using a dispatcher. Simultaneous support of CompositeImplicitAutograd and Quantized dispatch keys was made possible with #74963 Test plan: There are numerous tests in the suite that make use of torch.Tensor.item. ``` python test/run_test.py ``` can be used for comprehensive evaluation, or alternatively, b/c this PR should not affect torch.Tensor.item calls on non-quantized tensors, we can specifically test on quantized tensors: ``` python test/test_quantization.py ``` Differential Revision: [D35517808](https://our.internmc.facebook.com/intern/diff/D35517808) [ghstack-poisoned]

…uantized & non-quantized tensors in item" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR separates the calls to quantized & non-quantized backends for item using a dispatcher. Simultaneous support of CompositeImplicitAutograd and Quantized dispatch keys was made possible with #74963 Test plan: There are numerous tests in the suite that make use of torch.Tensor.item. ``` python test/run_test.py ``` can be used for comprehensive evaluation, or alternatively, b/c this PR should not affect torch.Tensor.item calls on non-quantized tensors, we can specifically test on quantized tensors: ``` python test/test_quantization.py ``` Differential Revision: [D35517808](https://our.internmc.facebook.com/intern/diff/D35517808) [ghstack-poisoned]

…non-quantized tensors in item Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR separates the calls to quantized & non-quantized backends for item using a dispatcher. Simultaneous support of CompositeImplicitAutograd and Quantized dispatch keys was made possible with #74963 Test plan: There are numerous tests in the suite that make use of torch.Tensor.item. ``` python test/run_test.py ``` can be used for comprehensive evaluation, or alternatively, b/c this PR should not affect torch.Tensor.item calls on non-quantized tensors, we can specifically test on quantized tensors: ``` python test/test_quantization.py ``` ghstack-source-id: 3264c38 Pull Request resolved: #72333

facebook-github-bot added the cla signed label Mar 30, 2022

bdhirsh mentioned this pull request Mar 30, 2022

[DO NOT MERGE] try to flush out mobile-only error in OSS CI #74965

Closed

bdhirsh requested a review from albanD March 30, 2022 20:55

albanD approved these changes Mar 31, 2022

View reviewed changes

pytorchmergebot closed this in 1b7d7d9 Mar 31, 2022

bdhirsh mentioned this pull request Apr 1, 2022

[do not merge] repro mobile-only dispatcher segfault #75102

Closed

facebook-github-bot deleted the gh/bdhirsh/191/head branch April 4, 2022 14:17

dzdang mentioned this pull request Apr 9, 2022

[Quant][core][improvement] Separated implementations for quantized & non-quantized tensors in item #72333

Closed

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Back out "Back out "free up dispatch key space (in C++)"" #74963

Back out "Back out "free up dispatch key space (in C++)"" #74963

Uh oh!

bdhirsh commented Mar 30, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 30, 2022 •

edited

Loading

Uh oh!

albanD left a comment

Uh oh!

github-actions bot commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Back out "Back out "free up dispatch key space (in C++)"" #74963

Back out "Back out "free up dispatch key space (in C++)"" #74963

Uh oh!

Conversation

bdhirsh commented Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pull / win-vs2019-cuda11.3-py3 / build (1/1)

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdhirsh commented Mar 30, 2022 •

edited

Loading

facebook-github-bot commented Mar 30, 2022 •

edited

Loading