Towards supporting quantized structured kernels #74560

ezyang · 2022-03-22T17:35:29Z

Stack from ghstack (oldest at bottom):

This PR add support for quantized tensors with "unknown quantizer",
which means that we can use standard APIs like torch.empty to allocate
quantized tensors, with the understanding that we will set the
quantizer later. This makes meta functions applicable to quantized
tensors (they will allocate with unknown quantizer and the kernel
will set the quantizer later) and fixes a bug David Dang reported
where structured kernels give a weird error message when you call them
with quantized inputs.

This is not a complete support for quantized structured kernels because
I haven't actually tried porting any of the quantized implementations
to structured; qadd is probably a good choice to try first as it
does its broadcasting implementation using TensorIterator. My goal
here is just to show that the error message is better.

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74560
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
↩️ [fb-only] Re-run with SSH instructions
❓Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 5a934fd (more details on the Dr. CI page):

18/18 failures introduced in this PR

🕵️‍♀️ 18 failures not recognized by patterns:

The following CI failures may be due to changes from the PR

Job	Step	Action
^{pull / linux-xenial-py3.7-gcc5.4 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-bionic-rocm4.5-py3.7 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-py3.7-gcc7-no-ops / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-py3-clang5-mobile-custom-build-static / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-py3.7-clang7-onnx / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / win-vs2019-cuda11.3-py3 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / win-vs2019-cpu-py3 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-py3.7-clang7-asan / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-py3-clang5-mobile-build / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-py3.7-gcc7 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-vulkan-bionic-py3.7-clang9 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-bionic-py3.7-clang9 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun
^{pull / linux-xenial-cuda11.3-py3.7-gcc7 / build}	^{Setup SSH (Click me for login details)}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

bdhirsh · 2022-03-22T18:03:20Z

tools/codegen/dest/register_dispatch_key.py

-        runtime_empty_supported_check = ""
-    elif backend_index.dispatch_key == DispatchKey.CompositeExplicitAutograd:
+    elif backend_index.dispatch_key in (
+            DispatchKey.CompositeExplicitAutograd, DispatchKey.QuantizedCPU, DispatchKey.QuantizedCUDA):


tiny nit if we care about the perf (probably not necessary, just calling it out): we could avoid the dispatcher hop here from calling at::empty, but we'd have to abide by the naming convention and name the native kernels at::native::empty_quantizedcpu/cuda

oh this used to not be easy to do but now it is easy

bdhirsh · 2022-03-22T18:03:31Z

tools/codegen/dest/register_dispatch_key.py

        empty_impl = "at::empty"
        empty_strided_impl = "at::empty_strided"
-        runtime_empty_supported_check = """\
-  if (!c10::detail::backend_supports_empty_operator(options)) {{


thanks for killing this :')

jerryzh168 · 2022-03-22T23:33:08Z

test/test_dispatch.py

                r"Registration to both CompositeImplicitAutograd and CompositeExplicitAutograd is not allowed"):
            dispatcher.register(["CompositeExplicitAutograd", "CompositeImplicitAutograd"])

+    def test_quantized_structured_not_implemented(self):


can we add a test in https://github.com/pytorch/pytorch/blob/master/test/quantization/core/test_quantized_tensor.py#L142 for calling some methods (e.g. qscheme) on an unknown tensor as well?

aten/src/ATen/quantized/Quantizer.h

jerryzh168

followups are non-blocking, will accept first

ezyang · 2022-04-01T00:35:48Z

@pytorchbot merge this

pytorchmergebot · 2022-04-01T00:37:46Z

Merge failed due to Matched rule superuser, but it was not reviewed yet by any of:pbelevich,H-Huang,albanD,hlu1,jamesr66a, ...
Raised by https://github.com/pytorch/pytorch/actions/runs/2074378349

This PR add support for quantized tensors with "unknown quantizer", which means that we can use standard APIs like torch.empty to allocate quantized tensors, with the understanding that we will set the quantizer later. This makes meta functions applicable to quantized tensors (they will allocate with unknown quantizer and the kernel will set the quantizer later) and fixes a bug David Dang reported where structured kernels give a weird error message when you call them with quantized inputs. This is not a complete support for quantized structured kernels because I haven't actually tried porting any of the quantized implementations to structured; qadd is probably a good choice to try first as it does its broadcasting implementation using TensorIterator. My goal here is just to show that the error message is better. See also #52680 Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

dzdang · 2022-04-01T16:21:53Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Pull Request resolved: #74560 This PR add support for quantized tensors with "unknown quantizer", which means that we can use standard APIs like torch.empty to allocate quantized tensors, with the understanding that we will set the quantizer later. This makes meta functions applicable to quantized tensors (they will allocate with unknown quantizer and the kernel will set the quantizer later) and fixes a bug David Dang reported where structured kernels give a weird error message when you call them with quantized inputs. This is not a complete support for quantized structured kernels because I haven't actually tried porting any of the quantized implementations to structured; qadd is probably a good choice to try first as it does its broadcasting implementation using TensorIterator. My goal here is just to show that the error message is better. See also #52680 Signed-off-by: Edward Z. Yang <ezyangfb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D35317441 Pulled By: dzdang fbshipit-source-id: ffb85b0e06ccbcc2b01052ca6760517684048b39

github-actions · 2022-04-05T04:32:22Z

Hey @ezyang.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

…tion_pad1d_quantized_cpu and" Summary: With the introduction of structured kernel support for quantized tensors in #74560, we are able to remove the dimension and output resizing code in reflection_pad1d_out_template (this code already present in reflection_pad1d), as well as the implementation for reflection_pad1d_quantized_cpu. This PR should introduce no functional changes. Test plan: ``` python run_test.py ``` Differential Revision: [D35148152](https://our.internmc.facebook.com/intern/diff/D35148152) [ghstack-poisoned]

…ized_cpu and" Summary: With the introduction of structured kernel support for quantized tensors in #74560, we are able to remove the dimension and output resizing code in reflection_pad1d_out_template (this code already present in reflection_pad1d), as well as the implementation for reflection_pad1d_quantized_cpu. This PR should introduce no functional changes. Test plan: ``` python run_test.py ``` Differential Revision: [D35148152](https://our.internmc.facebook.com/intern/diff/D35148152) [ghstack-poisoned]

…tion_pad1d_quantized_cpu, dimension and output resizing code in reflection_pad1d_out_template and implemented reflection_pad1d_out_quantized_cpu" Summary: With the introduction of structured kernel support for quantized tensors in #74560, we are able to remove the dimension and output resizing code in reflection_pad1d_out_template. This code is already present in reflection_pad1d. reflection_pad1d_quantized_cpu has also been removed as quantized tensors can now use reflection_pad1d after the changes in the linked PR. reflection_pad1d_out_quantized_cpu was implemented for quantized tensors. This PR should introduce no functional changes. Test plan: ``` python run_test.py ``` Differential Revision: [D35148152](https://our.internmc.facebook.com/intern/diff/D35148152) [ghstack-poisoned]

…ized_cpu, dimension and output resizing code in reflection_pad1d_out_template and implemented reflection_pad1d_out_quantized_cpu" Summary: With the introduction of structured kernel support for quantized tensors in #74560, we are able to remove the dimension and output resizing code in reflection_pad1d_out_template. This code is already present in reflection_pad1d. reflection_pad1d_quantized_cpu has also been removed as quantized tensors can now use reflection_pad1d after the changes in the linked PR. reflection_pad1d_out_quantized_cpu was implemented for quantized tensors. This PR should introduce no functional changes. Test plan: ``` python run_test.py ``` Differential Revision: [D35148152](https://our.internmc.facebook.com/intern/diff/D35148152) [ghstack-poisoned]

…tch registration for max_pool2d & quantized_max_pool2d and implemented max_pool2d_with_indices_out_quantized_cpu" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR removes the is_quantized check from max_pool2d, and implements a quantized kernel for max_pool2d_with_indices. This PR also introduces isnan() support for vectorized int tensors. This PR relies on #74560, which introduces structured kernel support for quantized tensors. Test plan: ``` python test/test_quantization.py -k test_max_pool2d ``` Differential Revision: [D35420901](https://our.internmc.facebook.com/intern/diff/D35420901) [ghstack-poisoned]

… for max_pool2d & quantized_max_pool2d and implemented max_pool2d_with_indices_out_quantized_cpu" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR removes the is_quantized check from max_pool2d, and implements a quantized kernel for max_pool2d_with_indices. This PR also introduces isnan() support for vectorized int tensors. This PR relies on #74560, which introduces structured kernel support for quantized tensors. Test plan: ``` python test/test_quantization.py -k test_max_pool2d ``` Differential Revision: [D35420901](https://our.internmc.facebook.com/intern/diff/D35420901) [ghstack-poisoned]

…tch registration for max_pool1d & quantized_max_pool1d" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR removes the is_quantized check from max_pool1d and modifies max_pool1d_impl to be compatible with int tensors. This PR relies on #74560, which introduces structured kernel support for quantized tensors and #72353. Test plan: ``` python test/test_quantization.py -k test_max_pool1d ``` Differential Revision: [D35431831](https://our.internmc.facebook.com/intern/diff/D35431831) [ghstack-poisoned]

… for max_pool1d & quantized_max_pool1d" Summary: This PR is part of a series of PRs addressing #54150, related to using dispatcher for calls to quantized backends as opposed to if/else conditionals. This particular PR removes the is_quantized check from max_pool1d and modifies max_pool1d_impl to be compatible with int tensors. This PR relies on #74560, which introduces structured kernel support for quantized tensors and #72353. Test plan: ``` python test/test_quantization.py -k test_max_pool1d ``` Differential Revision: [D35431831](https://our.internmc.facebook.com/intern/diff/D35431831) [ghstack-poisoned]