Change quantizer to account for input tensor's memory format. #42178

kimishpatel · 2020-07-28T17:57:58Z

Stack from ghstack:

Enable input pointer caching in XNNPACK integration. #42840 Enable input pointer caching in XNNPACK integration.
Change quantizer to account for input tensor's memory format. #42178 Change quantizer to account for input tensor's memory format.
Call qnnpack's conv setup only if input pointer has changed. #42008 Call qnnpack's conv setup only if input pointer has changed.
Refactor qconv to reduce allocations. #42007 Refactor qconv to reduce allocations.
Simple caching allocator for CPU. #42006 Simple caching allocator for CPU.

Summary:
This otherwise introduces unnecessary calls to contiguous in the rest of
the network, where certain ops want channels last format.

Test Plan:
Quantization tests.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D22796479

Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 65f145c Pull Request resolved: #42178

dr-ci · 2020-07-28T18:50:35Z

💊 CI failures summary and remediations

As of commit ecb3848 (more details on the Dr. CI page):

3/3 failures possibly* introduced in this PR
- 3/3 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: GitHub Actions - clang-tidy

ci.pytorch.org: 2 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.5.1-py3.6-test
Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 213 times.

aten/src/ATen/quantized/Quantizer.cpp

jerryzh168 · 2020-07-29T17:17:22Z

looks like there are some failing tests

…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]

Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8b8d12c Pull Request resolved: #42178

dreiss

I'm not sure the existing test are sufficient to test this. At the very least, I think we need tests for quantizing channels-first, channels-last, and fully non-contiguous input tensors. Unless those are already there and I'm just missing them.

dreiss · 2020-08-05T19:11:38Z

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

      qtensor.scalar_type(), "quantize_tensor_per_tensor_affine_cpu", [&]() {
-        TORCH_CHECK(
-            rtensor.is_contiguous(), "Float tensor should be contiguous");
        const float* const rdata = rtensor.data_ptr<float>();


I don't understand this. It's usually not safe to call data_ptr on a non-contiguous tensor. Maybe the check should be that rtensor is contiguous in the memory format of qtensor? And what if qtensor is not contiguous?

Oh yes. Thanks for the catch. Dont know what was I thinking removing this, maybe it will come back.

kimishpatel · 2020-08-05T20:28:27Z

I'm not sure the existing test are sufficient to test this. At the very least, I think we need tests for quantizing channels-first, channels-last, and fully non-contiguous input tensors. Unless those are already there and I'm just missing them.

That sounds good. Let's check with Jerry. @jerryzh168, do you know if quantization unit tests covers all these cases. If not I will add.

jerryzh168 · 2020-08-05T20:41:00Z

I'm not sure the existing test are sufficient to test this. At the very least, I think we need tests for quantizing channels-first, channels-last, and fully non-contiguous input tensors. Unless those are already there and I'm just missing them.

That sounds good. Let's check with Jerry. @jerryzh168, do you know if quantization unit tests covers all these cases. If not I will add.

right we don't have tests for different memory formats, we always make Tensor contiguous before. Please add them in test/quantization/test_quantized_tensor.py Thanks!

…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]

Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 295234b Pull Request resolved: #42178

…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]

Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. pytest test/quantization/test_quantized_tensor.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c3a7da3 Pull Request resolved: #42178

…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]

Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. pytest test/quantization/test_quantized_tensor.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9364161 Pull Request resolved: #42178

jerryzh168 · 2020-08-10T23:59:29Z

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

+        TORCH_CHECK(
+            qtensor.is_contiguous(qtensor.suggest_memory_format()),
+            "Quantized tensor should be contiguous");
+        TORCH_CHECK(
+            rtensor.is_contiguous(qtensor.suggest_memory_format()),
+            "Float tensor should be contiguous "
+            "in same memory format as quantizd tensor");


can we create a helper function for these two checks?

Just for checks? I know there are a few places we are doing checks, but it seems strange to have a separate two-line function to just do asserts.

jerryzh168 · 2020-08-11T00:02:42Z

test/quantization/test_quantized_tensor.py

+        zero_points = torch.tensor([5, 10], dtype=torch.long)
+        axis = 1
+
+        def quantize_c_4d(data, scales, zero_points):


can you make this function general to all dimensions? we can probably reshape the tensor to 1d, do computation in 1d tensor and then reshape it back i think.

We can't do 1D tensor because then we have to deconstruct channel dimension to apply per channel quant. But I think we can combine, h, w and d dimensions and then use just one function.

jerryzh168 · 2020-08-11T00:03:38Z

test/quantization/test_quantized_tensor.py

+        for memory_format in [torch.contiguous_format, torch.channels_last]:
+            r = r.contiguous(memory_format=memory_format)
+            qr = torch.quantize_per_channel(r, scales, zero_points, axis, torch.quint8)
+            rqr = qr.dequantize()
+            self.assertTrue(np.allclose(qr.int_repr(), quantize_c_4d(r, scales, zero_points)))
+            self.assertTrue(np.allclose(r.numpy(), rqr.numpy(), atol=2 / np.min(scales.numpy())))


and merge the test code as well

jerryzh168 · 2020-08-11T00:05:05Z

torch/testing/_internal/common_quantization.py

        out = torch.nn.functional.max_pool2d(out, 2, 2)
        out = self.cat.cat([out, out])
-        out = out.view(-1, 3 * 2 * 2)
+        out = out.reshape(-1, 3 * 2 * 2)


why is this change needed?

Because view is broken for channels_last format and there is no easy way to fix that. Although not clear why this PR exposes that particularly.

jerryzh168 · 2020-08-11T00:05:43Z

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

      qtensor.scalar_type(), "quantize_tensor_per_tensor_affine_cpu", [&]() {
        TORCH_CHECK(
-            rtensor.is_contiguous(), "Float tensor should be contiguous");
+            rtensor.is_contiguous(rtensor.suggest_memory_format()),


how do we guarantee that these functions are always passed contiguous tensors as inputs?

It is already broken if we are not passing contiguous tensors, because kernels extract raw pointer and operate on them. This check is to ensure that the kernels get what they expect.

jerryzh168 · 2020-08-17T19:01:28Z

test/quantization/test_quantized_tensor.py

+            ref_res = _quantize_per_channel_ref_nd(r, scales, zero_points)
+            r = r.contiguous(memory_format=memory_format)
+            qr = torch.quantize_per_channel(r, scales, zero_points, axis, torch.quint8)
+            rqr = qr.dequantize()
+            self.assertTrue(np.allclose(qr.int_repr(), ref_res))
+            self.assertTrue(np.allclose(r.numpy(), rqr.numpy(), atol=2 / np.min(scales.numpy())))


…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]

jerryzh168 · 2020-08-21T16:08:41Z

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

+        TORCH_CHECK(
+            qtensor.is_contiguous(qtensor.suggest_memory_format()),
+            "Quantized tensor should be contiguous");
+        TORCH_CHECK(
+            rtensor.is_contiguous(qtensor.suggest_memory_format()),
+            "Float tensor should be contiguous "
+            "in same memory format as quantizd tensor");


looks like these are repeated a lot of times, I think it would be better to put them into a function

Yes, but it seems weird to just move this to a function whose sole purpose is to assert. If you insist I can do this, but I dont think it is very meaniful.

we do have a lot of these checking functions here: https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/quantized/affine_quantizer.cpp.html#_ZN2at6native12_GLOBAL__N_114checkCPUTensorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_6TensorE

yeah I feel we should not repeat code

you can put these in anonymous namespace to avoid affecting other files

Sure. Although my feeling is that, it is little too much of deduplication just for the sake of it.

jerryzh168 · 2020-08-21T16:09:56Z

test/quantization/test_quantized_functional.py

+        print("--- reference output---")
+        print(Y_exp)
+        print("--- actual output---")
+        print(Y_act)


Aaah. Thanks for the catch.

jerryzh168 · 2020-08-21T16:12:37Z

test/quantization/test_quantized_tensor.py

+        # Check 4D tensor with 2 different memory formats.
+        r = torch.rand(3, 2, 4, 5, dtype=torch.float) * 4 - 2
+        scales = torch.tensor([0.2, 0.03], dtype=torch.double)
+        zero_points = torch.tensor([5, 10], dtype=torch.long)
+        self._test_quantize_per_channel(r, scales, zero_points, 1 , False)
+
+        scales = torch.tensor([0.2, 0.03, 0.5], dtype=torch.double)
+        zero_points = torch.tensor([5, 10, 7], dtype=torch.long)
+        self._test_quantize_per_channel(r, scales, zero_points, 0, False)
+
+        # Check 5D tensor.
+        r = torch.rand(3, 2, 4, 5, 7, dtype=torch.float) * 4 - 2
+        scales = torch.tensor([0.2, 0.03], dtype=torch.double)
+        zero_points = torch.tensor([5, 10], dtype=torch.long)
+        self._test_quantize_per_channel(r, scales, zero_points, 1, False)
+
+        scales = torch.tensor([0.2, 0.03, 0.5], dtype=torch.double)
+        zero_points = torch.tensor([5, 10, 7], dtype=torch.long)
+        self._test_quantize_per_channel(r, scales, zero_points, 0, False)


nit: these can be in a loop as well, with r, scale, zero_points, axis being configurable

jerryzh168 · 2020-08-21T16:13:17Z

test/quantization/test_quantized_tensor.py

        self.assertTrue(np.allclose(qr.int_repr(), ref))
        self.assertTrue(np.allclose(r.numpy(), dequant_tensor.numpy(), atol=1))

+        # Check 4D tensor with 2 different memory formats.


same here, I think maybe you can also merge this test with previous test

Introduces unrelated changes. We should merge with previous one in a separate PR if we want to do that.

sure, sounds good

jerryzh168

Thanks, had a few inline nit comments

…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]

facebook-github-bot · 2020-08-23T00:09:27Z

This pull request has been merged in b52e6d0.

This was referenced Jul 28, 2020

Simple caching allocator for CPU. #42006

Closed

Refactor qconv to reduce allocations. #42007

Closed

Call qnnpack's conv setup only if input pointer has changed. #42008

Closed

kimishpatel requested a review from jerryzh168 July 28, 2020 18:53

jerryzh168 reviewed Jul 28, 2020

View reviewed changes

aten/src/ATen/quantized/Quantizer.cpp Show resolved Hide resolved

jerryzh168 reviewed Jul 28, 2020

View reviewed changes

aten/src/ATen/quantized/Quantizer.cpp Outdated Show resolved Hide resolved

dreiss requested changes Aug 5, 2020

View reviewed changes

kimishpatel requested a review from jerryzh168 August 7, 2020 19:17

jerryzh168 reviewed Aug 10, 2020

View reviewed changes

jerryzh168 reviewed Aug 11, 2020

View reviewed changes

kimishpatel mentioned this pull request Aug 11, 2020

Enable input pointer caching in XNNPACK integration. #42840

Closed

jerryzh168 reviewed Aug 17, 2020

View reviewed changes

kimishpatel added 14 commits August 17, 2020 14:29

kimishpatel requested a review from jerryzh168 August 20, 2020 14:13

jerryzh168 reviewed Aug 21, 2020

View reviewed changes

jerryzh168 approved these changes Aug 21, 2020

View reviewed changes

kimishpatel added 2 commits August 21, 2020 10:46

facebook-github-bot closed this in b52e6d0 Aug 22, 2020

facebook-github-bot added the merged label Aug 23, 2020

facebook-github-bot deleted the gh/kimishpatel/38/head branch August 26, 2020 14:18

mruberry added the Merged label Oct 28, 2020

vkuzo mentioned this pull request Dec 27, 2021

[Quantization]Quant and Dequant is changing memory layout #33643

Closed

Change quantizer to account for input tensor's memory format. #42178

Change quantizer to account for input tensor's memory format. #42178

Uh oh!

Conversation

kimishpatel commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

ci.pytorch.org: 2 failed

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Jul 29, 2020

Uh oh!

dreiss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel commented Aug 5, 2020

Uh oh!

jerryzh168 commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 23, 2020

Uh oh!

Reviewers

Assignees

kimishpatel commented Jul 28, 2020 •

edited

Loading

dr-ci bot commented Jul 28, 2020 •

edited

Loading

jerryzh168 commented Aug 5, 2020 •

edited

Loading

kimishpatel Aug 11, 2020 •

edited

Loading