-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Change quantizer to account for input tensor's memory format. #42178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit ecb3848 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
ci.pytorch.org: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 213 times. |
|
looks like there are some failing tests |
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
dreiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the existing test are sufficient to test this. At the very least, I think we need tests for quantizing channels-first, channels-last, and fully non-contiguous input tensors. Unless those are already there and I'm just missing them.
| qtensor.scalar_type(), "quantize_tensor_per_tensor_affine_cpu", [&]() { | ||
| TORCH_CHECK( | ||
| rtensor.is_contiguous(), "Float tensor should be contiguous"); | ||
| const float* const rdata = rtensor.data_ptr<float>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. It's usually not safe to call data_ptr on a non-contiguous tensor. Maybe the check should be that rtensor is contiguous in the memory format of qtensor? And what if qtensor is not contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes. Thanks for the catch. Dont know what was I thinking removing this, maybe it will come back.
That sounds good. Let's check with Jerry. @jerryzh168, do you know if quantization unit tests covers all these cases. If not I will add. |
right we don't have tests for different memory formats, we always make Tensor contiguous before. Please add them in |
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. pytest test/quantization/test_quantized_tensor.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c3a7da3 Pull Request resolved: #42178
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. pytest test/quantization/test_quantized_tensor.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9364161 Pull Request resolved: #42178
| TORCH_CHECK( | ||
| qtensor.is_contiguous(qtensor.suggest_memory_format()), | ||
| "Quantized tensor should be contiguous"); | ||
| TORCH_CHECK( | ||
| rtensor.is_contiguous(qtensor.suggest_memory_format()), | ||
| "Float tensor should be contiguous " | ||
| "in same memory format as quantizd tensor"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a helper function for these two checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for checks? I know there are a few places we are doing checks, but it seems strange to have a separate two-line function to just do asserts.
| zero_points = torch.tensor([5, 10], dtype=torch.long) | ||
| axis = 1 | ||
|
|
||
| def quantize_c_4d(data, scales, zero_points): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this function general to all dimensions? we can probably reshape the tensor to 1d, do computation in 1d tensor and then reshape it back i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do 1D tensor because then we have to deconstruct channel dimension to apply per channel quant. But I think we can combine, h, w and d dimensions and then use just one function.
| for memory_format in [torch.contiguous_format, torch.channels_last]: | ||
| r = r.contiguous(memory_format=memory_format) | ||
| qr = torch.quantize_per_channel(r, scales, zero_points, axis, torch.quint8) | ||
| rqr = qr.dequantize() | ||
| self.assertTrue(np.allclose(qr.int_repr(), quantize_c_4d(r, scales, zero_points))) | ||
| self.assertTrue(np.allclose(r.numpy(), rqr.numpy(), atol=2 / np.min(scales.numpy()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and merge the test code as well
| out = torch.nn.functional.max_pool2d(out, 2, 2) | ||
| out = self.cat.cat([out, out]) | ||
| out = out.view(-1, 3 * 2 * 2) | ||
| out = out.reshape(-1, 3 * 2 * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because view is broken for channels_last format and there is no easy way to fix that. Although not clear why this PR exposes that particularly.
| qtensor.scalar_type(), "quantize_tensor_per_tensor_affine_cpu", [&]() { | ||
| TORCH_CHECK( | ||
| rtensor.is_contiguous(), "Float tensor should be contiguous"); | ||
| rtensor.is_contiguous(rtensor.suggest_memory_format()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we guarantee that these functions are always passed contiguous tensors as inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already broken if we are not passing contiguous tensors, because kernels extract raw pointer and operate on them. This check is to ensure that the kernels get what they expect.
| ref_res = _quantize_per_channel_ref_nd(r, scales, zero_points) | ||
| r = r.contiguous(memory_format=memory_format) | ||
| qr = torch.quantize_per_channel(r, scales, zero_points, axis, torch.quint8) | ||
| rqr = qr.dequantize() | ||
| self.assertTrue(np.allclose(qr.int_repr(), ref_res)) | ||
| self.assertTrue(np.allclose(r.numpy(), rqr.numpy(), atol=2 / np.min(scales.numpy()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
| TORCH_CHECK( | ||
| qtensor.is_contiguous(qtensor.suggest_memory_format()), | ||
| "Quantized tensor should be contiguous"); | ||
| TORCH_CHECK( | ||
| rtensor.is_contiguous(qtensor.suggest_memory_format()), | ||
| "Float tensor should be contiguous " | ||
| "in same memory format as quantizd tensor"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like these are repeated a lot of times, I think it would be better to put them into a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it seems weird to just move this to a function whose sole purpose is to assert. If you insist I can do this, but I dont think it is very meaniful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have a lot of these checking functions here: https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/quantized/affine_quantizer.cpp.html#_ZN2at6native12_GLOBAL__N_114checkCPUTensorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_6TensorE
yeah I feel we should not repeat code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put these in anonymous namespace to avoid affecting other files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Although my feeling is that, it is little too much of deduplication just for the sake of it.
| print("--- reference output---") | ||
| print(Y_exp) | ||
| print("--- actual output---") | ||
| print(Y_act) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaah. Thanks for the catch.
| # Check 4D tensor with 2 different memory formats. | ||
| r = torch.rand(3, 2, 4, 5, dtype=torch.float) * 4 - 2 | ||
| scales = torch.tensor([0.2, 0.03], dtype=torch.double) | ||
| zero_points = torch.tensor([5, 10], dtype=torch.long) | ||
| self._test_quantize_per_channel(r, scales, zero_points, 1 , False) | ||
|
|
||
| scales = torch.tensor([0.2, 0.03, 0.5], dtype=torch.double) | ||
| zero_points = torch.tensor([5, 10, 7], dtype=torch.long) | ||
| self._test_quantize_per_channel(r, scales, zero_points, 0, False) | ||
|
|
||
| # Check 5D tensor. | ||
| r = torch.rand(3, 2, 4, 5, 7, dtype=torch.float) * 4 - 2 | ||
| scales = torch.tensor([0.2, 0.03], dtype=torch.double) | ||
| zero_points = torch.tensor([5, 10], dtype=torch.long) | ||
| self._test_quantize_per_channel(r, scales, zero_points, 1, False) | ||
|
|
||
| scales = torch.tensor([0.2, 0.03, 0.5], dtype=torch.double) | ||
| zero_points = torch.tensor([5, 10, 7], dtype=torch.long) | ||
| self._test_quantize_per_channel(r, scales, zero_points, 0, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these can be in a loop as well, with r, scale, zero_points, axis being configurable
| self.assertTrue(np.allclose(qr.int_repr(), ref)) | ||
| self.assertTrue(np.allclose(r.numpy(), dequant_tensor.numpy(), atol=1)) | ||
|
|
||
| # Check 4D tensor with 2 different memory formats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, I think maybe you can also merge this test with previous test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduces unrelated changes. We should merge with previous one in a separate PR if we want to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, sounds good
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, had a few inline nit comments
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
…at." Summary: This otherwise introduces unnecessary calls to contiguous in the rest of the network, where certain ops want channels last format. Test Plan: Quantization tests. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22796479](https://our.internmc.facebook.com/intern/diff/D22796479) [ghstack-poisoned]
|
This pull request has been merged in b52e6d0. |
Stack from ghstack:
Summary:
This otherwise introduces unnecessary calls to contiguous in the rest of
the network, where certain ops want channels last format.
Test Plan:
Quantization tests.
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D22796479