[quant] Create PerRowQuantizer for floating point scale and zero_point #42612

supriyar · 2020-08-05T18:24:55Z

Stack from ghstack:

[quant] Add torchbind support for embedding_bag packed weights #42881 [quant] Add torchbind support for embedding_bag packed weights
[quant] Add embeddingbag_prepack function that works on quantized tensor. #42762 [quant] Add embeddingbag_prepack function that works on quantized tensor.
[quant] Make PerChannel Observer work with float qparams #42690 [quant] Make PerChannel Observer work with float qparams
[quant] Create PerRowQuantizer for floating point scale and zero_point #42612 [quant] Create PerRowQuantizer for floating point scale and zero_point

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D22960142

Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2c651ce Pull Request resolved: #42612

supriyar · 2020-08-05T18:29:42Z

Note to reviewers: This change touches quite a few files. But the relevant ones are
Quantizer.h/.cpp - Adds a new type of quantizer - PerRowFloatQParamsQuantizer
QScheme.h - Define a new qscheme corresponding to this quantizer
QuantizedOpKernels - kernel for new quantize/dequantize function for this quantizer
affine_quantizer.cpp - quantize_val_float_qparams which is the quantize_val equation for this case.

Rest of the other files add necessary code to hook up the new quantizer to the top level API and add print support.

dr-ci · 2020-08-05T19:14:02Z

💊 CI failures summary and remediations

As of commit 57c2d76 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 24 times.

aten/src/ATen/native/quantized/affine_quantizer.cpp

aten/src/ATen/quantized/Quantizer.h

aten/src/ATen/native/quantized/affine_quantizer.cpp

aten/src/ATen/quantized/Quantizer.h

aten/src/ATen/core/Formatting.cpp

aten/src/ATen/native/quantized/affine_quantizer.cpp

aten/src/ATen/native/quantized/QTensor.cpp

…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]

…nt scale and zero_point Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 82ceb39 Pull Request resolved: #42612

vkuzo · 2020-08-06T00:44:04Z

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

+        auto zero_points_data = zero_points.data_ptr<float>();
+        const float* rdata = rtensor.data_ptr<float>();
+        auto qdata = qtensor.data_ptr<scalar_t>();
+        for (auto b = 0; b < batches; ++b) {


I'm guessing parallelizing these are saved for a future PR?

aten/src/ATen/native/quantized/affine_quantizer.cpp

vkuzo · 2020-08-07T00:37:42Z

aten/src/ATen/native/quantized/affine_quantizer.cpp


+/*
+* Quantize value based on the following equation
+* Qx = (Xf - bias) * inv_scale


just to clarify, does bias = -1 * zero_point * scale (to get Qx = round(Xf/scale - zero_point)?

if yes, maybe we can make the API clearer by having the callsites also use the "bias" name?

I think this was a little confusing as I was trying to match the C2 equation (Xf-bias) * inv_scale in numerics by setting zero_point = bias.
I had a chat with Raghu about this and we thought it would be better to use zero_point = (-bias/scale) for this case so that we can use similar quantize equation to what we have now.
So the current equation: Xq = Xf * inv_scale + zp
Substituting zero_point
Xq = Xf * inv_scale + (-bias *inv_scale)
Xq = (Xf - bias) * inv_scale, which is the same as what caffe2 uses for embedding layers today. There may be some numerical differences due to the division rounding.
I'll update the PR with the change for this.

yes, thanks, that matches what I assumed from reading the code. I think that works well, but maybe we can have the callers use "bias" and not "zero_point" as the argument name then, to prevent confusion? i.e.

quantize_val_float_qparams<qint8>(float scale, float bias, float value); # instead of quantize_val_float_qparams<qint8>(float scale, float zero_point, float value);

c10/core/QScheme.h

…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]

aten/src/ATen/native/quantized/affine_quantizer.cpp

vkuzo · 2020-08-08T00:01:15Z

aten/src/ATen/quantized/Quantizer.h

+ * kPerChannelAffine.
+ *
+ * The quantize equation in this case looks like -
+ * Xq = (Xf - zero_point) * inv_scale, where inv_scale = 1.0/scale


hmm, so just to confirm, this is still not the same zero_point conceptually that we use elsewhere? I feel like people might get confused by this. What would you think of naming it bias (as I think it was earlier in this PR), or zero_point_caffe2, etc?

I think this is not a problem, we have different quantize functions as PerchannelAffineQuantizer

…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]

jerryzh168 · 2020-08-11T16:39:59Z

aten/src/ATen/native/quantized/affine_quantizer.cpp


 template <typename T>
-void checkZeroPoint(const std::string& fn_name, int64_t zero_point) {
+void checkZeroPoint(const std::string& fn_name, T zero_point) {


I think T here is for things like int8, uint8 etc.? and the zero_point is always going to be int64_t as of now.

Yeah we don't need to check zero_point when it is float.

…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]

jerryzh168

LGTM

jerryzh168 · 2020-08-12T18:15:14Z

aten/src/ATen/native/quantized/affine_quantizer.cpp

+  checkRoundingMode(fn_name);
+  checkFloatTensor(fn_name, rtensor);
+  checkCPUTensor(fn_name, rtensor);
+  checkSameDevice(fn_name, rtensor, qtensor);
+  checkSameSize(fn_name, qtensor, rtensor);


these probably should be put into one function

…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]

facebook-github-bot · 2020-08-13T22:15:04Z

This pull request has been merged in 6f84468.

…nt scale and zero_point Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 50b376d Pull Request resolved: pytorch/pytorch#42612

supriyar requested review from jerryzh168, raghuramank100, vkuzo and z-a-f August 5, 2020 18:25

supriyar commented Aug 5, 2020

View reviewed changes

aten/src/ATen/native/quantized/affine_quantizer.cpp Outdated Show resolved Hide resolved

jerryzh168 reviewed Aug 5, 2020

View reviewed changes

aten/src/ATen/quantized/Quantizer.h Outdated Show resolved Hide resolved

dskhudia reviewed Aug 5, 2020

View reviewed changes

aten/src/ATen/native/quantized/affine_quantizer.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/quantized/affine_quantizer.cpp Outdated Show resolved Hide resolved

jerryzh168 reviewed Aug 5, 2020

View reviewed changes

aten/src/ATen/quantized/Quantizer.h Outdated Show resolved Hide resolved

raghuramank100 reviewed Aug 5, 2020

View reviewed changes

aten/src/ATen/core/Formatting.cpp Outdated Show resolved Hide resolved

raghuramank100 reviewed Aug 5, 2020

View reviewed changes

aten/src/ATen/native/quantized/affine_quantizer.cpp Outdated Show resolved Hide resolved

raghuramank100 reviewed Aug 5, 2020

View reviewed changes

aten/src/ATen/native/quantized/QTensor.cpp Outdated Show resolved Hide resolved

supriyar mentioned this pull request Aug 6, 2020

[quant] Make PerChannel Observer work with float qparams #42690

Closed

supriyar requested review from jerryzh168 and raghuramank100 August 6, 2020 19:55

vkuzo reviewed Aug 7, 2020

View reviewed changes

supriyar commented Aug 7, 2020

View reviewed changes

aten/src/ATen/native/quantized/affine_quantizer.cpp Show resolved Hide resolved

supriyar mentioned this pull request Aug 7, 2020

[quant] Add embeddingbag_prepack function that works on quantized tensor. #42762

Closed

vkuzo reviewed Aug 8, 2020

View reviewed changes

jerryzh168 reviewed Aug 11, 2020

View reviewed changes

supriyar mentioned this pull request Aug 11, 2020

[quant] Add torchbind support for embedding_bag packed weights #42881

Closed

supriyar requested review from jerryzh168 and vkuzo August 12, 2020 17:48

jerryzh168 approved these changes Aug 12, 2020

View reviewed changes

jerryzh168 reviewed Aug 12, 2020

View reviewed changes

facebook-github-bot closed this in 6f84468 Aug 13, 2020

facebook-github-bot added the merged label Aug 13, 2020

mruberry added the Merged label Oct 28, 2020

[quant] Create PerRowQuantizer for floating point scale and zero_point #42612

[quant] Create PerRowQuantizer for floating point scale and zero_point #42612

Uh oh!

Conversation

supriyar commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

supriyar commented Aug 5, 2020

Uh oh!

dr-ci bot commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

supriyar commented Aug 5, 2020 •

edited

Loading

dr-ci bot commented Aug 5, 2020 •

edited

Loading

jerryzh168 Aug 11, 2020 •

edited

Loading