-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant] Create PerRowQuantizer for floating point scale and zero_point #42612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2c651ce Pull Request resolved: #42612
|
Note to reviewers: This change touches quite a few files. But the relevant ones are Rest of the other files add necessary code to hook up the new quantizer to the top level API and add print support. |
💊 CI failures summary and remediationsAs of commit 57c2d76 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 24 times. |
…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]
…nt scale and zero_point Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 82ceb39 Pull Request resolved: #42612
| auto zero_points_data = zero_points.data_ptr<float>(); | ||
| const float* rdata = rtensor.data_ptr<float>(); | ||
| auto qdata = qtensor.data_ptr<scalar_t>(); | ||
| for (auto b = 0; b < batches; ++b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing parallelizing these are saved for a future PR?
|
|
||
| /* | ||
| * Quantize value based on the following equation | ||
| * Qx = (Xf - bias) * inv_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify, does bias = -1 * zero_point * scale (to get Qx = round(Xf/scale - zero_point)?
if yes, maybe we can make the API clearer by having the callsites also use the "bias" name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a little confusing as I was trying to match the C2 equation (Xf-bias) * inv_scale in numerics by setting zero_point = bias.
I had a chat with Raghu about this and we thought it would be better to use zero_point = (-bias/scale) for this case so that we can use similar quantize equation to what we have now.
So the current equation: Xq = Xf * inv_scale + zp
Substituting zero_point
Xq = Xf * inv_scale + (-bias *inv_scale)
Xq = (Xf - bias) * inv_scale, which is the same as what caffe2 uses for embedding layers today. There may be some numerical differences due to the division rounding.
I'll update the PR with the change for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks, that matches what I assumed from reading the code. I think that works well, but maybe we can have the callers use "bias" and not "zero_point" as the argument name then, to prevent confusion? i.e.
quantize_val_float_qparams<qint8>(float scale, float bias, float value);
# instead of
quantize_val_float_qparams<qint8>(float scale, float zero_point, float value);
…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]
aten/src/ATen/quantized/Quantizer.h
Outdated
| * kPerChannelAffine. | ||
| * | ||
| * The quantize equation in this case looks like - | ||
| * Xq = (Xf - zero_point) * inv_scale, where inv_scale = 1.0/scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, so just to confirm, this is still not the same zero_point conceptually that we use elsewhere? I feel like people might get confused by this. What would you think of naming it bias (as I think it was earlier in this PR), or zero_point_caffe2, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not a problem, we have different quantize functions as PerchannelAffineQuantizer
…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]
|
|
||
| template <typename T> | ||
| void checkZeroPoint(const std::string& fn_name, int64_t zero_point) { | ||
| void checkZeroPoint(const std::string& fn_name, T zero_point) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think T here is for things like int8, uint8 etc.? and the zero_point is always going to be int64_t as of now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we don't need to check zero_point when it is float.
…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]
…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| checkRoundingMode(fn_name); | ||
| checkFloatTensor(fn_name, rtensor); | ||
| checkCPUTensor(fn_name, rtensor); | ||
| checkSameDevice(fn_name, rtensor, qtensor); | ||
| checkSameSize(fn_name, qtensor, rtensor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these probably should be put into one function
…d zero_point" Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142) [ghstack-poisoned]
|
This pull request has been merged in 6f84468. |
…nt scale and zero_point Summary: Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 50b376d Pull Request resolved: pytorch/pytorch#42612
Stack from ghstack:
Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is
Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary
Test Plan:
python test/test_quantization.py TestQuantizedTensor
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D22960142