Skip to content

Conversation

@supriyar
Copy link
Contributor

@supriyar supriyar commented Aug 5, 2020

Stack from ghstack:

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D22960142

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Aug 5, 2020
Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 2c651ce
Pull Request resolved: #42612
@supriyar
Copy link
Contributor Author

supriyar commented Aug 5, 2020

Note to reviewers: This change touches quite a few files. But the relevant ones are
Quantizer.h/.cpp - Adds a new type of quantizer - PerRowFloatQParamsQuantizer
QScheme.h - Define a new qscheme corresponding to this quantizer
QuantizedOpKernels - kernel for new quantize/dequantize function for this quantizer
affine_quantizer.cpp - quantize_val_float_qparams which is the quantize_val equation for this case.

Rest of the other files add necessary code to hook up the new quantizer to the top level API and add print support.

@dr-ci
Copy link

dr-ci bot commented Aug 5, 2020

💊 CI failures summary and remediations

As of commit 57c2d76 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 24 times.

…d zero_point"

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
…d zero_point"

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142)

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Aug 5, 2020
…nt scale and zero_point

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 82ceb39
Pull Request resolved: #42612
auto zero_points_data = zero_points.data_ptr<float>();
const float* rdata = rtensor.data_ptr<float>();
auto qdata = qtensor.data_ptr<scalar_t>();
for (auto b = 0; b < batches; ++b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing parallelizing these are saved for a future PR?


/*
* Quantize value based on the following equation
* Qx = (Xf - bias) * inv_scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, does bias = -1 * zero_point * scale (to get Qx = round(Xf/scale - zero_point)?

if yes, maybe we can make the API clearer by having the callsites also use the "bias" name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a little confusing as I was trying to match the C2 equation (Xf-bias) * inv_scale in numerics by setting zero_point = bias.
I had a chat with Raghu about this and we thought it would be better to use zero_point = (-bias/scale) for this case so that we can use similar quantize equation to what we have now.
So the current equation: Xq = Xf * inv_scale + zp
Substituting zero_point
Xq = Xf * inv_scale + (-bias *inv_scale)
Xq = (Xf - bias) * inv_scale, which is the same as what caffe2 uses for embedding layers today. There may be some numerical differences due to the division rounding.
I'll update the PR with the change for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks, that matches what I assumed from reading the code. I think that works well, but maybe we can have the callers use "bias" and not "zero_point" as the argument name then, to prevent confusion? i.e.

quantize_val_float_qparams<qint8>(float scale, float bias, float value);
# instead of
quantize_val_float_qparams<qint8>(float scale, float zero_point, float value);

…d zero_point"

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142)

[ghstack-poisoned]
* kPerChannelAffine.
*
* The quantize equation in this case looks like -
* Xq = (Xf - zero_point) * inv_scale, where inv_scale = 1.0/scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so just to confirm, this is still not the same zero_point conceptually that we use elsewhere? I feel like people might get confused by this. What would you think of naming it bias (as I think it was earlier in this PR), or zero_point_caffe2, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a problem, we have different quantize functions as PerchannelAffineQuantizer

…d zero_point"

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142)

[ghstack-poisoned]

template <typename T>
void checkZeroPoint(const std::string& fn_name, int64_t zero_point) {
void checkZeroPoint(const std::string& fn_name, T zero_point) {
Copy link
Contributor

@jerryzh168 jerryzh168 Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think T here is for things like int8, uint8 etc.? and the zero_point is always going to be int64_t as of now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we don't need to check zero_point when it is float.

…d zero_point"

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142)

[ghstack-poisoned]
…d zero_point"

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142)

[ghstack-poisoned]
@supriyar supriyar requested review from jerryzh168 and vkuzo August 12, 2020 17:48
Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 160 to 164
checkRoundingMode(fn_name);
checkFloatTensor(fn_name, rtensor);
checkCPUTensor(fn_name, rtensor);
checkSameDevice(fn_name, rtensor, qtensor);
checkSameSize(fn_name, qtensor, rtensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these probably should be put into one function

…d zero_point"

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22960142](https://our.internmc.facebook.com/intern/diff/D22960142)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 6f84468.

MauiDesign pushed a commit to MauiDesign/PyTorchPyTorch that referenced this pull request Aug 16, 2020
…nt scale and zero_point

Summary:
Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 50b376d
Pull Request resolved: pytorch/pytorch#42612
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants