[quant] Add optimized approach to calculate qparams for qembedding_bag #45149

supriyar · 2020-09-22T18:11:20Z

Stack from ghstack:

[quant] Add optimized approach to calculate qparams for qembedding_bag #45149 [quant] Add optimized approach to calculate qparams for qembedding_bag

Summary:
The choose_qparams_optimized calculates the the optimized qparams.
It uses a greedy approach to nudge the min and max and calculate the l2 norm
and tries to minimize the quant error by doing torch.norm(x-fake_quant(x,s,z))
Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23848060

Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

dr-ci · 2020-09-22T18:13:56Z

💊 CI failures summary and remediations

As of commit b91bd0d (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 34 times.

…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

aten/src/ATen/native/native_functions.yaml

aten/src/ATen/native/quantized/QTensor.cpp

aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp

…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]

raghuramank100 · 2020-09-22T23:05:31Z

tools/autograd/gen_python_functions.py

    'std::tuple<Tensor,Tensor,Tensor,Tensor,int64_t>',
    'std::tuple<Tensor,Tensor,double,Tensor,int64_t>',
    'std::tuple<double,int64_t>',
+    'std::tuple<double,double>',


Is this related to your change?

yes, this adds a new combination of return types that is not currently supported.

test/quantization/test_quantized_op.py

raghuramank100 · 2020-09-22T23:09:34Z

aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp

 // and packs the float weight tensor. In the next step it will be replaced by a
 // quantize and pack function once we support FP scale and FP zero_point
-Tensor qembeddingbag_byte_prepack(const Tensor& weight) {
+Tensor qembeddingbag_byte_prepack(const Tensor& weight, bool optimized_qparams) {


This function is not doing optimized_qparam calculation, can you please add that?

In caffe2 we currently don't have optimized_qparam calculation for byte_prepack. Do we need it in PT?

raghuramank100 · 2020-09-22T23:10:14Z

aten/src/ATen/native/quantized/library.cpp

  m.def("embedding_bag_prepack(Tensor weight) -> __torch__.torch.classes.quantized.EmbeddingPackedParamsBase W_prepack");
  m.def("embedding_bag_unpack(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase W_prepack) -> Tensor W_origin");
-  m.def("embedding_bag_byte_prepack(Tensor weight) -> Tensor");
+  m.def("embedding_bag_byte_prepack(Tensor weight, bool optimized_qparams=False) -> Tensor");


When we enable this at the python level, how does the user control which observer is used for embeddings? i.e should we change the default for this op to have optimized_qparams=True?

We can have a separate observer that calls the aten op to calculate optimized qparams.
I'm not sure if setting this to true will have any implications on PyPer perf.

…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]

Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c8b3262 Pull Request resolved: #45149

…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]

Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 21f5b9d Pull Request resolved: #45149

facebook-github-bot · 2020-09-24T02:17:23Z

This pull request has been merged in 60665ac.

supriyar requested review from dskhudia and raghuramank100 September 22, 2020 19:23