-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant] Add optimized approach to calculate qparams for qembedding_bag #45149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit b91bd0d (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 34 times. |
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]
| 'std::tuple<Tensor,Tensor,Tensor,Tensor,int64_t>', | ||
| 'std::tuple<Tensor,Tensor,double,Tensor,int64_t>', | ||
| 'std::tuple<double,int64_t>', | ||
| 'std::tuple<double,double>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to your change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this adds a new combination of return types that is not currently supported.
| // and packs the float weight tensor. In the next step it will be replaced by a | ||
| // quantize and pack function once we support FP scale and FP zero_point | ||
| Tensor qembeddingbag_byte_prepack(const Tensor& weight) { | ||
| Tensor qembeddingbag_byte_prepack(const Tensor& weight, bool optimized_qparams) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not doing optimized_qparam calculation, can you please add that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In caffe2 we currently don't have optimized_qparam calculation for byte_prepack. Do we need it in PT?
| m.def("embedding_bag_prepack(Tensor weight) -> __torch__.torch.classes.quantized.EmbeddingPackedParamsBase W_prepack"); | ||
| m.def("embedding_bag_unpack(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase W_prepack) -> Tensor W_origin"); | ||
| m.def("embedding_bag_byte_prepack(Tensor weight) -> Tensor"); | ||
| m.def("embedding_bag_byte_prepack(Tensor weight, bool optimized_qparams=False) -> Tensor"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we enable this at the python level, how does the user control which observer is used for embeddings? i.e should we change the default for this op to have optimized_qparams=True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a separate observer that calls the aten op to calculate optimized qparams.
I'm not sure if setting this to true will have any implications on PyPer perf.
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]
Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c8b3262 Pull Request resolved: #45149
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]
…mbedding_bag" Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23848060](https://our.internmc.facebook.com/intern/diff/D23848060) [ghstack-poisoned]
Summary: The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 21f5b9d Pull Request resolved: #45149
|
This pull request has been merged in 60665ac. |
Stack from ghstack:
Summary:
The choose_qparams_optimized calculates the the optimized qparams.
It uses a greedy approach to nudge the min and max and calculate the l2 norm
and tries to minimize the quant error by doing
torch.norm(x-fake_quant(x,s,z))Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D23848060