-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant] Add quantized Embedding module #44208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Add quantized module in static quantization namespace. Embedding quantization requires only weights to be quantized so it is static. Internally it calls the embedding_bag_byte op with the offsets set corresponding to the indices. Future PR will move EmbeddingBag quantization from dynamic to static as well. Test Plan: python test/test_quantization.py test_embedding_api Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit b4a14f7 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 18 times. |
Summary: Add quantized module in static quantization namespace. Embedding quantization requires only weights to be quantized so it is static. Internally it calls the embedding_bag_byte op with the offsets set corresponding to the indices. Future PR will move EmbeddingBag quantization from dynamic to static as well. Test Plan: python test/test_quantization.py test_embedding_api Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Add quantized module in static quantization namespace. Embedding quantization requires only weights to be quantized so it is static. Internally it calls the embedding_bag_byte op with the offsets set corresponding to the indices. Future PR will move EmbeddingBag quantization from dynamic to static as well. Test Plan: python test/test_quantization.py test_embedding_api Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23547384](https://our.internmc.facebook.com/intern/diff/D23547384) [ghstack-poisoned]
| scales = torch.ones(num_embeddings, dtype=torch.float) | ||
| zero_points = torch.ones(num_embeddings, dtype=torch.float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this is the same code as in packed params, perhaps we can only do it once?
| def set_weight(self, weight): | ||
| # type: (torch.Tensor) -> None | ||
| if self.dtype == torch.quint8: | ||
| self._packed_weight = torch.ops.quantized.embedding_bag_prepack(weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support per tensor quantization for packed params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not atm, we only have per-row quantization support with float qparams.
| # |--- _packed_weight : Tensor representing weight of EmbeddingPackedParamsBase | ||
| # |--- dtype : torch.dtype | ||
|
|
||
| def _save_to_state_dict(self, destination, prefix, keep_vars): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have a field for bitwidth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use the tensor dtype to determine the bitwidth, right? Currently it only supports 8bit, but once we add 4-bit qtensor it should be encoded in the dtype.
| super(Embedding, self).__init__() | ||
| self.num_embeddings = num_embeddings | ||
| self.embedding_dim = embedding_dim | ||
| self.sparse = sparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding: What does self.sparse do ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it doesn't do anything for the quantized module. I'll remove it from here.
It is used in the float module for sparse gradients for weight tensor.
Codecov Report
@@ Coverage Diff @@
## gh/supriyar/174/base #44208 +/- ##
========================================================
+ Coverage 69.24% 69.31% +0.06%
========================================================
Files 381 382 +1
Lines 47573 47714 +141
========================================================
+ Hits 32943 33072 +129
- Misses 14630 14642 +12
Continue to review full report at Codecov.
|
Summary: Add quantized module in static quantization namespace. Embedding quantization requires only weights to be quantized so it is static. Internally it calls the embedding_bag_byte op with the offsets set corresponding to the indices. Future PR will move EmbeddingBag quantization from dynamic to static as well. Test Plan: python test/test_quantization.py test_embedding_api Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23547384](https://our.internmc.facebook.com/intern/diff/D23547384) [ghstack-poisoned]
Summary: Add quantized module in static quantization namespace. Embedding quantization requires only weights to be quantized so it is static. Internally it calls the embedding_bag_byte op with the offsets set corresponding to the indices. Future PR will move EmbeddingBag quantization from dynamic to static as well. Test Plan: python test/test_quantization.py test_embedding_api Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23547384](https://our.internmc.facebook.com/intern/diff/D23547384) [ghstack-poisoned]
|
This pull request has been merged in 57b87aa. |
Stack from ghstack:
Summary:
Add quantized module in static quantization namespace. Embedding
quantization requires only weights to be quantized so it is static.
Internally it calls the embedding_bag_byte op with the offsets set corresponding to the
indices.
Future PR will move EmbeddingBag quantization from dynamic to static as well.
Test Plan:
python test/test_quantization.py test_embedding_api
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D23547384