-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant] Add embeddingbag_prepack function that works on quantized tensor. #42762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sor. Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit bc1d0f9 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 53 times. |
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…sor. Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b26323b Pull Request resolved: #42762
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…sor. Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 05ffb75 Pull Request resolved: #42762
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
| // TODO: Extend this to support 4-bits once 4-bit qtensor support is added. | ||
| Tensor qembeddingbag_prepack(at::Tensor qweight) { | ||
| Tensor weight_contig = qweight.contiguous(qweight.suggest_memory_format()); | ||
| const uint8_t* weight_data = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to check the dtype here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add a check. We don't check dtype in many ops though, probably because data_ptr would error out for incorrect type
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23070632](https://our.internmc.facebook.com/intern/diff/D23070632) [ghstack-poisoned]
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23070632](https://our.internmc.facebook.com/intern/diff/D23070632) [ghstack-poisoned]
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23070632](https://our.internmc.facebook.com/intern/diff/D23070632) [ghstack-poisoned]
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23070632](https://our.internmc.facebook.com/intern/diff/D23070632) [ghstack-poisoned]
…antized tensor." Summary: Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data. This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too. Note -In the following change I will add TorchBind support for this to support serialization of packed weights. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23070632](https://our.internmc.facebook.com/intern/diff/D23070632) [ghstack-poisoned]
|
This pull request has been merged in 7632a9b. |
|
This pull request has been merged in 7632a9b. |
Stack from ghstack:
Summary:
Use a prepack function that accepts qtensor as an input. The output is a byte tensor with packed data.
This is currently implemented only for 8-bit. In the future once we add 4-bit support this function will be extended to support that too.
Note -In the following change I will add TorchBind support for this to support serialization of packed weights.
Test Plan:
python test/test_quantization.py TestQuantizedEmbeddingBag
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D23070632