-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant][pyper] Add embedding_bag weight quantize and dequantize ops #41293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: cfbca83 Pull Request resolved: #41293
💊 CI failures summary and remediationsAs of commit d7b399d (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 13 times. |
…ntize ops" Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 782b7da Pull Request resolved: #41293
| constexpr int NUM_ELEM_PER_BYTE = 8 / BIT_RATE; | ||
| TORCH_CHECK( | ||
| weight_contig.size(weight.dim() - 1) % NUM_ELEM_PER_BYTE == 0, | ||
| "FloatToFused4BitRowwiseQuantizedOp only works for the number of " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean this check isn't required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message says "FloatToFused4BitRowwiseQuantizedOp". :) It should be "qembeddingbag_4bit_prepack only works for the number of columns a multiple of 2".
…ntize ops" Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22506700](https://our.internmc.facebook.com/intern/diff/D22506700) [ghstack-poisoned]
vkuzo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg, feel free to ignore the comments if this implementation will be replaced in the near future
| auto* output_data = output.data_ptr<uint8_t>(); | ||
| const auto output_columns = output.size(output.dim() - 1); | ||
|
|
||
| for (int row = 0; row < embedding_rows; ++row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does performance matter, or is this a reference implementation? Could probably parallelize if needed (same for the other op)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This op is a temporary solution until we de-couple quantize and packing. We can re-visit optimizations then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
| const float* input_row = weight_data + row * embedding_cols; | ||
| std::uint8_t* output_row = output_data + row * output_columns; | ||
|
|
||
| at::Half* output_row_scale_zp = reinterpret_cast<at::Half*>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional readability nit: if this is packed at the end of a row, maybe we can move the code down to be below the weight packing, so the code structure follows the data format?
…ntize ops" Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22506700](https://our.internmc.facebook.com/intern/diff/D22506700) [ghstack-poisoned]
Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 80fec1a Pull Request resolved: #41293
…ntize ops" Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22506700](https://our.internmc.facebook.com/intern/diff/D22506700) [ghstack-poisoned]
Summary: Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators. This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point. Test Plan: python test/test_quantization.py TestQuantizedEmbeddingBag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fe24a52 Pull Request resolved: #41293
|
This pull request has been merged in 008ab27. |
1 similar comment
|
This pull request has been merged in 008ab27. |
Stack from ghstack:
Summary:
Add new operators that does quantize and packing for 8 bit and 4 bit embedding bag operators.
This is an initial change to help unblock testing. This will be follwed by adding graph mode passes to enable quantization of embedding_bag module
Note to reviewers: Future PRs will replace this op with a separate quantize and pack operator and add support for floating point scale and zero point.
Test Plan:
python test/test_quantization.py TestQuantizedEmbeddingBag
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D22506700