-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant] Add Graph Mode Passes to quantize EmbeddingBag operators #41612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…rators" Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 3210bd9 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 15 times. |
…rators" Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342) [ghstack-poisoned]
Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5d8a834 Pull Request resolved: #41612
vkuzo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, accepting to unblock. Feel free to wait for @jerryzh168 if a deeper review on the JIT pass is needed.
| offsets = torch.tensor([0, 19, 20, 28, 28, 32]) | ||
|
|
||
| from torch.quantization import QConfigDynamic, NoopObserver | ||
| int4_dynamic_qconfig = QConfigDynamic(activation=NoopObserver.with_args(custom_op_name="embedding_bag_4bit"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense that custom_op_name is a temporary solution. Do we have thoughts on what to replace it with eventually / why not now? Would it be EmbeddingBag{8|4|2}BitObserver / something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the longer term solution? are we planning to add torch.qint4, torch.qint2 etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, long term this will be replaced with observer that can support torch.qint4 and torch.qint2
| auto observer_module = module.attr(findObserverName(v).value()).toModule(); | ||
| if (observer_module.hasattr("custom_op")) { | ||
| auto op_name = observer_module.attr("custom_op").toStringRef(); | ||
| return isNoopObserver(observer) ? op_name : ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up PR: since NoopObserver is special, probably better to add a "_" prefix to reserve this for internal use.
| } | ||
| // Insert prepack op | ||
| Node* prepack = g->create(Symbol::fromQualString(prepack_fn), prepack_inputs); | ||
| g->insertNode(prepack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add a WithInsertPoint? we want to insert before the use node I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should just be: WithInsertPoint ins(embedding_bag_loat_op);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've already added the insert point in insertQuantizationOps at the output of the observer node.
| for (const Use& use : uses) { | ||
| if (matchCallFuncToUse(use, "embedding_bag", 2)) { | ||
| embedding_bag_float_op = use.user; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the possible cases? is the observer_out always going to be used by an embedding_bag op here?
…rators" Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342) [ghstack-poisoned]
Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 845f2a0 Pull Request resolved: #41612
…rators" Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342) [ghstack-poisoned]
Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 76a8c2a Pull Request resolved: #41612
…rators" Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342) [ghstack-poisoned]
| inputs.push_back(g->insertGetAttr(self, qparam_name)); | ||
| // Temporary solution to quantize embedding_bag operators. | ||
| auto embedding_bag_name = getEmbeddingBagObsName(module, observer); | ||
| if (quant_type == QuantType::DYNAMIC && embedding_bag_name && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you merge this branch with the one lin L399 now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer keeping it separate since this is a special case and this code will be removed in the future. Wanted to make that a little more obvious :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might also be good to have a isEmbeddingBagOp function to be consistent with isFP16NoopObserver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, sure. do you mean we plan to remove the swapping of input and weight in the future in the embedding bag module? I think this will be needed in graph mode if that does not change
| observer_out->replaceAllUsesWith(original_val); | ||
| original_val->replaceAllUsesAfterNodeWith(dequant, dequant->output()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two lines are the same as 417-419
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! thanks. had a few more inline comments.
| } | ||
|
|
||
| // find the observer for Value `v` and return the name of the observer | ||
| c10::optional<std::string> findObserverName(Value* v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw we can check for types now I think
…rators" Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342) [ghstack-poisoned]
Summary: This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7928558 Pull Request resolved: #41612
|
This pull request has been merged in 36fb14b. |
Stack from ghstack:
Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.
To quantize these operators, specify the operator name in the
custom_op_namefield of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.
Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.
NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.
Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D22609342