-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant][graph] Add support for FP16 dynamic quant #42222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 99716c4 Pull Request resolved: #42222
💊 CI failures summary and remediationsAs of commit 8016494 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 17 times. |
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fdd1e38 Pull Request resolved: #42222
| } | ||
| if (quant_type == QuantType::DYNAMIC) { | ||
| if (isFP16NoopObserver(module, observer)) { | ||
| if (isFp16Observer(observer->input(0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just check dtype here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, was just being more explicit by checking for the observer type as well.
| auto observer_module = module.attr(findObserverName(v).value()).toModule(); | ||
| return (observer_module.attr("dtype") == at::ScalarType::Half) && | ||
| isNoopObserver(observer); | ||
| bool isFp16Observer(Value* observer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this check? I think checking dtype is enough for our purposes?
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0d270ed Pull Request resolved: #42222
| observer_name = 'Fp16Observer = prim::GetAttr[name="_observer_' | ||
| FileCheck().check(observer_name) \ | ||
| .run(m.fc.graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this check is not very useful, what do we want to check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just additional check for observer name match
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, had a few inline comments
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22849220](https://our.internmc.facebook.com/intern/diff/D22849220) [ghstack-poisoned]
Summary: This change adds the necessary passes to perform FP16 dynamic quantization. We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights Test Plan: python test/test_quantization.py TestQuantizeJitOps Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ce6e7dd Pull Request resolved: #42222
|
This pull request has been merged in 6bd46b5. |
Stack from ghstack:
Summary:
This change adds the necessary passes to perform FP16 dynamic quantization.
We skip inserting observers for activations based on the dtype (torch.float16) and only insert the Fp16Observer for weights
Test Plan:
python test/test_quantization.py TestQuantizeJitOps
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D22849220