-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant] Add FP16Observer for fp16 quant support #42221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph Test Plan: python test/test_quantizaton.py TestObserver.test_fp16_observer Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit e8d108d (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 10 times. |
Summary: Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph Test Plan: python test/test_quantizaton.py TestObserver.test_fp16_observer Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
| super(HistogramObserver, self)._load_from_state_dict(state_dict, prefix, local_metadata, strict, | ||
| missing_keys, unexpected_keys, error_msgs) | ||
|
|
||
| class Fp16Observer(ObserverBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be the same as min max observer except for the warning, can we just use min max observer and put warning somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have calculate_qparams and other methods defined for this observer. Also, feel that from user standpoint it might be better to separate this from MinMax observer since that actually observes the values to calculate the qparams. I can add a docblock to better explain the usage of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, since we just use min_val/max_val for warning, can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can remove the warning then we can just use NoOp observer for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the warning is useful for users to know if their weight values might be getting saturated and it can tell them that they might need to update the model. So users can just insert the observers and run the model to check for potential overflow issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be done during convert though? we only have one user facing API quantize_dynamic_jit
Summary: Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph Test Plan: python test/test_quantizaton.py TestObserver.test_fp16_observer Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph Test Plan: python test/test_quantizaton.py TestObserver.test_fp16_observer Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22849222](https://our.internmc.facebook.com/intern/diff/D22849222) [ghstack-poisoned]
|
This pull request has been merged in 8c5bf10. |
Stack from ghstack:
Summary:
Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph
Test Plan:
python test/test_quantizaton.py TestObserver.test_fp16_observer
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D22849222