-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict #74882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dict Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit cc63999 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things. |
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…dict Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb5b63c Pull Request resolved: #74882
could you elaborate a bit on the use case? I think we are moving the zero_point right now for scalar add? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qadd.cpp#L52 |
if we are moving the zero point, why is the observer shared between input and output? Would it be more correct to not have an observer at the output at all, since that formula assumes correct scale+zp of the input, and calculates scale+zp of the output based on scale+zp of the input? |
Yeah currently we do not model the numerics for add_scalar op correctly, we can't really model a change of zero_point right now I think or it might require some more thinking. Currently sharing observer would help simplify the lowering for add_scalar, since it would have very similar pattern as the normal add. |
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| "pattern": operator.add, | ||
| "num_tensor_args_to_observation_type": { | ||
| # TODO: maybe change this to NO_OBSERVER | ||
| 0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does the zero case actually appear, or does FX inline the calculation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so should we add an assert somewhere for self.num_tensor_args > 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this does occur actually. if the the number inputs are constant they will be inlined, if the number inputs are produced from other ops, then we will have them as a normal Node in the graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there are zero tensors, should we not insert observers then? Observers expect a tensor, violating this will lead to a runtime error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah we do not insert observer actually, this is not really used right now since we have some extra checks in prepare: https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/prepare.py#L316, we can change this to NO_OBSERVER later after we have dtype inference implemented, maybe I can add more comments for TODO, this is used just for the code calling lookups with 0 tensors, to make sure these code don't error out
| "pattern": operator.add, | ||
| "num_tensor_args_to_observation_type": { | ||
| # TODO: maybe change this to NO_OBSERVER | ||
| 0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so should we add an assert somewhere for self.num_tensor_args > 0?
| arg = self.root_node.args[arg_idx] | ||
| if isinstance(arg, Node) and ( | ||
| not all_node_args_have_no_tensors( | ||
| arg, self.modules, cache_for_no_tensor_check)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This logic looks quite complicated. Maybe in a future PR we can rewrite all_node_args_have_no_tensors to instead return the number of tensor args instead, so we don't need to call this recursive function in a loop. I'm also not sure if we need the cache argument, or at least we should avoid exposing it to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah both suggestions sound good
…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…dict (#74882) Summary: Pull Request resolved: #74882 This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo, andrewor14 Differential Revision: D35236032 fbshipit-source-id: 7077f3ccee8a5d8d19b40107cf8ff16cceafc535
Stack from ghstack (oldest at bottom):
Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D35236032