Skip to content

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Mar 29, 2022

Stack from ghstack (oldest at bottom):

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D35236032

…dict

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 29, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit cc63999 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@vkuzo
Copy link
Contributor

vkuzo commented Mar 29, 2022

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Mar 30, 2022
…dict

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: fb5b63c
Pull Request resolved: #74882
@jerryzh168
Copy link
Contributor Author

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

could you elaborate a bit on the use case? I think we are moving the zero_point right now for scalar add? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qadd.cpp#L52

@vkuzo
Copy link
Contributor

vkuzo commented Mar 30, 2022

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

could you elaborate a bit on the use case? I think we are moving the zero_point right now for scalar add? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qadd.cpp#L52

if we are moving the zero point, why is the observer shared between input and output? Would it be more correct to not have an observer at the output at all, since that formula assumes correct scale+zp of the input, and calculates scale+zp of the output based on scale+zp of the input?

@jerryzh168
Copy link
Contributor Author

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

could you elaborate a bit on the use case? I think we are moving the zero_point right now for scalar add? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qadd.cpp#L52

if we are moving the zero point, why is the observer shared between input and output? Would it be more correct to not have an observer at the output at all, since that formula assumes correct scale+zp of the input, and calculates scale+zp of the output based on scale+zp of the input?

Yeah currently we do not model the numerics for add_scalar op correctly, we can't really model a change of zero_point right now I think or it might require some more thinking. Currently sharing observer would help simplify the lowering for add_scalar, since it would have very similar pattern as the normal add.

@jerryzh168 jerryzh168 requested a review from vkuzo March 31, 2022 20:54
…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
@jerryzh168 jerryzh168 requested a review from vkuzo April 1, 2022 20:56
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

"pattern": operator.add,
"num_tensor_args_to_observation_type": {
# TODO: maybe change this to NO_OBSERVER
0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does the zero case actually appear, or does FX inline the calculation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so should we add an assert somewhere for self.num_tensor_args > 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this does occur actually. if the the number inputs are constant they will be inlined, if the number inputs are produced from other ops, then we will have them as a normal Node in the graph

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are zero tensors, should we not insert observers then? Observers expect a tensor, violating this will lead to a runtime error.

Copy link
Contributor Author

@jerryzh168 jerryzh168 Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yeah we do not insert observer actually, this is not really used right now since we have some extra checks in prepare: https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/prepare.py#L316, we can change this to NO_OBSERVER later after we have dtype inference implemented, maybe I can add more comments for TODO, this is used just for the code calling lookups with 0 tensors, to make sure these code don't error out

@andrewor14 andrewor14 changed the title [quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict [quant][fx] Add support for BinaryOpQuantizeHandler in backend_config_dict Apr 4, 2022
"pattern": operator.add,
"num_tensor_args_to_observation_type": {
# TODO: maybe change this to NO_OBSERVER
0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so should we add an assert somewhere for self.num_tensor_args > 0?

arg = self.root_node.args[arg_idx]
if isinstance(arg, Node) and (
not all_node_args_have_no_tensors(
arg, self.modules, cache_for_no_tensor_check)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This logic looks quite complicated. Maybe in a future PR we can rewrite all_node_args_have_no_tensors to instead return the number of tensor args instead, so we don't need to call this recursive function in a loop. I'm also not sure if we need the cache argument, or at least we should avoid exposing it to the caller.

Copy link
Contributor Author

@jerryzh168 jerryzh168 Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah both suggestions sound good

…kend_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
…kend_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
…kend_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
…kend_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…kend_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…kend_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
…kend_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
@jerryzh168 jerryzh168 changed the title [quant][fx] Add support for BinaryOpQuantizeHandler in backend_config_dict [quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict Apr 5, 2022
…end_config_dict"

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032)

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Apr 6, 2022
…dict (#74882)

Summary:
Pull Request resolved: #74882

This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo, andrewor14

Differential Revision: D35236032

fbshipit-source-id: 7077f3ccee8a5d8d19b40107cf8ff16cceafc535
@facebook-github-bot facebook-github-bot deleted the gh/jerryzh168/761/head branch April 9, 2022 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants