[quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict #74882

jerryzh168 · 2022-03-29T01:51:25Z

Stack from ghstack (oldest at bottom):

Summary:
This PR adds support for ops like add/mul in backend_config_dict, these ops have different
observation_type based on the number of tensor inputs, when number of tensor inputs is 1,
we will share the output observer with input, otherwise we'll have a new observer.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D35236032

…dict Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

facebook-github-bot · 2022-03-29T01:51:32Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74882
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit cc63999 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

vkuzo · 2022-03-29T21:26:04Z

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

torch/ao/quantization/fx/backend_config/native.py

…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jerryzh168 · 2022-03-29T23:55:42Z

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]

…dict Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb5b63c Pull Request resolved: #74882

jerryzh168 · 2022-03-30T04:31:43Z

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

could you elaborate a bit on the use case? I think we are moving the zero_point right now for scalar add? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qadd.cpp#L52

vkuzo · 2022-03-30T13:05:52Z

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

could you elaborate a bit on the use case? I think we are moving the zero_point right now for scalar add? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qadd.cpp#L52

if we are moving the zero point, why is the observer shared between input and output? Would it be more correct to not have an observer at the output at all, since that formula assumes correct scale+zp of the input, and calculates scale+zp of the output based on scale+zp of the input?

jerryzh168 · 2022-03-30T15:36:41Z

when number of tensor inputs is 1,
we will share the output observer with input

I had a question about this. If at a later time we want to improve this numerically, is the framework going to extend easily to support this? For example, let's say there are some cases when we can just move the zero_point for scalar_add instead of recalculating the scale. It's not p0 to do it now, but it would be good to have a path to do it later for edge casey things.

could you elaborate a bit on the use case? I think we are moving the zero_point right now for scalar add? https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qadd.cpp#L52

if we are moving the zero point, why is the observer shared between input and output? Would it be more correct to not have an observer at the output at all, since that formula assumes correct scale+zp of the input, and calculates scale+zp of the output based on scale+zp of the input?

Yeah currently we do not model the numerics for add_scalar op correctly, we can't really model a change of zero_point right now I think or it might require some more thinking. Currently sharing observer would help simplify the lowering for add_scalar, since it would have very similar pattern as the normal add.

…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]

jerryzh168 · 2022-04-02T00:53:16Z

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vkuzo · 2022-04-04T14:27:34Z

torch/ao/quantization/fx/backend_config/native.py

+    "pattern": operator.add,
+    "num_tensor_args_to_observation_type": {
+        # TODO: maybe change this to NO_OBSERVER
+        0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,


nit: does the zero case actually appear, or does FX inline the calculation?

If so should we add an assert somewhere for self.num_tensor_args > 0?

yeah this does occur actually. if the the number inputs are constant they will be inlined, if the number inputs are produced from other ops, then we will have them as a normal Node in the graph

if there are zero tensors, should we not insert observers then? Observers expect a tensor, violating this will lead to a runtime error.

oh, yeah we do not insert observer actually, this is not really used right now since we have some extra checks in prepare: https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/prepare.py#L316, we can change this to NO_OBSERVER later after we have dtype inference implemented, maybe I can add more comments for TODO, this is used just for the code calling lookups with 0 tensors, to make sure these code don't error out

andrewor14 · 2022-04-04T15:18:04Z

torch/ao/quantization/fx/backend_config/native.py

+    "pattern": operator.add,
+    "num_tensor_args_to_observation_type": {
+        # TODO: maybe change this to NO_OBSERVER
+        0: ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,


If so should we add an assert somewhere for self.num_tensor_args > 0?

andrewor14 · 2022-04-04T15:25:56Z

torch/ao/quantization/fx/quantization_patterns.py

+                arg = self.root_node.args[arg_idx]
+                if isinstance(arg, Node) and (
+                        not all_node_args_have_no_tensors(
+                            arg, self.modules, cache_for_no_tensor_check)):


nit: This logic looks quite complicated. Maybe in a future PR we can rewrite all_node_args_have_no_tensors to instead return the number of tensor args instead, so we don't need to call this recursive function in a loop. I'm also not sure if we need the cache argument, or at least we should avoid exposing it to the caller.

yeah both suggestions sound good

torch/ao/ns/fx/mappings.py

…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]

jerryzh168 · 2022-04-04T23:31:03Z

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]

jerryzh168 · 2022-04-04T23:53:03Z

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…kend_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]

…end_config_dict" Summary: This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D35236032](https://our.internmc.facebook.com/intern/diff/D35236032) [ghstack-poisoned]

jerryzh168 · 2022-04-05T17:41:04Z

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…dict (#74882) Summary: Pull Request resolved: #74882 This PR adds support for ops like add/mul in backend_config_dict, these ops have different observation_type based on the number of tensor inputs, when number of tensor inputs is 1, we will share the output observer with input, otherwise we'll have a new observer. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo, andrewor14 Differential Revision: D35236032 fbshipit-source-id: 7077f3ccee8a5d8d19b40107cf8ff16cceafc535

This was referenced Mar 29, 2022

[quant][fx] Remove is_output_quantized from QuantizeHandler #74843

Closed

[quant][refactor] Refactor find_matches for easier future extension #74878

Closed

facebook-github-bot added the cla signed label Mar 29, 2022

facebook-github-bot added the module: fx label Mar 29, 2022

jerryzh168 mentioned this pull request Mar 29, 2022

[quant][fx] Add support for root_node_getter in initializing quantize handlers #74883

Closed

jerryzh168 requested review from andrewor14 and vkuzo and removed request for andrewor14 March 29, 2022 02:30

jerryzh168 added the topic: not user facing topic category label Mar 29, 2022

jerryzh168 added 2 commits March 28, 2022 19:45

vkuzo reviewed Mar 29, 2022

View reviewed changes

torch/ao/quantization/fx/backend_config/native.py Outdated Show resolved Hide resolved

vkuzo reviewed Mar 29, 2022

View reviewed changes

torch/ao/quantization/fx/backend_config/native.py Show resolved Hide resolved

jerryzh168 requested a review from vkuzo March 31, 2022 20:54

jerryzh168 requested a review from vkuzo April 1, 2022 20:56

vkuzo reviewed Apr 4, 2022

View reviewed changes

vkuzo approved these changes Apr 4, 2022

View reviewed changes

andrewor14 changed the title ~~[quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict~~ [quant][fx] Add support for BinaryOpQuantizeHandler in backend_config_dict Apr 4, 2022

andrewor14 approved these changes Apr 4, 2022

View reviewed changes

jerryzh168 mentioned this pull request Apr 4, 2022

[quant][fx] Remove Standalone and CustomModule QuantizeHandler type checks in prepare #75202

Closed

jerryzh168 added 3 commits April 4, 2022 14:17

jerryzh168 mentioned this pull request Apr 5, 2022

[quant][fx] Move all binary op configs to backend_config_dict #75241

Closed

This was referenced Apr 5, 2022

[quant][fx] Remove the remaining registrations in BinaryOpQuantizeHandler #75258

Closed

[quant][fx] Add cat to backend_config_dict #75259

Closed

[quant][fx] Add BatchNorm ops to backend_config_dict #75260

Closed

jerryzh168 changed the title ~~[quant][fx] Add support for BinaryOpQuantizeHandler in backend_config_dict~~ [quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict Apr 5, 2022

pytorchmergebot closed this in 9817875 Apr 6, 2022

facebook-github-bot deleted the gh/jerryzh168/761/head branch April 9, 2022 14:16

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

[quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict #74882

[quant][fx] Add support for BinarOpQuantizeHandler in backend_config_dict #74882

Uh oh!

Conversation

jerryzh168 commented Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

vkuzo commented Mar 29, 2022

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Mar 29, 2022

Uh oh!

jerryzh168 commented Mar 30, 2022

Uh oh!

vkuzo commented Mar 30, 2022

Uh oh!

jerryzh168 commented Mar 30, 2022

Uh oh!

jerryzh168 commented Apr 2, 2022

Uh oh!

vkuzo Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

andrewor14 Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

vkuzo Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

andrewor14 Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 commented Apr 4, 2022

Uh oh!

jerryzh168 commented Apr 4, 2022

Uh oh!

jerryzh168 commented Apr 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jerryzh168 commented Mar 29, 2022 •

edited

Loading

facebook-github-bot commented Mar 29, 2022 •

edited

Loading

jerryzh168 Apr 4, 2022 •

edited

Loading

jerryzh168 Apr 4, 2022 •

edited

Loading