-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant][fx] Fix dynamic weighted op lowering when input is used multiple times #74364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ple times
Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
\-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
\----- dequantize2 - linear2
```
But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
\- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use
Reviewers:
Subscribers:
Tasks:
Tags:
[ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 27a8e0d (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
… used multiple times"
Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
\-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
\----- dequantize2 - linear2
```
But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
\- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)
[ghstack-poisoned]
… used multiple times"
Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
\-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
\----- dequantize2 - linear2
```
But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
\- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)
[ghstack-poisoned]
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| # run the weight observer | ||
| weight_observer_module() | ||
|
|
||
| # this method is temporary will be removed soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the real fix removing the check for whether dequantize nodes have multiple users in the lowering code? Is there a reason why we don't just directly change that instead of adding a temporary fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the real fix is to stop duplicating dequantize and remove the check, planning to talk to you in today's sync
… used multiple times"
Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
\-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
\----- dequantize2 - linear2
```
But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
\- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)
[ghstack-poisoned]
… used multiple times"
Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
\-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
\----- dequantize2 - linear2
```
But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
\- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)
[ghstack-poisoned]
|
@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…ple times (#74364) Summary: Pull Request resolved: #74364 if a input is used multiple times in modules that are dynamically quantized: ``` x -- linear1 \-- linear2 ``` we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass to duplicate dequantize ops for pattern matching: ``` x - quantize_per_tensor_dynamic - dequantize1 - linear1 \----- dequantize2 - linear2 ``` But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case to recover both patterns: ``` x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1 \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2 ``` so that they can be fused into dynamic linear: ``` x - linear_dynamic1 \-- linear_dynamic2 ``` Test Plan: python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use Imported from OSS Reviewed By: yixin94 Differential Revision: D34952755 fbshipit-source-id: a950159fd6a661e84faf0baf1692f6783904cfb3
…ple times (#74364) Summary: Pull Request resolved: #74364 if a input is used multiple times in modules that are dynamically quantized: ``` x -- linear1 \-- linear2 ``` we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass to duplicate dequantize ops for pattern matching: ``` x - quantize_per_tensor_dynamic - dequantize1 - linear1 \----- dequantize2 - linear2 ``` But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case to recover both patterns: ``` x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1 \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2 ``` so that they can be fused into dynamic linear: ``` x - linear_dynamic1 \-- linear_dynamic2 ``` Test Plan: python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use Imported from OSS Reviewed By: yixin94 Differential Revision: D34952755 fbshipit-source-id: a950159fd6a661e84faf0baf1692f6783904cfb3 (cherry picked from commit 8a68968)
Stack from ghstack (oldest at bottom):
Summary:
if a input is used multiple times in modules that are dynamically quantized:
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
so that they can be fused into dynamic linear:
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D34952755