-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Quant][Fx] Fix issue: qconfig_mappings of onednn backend are not correctly set for fused modules #91297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant][Fx] Fix issue: qconfig_mappings of onednn backend are not correctly set for fused modules #91297
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91297
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f850cad: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
a5b01ce to
e47778d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does set qconfig to None mean we use FP32 for Tanh? I thought we should use same qconfig for linear and Tanh to enable the fusion, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, None means don't quantize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Good point. I thought None meant global and I meant to use global qconfig for tanh so that tanh and qlinear use the same qconfig. Now that it was wrong and did not work, I have to find a new way to fix it.
It is required that QConfig must specify a FixedQParamsObserver or a FixedQParamsFakeQuantize for fixed qparams ops (e.g. tanh).
pytorch/torch/ao/quantization/fx/utils.py
Line 811 in ad782ff
| if not isinstance(activation_post_process, FixedQParamsObserver) and \ |
@jerryzh168 Do you know any solution or workaround? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is no easy way to set correct qconfig_mappings for both tanh and linear - tanh fusion. I removed these lines. Users will need to set qconfig manually if they want to fuse linear - tanh with onednn backend.
e47778d to
5274795
Compare
|
Hi @jgong5 Could you review this PR? Thanks! |
jgong5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the change. But requiring same qconfig for all the ops in a fusion pattern sounds a too rigid requirement. May I know if this will be improved in the PT 2.0 quant flow? @jerryzh168
c16bb47 to
ae7f76b
Compare
ae7f76b to
0e417e3
Compare
|
Hi @jerryzh168 Please review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not part of get_default_qconfig_mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I just saw the summary message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please add TODO comment to all callsites that is using this, we want to refactor them later when we have new api ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok It's been added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a comment here summarizing what you said in the summary? it will be helpful for future reference when we have the new api ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok It's been added.
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment in the code to describe the current status of how to set qconfig for linear - tanh fusion
yes, this will be improved in quantize_pt2e flow |
aaf321d to
b90b5fe
Compare
|
@pytorchbot merge |
Merge failedReason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at: Details for Dev Infra teamRaised by workflow job |
b90b5fe to
895cf15
Compare
…rectly set for fused modules
895cf15 to
f850cad
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
#90001) **Summary** This work continues with #83784 by @vkuzo and includes all the changes in that PR. Quote from #83784: > Issue #83658 reports that ops followed by a certain pattern of `view` and `size` ops were not quantized correctly by FX graph mode quantization. Before this PR, the "size" op was in the "op shares qparams with input" category, and the code assumed that the input of this op has the same dtype as its output. This led to incorrectly propagating the `int` dtype as the output of whichever op was preceding the `view` op, which in turn made that op blocklisted from quantization. > The fix is to create a new category of ops which work on different dtypes of tensors but are not observed. This PR does so for `size`, and also for `shape` since it works the same way. **Note**: This PR needs #91297 to be landed first otherwise there is a UT failure. **Test plan** ``` python test/test_quantization.py -k test_linear_size_view python test/test_quantization.py -k test_linear_shape_view ``` Pull Request resolved: #90001 Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Summary
For onednn quantization backend only.
Currently, FX fusion requires that all separate ops in a fused module/op have the same
qconfig. To supportlinear - leaky_reluandlinear - tanhfusion with onednn backend, we previously explicitly set the sameqconfigtolinear,leaky_reluandtanh. However, this brings two problems:linear - relusincereludoes not have the sameqconfigaslineardoes. And it does not look good if we setqconfigto all these ops. They should use a globalqconfigby default.Tanhrequiresfixed_qparams_qconfigotherwise it is not quantized. So, we cannot set anotherqconfigtotanh.Looks like there is not a straightforward way to solve the problems. This PR fixes them by the following:
qconfigto these ops so that these ops use a globalqconfigandlinear - reluandlinear - leaky_relucan be fused correctly.qconfigtolinearandtanhmanually by users when they want to fuselinear - tanhwith onednn backend.A known issue still exists: users cannot fuse
linear - tanhand quantize standalonetanhat the same time.Test plan
python test/test_quantization.py -k test_qconfig_dict_with_fused_modules
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10