-
Notifications
You must be signed in to change notification settings - Fork 26.3k
fx quant: fix issue with FX quant for x.view(x.size(...), ...) #83784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: #83658 reported that ops followed by a certain pattern of `view` and `size` ops were not quantized correctly by FX graph mode quantization. Before this PR, the "size" op was in the "op shares qparams with input" category, and the code assumed that the input of this op has the same dtype as its output. This led to incorrectly propagating the `int` dtype as the output of whichever op was preceding the `view` op, which in turn made that op blocklisted from quantization. The fix is to create a new category of ops which work on different dtypes of tensors but are not observed. This PR does so for `size`, and also for `shape` since it works the same way. Test plan: ``` python test/test_quantization.py -k test_linear_view_size ``` [ghstack-poisoned]
🔗 Helpful links
❌ 1 New Failures, 1 Base FailuresAs of commit d30244c (more details on the Dr. CI page): Expand to see more
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
Summary: #83658 reported that ops followed by a certain pattern of `view` and `size` ops were not quantized correctly by FX graph mode quantization. Before this PR, the "size" op was in the "op shares qparams with input" category, and the code assumed that the input of this op has the same dtype as its output. This led to incorrectly propagating the `int` dtype as the output of whichever op was preceding the `view` op, which in turn made that op blocklisted from quantization. The fix is to create a new category of ops which work on different dtypes of tensors but are not observed. This PR does so for `size`, and also for `shape` since it works the same way. Test plan: ``` python test/test_quantization.py -k test_linear_view_size ``` ghstack-source-id: e94248a Pull Request resolved: #83784
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about input? do we insert observers for input of these ops?
I don't know why that would be needed today. What are your thoughts? |
Not saying input observer are needed, just asking to document the behavior clearly, since previous observation types are talking about the relationship between input and output observer and assumes both are inserted. I think the question is whether we need to dequantize when user call quantized: how can we have Which version do we have right now and how do we insert observers in the flow? |
| OUTPUT_SHARE_OBSERVER_WITH_INPUT = 1 | ||
| # this means the output is never observed | ||
| # example: x.shape, x.size | ||
| OUTPUT_NOT_OBSERVED = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if both input and output are not observed maybe we want to rename this to INPUT_OUTPUT_NOT_OBSERVED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, sure, that sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, can we also document if we will dequantize the Tensor before calling tensor.size/tensor.shape here as well? see #83784 (comment)
| prepared_model = prepare_fx(model_fp32, qconfig_mapping, x) | ||
| prepared_model(x) | ||
| quantized_model = convert_fx(prepared_model) | ||
| self.assertTrue(type(quantized_model.linear) == nnq.Linear) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also test how many dequantize op appears in the model as well? I think ideally we should not dequantize the Tensor to get the size
|
/easycla As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details. This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign. |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
#90001) **Summary** This work continues with #83784 by @vkuzo and includes all the changes in that PR. Quote from #83784: > Issue #83658 reports that ops followed by a certain pattern of `view` and `size` ops were not quantized correctly by FX graph mode quantization. Before this PR, the "size" op was in the "op shares qparams with input" category, and the code assumed that the input of this op has the same dtype as its output. This led to incorrectly propagating the `int` dtype as the output of whichever op was preceding the `view` op, which in turn made that op blocklisted from quantization. > The fix is to create a new category of ops which work on different dtypes of tensors but are not observed. This PR does so for `size`, and also for `shape` since it works the same way. **Note**: This PR needs #91297 to be landed first otherwise there is a UT failure. **Test plan** ``` python test/test_quantization.py -k test_linear_size_view python test/test_quantization.py -k test_linear_shape_view ``` Pull Request resolved: #90001 Approved by: https://github.com/jgong5, https://github.com/jerryzh168
Stack from ghstack:
Summary:
#83658 reported that ops
followed by a certain pattern of
viewandsizeops were not quantizedcorrectly by FX graph mode quantization.
Before this PR, the "size" op was in the "op shares qparams with input"
category, and the code assumed that the input of this op has the same dtype
as its output. This led to incorrectly propagating the
intdtypeas the output of whichever op was preceding the
viewop, which in turnmade that op blocklisted from quantization.
The fix is to create a new category of ops which work on different dtypes
of tensors but are not observed. This PR does so for
size, and also forshapesince it works the same way.Test plan:
cc @ezyang @SherlockNoMad @soumith @EikanWang @jgong5 @wenzhe-nrv