-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Quant] Use the true src zero point to query and create conv pd #90818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant] Use the true src zero point to query and create conv pd #90818
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90818
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0656a6e: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
@Xia-Weiwen Please help to take a look. |
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
| } | ||
| // Since src zero point is unknown, set runtime value here | ||
| op_attr.set_zero_points(DNNL_ARG_SRC, ideep::utils::tensor_zp_mask(1), {DNNL_RUNTIME_S32_VAL}); | ||
| op_attr.set_zero_points(DNNL_ARG_SRC, ideep::utils::tensor_zp_mask(1), src_zero_points); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we use src zero point to initialize the primitive in the cache or use DNNL_RUNTIME_S32_VAL instead? If it is the latter, the primitive cache won't take effect any longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we use src zero point to initialize the primitive in the cache or use
DNNL_RUNTIME_S32_VALinstead? If it is the latter, the primitive cache won't take effect any longer?
We used to use runtime value to create primitive and put it in cache. This PR uses true values of src zero point to create a primitive.
Src zero point is part of the cache key. If src zero point changes, cache misses. It has always been this behavior, not introduced by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. If we were using DNNL_RUNTIME_S32_VAL previously, the src zero point shouldn't be part of the key? Anyway, the change looks good then.
Xia-Weiwen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BTW, try_reorder is no longer needed after this PR #90354 is landed which uses new ideep API. The try_reorder behavior is implemented inside ideep.
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
| qx = torch.quantize_per_tensor(x, scale=1.0, zero_point=0, dtype=torch.quint8) | ||
| # The following should pass when input shape is changed | ||
| torch.ops.quantized.conv2d(qx, w_packed, output_scale=1.0, output_zero_point=0) | ||
| # conv_transposed part |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe split this into a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Put it into a separate test.
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
@jerryzh168 I have changed according to your comments. Could you help to take a look of this PR again? |
ghstack-source-id: ffa41ac Pull Request resolved: pytorch#90818
ghstack-source-id: ffa41ac Pull Request resolved: pytorch#90818
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
Hi @jerryzh168 Is there any other comments to this PR? Could you help to take a look again? |
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
| if 'onednn' not in supported_qengines: | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions. Changed it.
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
…nv pd" **Summary** Previously, we use `DNNL_RUNTIME_S32_VAL` as the `zero point` for `src` in both weight prepack and convolution forward to ensure the same block format of weight is used. The problem is `DNNL_RUNTIME_S32_VAL` may query out a different block format weight comparing with the true `zero point` for `src`. It makes oneDNN convolution into `jit` path instead of `brgconv` path. Here we will use the true `zero point` for `src` to create pd and make reorder if it's a different block format weight as weight prepack generated. **Test Plan** ``` python -m pytest quantization/core/test_quantized_op.py::TestQuantizedConv::test_conv_reorder_issue_onednn ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Summary
Previously, we use
DNNL_RUNTIME_S32_VALas thezero pointforsrcin both weight prepack and convolution forward to ensure the same block format of weight is used. The problem isDNNL_RUNTIME_S32_VALmay query out a different block format weight comparing with the truezero pointforsrc. It makes oneDNN convolution intojitpath instead ofbrgconvpath. Here we will use the truezero pointforsrcto create pd and make reorder if it's a different block format weight as weight prepack generated.Test Plan
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10