-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Quant] Add fused conv2d_add_relu op for onednn backend #90364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant] Add fused conv2d_add_relu op for onednn backend #90364
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90364
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e4877f1: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` **TODO** There is a oneDNN issue which may cause the kernel core dump with some input shapes. This PR should be merged after the issue resolved. cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` **TODO** There is a oneDNN issue which may cause the kernel core dump with some input shapes. This PR should be merged after the issue resolved. cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` **TODO** There is a oneDNN issue which may cause the kernel core dump with some input shapes. This PR should be merged after the issue resolved. cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` **TODO** There is a oneDNN issue which may cause the kernel core dump with some input shapes. This PR should be merged after the issue resolved. cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
@jerryzh168 Thanks for the comments and I have changed them. Could you help to take a look of this PR again? |
ghstack-source-id: 7c95466 Pull Request resolved: pytorch#90364
ghstack-source-id: 7c95466 Pull Request resolved: pytorch#90364
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
Hi @jerryzh168 Is there any other comments to this PR? Could you help to take a look again? |
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
| Y_scale, Y_zero_point, use_bias, "add", use_channelwise, False, | ||
| input_dtype=X_qdtype, output_dtype=X_qdtype, X2_scale=X2_scale, X2_zero_point=X2_zero_point) | ||
|
|
||
| @given(batch_size=st.integers(1, 3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't add new calls to hypothesis, it has caused a lot of flaky test errors in CI before, can you change them to loops instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. I have changed it with itertools.product to use for loop. Please help to take a look again @jerryzh168.
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
Hi @jerryzh168, thanks your review and comments for this ghstack. The previous comments have all been fixed. Could you kindly take a look of this ghstack again? There are still some PRs inside this ghstack may need your approval. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
I have a build error blamed to this PR when trying to build PyTorch on my machine: https://gist.github.com/vkuzo/43fa00f625fa099eb50b1e41bf8a9b2e, specifically the |
|
is this related to the version of ideep dependency? |
|
@vkuzo Can you try update your ideep version to same as PyTorch master? |
|
thanks, my |
Stack from ghstack (oldest at bottom):
Summary
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused conv2d_add_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown.
Test Plan
cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10