-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Quant] Add fused conv2d_add op for onednn backend #90262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant] Add fused conv2d_add op for onednn backend #90262
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90262
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 8baef3e: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
ghstack-source-id: 2b22fc9 Pull Request resolved: pytorch#90262
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
| Y_zero_point=st.integers(0, 4), | ||
| use_bias=st.booleans(), | ||
| use_relu=st.booleans(), | ||
| post_op=st.sampled_from(["none", "relu"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be a separate PR, but might make sense to split the conv and conv_relu test as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions, I have split the conv and conv_relu test and so as the other similar test cases.
| Y_zero_point=st.sampled_from([0]), | ||
| use_bias=st.booleans(), | ||
| use_relu=st.booleans(), | ||
| post_op=st.sampled_from(["none", "relu"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions, split them into separate tests.
| if post_op == "add": | ||
| qconv = torch.ops.quantized.conv2d_add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is only "add" we can remove the post_op argument and also this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions, I have removed the post_op argument and the check. In next PR, I will put conv2d_add_relu into a separate test.
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
ghstack-source-id: bb2010e Pull Request resolved: pytorch#90262
ghstack-source-id: bb2010e Pull Request resolved: pytorch#90262
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `conv2d_add` op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** ``` python -m pytest test_quantization.py::TestQuantizedConv ``` cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui PenghuiCheng jianyuh min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Summary
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused
conv2d_addop for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown.Test Plan
cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen