-
Notifications
You must be signed in to change notification settings - Fork 26.3k
quant: add q_batchnorm_1d op #42491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quant: add q_batchnorm_1d op #42491
Conversation
Summary: Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode hookup will be in a future PR, and graph mode should work after this PR. Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d because we convert back to contiguous memory format at the end, since channels_last is only defined for rank >= 4. If further optimization is needed, that can be a separate PR (will need the NHWC folks to see if there is a workaround). Context: There have been both internal and external requests for various quantized BN1d use cases. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm // performance: // https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 322626b (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 10 times. |
Summary: Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode hookup will be in a future PR, and graph mode should work after this PR. Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d because we convert back to contiguous memory format at the end, since channels_last is only defined for rank >= 4. If further optimization is needed, that can be a separate PR (will need the NHWC folks to see if there is a workaround). Meanwhile, having this is better than not having anything. Context: There have been both internal and external requests for various quantized BN1d use cases. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm // performance: // https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode hookup will be in a future PR, and graph mode should work after this PR. Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d because we convert back to contiguous memory format at the end, since channels_last is only defined for rank >= 4. If further optimization is needed, that can be a separate PR (will need the NHWC folks to see if there is a workaround). Context: There have been both internal and external requests for various quantized BN1d use cases. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm // performance: // https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e8890a9 Pull Request resolved: #42491
|
|
||
| @skipIfNoFBGEMM | ||
| def test_batch_norm2d_relu(self): | ||
| def test_batch_norm_1d_2d_3d_relu(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe just test_batch_norm_relu and add dimensions in the docstring if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, sounds good
|
|
||
| @skipIfNoFBGEMM | ||
| def test_batch_norm3d(self): | ||
| def test_batch_norm_1d_2d_3d(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
jerryzh168
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Summary: Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode hookup will be in a future PR, and graph mode should work after this PR. Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d because we convert back to contiguous memory format at the end, since channels_last is only defined for rank >= 4. If further optimization is needed, that can be a separate PR (will need the NHWC folks to see if there is a workaround). Meanwhile, having this is better than not having anything. Context: There have been both internal and external requests for various quantized BN1d use cases. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm // performance: // https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D22926254](https://our.internmc.facebook.com/intern/diff/D22926254) [ghstack-poisoned]
|
This pull request has been merged in 50f0d2b. |
Summary: Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode hookup will be in a future PR, and graph mode should work after this PR. Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d because we convert back to contiguous memory format at the end, since channels_last is only defined for rank >= 4. If further optimization is needed, that can be a separate PR (will need the NHWC folks to see if there is a workaround). Context: There have been both internal and external requests for various quantized BN1d use cases. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm // performance: // https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8c1e75b Pull Request resolved: pytorch/pytorch#42491
Stack from ghstack:
Summary:
Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.
Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround). Meanwhile, having this is better than not having anything.
Context: There have been both internal and external requests for various
quantized BN1d use cases.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D22926254