Skip to content

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Aug 3, 2020

Stack from ghstack:

Summary:

Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.

Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround). Meanwhile, having this is better than not having anything.

Context: There have been both internal and external requests for various
quantized BN1d use cases.

Test Plan:

python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu
python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm

// performance:
// https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D22926254

Summary:

Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.

Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround).

Context: There have been both internal and external requests for various
quantized BN1d use cases.

Test Plan:

```
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu
python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm

// performance:
// https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e

```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Aug 3, 2020

💊 CI failures summary and remediations

As of commit 322626b (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 10 times.

Summary:

Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.

Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround).  Meanwhile, having this is better than not having anything.

Context: There have been both internal and external requests for various
quantized BN1d use cases.

Test Plan:

```
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu
python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm

// performance:
// https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e

```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Aug 3, 2020
Summary:

Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.

Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround).

Context: There have been both internal and external requests for various
quantized BN1d use cases.

Test Plan:

```
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu
python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm

// performance:
// https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e

```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e8890a9
Pull Request resolved: #42491
@vkuzo vkuzo changed the title quant: add q_batchnorm_1d kernel quant: add q_batchnorm_1d op Aug 4, 2020

@skipIfNoFBGEMM
def test_batch_norm2d_relu(self):
def test_batch_norm_1d_2d_3d_relu(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just test_batch_norm_relu and add dimensions in the docstring if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, sounds good


@skipIfNoFBGEMM
def test_batch_norm3d(self):
def test_batch_norm_1d_2d_3d(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Summary:

Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.

Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround).  Meanwhile, having this is better than not having anything.

Context: There have been both internal and external requests for various
quantized BN1d use cases.

Test Plan:

```
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu
python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm

// performance:
// https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e

```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22926254](https://our.internmc.facebook.com/intern/diff/D22926254)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 50f0d2b.

@facebook-github-bot facebook-github-bot deleted the gh/vkuzo/111/head branch August 9, 2020 14:16
MauiDesign pushed a commit to MauiDesign/PyTorchPyTorch that referenced this pull request Aug 16, 2020
Summary:

Hooks up quantized batchnorm_1d to the quantized_bn kernel. Eager mode
hookup will be in a future PR, and graph mode should work after this PR.

Note: currently the implementation is ~2x slower on the benchmark than q_batch_norm2d
because we convert back to contiguous memory format at the end, since
channels_last is only defined for rank >= 4. If further optimization is
needed, that can be a separate PR (will need the NHWC folks to see if
there is a workaround).

Context: There have been both internal and external requests for various
quantized BN1d use cases.

Test Plan:

```
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d
python test/test_quantization.py TestQuantizedOps.test_batch_norm_1d_2d_3d_relu
python test/test_quantization.py TestQuantizeJitOps.test_qbatch_norm

// performance:
// https://gist.github.com/vkuzo/73a07c0f24c05f5804990d9ebfaecf5e

```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 8c1e75b
Pull Request resolved: pytorch/pytorch#42491
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants