[Quant] Add fused linear-leaky_relu op for onednn backend #88478

Xia-Weiwen · 2022-11-04T04:37:00Z

Stack from ghstack (oldest at bottom):

Summary
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused linear-leaky_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown.

Test Plan
python test_quantization.py TestQuantizedLinear

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @leslie-fang-intel @VitalyFedyunin @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

pytorch-bot · 2022-11-04T04:37:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88478

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c74af6a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 5356bc9 Pull Request resolved: #88478

Xia-Weiwen · 2022-11-04T11:43:55Z

This is part of replacement of #76424 which is too big to land.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 3215ca4 Pull Request resolved: #88478

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Xia-Weiwen · 2022-11-17T06:07:36Z

Also for this one and future PRs, please add a "Summary" that talks about the motivation and context for the change and a "Test Plan" section that talks about how to run the test, e.g.: #88620

Sure. I will add description later.

I have added summary and test plan.

**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** python test_quantization.py TestQuantizedLinear cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

jerryzh168

looks good to me

jerryzh168 · 2022-11-23T21:06:58Z

btw the Summary and Test Plan should appear after the stack..

**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** python test_quantization.py TestQuantizedLinear cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Xia-Weiwen · 2022-11-24T11:32:44Z

@pytorchbot merge

pytorchmergebot · 2022-11-24T11:34:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-11-24T12:25:14Z

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / cuda11.6-py3.10-gcc7-sm86 / test (default, 1, 4, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

Xia-Weiwen · 2022-11-25T09:24:54Z

Hi @jerryzh168. The separated test_qlinear and test_qlinear_relu failed so merge failed. But they look good when running locally. I also created a draft PR which only separated the two tests without other changes, just for test (#89678) But they also failed.
Do you have any ideas why?

**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** python test_quantization.py TestQuantizedLinear cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Xia-Weiwen · 2022-12-01T00:56:00Z

Hi @jerryzh168. The separated test_qlinear and test_qlinear_relu failed so merge failed. But they look good when running locally. I also created a draft PR which only separated the two tests without other changes, just for test (#89678) But they also failed. Do you have any ideas why?

Hi, @jerryzh168. It was due to some weird rounding errors, I think. Y_scale = 125.1234 caused the issue in fbgemm:

pytorch/test/quantization/core/test_quantized_op.py

Line 3639 in a43e09c

Y_scale = 125.1234

It has been good before we separated test_qlinear and test_qlinear_relu but the issue occurred after the separation.
To avoid the issue and keep the test cases reasonable, I changed the scale value to 12.34. Now all checks have passed. How does that sound to you?

**Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** python test_quantization.py TestQuantizedLinear cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Xia-Weiwen · 2022-12-05T12:53:21Z

Hi @jerryzh168 It's been a while since your last review. Could you take a look again? Do you have more comments on this PR and others? Thanks!

jerryzh168 · 2022-12-05T20:14:02Z

aten/src/ATen/native/quantized/library.cpp

  m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_relu_dynamic(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack, bool reduce_range=False) -> Tensor Y"));
  m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_dynamic_fp16(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack) -> Tensor Y"));
  m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_relu_dynamic_fp16(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack) -> Tensor Y"));
+  m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_leaky_relu(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack, float Y_scale_i, int Y_zero_point_i, float negative_slope) -> Tensor Y"));


nit: what does i in Y_scale_i mean?

nit: what does i in Y_scale_i mean?

Good question. I don't know. I just copied the arg name from above to look aligned

pytorch/aten/src/ATen/native/quantized/library.cpp

Line 150 in 55b10e6

m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_relu(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack, float Y_scale_i, int Y_zero_point_i) -> Tensor Y"));

Shall I keep it or remove it?

jerryzh168 · 2022-12-05T20:15:11Z

yeah LGTM

jerryzh168 · 2022-12-05T20:15:54Z

It has been good before we separated test_qlinear and test_qlinear_relu but the issue occurred after the separation.
To avoid the issue and keep the test cases reasonable, I changed the scale value to 12.34. Now all checks have passed. How does that sound to you?

what is the error? is it a numerical error or something else?

Xia-Weiwen · 2022-12-06T01:30:23Z

It has been good before we separated test_qlinear and test_qlinear_relu but the issue occurred after the separation.
To avoid the issue and keep the test cases reasonable, I changed the scale value to 12.34. Now all checks have passed. How does that sound to you?

what is the error? is it a numerical error or something else?

Yes, it's a numerical error of fbgemm. There was only one mismatched value with absolute difference = 1. So, it was probably about rounding.

Xia-Weiwen · 2022-12-06T01:35:47Z

yeah LGTM

Thanks. Please also review other PRs at your convenience.

Xia-Weiwen · 2022-12-06T08:31:17Z

@pytorchbot merge

pytorchmergebot · 2022-12-06T08:32:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

) **Summary** Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown. **Test Plan** python test_quantization.py TestQuantizedLinear Pull Request resolved: pytorch#88478 Approved by: https://github.com/jgong5, https://github.com/jerryzh168

ghstack-source-id: 2d1f2b2 Pull Request resolved: pytorch/pytorch#88478

Add quantized linear-leaky_relu fusion for onednn backend

0af0a8d

[ghstack-poisoned]

Xia-Weiwen requested review from digantdesai, jerryzh168, jianyuh, kimishpatel, salilsdesai and z-a-f as code owners November 4, 2022 04:37

pytorch-bot bot added the release notes: quantization release notes category label Nov 4, 2022

github-actions bot added module: cpu CPU specific problem (e.g., perf, algorithm) oncall: quantization Quantization support in PyTorch labels Nov 4, 2022

pytorchbot added the open source label Nov 4, 2022

Update on "Add quantized linear-leaky_relu fusion for onednn backend"

880b6f9

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Xia-Weiwen added a commit that referenced this pull request Nov 4, 2022

Add quantized linear-leaky_relu fusion for onednn backend

ef20bfd

ghstack-source-id: 5356bc9 Pull Request resolved: #88478

Xia-Weiwen added the intel This tag is for PR from Intel label Nov 4, 2022

Xia-Weiwen marked this pull request as draft November 4, 2022 08:28

Xia-Weiwen requested a review from jgong5 November 4, 2022 08:29

jgong5 approved these changes Nov 7, 2022

View reviewed changes

Update on "Add quantized linear-leaky_relu fusion for onednn backend"

9c531f0

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Xia-Weiwen added a commit that referenced this pull request Nov 8, 2022

Add quantized linear-leaky_relu fusion for onednn backend

a0b7134

ghstack-source-id: 3215ca4 Pull Request resolved: #88478

Xia-Weiwen marked this pull request as ready for review November 8, 2022 03:46

Xia-Weiwen mentioned this pull request Nov 8, 2022

[Quant] Add fused LinearLeakyReLU module for onednn backend #88661

Closed

Xia-Weiwen changed the title ~~Add quantized linear-leaky_relu fusion for onednn backend~~ [Quant] Add fued linear-leaky_relu op for onednn backend Nov 8, 2022

Xia-Weiwen changed the title ~~[Quant] Add fued linear-leaky_relu op for onednn backend~~ [Quant] Add fused linear-leaky_relu op for onednn backend Nov 8, 2022

This was referenced Nov 8, 2022

[Quant][FX] Add backend config for onednn backend and fuse Linear-LeakyReLU #88665

Closed

[Quant][FX] Lower QLinearLeakyReLU for onednn backend #88668

Closed

Xia-Weiwen added 2 commits November 10, 2022 13:57

Update on "[Quant] Add fused linear-leaky_relu op for onednn backend"

e578a9b

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Update on "[Quant] Add fused linear-leaky_relu op for onednn backend"

0cbd3dd

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Xia-Weiwen requested a review from jerryzh168 November 18, 2022 00:35

jerryzh168 approved these changes Nov 23, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 24, 2022

Xia-Weiwen added 3 commits November 29, 2022 14:48

jerryzh168 reviewed Dec 5, 2022

View reviewed changes

pytorchmergebot added the Merged label Dec 6, 2022

pytorchmergebot closed this in 97e47a5 Dec 6, 2022

hasanyeganeh pushed a commit to hasanyeganeh/pytorch-pytorch that referenced this pull request Dec 21, 2022

[Quant] Add fused linear-leaky_relu op for onednn backend

681c953

ghstack-source-id: 2d1f2b2 Pull Request resolved: pytorch/pytorch#88478

facebook-github-bot deleted the gh/Xia-Weiwen/1/head branch June 8, 2023 14:57

[Quant] Add fused linear-leaky_relu op for onednn backend #88478

[Quant] Add fused linear-leaky_relu op for onednn backend #88478

Uh oh!

Conversation

Xia-Weiwen commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88478

✅ No Failures

Uh oh!

Xia-Weiwen commented Nov 4, 2022

Uh oh!

Xia-Weiwen commented Nov 17, 2022

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Nov 23, 2022

Uh oh!

Xia-Weiwen commented Nov 24, 2022

Uh oh!

pytorchmergebot commented Nov 24, 2022

Merge started

Uh oh!

pytorchmergebot commented Nov 24, 2022

Merge failed

Uh oh!

Xia-Weiwen commented Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xia-Weiwen commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xia-Weiwen commented Dec 5, 2022

Uh oh!

jerryzh168 Dec 5, 2022

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Dec 6, 2022

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Dec 5, 2022

Uh oh!

jerryzh168 commented Dec 5, 2022

Uh oh!

Xia-Weiwen commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xia-Weiwen commented Dec 6, 2022

Uh oh!

Xia-Weiwen commented Dec 6, 2022

Uh oh!

pytorchmergebot commented Dec 6, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Xia-Weiwen commented Nov 4, 2022 •

edited

Loading

pytorch-bot bot commented Nov 4, 2022 •

edited

Loading

Xia-Weiwen commented Nov 25, 2022 •

edited

Loading

Xia-Weiwen commented Dec 1, 2022 •

edited

Loading

Xia-Weiwen commented Dec 6, 2022 •

edited

Loading