Skip to content

Conversation

@Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Nov 4, 2022

Stack from ghstack (oldest at bottom):

Summary
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused linear-leaky_relu op for onednn backend, which will be used for int8 inference with onednn backend. Cannot call this op with other quantization backends otherwise an error is thrown.

Test Plan
python test_quantization.py TestQuantizedLinear

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @leslie-fang-intel @VitalyFedyunin @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 4, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88478

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c74af6a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added module: cpu CPU specific problem (e.g., perf, algorithm) oncall: quantization Quantization support in PyTorch labels Nov 4, 2022
cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Nov 4, 2022
@Xia-Weiwen Xia-Weiwen added the intel This tag is for PR from Intel label Nov 4, 2022
@Xia-Weiwen Xia-Weiwen marked this pull request as draft November 4, 2022 08:28
@Xia-Weiwen Xia-Weiwen requested a review from jgong5 November 4, 2022 08:29
@Xia-Weiwen
Copy link
Collaborator Author

This is part of replacement of #76424 which is too big to land.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Nov 8, 2022
@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review November 8, 2022 03:46
@Xia-Weiwen Xia-Weiwen changed the title Add quantized linear-leaky_relu fusion for onednn backend [Quant] Add fued linear-leaky_relu op for onednn backend Nov 8, 2022
@Xia-Weiwen Xia-Weiwen changed the title [Quant] Add fued linear-leaky_relu op for onednn backend [Quant] Add fused linear-leaky_relu op for onednn backend Nov 8, 2022
cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

Also for this one and future PRs, please add a "Summary" that talks about the motivation and context for the change and a "Test Plan" section that talks about how to run the test, e.g.: #88620

Sure. I will add description later.

I have added summary and test plan.

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown.

**Test Plan**
python test_quantization.py TestQuantizedLinear




cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@jerryzh168
Copy link
Contributor

btw the Summary and Test Plan should appear after the stack..

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown.

**Test Plan**
python test_quantization.py TestQuantizedLinear

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 24, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / cuda11.6-py3.10-gcc7-sm86 / test (default, 1, 4, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

@Xia-Weiwen
Copy link
Collaborator Author

Xia-Weiwen commented Nov 25, 2022

Hi @jerryzh168. The separated test_qlinear and test_qlinear_relu failed so merge failed. But they look good when running locally. I also created a draft PR which only separated the two tests without other changes, just for test (#89678) But they also failed.
Do you have any ideas why?

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown.

**Test Plan**
python test_quantization.py TestQuantizedLinear

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown.

**Test Plan**
python test_quantization.py TestQuantizedLinear

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown.

**Test Plan**
python test_quantization.py TestQuantizedLinear

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

Xia-Weiwen commented Dec 1, 2022

Hi @jerryzh168. The separated test_qlinear and test_qlinear_relu failed so merge failed. But they look good when running locally. I also created a draft PR which only separated the two tests without other changes, just for test (#89678) But they also failed. Do you have any ideas why?

Hi, @jerryzh168. It was due to some weird rounding errors, I think. Y_scale = 125.1234 caused the issue in fbgemm:


It has been good before we separated test_qlinear and test_qlinear_relu but the issue occurred after the separation.
To avoid the issue and keep the test cases reasonable, I changed the scale value to 12.34. Now all checks have passed. How does that sound to you?

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown.

**Test Plan**
python test_quantization.py TestQuantizedLinear

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 leslie-fang-intel VitalyFedyunin mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

Hi @jerryzh168 It's been a while since your last review. Could you take a look again? Do you have more comments on this PR and others? Thanks!

m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_relu_dynamic(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack, bool reduce_range=False) -> Tensor Y"));
m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_dynamic_fp16(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack) -> Tensor Y"));
m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_relu_dynamic_fp16(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack) -> Tensor Y"));
m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_leaky_relu(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack, float Y_scale_i, int Y_zero_point_i, float negative_slope) -> Tensor Y"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what does i in Y_scale_i mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what does i in Y_scale_i mean?

Good question. I don't know. I just copied the arg name from above to look aligned

m.def(TORCH_SELECTIVE_SCHEMA("quantized::linear_relu(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack, float Y_scale_i, int Y_zero_point_i) -> Tensor Y"));

Shall I keep it or remove it?

@jerryzh168
Copy link
Contributor

yeah LGTM

@jerryzh168
Copy link
Contributor

It has been good before we separated test_qlinear and test_qlinear_relu but the issue occurred after the separation.
To avoid the issue and keep the test cases reasonable, I changed the scale value to 12.34. Now all checks have passed. How does that sound to you?

what is the error? is it a numerical error or something else?

@Xia-Weiwen
Copy link
Collaborator Author

Xia-Weiwen commented Dec 6, 2022

It has been good before we separated test_qlinear and test_qlinear_relu but the issue occurred after the separation.
To avoid the issue and keep the test cases reasonable, I changed the scale value to 12.34. Now all checks have passed. How does that sound to you?

what is the error? is it a numerical error or something else?

Yes, it's a numerical error of fbgemm. There was only one mismatched value with absolute difference = 1. So, it was probably about rounding.

@Xia-Weiwen
Copy link
Collaborator Author

yeah LGTM

Thanks. Please also review other PRs at your convenience.

@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
)

**Summary**
Post op fusion can reduce data movement overhead and improve inference performance. This PR adds fused `linear-leaky_relu` op for `onednn` backend, which will be used for int8 inference with `onednn` backend. Cannot call this op with other quantization backends otherwise an error is thrown.

**Test Plan**
python test_quantization.py TestQuantizedLinear

Pull Request resolved: pytorch#88478
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
hasanyeganeh pushed a commit to hasanyeganeh/pytorch-pytorch that referenced this pull request Dec 21, 2022
@facebook-github-bot facebook-github-bot deleted the gh/Xia-Weiwen/1/head branch June 8, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) oncall: quantization Quantization support in PyTorch open source release notes: quantization release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants