[Quant] onednn backend switch to ideep new api without affacting performance #91056

Xia-Weiwen · 2022-12-17T04:10:15Z

Stack from ghstack (oldest at bottom):

-> [Quant] onednn backend switch to ideep new api without affacting performance #91056

Reopen of #90354

Summary
Onednn quantization backend switch to new API in third_party/ideep.

struct forward_params for conv/deconv are changed. Modify primitive cache accordingly.
Use new versions of prepare and compute API. Fp32 and int8 paths separated. The old ones will be deprecated.
Now ideep::tensor::reorder_if_differ_in supports block-to-block reorder. Use it instead of defining a util function onednn_utils::try_reorder.
For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch.
Use is_channels_last flag to specify layout of src/dst when querying expected weight desc.

It won't impact correctness. Performance should be unaffected or slightly better.
FBGEMM and QNNPACK backends are not affected.

Performance results are given below.

End-to-end performance of static quantized models (from torchvision)
(throughput: fps, higher is better)
Op benchmark of dynamic quantized linear
(Latency: ms, lower is better)

Test method & env:

Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Run multi-instances on a single node. Use one core for each instance.
Use Jemalloc and Intel OpenMP

Test plan
python test/test_quantization.py

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

…ormance [ghstack-poisoned]

pytorch-bot · 2022-12-17T04:10:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91056

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e166522:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ormance ghstack-source-id: 0df32fb Pull Request resolved: #91056

…acting performance" > Reopen of #90354 **Summary** Onednn quantization backend switch to new API in `third_party/ideep`. - `struct forward_params` for conv/deconv are changed. Modify primitive cache accordingly. - Use new versions of `prepare` and `compute` API. Fp32 and int8 paths separated. The old ones will be deprecated. - Now `ideep::tensor::reorder_if_differ_in` supports block-to-block reorder. Use it instead of defining a util function `onednn_utils::try_reorder`. - For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch. - Use `is_channels_last` flag to specify layout of src/dst when querying expected weight desc. It won't impact correctness. Performance should be unaffected or slightly better. FBGEMM and QNNPACK backends are not affected. Performance results are given below. 1. End-to-end performance of static quantized models (from torchvision) (throughput: fps, higher is better) ![image](https://user-images.githubusercontent.com/12522207/206105879-45c59996-9804-4531-aa1f-dc962e6db5ab.png) 2. Op benchmark of dynamic quantized linear (Latency: ms, lower is better) ![image](https://user-images.githubusercontent.com/12522207/206124949-77352991-0fda-4285-a484-e20a5797262b.png) Test method & env: - Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz - Run multi-instances on a single node. Use one core for each instance. - Use Jemalloc and Intel OpenMP **Test plan** python test/test_quantization.py cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…ormance ghstack-source-id: c583ee8 Pull Request resolved: #91056

…acting performance" > Reopen of #90354 **Summary** Onednn quantization backend switch to new API in `third_party/ideep`. - `struct forward_params` for conv/deconv are changed. Modify primitive cache accordingly. - Use new versions of `prepare` and `compute` API. Fp32 and int8 paths separated. The old ones will be deprecated. - Now `ideep::tensor::reorder_if_differ_in` supports block-to-block reorder. Use it instead of defining a util function `onednn_utils::try_reorder`. - For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch. - Use `is_channels_last` flag to specify layout of src/dst when querying expected weight desc. It won't impact correctness. Performance should be unaffected or slightly better. FBGEMM and QNNPACK backends are not affected. Performance results are given below. 1. End-to-end performance of static quantized models (from torchvision) (throughput: fps, higher is better) ![image](https://user-images.githubusercontent.com/12522207/206105879-45c59996-9804-4531-aa1f-dc962e6db5ab.png) 2. Op benchmark of dynamic quantized linear (Latency: ms, lower is better) ![image](https://user-images.githubusercontent.com/12522207/206124949-77352991-0fda-4285-a484-e20a5797262b.png) Test method & env: - Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz - Run multi-instances on a single node. Use one core for each instance. - Use Jemalloc and Intel OpenMP **Test plan** python test/test_quantization.py cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…ormance ghstack-source-id: 5151171 Pull Request resolved: #91056

…acting performance" > Reopen of #90354 **Summary** Onednn quantization backend switch to new API in `third_party/ideep`. - `struct forward_params` for conv/deconv are changed. Modify primitive cache accordingly. - Use new versions of `prepare` and `compute` API. Fp32 and int8 paths separated. The old ones will be deprecated. - Now `ideep::tensor::reorder_if_differ_in` supports block-to-block reorder. Use it instead of defining a util function `onednn_utils::try_reorder`. - For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch. - Use `is_channels_last` flag to specify layout of src/dst when querying expected weight desc. It won't impact correctness. Performance should be unaffected or slightly better. FBGEMM and QNNPACK backends are not affected. Performance results are given below. 1. End-to-end performance of static quantized models (from torchvision) (throughput: fps, higher is better) ![image](https://user-images.githubusercontent.com/12522207/206105879-45c59996-9804-4531-aa1f-dc962e6db5ab.png) 2. Op benchmark of dynamic quantized linear (Latency: ms, lower is better) ![image](https://user-images.githubusercontent.com/12522207/206124949-77352991-0fda-4285-a484-e20a5797262b.png) Test method & env: - Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz - Run multi-instances on a single node. Use one core for each instance. - Use Jemalloc and Intel OpenMP **Test plan** python test/test_quantization.py cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…ormance ghstack-source-id: 70704c0 Pull Request resolved: #91056

Xia-Weiwen · 2023-01-18T09:51:37Z

@pytorchbot merge

pytorchmergebot · 2023-01-18T09:53:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2023-01-19T19:35:26Z

Hmm, this seems to be using new iDeep APIs, why was it landed before internal update?

Xia-Weiwen · 2023-01-20T00:40:54Z

Hmm, this seems to be using new iDeep APIs, why was it landed before internal update?

Hi @malfet. Since the double checkout issue has been solved by #92239, I thought it would be OK to land this. If it is breaking something, please go ahead to revert it.
BTW, how can I know when internal update is done? Thanks.

[Quant] onednn backend switch to ideep new api without affacting perf…

03d6535

…ormance [ghstack-poisoned]

Xia-Weiwen requested review from digantdesai, jerryzh168, jianyuh, kimishpatel, salilsdesai and z-a-f as code owners December 17, 2022 04:10

pytorch-bot bot added the release notes: quantization release notes category label Dec 17, 2022

Xia-Weiwen added a commit that referenced this pull request Dec 17, 2022

[Quant] onednn backend switch to ideep new api without affacting perf…

34033c6

…ormance ghstack-source-id: 0df32fb Pull Request resolved: #91056

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 17, 2022

Xia-Weiwen mentioned this pull request Dec 17, 2022

[Quant] onednn backend switch to ideep new api without affacting performance #90354

Closed

Xia-Weiwen requested review from XiaobingSuper and jgong5 December 17, 2022 04:13

pytorchbot added the open source label Dec 17, 2022

Xia-Weiwen added a commit that referenced this pull request Dec 18, 2022

[Quant] onednn backend switch to ideep new api without affacting perf…

a2c24ba

…ormance ghstack-source-id: c583ee8 Pull Request resolved: #91056

Xia-Weiwen added intel This tag is for PR from Intel ciflow/trunk Trigger trunk jobs on your pull request labels Dec 18, 2022

jgong5 approved these changes Dec 19, 2022

View reviewed changes

Xia-Weiwen added a commit that referenced this pull request Jan 16, 2023

[Quant] onednn backend switch to ideep new api without affacting perf…

4e9e429

…ormance ghstack-source-id: 5151171 Pull Request resolved: #91056

Xia-Weiwen added a commit that referenced this pull request Jan 16, 2023

[Quant] onednn backend switch to ideep new api without affacting perf…

ba2cb07

…ormance ghstack-source-id: 70704c0 Pull Request resolved: #91056

pytorchmergebot added the Merged label Jan 18, 2023

pytorchmergebot closed this in 5a2ae88 Jan 18, 2023

facebook-github-bot deleted the gh/Xia-Weiwen/9/head branch June 8, 2023 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quant] onednn backend switch to ideep new api without affacting performance #91056

[Quant] onednn backend switch to ideep new api without affacting performance #91056

Uh oh!

Xia-Weiwen commented Dec 17, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 17, 2022 •

edited

Loading

Uh oh!

Xia-Weiwen commented Jan 18, 2023

Uh oh!

pytorchmergebot commented Jan 18, 2023

Uh oh!

malfet commented Jan 19, 2023

Uh oh!

Xia-Weiwen commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Quant] onednn backend switch to ideep new api without affacting performance #91056

[Quant] onednn backend switch to ideep new api without affacting performance #91056

Uh oh!

Conversation

Xia-Weiwen commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91056

✅ No Failures

Uh oh!

Xia-Weiwen commented Jan 18, 2023

Uh oh!

pytorchmergebot commented Jan 18, 2023

Merge started

Uh oh!

malfet commented Jan 19, 2023

Uh oh!

Xia-Weiwen commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Xia-Weiwen commented Dec 17, 2022 •

edited

Loading

pytorch-bot bot commented Dec 17, 2022 •

edited

Loading