Skip to content

Conversation

@Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Dec 17, 2022

Stack from ghstack (oldest at bottom):

Reopen of #90354

Summary
Onednn quantization backend switch to new API in third_party/ideep.

  • struct forward_params for conv/deconv are changed. Modify primitive cache accordingly.
  • Use new versions of prepare and compute API. Fp32 and int8 paths separated. The old ones will be deprecated.
  • Now ideep::tensor::reorder_if_differ_in supports block-to-block reorder. Use it instead of defining a util function onednn_utils::try_reorder.
  • For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch.
  • Use is_channels_last flag to specify layout of src/dst when querying expected weight desc.

It won't impact correctness. Performance should be unaffected or slightly better.
FBGEMM and QNNPACK backends are not affected.

Performance results are given below.

  1. End-to-end performance of static quantized models (from torchvision)
    (throughput: fps, higher is better)
    image

  2. Op benchmark of dynamic quantized linear
    (Latency: ms, lower is better)
    image

Test method & env:

  • Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
  • Run multi-instances on a single node. Use one core for each instance.
  • Use Jemalloc and Intel OpenMP

Test plan
python test/test_quantization.py

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 17, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91056

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e166522:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: quantization release notes category label Dec 17, 2022
Xia-Weiwen added a commit that referenced this pull request Dec 17, 2022
@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 17, 2022
…acting performance"



> Reopen of #90354

**Summary**
Onednn quantization backend switch to new API in `third_party/ideep`.
- `struct forward_params` for conv/deconv are changed. Modify primitive cache accordingly.
- Use new versions of `prepare` and `compute` API. Fp32 and int8 paths separated. The old ones will be deprecated.
- Now `ideep::tensor::reorder_if_differ_in` supports block-to-block reorder. Use it instead of defining a util function `onednn_utils::try_reorder`.
- For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch.
- Use `is_channels_last` flag to specify layout of src/dst when querying expected weight desc.

It won't impact correctness. Performance should be unaffected or slightly better.
FBGEMM and QNNPACK backends are not affected.

Performance results are given below.
1. End-to-end performance of static quantized models (from torchvision)
(throughput: fps, higher is better)
![image](https://user-images.githubusercontent.com/12522207/206105879-45c59996-9804-4531-aa1f-dc962e6db5ab.png)

2. Op benchmark of dynamic quantized linear
(Latency: ms, lower is better)
![image](https://user-images.githubusercontent.com/12522207/206124949-77352991-0fda-4285-a484-e20a5797262b.png)

Test method & env:
- Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
- Run multi-instances on a single node. Use one core for each instance.
- Use Jemalloc and Intel OpenMP

**Test plan**
python test/test_quantization.py


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Dec 18, 2022
@Xia-Weiwen Xia-Weiwen added intel This tag is for PR from Intel ciflow/trunk Trigger trunk jobs on your pull request labels Dec 18, 2022
…acting performance"



> Reopen of #90354

**Summary**
Onednn quantization backend switch to new API in `third_party/ideep`.
- `struct forward_params` for conv/deconv are changed. Modify primitive cache accordingly.
- Use new versions of `prepare` and `compute` API. Fp32 and int8 paths separated. The old ones will be deprecated.
- Now `ideep::tensor::reorder_if_differ_in` supports block-to-block reorder. Use it instead of defining a util function `onednn_utils::try_reorder`.
- For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch.
- Use `is_channels_last` flag to specify layout of src/dst when querying expected weight desc.

It won't impact correctness. Performance should be unaffected or slightly better.
FBGEMM and QNNPACK backends are not affected.

Performance results are given below.
1. End-to-end performance of static quantized models (from torchvision)
(throughput: fps, higher is better)
![image](https://user-images.githubusercontent.com/12522207/206105879-45c59996-9804-4531-aa1f-dc962e6db5ab.png)

2. Op benchmark of dynamic quantized linear
(Latency: ms, lower is better)
![image](https://user-images.githubusercontent.com/12522207/206124949-77352991-0fda-4285-a484-e20a5797262b.png)

Test method & env:
- Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
- Run multi-instances on a single node. Use one core for each instance.
- Use Jemalloc and Intel OpenMP

**Test plan**
python test/test_quantization.py


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Jan 16, 2023
…acting performance"



> Reopen of #90354

**Summary**
Onednn quantization backend switch to new API in `third_party/ideep`.
- `struct forward_params` for conv/deconv are changed. Modify primitive cache accordingly.
- Use new versions of `prepare` and `compute` API. Fp32 and int8 paths separated. The old ones will be deprecated.
- Now `ideep::tensor::reorder_if_differ_in` supports block-to-block reorder. Use it instead of defining a util function `onednn_utils::try_reorder`.
- For new API of transposed convolution, we can use a flag to keep weight desc align with oneDNN thus needless to transpose it explicitly in PyTorch.
- Use `is_channels_last` flag to specify layout of src/dst when querying expected weight desc.

It won't impact correctness. Performance should be unaffected or slightly better.
FBGEMM and QNNPACK backends are not affected.

Performance results are given below.
1. End-to-end performance of static quantized models (from torchvision)
(throughput: fps, higher is better)
![image](https://user-images.githubusercontent.com/12522207/206105879-45c59996-9804-4531-aa1f-dc962e6db5ab.png)

2. Op benchmark of dynamic quantized linear
(Latency: ms, lower is better)
![image](https://user-images.githubusercontent.com/12522207/206124949-77352991-0fda-4285-a484-e20a5797262b.png)

Test method & env:
- Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
- Run multi-instances on a single node. Use one core for each instance.
- Use Jemalloc and Intel OpenMP

**Test plan**
python test/test_quantization.py


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Jan 16, 2023
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented Jan 19, 2023

Hmm, this seems to be using new iDeep APIs, why was it landed before internal update?

@Xia-Weiwen
Copy link
Collaborator Author

Hmm, this seems to be using new iDeep APIs, why was it landed before internal update?

Hi @malfet. Since the double checkout issue has been solved by #92239, I thought it would be OK to land this. If it is breaking something, please go ahead to revert it.
BTW, how can I know when internal update is done? Thanks.

@facebook-github-bot facebook-github-bot deleted the gh/Xia-Weiwen/9/head branch June 8, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: quantization release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants