Add ONEDNN quantization backend #69820

Xia-Weiwen · 2021-12-13T05:50:34Z

Summary

This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend

The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI.

ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK.
To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models.

torch.backends.quantized.engine = 'onednn'

Design docs

#21120 (comment)
#67177 (comment)

File changes

Add ONEDNN to qengine list

aten/src/ATen/Context.cpp
c10/core/QEngine.h
torch/ao/quantization/qconfig.py
torch/backends/quantized/__init__.py

Implement qconv & qlinear for ONEDNN backend

aten/src/ATen/native/quantized/cpu/conv_serialization.h
aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp
aten/src/ATen/native/quantized/cpu/onednn_utils.h
aten/src/ATen/native/quantized/cpu/qconv.cpp
aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp
aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp
aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp
aten/src/ATen/native/quantized/cpu/qlinear.cpp
aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp
aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp

Skip tests that are not supported by ONEDNN

test/ao/sparsity/test_kernels.py
test/quantization/core/test_quantized_module.py
test/quantization/core/test_quantized_op.py

Validation results

This PR has passed test_quantization.py and test_mkldnn.py.
Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform:
(Note: Tested with single instance on single core. Using the latest oneDNN library.)

Table 1. Performance comparison of int8 2d convolution operator

No.	Shape	FBGEMM	ONEDNN	Gain
1	IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0	668.310us	535.630us	24.8%
2	IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0	290.630us	281.810us	3.1%
3	IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0	1.045ms	893.010us	17.0%
4	IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0	385.320us	373.720us	3.1%
5	IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0	1.876ms	1.641ms	14.3%
6	IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0	660.460us	638.470us	3.4%

Table 2. Performance comparison of int8 linear operator

No.	Shape (m, n, k)	FBGEMM	ONEDNN	Gap
1	64, 800, 320	80.550us	96.770us	20.10%
2	64, 768, 512	101.230us	130.720us	29.10%
3	16, 256, 512	30.230us	51.450us	70.20%
4	128, 128, 128	33.810us	50.480us	49.30%
5	256, 512, 256	154.490us	195.050us	26.30%
6	1024, 1024, 1024	3.134ms	3.514ms	12.10%

ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear.

pytorch-probot · 2021-12-13T05:50:37Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/Xia-Weiwen/pytorch/blob/8b3cfd2afba2c7d936f37c8e38ffb7f38f66970a/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries/conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries/libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries/libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries/wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-12-14T05:48:39Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69820
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 8a40b8c (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Xia-Weiwen · 2021-12-16T09:00:37Z

Hi @jerryzh168 @VitalyFedyunin please review this PR. Thanks.

Xia-Weiwen · 2021-12-17T00:06:20Z

The failure does not seem to be caused by this patch

19:07:53   test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... Memory exception on virtual address 0x7f46c627a000, node id 4 : Page not present
19:07:53 Address does not belong to a known buffer
19:07:53 Memory access fault by GPU node-4 (Agent handle: 0x55a9bc01d060) on address 0x7f46c627a000. Reason: Page not present or supervisor privilege.
19:08:00 Traceback (most recent call last):
19:08:00   File "test/run_test.py", line 1068, in <module>
19:08:00     main()
19:08:00   File "test/run_test.py", line 1046, in main
19:08:00     raise RuntimeError(err_message)
19:08:00 RuntimeError: test_ops failed! Received signal: SIGIOT

… ONEDNN

…ackend

…e=True` in qconfig for unit test TestQuantizedOps.test_custom_module_multi_head_attention. Skip unsupported tests (output padding for deconv)

Xia-Weiwen · 2022-01-06T00:37:38Z

Now all checks have passed. Please review. Thanks.

aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp

jerryzh168 · 2022-01-06T05:19:31Z

test/quantization/core/test_quantized_module.py

+            # ONEDNN only supports symmetric quantization of weight
+            if torch.backends.quantized.engine == 'onednn':
+                W_q = torch.quantize_per_tensor(W, 0.1, 0, torch.qint8)


would L100 error out since it's not symmetric quantization?

maybe we can select scale/zero_point based on qengine instead of hardcode them here

I think L100 is OK.

maybe we can select scale/zero_point based on qengine instead of hardcode them here

Do you mean something like this?
zp_weight = 0 if qengine_is_onednn() else torch.randint(1, 10, (1,)).item()

yes exactly

jerryzh168 · 2022-01-06T05:21:11Z

test/quantization/core/test_quantized_module.py

            X_scale, X_zero_point, W_scale, W_zero_point, Y_scale, Y_zero_point,
            use_bias, use_fused, use_channelwise):
+        # ONEDNN only supports symmetric quantization of weight
+        if torch.backends.quantized.engine == 'onednn' and not all(zp == 0 for zp in W_zero_point):


if we select scale/zero_point based on qengine we won't need this check

Do you mean to select proper weight scale/zp by if ... else .... in each unit test?

yes, this function is called from: https://github.com/pytorch/pytorch/blob/master/test/quantization/core/test_quantized_module.py#L387, we can generate the scale/zp based on the engine. the current implementation will just skip the check I think

jerryzh168 · 2022-01-06T05:21:17Z

test/quantization/core/test_quantized_module.py

        W = torch.rand(out_features, in_features).float()
        W_scale, W_zp = _calculate_dynamic_qparams(W, torch.qint8)
+        # ONEDNN only supports symmetric quantization of weight
+        if torch.backends.quantized.engine == 'onednn' and W_zp != 0:


Here weight scale and zero point are calculated not selected manually. Do you mean we need a new function to calculate weight scale/zero point for symmetric quantization?

Yes I think so, we can add a qscheme argument to https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_quantized.py#L49 to support symmetric quantization and set the qscheme to symmetric quantization when the qengine is mkldnn/onednn

jerryzh168 · 2022-01-06T05:22:24Z

test/quantization/core/test_quantized_op.py

+        # ONEDNN only supports symmetric quantization of weight
+        if torch.backends.quantized.engine == 'onednn':
+            W_zps = np.zeros(output_channels).astype(np.int)


nit: we can put L3211 in the else branch

…ackend only supports symmetric quantization of weight

jerryzh168 · 2022-02-26T06:08:00Z

I did a rebase and looks like there is still errors, I think the third-party import is probably still not done yet

Xia-Weiwen · 2022-02-26T12:47:17Z

Hi @jerryzh168, thanks for the update. Could you please remind @frank-wei to import? Thanks.

jerryzh168 · 2022-03-04T19:19:17Z

Hi @Xia-Weiwen can you resolve the merge conflict? @frank-wei just finished the update of ideep library, we can import the PR now.

Xia-Weiwen · 2022-03-07T07:11:15Z

Hi @jerryzh168 Conflict resolved. Please move on. Thanks.

facebook-github-bot · 2022-03-07T20:50:49Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 · 2022-03-08T07:13:54Z

Hi @Xia-Weiwen, I have imported again. and looks like there are some other errors.

stderr: caffe2/aten/src/ATen/native/quantized/cpu/qlinear.cpp:440:17: error: unused variable 'dim' [-Werror,-Wunused-variable]
  const int64_t dim = input.dim();

stderr: caffe2/aten/src/ATen/native/quantized/cpu/qconv.cpp:911:16: error: unused variable 'with_groups' [-Werror,-Wunused-variable]
    const bool with_groups = groups() > 1;

there are other lint warnings as well and I'm not sure what is the best way to communicate them. but I feel it might be OK to fix them later

Xia-Weiwen · 2022-03-08T07:27:35Z

Hi @jerryzh168 Thanks for the update. Are they all warnings of unused variables? Do you think it's better to fix them now or later in another PR? Anyway, could you please provide a log of these warnings so I can fix them?

jerryzh168 · 2022-03-08T20:06:51Z

Hi @jerryzh168 Thanks for the update. Are they all warnings of unused variables? Do you think it's better to fix them now or later in another PR? Anyway, could you please provide a log of these warnings so I can fix them?

Hi @Xia-Weiwen, unfortunately there is no easy way to export those warnings right now. Can you just fix the blocking ones for now? the one I pasted in the previous comment. I think we can leave the rest there for now.

We're also moving to Github First soon, hopefully these problems can be addressed during that move.

facebook-github-bot · 2022-03-09T05:27:10Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 · 2022-03-10T16:51:56Z

Hi @Xia-Weiwen I can confirm there is no more internal errors now. but looks like there is some new merge conflict, can you help resolve them? I think we should be able to land after that

facebook-github-bot · 2022-03-11T04:27:29Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI. ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK. To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models. ```python torch.backends.quantized.engine = 'onednn' ``` ## Design docs #21120 (comment) #67177 (comment) ## File changes **Add ONEDNN to qengine list** - aten/src/ATen/Context.cpp - c10/core/QEngine.h - torch/ao/quantization/qconfig.py - torch/backends/quantized/\_\_init\_\_.py **Implement qconv & qlinear for ONEDNN backend** - aten/src/ATen/native/quantized/cpu/conv_serialization.h - aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp - aten/src/ATen/native/quantized/cpu/onednn_utils.h - aten/src/ATen/native/quantized/cpu/qconv.cpp - aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp - aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp - aten/src/ATen/native/quantized/cpu/qlinear.cpp - aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp - aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp - aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp **Skip tests that are not supported by ONEDNN** - test/ao/sparsity/test_kernels.py - test/quantization/core/test_quantized_module.py - test/quantization/core/test_quantized_op.py ## Validation results This PR has passed `test_quantization.py` and `test_mkldnn.py`. Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform: (Note: Tested with single instance on single core. Using the latest oneDNN library.) **Table 1. Performance comparison of int8 2d convolution operator** |No.| Shape| FBGEMM| ONEDNN| Gain| |-|-|-|-|-| |1| IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 668.310us| 535.630us| 24.8%| |2| IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 290.630us| 281.810us| 3.1%| |3| IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 1.045ms| 893.010us| 17.0%| |4| IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 385.320us| 373.720us| 3.1%| |5| IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 1.876ms| 1.641ms| 14.3%| |6| IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 660.460us| 638.470us| 3.4%| **Table 2. Performance comparison of int8 linear operator** |No.| Shape (m, n, k)| FBGEMM| ONEDNN| Gap| |-|-|-|-|-| |1| 64, 800, 320| 80.550us| 96.770us| 20.10%| |2| 64, 768, 512| 101.230us| 130.720us| 29.10%| |3| 16, 256, 512| 30.230us| 51.450us| 70.20%| |4| 128, 128, 128| 33.810us| 50.480us| 49.30%| |5| 256, 512, 256| 154.490us| 195.050us| 26.30%| |6| 1024, 1024, 1024| 3.134ms| 3.514ms| 12.10%| ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear. Pull Request resolved: #69820 Reviewed By: HDCharles Differential Revision: D33716039 Pulled By: jerryzh168 fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd

github-actions · 2022-03-11T20:32:27Z

Hey @Xia-Weiwen.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

jerryzh168 · 2022-03-11T21:59:38Z

aten/src/ATen/native/quantized/cpu/onednn_utils.h

+#include <ATen/Config.h>
+#if AT_MKLDNN_ENABLED()
+#include <ATen/Tensor.h>
+#include <ATen/native/quantized/cpu/conv_packed_params.h>


Hi @Xia-Weiwen, I landed the PR but looks like this line is not up to date. we should remove this line. I'm reverting the change right now, can you help recreate the PR after this is reverted?

Hi @jerryzh168, I created a new PR #74137. Please take a look. Thanks.

facebook-github-bot · 2022-03-11T22:06:33Z

This pull request has been reverted by 5a89753. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

Summary: Resolve the conflicts in #69820 jerryzh168 Please review. Thanks. Pull Request resolved: #74137 Reviewed By: samdow Differential Revision: D34840477 Pulled By: jerryzh168 fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425

Summary: Resolve the conflicts in #69820 jerryzh168 Please review. Thanks. Pull Request resolved: #74137 Reviewed By: samdow Differential Revision: D34840477 Pulled By: jerryzh168 fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425 (cherry picked from commit de76bb8)

facebook-github-bot · 2022-03-21T20:56:57Z

This pull request has been reverted by 5a89753. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

pytorch-probot bot added the ciflow/default label Dec 13, 2021

Xia-Weiwen marked this pull request as draft December 13, 2021 05:50

facebook-github-bot added the cla signed label Dec 13, 2021

pytorchbot added the open source label Dec 13, 2021

Xia-Weiwen force-pushed the onednn_quant_backend branch from 5a30331 to 6c45802 Compare December 13, 2021 08:53

Xia-Weiwen force-pushed the onednn_quant_backend branch 4 times, most recently from ebf6c82 to 0543f13 Compare December 15, 2021 06:08

XiaobingSuper added the intel priority matters to intel architecture from performance wise label Dec 15, 2021

Xia-Weiwen marked this pull request as ready for review December 16, 2021 08:59

mruberry requested a review from vkuzo December 17, 2021 14:24

mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 17, 2021

vkuzo requested a review from jerryzh168 December 17, 2021 14:36

Xia-Weiwen force-pushed the onednn_quant_backend branch from ac072a4 to 07777f7 Compare December 26, 2021 00:55

Xia-Weiwen added 3 commits January 5, 2022 09:55

Add a new quantization backend, ONEDNN. Implement qconv & qlinear for…

5c56bad

… ONEDNN

Skip unsupported unit tests (TestQuantizedSparseKernels) for ONEDNN b…

73c1adc

…ackend

ONEDNN qconv: Return output directly if it is empty. Use `reduce_rang…

8bfa256

…e=True` in qconfig for unit test TestQuantizedOps.test_custom_module_multi_head_attention. Skip unsupported tests (output padding for deconv)

Xia-Weiwen force-pushed the onednn_quant_backend branch from 07777f7 to 8bfa256 Compare January 5, 2022 01:58

jerryzh168 reviewed Jan 6, 2022

View reviewed changes

aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp Show resolved Hide resolved

jerryzh168 reviewed Jan 6, 2022

View reviewed changes

Select weight zero point by qengine in some unit tests since ONEDNN b…

98302aa

…ackend only supports symmetric quantization of weight

Merge branch 'master' into onednn_quant_backend

64a9913

Remove unused variables

f427960

Merge branch 'master' into onednn_quant_backend

8a40b8c

pytorchmergebot closed this in 989b248 Mar 11, 2022

jerryzh168 reviewed Mar 11, 2022

View reviewed changes

facebook-github-bot added the Reverted label Mar 11, 2022

Xia-Weiwen mentioned this pull request Mar 12, 2022

Add onednn quant backend #74137

Closed

jerryzh168 added release notes: quantization release notes category topic: new features topic category labels Mar 15, 2022

frank-wei added the intel This tag is for PR from Intel label Jun 3, 2022

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

fzhao3 mentioned this pull request Jan 13, 2023

[RFC] Unified quantization backend for x86 CPU platforms #83888

Closed

Add ONEDNN quantization backend #69820

Add ONEDNN quantization backend #69820

Uh oh!

Conversation

Xia-Weiwen commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design docs

File changes

Validation results

Uh oh!

pytorch-probot bot commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Dec 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

Xia-Weiwen commented Dec 16, 2021

Uh oh!

Xia-Weiwen commented Dec 17, 2021

Uh oh!

Xia-Weiwen commented Jan 6, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Feb 26, 2022

Uh oh!

Xia-Weiwen commented Feb 26, 2022

Uh oh!

jerryzh168 commented Mar 4, 2022

Uh oh!

Xia-Weiwen commented Mar 7, 2022

Uh oh!

facebook-github-bot commented Mar 7, 2022

Uh oh!

jerryzh168 commented Mar 8, 2022

Uh oh!

Xia-Weiwen commented Mar 8, 2022

Uh oh!

jerryzh168 commented Mar 8, 2022

Uh oh!

facebook-github-bot commented Mar 9, 2022

Uh oh!

jerryzh168 commented Mar 10, 2022

Uh oh!

facebook-github-bot commented Mar 11, 2022

Uh oh!

github-actions bot commented Mar 11, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 11, 2022

Uh oh!

facebook-github-bot commented Mar 21, 2022

Xia-Weiwen commented Dec 13, 2021 •

edited

Loading

pytorch-probot bot commented Dec 13, 2021 •

edited

Loading

facebook-github-bot commented Dec 14, 2021 •

edited

Loading