Skip to content

Conversation

@mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Aug 19, 2022

Stack from ghstack (oldest at bottom):

Motivation of this PR

This patch is to migrate spmm_reduce from torch-sparse (a 3rd party dependency for PyG) to torch, which is a response to the initial proposal for fusion of Gather, Apply Scatter in Message Passing of GNN inference/training. #71300

GAS is the major step for Message Passing, the behavior of GAS can be classified into 2 kinds depending on the storage type of EdgeIndex which records the connections of nodes:

  • COO: the hotspot is scatter_reduce
  • CSR: the hotspot is spmm_reduce

The reduce type can be choose from: "max", "mean", "max", "min".

extend torch.sparse.mm with an reduce argument, maps to torch.sparse_mm.reduce internally.
sparse_mm_reduce is registered under the TensorTypeId of SparseCsrCPU, and this operator requires an internal interface _sparse_mm_reduce_impl which has dual outputs:

  • out - the actual output
  • arg_out - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated.

Performance

Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by 4.3x with this patch.

Performance benefit for training will be bigger, the original backward impl for sum|mean is sequential; the original backward impl for max|min is not fused.

before:

-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
       torch_sparse::spmm_sum        97.09%       56.086s        97.09%       56.088s        6.232s             9
                 aten::linear         0.00%      85.000us         1.38%     795.485ms      88.387ms             9
                 aten::matmul         0.00%      57.000us         1.38%     795.260ms      88.362ms             9
                     aten::mm         1.38%     795.201ms         1.38%     795.203ms      88.356ms             9
                   aten::relu         0.00%      50.000us         0.76%     440.434ms      73.406ms             6
              aten::clamp_min         0.76%     440.384ms         0.76%     440.384ms      73.397ms             6
                   aten::add_         0.57%     327.801ms         0.57%     327.801ms      36.422ms             9
            aten::log_softmax         0.00%      23.000us         0.10%      55.503ms      18.501ms             3

after

-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
               aten::spmm_sum        87.35%       11.826s        87.36%       11.827s        1.314s             9
                 aten::linear         0.00%      92.000us         5.87%     794.451ms      88.272ms             9
                 aten::matmul         0.00%      62.000us         5.87%     794.208ms      88.245ms             9
                     aten::mm         5.87%     794.143ms         5.87%     794.146ms      88.238ms             9
                   aten::relu         0.00%      53.000us         3.35%     452.977ms      75.496ms             6
              aten::clamp_min         3.35%     452.924ms         3.35%     452.924ms      75.487ms             6
                   aten::add_         2.58%     348.663ms         2.58%     348.663ms      38.740ms             9
                 aten::argmax         0.42%      57.473ms         0.42%      57.475ms      14.369ms             4
            aten::log_softmax         0.00%      22.000us         0.39%      52.605ms      17.535ms             3

cc @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10 @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @Guobing-Chen @chunyuan-w @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire @VitalyFedyunin

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 19, 2022

🔗 Helpful links

❌ 18 New Failures

As of commit 75bfbc3 (more details on the Dr. CI page):

Expand to see more
  • 18/18 failures introduced in this PR

🕵️ 18 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge) (1/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T08:01:29.1192741Z ##[error]Process completed with exit code 1.
2022-08-19T08:01:29.1148336Z Non-cacheable calls                   0
2022-08-19T08:01:29.1148688Z Non-compilation calls                 0
2022-08-19T08:01:29.1148973Z Unsupported compiler calls            0
2022-08-19T08:01:29.1149239Z Average cache write               0.000 s
2022-08-19T08:01:29.1149484Z Average cache read miss           0.000 s
2022-08-19T08:01:29.1149694Z Average cache read hit            0.000 s
2022-08-19T08:01:29.1149918Z Failed distributed compilations       0
2022-08-19T08:01:29.1150498Z Cache location                  S3, bucket: Bucket(name=ossci-compiler-cache-circleci-v2, base_url=http://ossci-compiler-cache-circleci-v2.s3.amazonaws.com/)
2022-08-19T08:01:29.1150927Z + echo ::endgroup::
2022-08-19T08:01:29.1151493Z ##[endgroup]
2022-08-19T08:01:29.1192741Z ##[error]Process completed with exit code 1.
2022-08-19T08:01:29.1244007Z Prepare all required actions
2022-08-19T08:01:29.1244325Z Getting action download info
2022-08-19T08:01:29.2820254Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T08:01:29.2820556Z with:
2022-08-19T08:01:29.2820910Z   github-token: ***
2022-08-19T08:01:29.2821073Z env:
2022-08-19T08:01:29.2821256Z   GIT_DEFAULT_BRANCH: master
2022-08-19T08:01:29.2821450Z ##[endgroup]
2022-08-19T08:01:29.2847937Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a
2022-08-19T08:01:29.2848178Z with:

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge) (2/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T07:59:23.3157643Z RuntimeError: test_schema_check failed!
2022-08-19T07:59:22.2345045Z FAILED (errors=21, skipped=168, expected failures=63)
2022-08-19T07:59:22.2345191Z 
2022-08-19T07:59:22.2345275Z Generating XML reports...
2022-08-19T07:59:22.2345707Z Generated XML report: test-reports/python-unittest/test_schema_check/TEST-TestSchemaCheck-20220819075421.xml
2022-08-19T07:59:22.8086713Z Generated XML report: test-reports/python-unittest/test_schema_check/TEST-TestSchemaCheckModeOpInfoCPU-20220819075421.xml
2022-08-19T07:59:23.3152930Z Traceback (most recent call last):
2022-08-19T07:59:23.3153297Z   File "test/run_test.py", line 990, in <module>
2022-08-19T07:59:23.3154735Z     main()
2022-08-19T07:59:23.3155075Z   File "test/run_test.py", line 968, in main
2022-08-19T07:59:23.3157402Z     raise RuntimeError(err_message)
2022-08-19T07:59:23.3157643Z RuntimeError: test_schema_check failed!
2022-08-19T07:59:23.6141943Z 
2022-08-19T07:59:23.6142396Z real	5m7.097s
2022-08-19T07:59:23.6142663Z user	9m14.053s
2022-08-19T07:59:23.6142889Z sys	0m12.995s
2022-08-19T07:59:23.6181872Z ##[error]Process completed with exit code 1.
2022-08-19T07:59:23.6221850Z Prepare all required actions
2022-08-19T07:59:23.6222158Z Getting action download info
2022-08-19T07:59:23.8567051Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T07:59:23.8567278Z with:
2022-08-19T07:59:23.8567611Z   github-token: ***

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge) (3/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T07:56:47.5627530Z RuntimeError: test_ops failed!
2022-08-19T07:56:46.5999820Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-19T07:56:46.6002624Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-19T07:56:46.6005054Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-19T07:56:46.6013716Z = 1 failed, 1232 passed, 352 skipped, 8 xfailed, 63 warnings, 2 rerun in 133.29s (0:02:13) =
2022-08-19T07:56:46.6255560Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-19T07:56:47.5622528Z Traceback (most recent call last):
2022-08-19T07:56:47.5622826Z   File "test/run_test.py", line 990, in <module>
2022-08-19T07:56:47.5625039Z     main()
2022-08-19T07:56:47.5625706Z   File "test/run_test.py", line 968, in main
2022-08-19T07:56:47.5627053Z     raise RuntimeError(err_message)
2022-08-19T07:56:47.5627530Z RuntimeError: test_ops failed!
2022-08-19T07:56:47.9031650Z 
2022-08-19T07:56:47.9032075Z real	2m21.899s
2022-08-19T07:56:47.9032467Z user	11m25.076s
2022-08-19T07:56:47.9033973Z sys	0m11.537s
2022-08-19T07:56:47.9075140Z ##[error]Process completed with exit code 1.
2022-08-19T07:56:47.9118501Z Prepare all required actions
2022-08-19T07:56:47.9118823Z Getting action download info
2022-08-19T07:56:48.0634351Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T07:56:48.0634581Z with:
2022-08-19T07:56:48.0634925Z   github-token: ***

See GitHub Actions build pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (4/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T08:37:35.1368325Z RuntimeError: test_autograd failed!
2022-08-19T08:37:34.5365476Z Generated XML report: test-reports/python-unittest/test_autograd/TEST-TestAutogradForwardModeBatchedGrad-20220819083708.xml
2022-08-19T08:37:34.5484439Z Generated XML report: test-reports/python-unittest/test_autograd/TEST-autograd.test_functional.TestAutogradFunctional-20220819083708.xml
2022-08-19T08:37:34.5503558Z Generated XML report: test-reports/python-unittest/test_autograd/TEST-TestAutogradInferenceMode-20220819083708.xml
2022-08-19T08:37:34.5508969Z Generated XML report: test-reports/python-unittest/test_autograd/TEST-TestAutogradMultipleDispatchCUDA-20220819083708.xml
2022-08-19T08:37:34.5519340Z Generated XML report: test-reports/python-unittest/test_autograd/TEST-TestMultithreadAutograd-20220819083708.xml
2022-08-19T08:37:35.1363887Z Traceback (most recent call last):
2022-08-19T08:37:35.1364739Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 990, in <module>
2022-08-19T08:37:35.1365391Z     main()
2022-08-19T08:37:35.1366101Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 968, in main
2022-08-19T08:37:35.1367608Z     raise RuntimeError(err_message)
2022-08-19T08:37:35.1368325Z RuntimeError: test_autograd failed!
2022-08-19T08:37:35.4056427Z 
2022-08-19T08:37:35.4057456Z real	16m2.522s
2022-08-19T08:37:35.4058056Z user	15m50.753s
2022-08-19T08:37:35.4058344Z sys	0m43.521s
2022-08-19T08:37:35.4112783Z ##[error]Process completed with exit code 1.
2022-08-19T08:37:35.4156869Z Prepare all required actions
2022-08-19T08:37:35.4157282Z Getting action download info
2022-08-19T08:37:35.6841843Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T08:37:35.6842138Z with:
2022-08-19T08:37:35.6842586Z   github-token: ***

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (5/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T08:11:14.2597948Z RuntimeError: test_proxy_tensor failed!
2022-08-19T08:11:12.8617245Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorFake-20220819080418.xml
2022-08-19T08:11:12.8633038Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorReal-20220819080418.xml
2022-08-19T08:11:12.8651254Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorSymbolic-20220819080418.xml
2022-08-19T08:11:13.2739345Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestProxyTensorOpInfoCPU-20220819080418.xml
2022-08-19T08:11:13.2743944Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestSymbolicTracing-20220819080418.xml
2022-08-19T08:11:14.2592973Z Traceback (most recent call last):
2022-08-19T08:11:14.2593239Z   File "test/run_test.py", line 990, in <module>
2022-08-19T08:11:14.2595721Z     main()
2022-08-19T08:11:14.2595930Z   File "test/run_test.py", line 968, in main
2022-08-19T08:11:14.2597702Z     raise RuntimeError(err_message)
2022-08-19T08:11:14.2597948Z RuntimeError: test_proxy_tensor failed!
2022-08-19T08:11:14.5327266Z 
2022-08-19T08:11:14.5327635Z real	16m55.347s
2022-08-19T08:11:14.5328002Z user	47m26.961s
2022-08-19T08:11:14.5328294Z sys	6m19.107s
2022-08-19T08:11:14.5368822Z ##[error]Process completed with exit code 1.
2022-08-19T08:11:14.5412392Z Prepare all required actions
2022-08-19T08:11:14.5412694Z Getting action download info
2022-08-19T08:11:14.7183793Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T08:11:14.7184000Z with:
2022-08-19T08:11:14.7184353Z   github-token: ***

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (6/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T07:56:12.0536079Z RuntimeError: test_ops failed!
2022-08-19T07:56:11.2780915Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-19T07:56:11.2781428Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-19T07:56:11.2782730Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-19T07:56:11.2788927Z = 1 failed, 1157 passed, 292 skipped, 8 xfailed, 60 warnings, 2 rerun in 112.94s (0:01:52) =
2022-08-19T07:56:11.3016196Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-19T07:56:12.0531323Z Traceback (most recent call last):
2022-08-19T07:56:12.0531645Z   File "test/run_test.py", line 990, in <module>
2022-08-19T07:56:12.0533945Z     main()
2022-08-19T07:56:12.0534149Z   File "test/run_test.py", line 968, in main
2022-08-19T07:56:12.0535837Z     raise RuntimeError(err_message)
2022-08-19T07:56:12.0536079Z RuntimeError: test_ops failed!
2022-08-19T07:56:12.3091256Z 
2022-08-19T07:56:12.3091672Z real	1m59.212s
2022-08-19T07:56:12.3092025Z user	9m50.064s
2022-08-19T07:56:12.3093261Z sys	0m15.657s
2022-08-19T07:56:12.3130361Z ##[error]Process completed with exit code 1.
2022-08-19T07:56:12.3169733Z Prepare all required actions
2022-08-19T07:56:12.3170039Z Getting action download info
2022-08-19T07:56:12.5005635Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T07:56:12.5005858Z with:
2022-08-19T07:56:12.5006172Z   github-token: ***

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (7/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T07:59:11.1486118Z RuntimeError: test_schema_check failed!
2022-08-19T07:59:10.1223112Z FAILED (errors=21, skipped=168, expected failures=63)
2022-08-19T07:59:10.1223266Z 
2022-08-19T07:59:10.1223353Z Generating XML reports...
2022-08-19T07:59:10.1223788Z Generated XML report: test-reports/python-unittest/test_schema_check/TEST-TestSchemaCheck-20220819075350.xml
2022-08-19T07:59:10.6010585Z Generated XML report: test-reports/python-unittest/test_schema_check/TEST-TestSchemaCheckModeOpInfoCPU-20220819075350.xml
2022-08-19T07:59:11.1481313Z Traceback (most recent call last):
2022-08-19T07:59:11.1481746Z   File "test/run_test.py", line 990, in <module>
2022-08-19T07:59:11.1483609Z     main()
2022-08-19T07:59:11.1483987Z   File "test/run_test.py", line 968, in main
2022-08-19T07:59:11.1485693Z     raise RuntimeError(err_message)
2022-08-19T07:59:11.1486118Z RuntimeError: test_schema_check failed!
2022-08-19T07:59:11.4766143Z 
2022-08-19T07:59:11.4766496Z real	5m26.418s
2022-08-19T07:59:11.4766893Z user	8m15.820s
2022-08-19T07:59:11.4767127Z sys	0m2.196s
2022-08-19T07:59:11.4804709Z ##[error]Process completed with exit code 1.
2022-08-19T07:59:11.4851677Z Prepare all required actions
2022-08-19T07:59:11.4852239Z Getting action download info
2022-08-19T07:59:11.6811210Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T07:59:11.6811436Z with:
2022-08-19T07:59:11.6811907Z   github-token: ***

See GitHub Actions build pull / win-vs2019-cpu-py3 / build (8/18)

Step: "Build" (full log | diagnosis details)

2022-08-19T07:25:09.2510856Z C:\actions-runner\...error C3861: '__builtin_clz': identifier not found
2022-08-19T07:25:09.2040607Z         with
2022-08-19T07:25:09.2040865Z         [
2022-08-19T07:25:09.2041152Z             _Ty=int64_t,
2022-08-19T07:25:09.2041483Z             K=int64_t,
2022-08-19T07:25:09.2042028Z             V=int64_t
2022-08-19T07:25:09.2042303Z         ]
2022-08-19T07:25:09.2498830Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp(711): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::cpu_scatter_add_contig_kernel<scalar_t>(const at::Tensor &,const at::Tensor &,const at::Tensor &)' being compiled
2022-08-19T07:25:09.2502256Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): error C2131: expression did not evaluate to a constant
2022-08-19T07:25:09.2508438Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: failure was caused by a read of a variable outside its lifetime
2022-08-19T07:25:09.2509896Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: see usage of 'maxthreads'
2022-08-19T07:25:09.2510856Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(151): error C3861: '__builtin_clz': identifier not found
2022-08-19T07:25:09.2773499Z [4543/5902] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\WeightNormKernel.cpp.DEFAULT.cpp.obj
2022-08-19T07:25:09.2909752Z [4544/5902] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxUnpoolKernel.cpp.DEFAULT.cpp.obj
2022-08-19T07:25:09.4273158Z [4545/5902] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\SpmmReduceKernel.cpp.DEFAULT.cpp.obj
2022-08-19T07:25:10.0198780Z [4546/5902] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj
2022-08-19T07:25:16.5129092Z [4547/5902] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\quantized\cpu\kernels\QuantizedOpKernels.cpp.DEFAULT.cpp.obj
2022-08-19T07:25:17.0418000Z [4548/5902] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\SparseFactories.cpp.DEFAULT.cpp.obj
2022-08-19T07:25:19.3549997Z [4549/5902] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\core\ATenOpList.cpp.obj
2022-08-19T07:25:19.3550879Z ninja: build stopped: subcommand failed.
2022-08-19T07:25:19.3651510Z -- Building version 1.13.0a0+git75bfbc3
2022-08-19T07:25:19.3652795Z cmake -GNinja -DBUILD_ENVIRONMENT=win-vs2019-cpu-py3 -DBUILD_PYTHON=True -DBUILD_TEST=True -DBUILD_TYPE=release -DBUILD_WHEEL=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_GENERATOR=Ninja -DCMAKE_INCLUDE_PATH=C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -DCMAKE_INSTALL_PREFIX=C:\actions-runner\_work\pytorch\pytorch\torch -DCMAKE_PREFIX_PATH=C:\Jenkins\Miniconda3\Lib\site-packages -DNUMPY_INCLUDE_DIR=C:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=C:\Jenkins\Miniconda3\python.exe -DPYTHON_INCLUDE_DIR=C:\Jenkins\Miniconda3\Include -DPYTHON_LIBRARY=C:\Jenkins\Miniconda3/libs/python39.lib -DTORCH_BUILD_VERSION=1.13.0a0+git75bfbc3 -DUSE_CUDA=0 -DUSE_NUMPY=True C:\actions-runner\_work\pytorch\pytorch

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge) (9/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T08:06:27.2497227Z RuntimeError: /var.../jenkins/workspace/functorch/test/test_ops failed!
2022-08-19T08:06:25.6687285Z 
2022-08-19T08:06:25.6687429Z FAILED (errors=61, skipped=2191, expected failures=481, unexpected successes=1)
2022-08-19T08:06:25.6687435Z 
2022-08-19T08:06:25.6687520Z Generating XML reports...
2022-08-19T08:06:26.5606358Z Generated XML report: test-reports/python-unittest/functorch.test.test_ops/TEST-TestOperatorsCPU-20220819075425.xml
2022-08-19T08:06:27.2492404Z Traceback (most recent call last):
2022-08-19T08:06:27.2492689Z   File "test/run_test.py", line 990, in <module>
2022-08-19T08:06:27.2494059Z     main()
2022-08-19T08:06:27.2494339Z   File "test/run_test.py", line 968, in main
2022-08-19T08:06:27.2496857Z     raise RuntimeError(err_message)
2022-08-19T08:06:27.2497227Z RuntimeError: /var/lib/jenkins/workspace/functorch/test/test_ops failed!
2022-08-19T08:06:27.5802939Z ##[error]Process completed with exit code 1.
2022-08-19T08:06:27.5843318Z Prepare all required actions
2022-08-19T08:06:27.5843627Z Getting action download info
2022-08-19T08:06:27.7952944Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T08:06:27.7953170Z with:
2022-08-19T08:06:27.7953491Z   github-token: ***
2022-08-19T08:06:27.7953662Z env:
2022-08-19T08:06:27.7953839Z   GIT_DEFAULT_BRANCH: master
2022-08-19T08:06:27.7954015Z ##[endgroup]
2022-08-19T08:06:27.7979520Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (10/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T07:55:40.3444109Z RuntimeError: test_ops failed!
2022-08-19T07:55:39.4749914Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-19T07:55:39.4750293Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-19T07:55:39.4750801Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-19T07:55:39.4756461Z = 1 failed, 1157 passed, 292 skipped, 8 xfailed, 60 warnings, 2 rerun in 116.56s (0:01:56) =
2022-08-19T07:55:39.4964744Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-19T07:55:40.3439697Z Traceback (most recent call last):
2022-08-19T07:55:40.3440034Z   File "test/run_test.py", line 990, in <module>
2022-08-19T07:55:40.3440959Z     main()
2022-08-19T07:55:40.3441246Z   File "test/run_test.py", line 968, in main
2022-08-19T07:55:40.3443793Z     raise RuntimeError(err_message)
2022-08-19T07:55:40.3444109Z RuntimeError: test_ops failed!
2022-08-19T07:55:40.6546726Z 
2022-08-19T07:55:40.6547195Z real	2m3.150s
2022-08-19T07:55:40.6547473Z user	7m57.761s
2022-08-19T07:55:40.6547652Z sys	0m5.155s
2022-08-19T07:55:40.6585969Z ##[error]Process completed with exit code 1.
2022-08-19T07:55:40.6625288Z Prepare all required actions
2022-08-19T07:55:40.6625579Z Getting action download info
2022-08-19T07:55:40.8473386Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T07:55:40.8473607Z with:
2022-08-19T07:55:40.8473948Z   github-token: ***

See GitHub Actions build pull / win-vs2019-cuda11.6-py3 / build (11/18)

Step: "Build" (full log | diagnosis details)

2022-08-19T07:33:40.1792053Z C:\actions-runner\...error C3861: '__builtin_clz': identifier not found
2022-08-19T07:33:40.1724637Z         with
2022-08-19T07:33:40.1725676Z         [
2022-08-19T07:33:40.1726010Z             _Ty=int64_t,
2022-08-19T07:33:40.1782367Z             K=int64_t,
2022-08-19T07:33:40.1782704Z             V=int64_t
2022-08-19T07:33:40.1784026Z         ]
2022-08-19T07:33:40.1785113Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp(711): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::cpu_scatter_add_contig_kernel<scalar_t>(const at::Tensor &,const at::Tensor &,const at::Tensor &)' being compiled
2022-08-19T07:33:40.1786376Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): error C2131: expression did not evaluate to a constant
2022-08-19T07:33:40.1787248Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: failure was caused by a read of a variable outside its lifetime
2022-08-19T07:33:40.1788005Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: see usage of 'maxthreads'
2022-08-19T07:33:40.1792053Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(151): error C3861: '__builtin_clz': identifier not found
2022-08-19T07:33:40.5757003Z [4965/6397] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\CopyKernel.cpp.DEFAULT.cpp.obj
2022-08-19T07:33:40.6581737Z [4966/6397] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\DistributionKernels.cpp.DEFAULT.cpp.obj
2022-08-19T07:33:41.0439331Z [4967/6397] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\DepthwiseConvKernel.cpp.DEFAULT.cpp.obj
2022-08-19T07:33:41.8692988Z [4968/6397] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\Operators_1.cpp.obj
2022-08-19T07:33:47.8866186Z [4969/6397] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\RegisterFunctionalization_1.cpp.obj
2022-08-19T07:33:49.4163923Z [4970/6397] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\PixelShuffleKernel.cpp.DEFAULT.cpp.obj
2022-08-19T07:34:05.9057183Z [4971/6397] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\RegisterCPU.cpp.obj
2022-08-19T07:34:05.9059148Z ninja: build stopped: subcommand failed.
2022-08-19T07:34:05.9166267Z -- Building version 1.13.0a0+git75bfbc3
2022-08-19T07:34:05.9168232Z cmake -GNinja -DBUILD_ENVIRONMENT=win-vs2019-cuda11.6-py3 -DBUILD_PYTHON=True -DBUILD_SPLIT_CUDA=ON -DBUILD_TEST=True -DBUILD_TYPE=release -DBUILD_WHEEL=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/bin/nvcc.exe -DCMAKE_CUDA_COMPILER_LAUNCHER=C:/actions-runner/_work/pytorch/pytorch/build/win_tmp/bin/randomtemp.exe;C:/actions-runner/_work/pytorch/pytorch/build/win_tmp\bin\sccache.exe -DCMAKE_GENERATOR=Ninja -DCMAKE_INCLUDE_PATH=C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -DCMAKE_INSTALL_PREFIX=C:\actions-runner\_work\pytorch\pytorch\torch -DCMAKE_PREFIX_PATH=C:\Jenkins\Miniconda3\Lib\site-packages -DCUDA_NVCC_EXECUTABLE=C:/actions-runner/_work/pytorch/pytorch/build/win_tmp/bin/nvcc.bat -DCUDNN_LIBRARY=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\lib\x64 -DNUMPY_INCLUDE_DIR=C:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=C:\Jenkins\Miniconda3\python.exe -DPYTHON_INCLUDE_DIR=C:\Jenkins\Miniconda3\Include -DPYTHON_LIBRARY=C:\Jenkins\Miniconda3/libs/python39.lib -DTORCH_BUILD_VERSION=1.13.0a0+git75bfbc3 -DUSE_CUDA=1 -DUSE_NUMPY=True C:\actions-runner\_work\pytorch\pytorch

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge) (12/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T08:05:52.9177548Z RuntimeError: /var.../jenkins/workspace/functorch/test/test_ops failed!
2022-08-19T08:05:51.4759359Z 
2022-08-19T08:05:51.4759478Z FAILED (errors=61, skipped=2191, expected failures=482)
2022-08-19T08:05:51.4759483Z 
2022-08-19T08:05:51.4759572Z Generating XML reports...
2022-08-19T08:05:52.3429742Z Generated XML report: test-reports/python-unittest/functorch.test.test_ops/TEST-TestOperatorsCPU-20220819075448.xml
2022-08-19T08:05:52.9172831Z Traceback (most recent call last):
2022-08-19T08:05:52.9173116Z   File "test/run_test.py", line 990, in <module>
2022-08-19T08:05:52.9175003Z     main()
2022-08-19T08:05:52.9175330Z   File "test/run_test.py", line 968, in main
2022-08-19T08:05:52.9177070Z     raise RuntimeError(err_message)
2022-08-19T08:05:52.9177548Z RuntimeError: /var/lib/jenkins/workspace/functorch/test/test_ops failed!
2022-08-19T08:05:53.2104986Z ##[error]Process completed with exit code 1.
2022-08-19T08:05:53.2146274Z Prepare all required actions
2022-08-19T08:05:53.2146583Z Getting action download info
2022-08-19T08:05:53.4500994Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T08:05:53.4501222Z with:
2022-08-19T08:05:53.4501556Z   github-token: ***
2022-08-19T08:05:53.4501776Z env:
2022-08-19T08:05:53.4501952Z   GIT_DEFAULT_BRANCH: master
2022-08-19T08:05:53.4502139Z ##[endgroup]
2022-08-19T08:05:53.4527357Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge) (13/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T08:03:43.2591859Z RuntimeError: test_torch failed!
2022-08-19T08:03:42.4275989Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestTorchDeviceTypeCPU-20220819080156.xml
2022-08-19T08:03:42.4278955Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestVitalSignsCudaCPU-20220819080156.xml
2022-08-19T08:03:43.2312764Z [TORCH_VITAL] Dataloader.enabled		 True
2022-08-19T08:03:43.2313259Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2022-08-19T08:03:43.2313511Z [TORCH_VITAL] CUDA.used		 False
2022-08-19T08:03:43.2587536Z Traceback (most recent call last):
2022-08-19T08:03:43.2587819Z   File "test/run_test.py", line 990, in <module>
2022-08-19T08:03:43.2589960Z     main()
2022-08-19T08:03:43.2590233Z   File "test/run_test.py", line 968, in main
2022-08-19T08:03:43.2591611Z     raise RuntimeError(err_message)
2022-08-19T08:03:43.2591859Z RuntimeError: test_torch failed!
2022-08-19T08:03:43.5227515Z 
2022-08-19T08:03:43.5227953Z real	9m32.973s
2022-08-19T08:03:43.5228327Z user	15m13.723s
2022-08-19T08:03:43.5228572Z sys	1m52.602s
2022-08-19T08:03:43.5267928Z ##[error]Process completed with exit code 1.
2022-08-19T08:03:43.5324008Z Prepare all required actions
2022-08-19T08:03:43.5324307Z Getting action download info
2022-08-19T08:03:43.6987634Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T08:03:43.6987851Z with:
2022-08-19T08:03:43.6988206Z   github-token: ***

See GitHub Actions build pull / linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build (14/18)

Step: "Build" (full log | diagnosis details)

2022-08-19T07:22:07.7771587Z collect2: error: ld returned 1 exit status
2022-08-19T07:21:50.2379786Z [100%] �[32m�[1mLinking CXX executable ../bin/inline_container_test�[0m
2022-08-19T07:21:50.5756716Z /usr/bin/ld: /var/lib/jenkins/workspace/build/custom_test_artifacts/build/lib/libtorch_cpu.so: undefined reference to `at::native::spmm_sum_cpu(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, at::Tensor const&)'
2022-08-19T07:21:50.5757525Z collect2: error: ld returned 1 exit status
2022-08-19T07:21:50.5774200Z make[2]: *** [caffe2/CMakeFiles/inline_container_test.dir/build.make:109: bin/inline_container_test] Error 1
2022-08-19T07:21:50.5775051Z make[1]: *** [CMakeFiles/Makefile2:3784: caffe2/CMakeFiles/inline_container_test.dir/all] Error 2
2022-08-19T07:21:50.5777007Z make[1]: *** Waiting for unfinished jobs....
2022-08-19T07:21:58.6661187Z [100%] �[32m�[1mLinking CXX shared library ../lib/libbackend_with_compiler_runtime.so�[0m
2022-08-19T07:21:58.8749631Z [100%] Built target backend_with_compiler_runtime
2022-08-19T07:22:07.5088471Z [100%] �[32m�[1mLinking CXX executable ../bin/test_codegen_unboxing�[0m
2022-08-19T07:22:07.7771123Z /usr/bin/ld: /var/lib/jenkins/workspace/build/custom_test_artifacts/build/lib/libtorch_cpu.so: undefined reference to `at::native::spmm_sum_cpu(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, at::Tensor const&)'
2022-08-19T07:22:07.7771587Z collect2: error: ld returned 1 exit status
2022-08-19T07:22:07.7785172Z make[2]: *** [test_codegen_unboxing/CMakeFiles/test_codegen_unboxing.dir/build.make:123: bin/test_codegen_unboxing] Error 1
2022-08-19T07:22:07.7786142Z make[1]: *** [CMakeFiles/Makefile2:5280: test_codegen_unboxing/CMakeFiles/test_codegen_unboxing.dir/all] Error 2
2022-08-19T07:22:07.7788278Z make: *** [Makefile:146: all] Error 2
2022-08-19T07:22:07.8176418Z ##[error]Process completed with exit code 1.
2022-08-19T07:22:07.8214159Z Prepare all required actions
2022-08-19T07:22:07.8233358Z ##[group]Run ./.github/actions/teardown-linux
2022-08-19T07:22:07.8233555Z with:
2022-08-19T07:22:07.8233717Z ##[endgroup]
2022-08-19T07:22:07.8248895Z ##[group]Run .github/scripts/wait_for_ssh_to_drain.sh
2022-08-19T07:22:07.8249168Z �[36;1m.github/scripts/wait_for_ssh_to_drain.sh�[0m

See GitHub Actions build pull / linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test (15/18)

Step: "Build" (full log | diagnosis details)

2022-08-19T07:26:19.2907629Z FAILED: ../../.....ake/release/obj/armeabi-v7a/libpytorch_jni_lite.so
2022-08-19T07:26:19.2900449Z   [11/20] Building CXX object fbjni/armeabi-v7a/CMakeFiles/fbjni.dir/cxx/lyra/lyra_breakpad.cpp.o
2022-08-19T07:26:19.2901163Z   [12/20] Building CXX object fbjni/armeabi-v7a/CMakeFiles/fbjni.dir/cxx/lyra/cxa_throw.cpp.o
2022-08-19T07:26:19.2901785Z   [13/20] Building CXX object fbjni/armeabi-v7a/CMakeFiles/fbjni.dir/cxx/fbjni/fbjni.cpp.o
2022-08-19T07:26:19.2902474Z   [14/20] Building CXX object fbjni/armeabi-v7a/CMakeFiles/fbjni.dir/cxx/lyra/lyra.cpp.o
2022-08-19T07:26:19.2903206Z   [15/20] Building CXX object fbjni/armeabi-v7a/CMakeFiles/fbjni.dir/cxx/lyra/lyra_exceptions.cpp.o
2022-08-19T07:26:19.2903998Z   [16/20] Building CXX object fbjni/armeabi-v7a/CMakeFiles/fbjni.dir/cxx/fbjni/detail/Exceptions.cpp.o
2022-08-19T07:26:19.2904792Z   [17/20] Linking CXX shared library ../../../../build/intermediates/cmake/release/obj/armeabi-v7a/libfbjni.so
2022-08-19T07:26:19.2905430Z   [18/20] Building CXX object CMakeFiles/pytorch_jni_lite.dir/src/main/cpp/pytorch_jni_common.cpp.o
2022-08-19T07:26:19.2906061Z   [19/20] Building CXX object CMakeFiles/pytorch_jni_lite.dir/src/main/cpp/pytorch_jni_lite.cpp.o
2022-08-19T07:26:19.2906896Z   [20/20] Linking CXX shared library ../../../../build/intermediates/cmake/release/obj/armeabi-v7a/libpytorch_jni_lite.so
2022-08-19T07:26:19.2907629Z   FAILED: ../../../../build/intermediates/cmake/release/obj/armeabi-v7a/libpytorch_jni_lite.so 
2022-08-19T07:26:19.2913460Z   : && /opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot -fPIC -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -mfpu=vfpv3-d16 -fno-addrsig -march=armv7-a -mthumb -Wa,--noexecstack -Wformat -Werror=format-security -stdlib=libc++  -Oz -DNDEBUG  -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libatomic.a -Wl,--build-id -Wl,--warn-shared-textrel -Wl,--fatal-warnings -Wl,--exclude-libs,libunwind.a -Wl,--no-undefined -Qunused-arguments -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -shared -Wl,-soname,libpytorch_jni_lite.so -o ../../../../build/intermediates/cmake/release/obj/armeabi-v7a/libpytorch_jni_lite.so CMakeFiles/pytorch_jni_lite.dir/src/main/cpp/pytorch_jni_common.cpp.o CMakeFiles/pytorch_jni_lite.dir/src/main/cpp/pytorch_jni_lite.cpp.o  ../../../../build/intermediates/cmake/release/obj/armeabi-v7a/libfbjni.so -Wl,--gc-sections -Wl,--whole-archive ../../../../src/main/jniLibs/armeabi-v7a/libtorch.a ../../../../src/main/jniLibs/armeabi-v7a/libtorch_cpu.a -Wl,--no-whole-archive ../../../../src/main/jniLibs/armeabi-v7a/libc10.a ../../../../src/main/jniLibs/armeabi-v7a/libnnpack.a ../../../../src/main/jniLibs/armeabi-v7a/libXNNPACK.a ../../../../src/main/jniLibs/armeabi-v7a/libpytorch_qnnpack.a ../../../../src/main/jniLibs/armeabi-v7a/libpthreadpool.a ../../../../src/main/jniLibs/armeabi-v7a/libeigen_blas.a ../../../../src/main/jniLibs/armeabi-v7a/libcpuinfo.a ../../../../src/main/jniLibs/armeabi-v7a/libclog.a libVulkanWrapper.a -landroid -llog -ldl -latomic -lm && :
2022-08-19T07:26:19.2918077Z   ../../../../src/main/jniLibs/armeabi-v7a/libtorch_cpu.a(RegisterCPU.cpp.o):RegisterCPU.cpp:function at::(anonymous namespace)::(anonymous namespace)::wrapper__spmm_sum(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, at::Tensor const&): error: undefined reference to 'at::native::spmm_sum_cpu(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, at::Tensor const&)'
2022-08-19T07:26:19.2919362Z   clang++: error: linker command failed with exit code 1 (use -v to see invocation)
2022-08-19T07:26:19.2919833Z   ninja: build stopped: subcommand failed.
2022-08-19T07:26:19.2920163Z   
2022-08-19T07:26:19.2920315Z 
2022-08-19T07:26:19.2920321Z 
2022-08-19T07:26:19.2920410Z * Try:
2022-08-19T07:26:19.2921103Z Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
2022-08-19T07:26:19.2921496Z 

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (16/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T07:56:35.7107023Z RuntimeError: test_ops failed!
2022-08-19T07:56:34.9345748Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-19T07:56:34.9347870Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-19T07:56:34.9348235Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-19T07:56:34.9354539Z = 1 failed, 1211 passed, 294 skipped, 8 xfailed, 63 warnings, 2 rerun in 125.28s (0:02:05) =
2022-08-19T07:56:34.9550194Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-19T07:56:35.7102163Z Traceback (most recent call last):
2022-08-19T07:56:35.7102451Z   File "test/run_test.py", line 990, in <module>
2022-08-19T07:56:35.7104227Z     main()
2022-08-19T07:56:35.7104568Z   File "test/run_test.py", line 968, in main
2022-08-19T07:56:35.7106623Z     raise RuntimeError(err_message)
2022-08-19T07:56:35.7107023Z RuntimeError: test_ops failed!
2022-08-19T07:56:35.9725135Z 
2022-08-19T07:56:35.9725470Z real	2m11.852s
2022-08-19T07:56:35.9725824Z user	11m15.355s
2022-08-19T07:56:35.9726125Z sys	0m17.993s
2022-08-19T07:56:35.9769664Z ##[error]Process completed with exit code 1.
2022-08-19T07:56:35.9810380Z Prepare all required actions
2022-08-19T07:56:35.9810676Z Getting action download info
2022-08-19T07:56:36.1661624Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T07:56:36.1661851Z with:
2022-08-19T07:56:36.1662175Z   github-token: ***

See GitHub Actions build pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (17/18)

Step: "Test" (full log | diagnosis details)

2022-08-19T08:45:00.5482076Z RuntimeError: test_torch failed!
2022-08-19T08:44:59.9509835Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestTorchDeviceTypeCUDA-20220819084311.xml
2022-08-19T08:44:59.9512920Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestVitalSignsCudaCUDA-20220819084311.xml
2022-08-19T08:45:00.4204569Z [TORCH_VITAL] Dataloader.enabled		 True
2022-08-19T08:45:00.4204968Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2022-08-19T08:45:00.4205273Z [TORCH_VITAL] CUDA.used		 true
2022-08-19T08:45:00.5476553Z Traceback (most recent call last):
2022-08-19T08:45:00.5477154Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 990, in <module>
2022-08-19T08:45:00.5478598Z     main()
2022-08-19T08:45:00.5479108Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 968, in main
2022-08-19T08:45:00.5481785Z     raise RuntimeError(err_message)
2022-08-19T08:45:00.5482076Z RuntimeError: test_torch failed!
2022-08-19T08:45:00.8074489Z 
2022-08-19T08:45:00.8074880Z real	24m17.800s
2022-08-19T08:45:00.8075338Z user	39m2.714s
2022-08-19T08:45:00.8075750Z sys	3m8.401s
2022-08-19T08:45:00.8125248Z ##[error]Process completed with exit code 1.
2022-08-19T08:45:00.8168348Z Prepare all required actions
2022-08-19T08:45:00.8168762Z Getting action download info
2022-08-19T08:45:01.0133836Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-19T08:45:01.0134125Z with:
2022-08-19T08:45:01.0134576Z   github-token: ***

See GitHub Actions build Lint / lintrunner (18/18)

Step: "Run lintrunner on all files" (full log | diagnosis details)

2022-08-19T07:14:21.9618034Z ##[error]This line has trailing spaces; please remove them.
2022-08-19T07:14:21.7745367Z �[36;1m# Use jq to massage the JSON lint output into GitHub Actions workflow commands.�[0m
2022-08-19T07:14:21.7745644Z �[36;1mjq --raw-output \�[0m
2022-08-19T07:14:21.7746033Z �[36;1m  '"::\(if .severity == "advice" or .severity == "disabled" then "warning" else .severity end) file=\(.path),line=\(.line),col=\(.char),title=\(.code) \(.name)::" + (.description | gsub("\\n"; "%0A"))' \�[0m
2022-08-19T07:14:21.7746379Z �[36;1m  lint.json�[0m
2022-08-19T07:14:21.7791858Z shell: /usr/bin/bash -e {0}
2022-08-19T07:14:21.7792052Z env:
2022-08-19T07:14:21.7792278Z   pythonLocation: /opt/hostedtoolcache/Python/3.8.13/x64
2022-08-19T07:14:21.7792582Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.13/x64/lib
2022-08-19T07:14:21.7792823Z ##[endgroup]
2022-08-19T07:14:21.9616279Z ##[error]Trailing newline found. Run `lintrunner --take NEWLINE -a` to apply changes.
2022-08-19T07:14:21.9618034Z ##[error]This line has trailing spaces; please remove them.
2022-08-19T07:14:21.9660649Z Post job cleanup.
2022-08-19T07:14:21.9694548Z Post job cleanup.
2022-08-19T07:14:22.0773163Z [command]/usr/bin/git version
2022-08-19T07:14:22.0825940Z git version 2.37.2
2022-08-19T07:14:22.0870429Z Temporarily overriding HOME='/home/runner/work/_temp/2f7bc0cd-fe7d-46df-83b4-c55da2a9de60' before making global git config changes
2022-08-19T07:14:22.0870925Z Adding repository directory to the temporary git global config as a safe directory
2022-08-19T07:14:22.0876934Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/pytorch/pytorch
2022-08-19T07:14:22.0925644Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2022-08-19T07:14:22.0971257Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2022-08-19T07:14:22.1225930Z Entering 'android/libs/fbjni'

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mingfeima added a commit that referenced this pull request Aug 19, 2022
ghstack-source-id: b41151f
Pull Request resolved: #83727
@mingfeima mingfeima marked this pull request as draft August 19, 2022 07:07
@mingfeima
Copy link
Collaborator Author

mingfeima commented Aug 19, 2022

  • refine API definition
  • add bfloat16 support
  • revise thread partition logic

@frank-wei
Copy link
Contributor

Is it possible to make this implementation in torch.sparse?

@mingfeima
Copy link
Collaborator Author

Is it possible to make this implementation in torch.sparse?

sure, definitely a better idea. Currently I am just trying to show the performance benefit here on some GNN workloads (to convince the management that this type of work is worthy). Still there are a lot of ongoing work here: make a better API definition for spmm_reduce, add doc, register the backward, etc. And this one should go under torch.sparse I suppose.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 22, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83727

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 510535e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mingfeima added a commit that referenced this pull request Sep 23, 2022
ghstack-source-id: 8f6ad79
Pull Request resolved: #83727
mingfeima added a commit that referenced this pull request Sep 29, 2022
ghstack-source-id: c49eed5
Pull Request resolved: #83727
### Motivation of this PR

This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. #71300

**GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes:

* COO: the hotspot is `scatter_reduce`
* CSR: the hotspot is `spmm_reduce`

The reduce type can be choose from: "max", "mean", "max",  "min".

extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally.
`sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs:
* `out` - the actual output
* `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated.


### Performance

Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch.

Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused.


#### before:
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
       torch_sparse::spmm_sum        97.09%       56.086s        97.09%       56.088s        6.232s             9
                 aten::linear         0.00%      85.000us         1.38%     795.485ms      88.387ms             9
                 aten::matmul         0.00%      57.000us         1.38%     795.260ms      88.362ms             9
                     aten::mm         1.38%     795.201ms         1.38%     795.203ms      88.356ms             9
                   aten::relu         0.00%      50.000us         0.76%     440.434ms      73.406ms             6
              aten::clamp_min         0.76%     440.384ms         0.76%     440.384ms      73.397ms             6
                   aten::add_         0.57%     327.801ms         0.57%     327.801ms      36.422ms             9
            aten::log_softmax         0.00%      23.000us         0.10%      55.503ms      18.501ms             3
```

#### after
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
               aten::spmm_sum        87.35%       11.826s        87.36%       11.827s        1.314s             9
                 aten::linear         0.00%      92.000us         5.87%     794.451ms      88.272ms             9
                 aten::matmul         0.00%      62.000us         5.87%     794.208ms      88.245ms             9
                     aten::mm         5.87%     794.143ms         5.87%     794.146ms      88.238ms             9
                   aten::relu         0.00%      53.000us         3.35%     452.977ms      75.496ms             6
              aten::clamp_min         3.35%     452.924ms         3.35%     452.924ms      75.487ms             6
                   aten::add_         2.58%     348.663ms         2.58%     348.663ms      38.740ms             9
                 aten::argmax         0.42%      57.473ms         0.42%      57.475ms      14.369ms             4
            aten::log_softmax         0.00%      22.000us         0.39%      52.605ms      17.535ms             3
```

cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang Guobing-Chen chunyuan-w zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire VitalyFedyunin

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Feb 9, 2023
ghstack-source-id: 758faa1
Pull Request resolved: #83727
@mingfeima mingfeima requested a review from cpuhrsch February 9, 2023 02:35
@mingfeima mingfeima removed the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Feb 9, 2023
Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mingfeima for this PR!

I have a number of nits, generate CSC test samples, and a suggestion to reduce the number of arguments in the new native functions to reduce the ambiguity of naming conventions. For instance, row_indices is different in CSC context compared to the CSR context. So, the suggestion is to capture the arguments ccol_indices, row_indices, csr2csc in a single argument self_permute that captures the same information and is easier to compute (my review suggestions are not tested and I may have missed a few details but hopefully the general idea of the suggestions is clear).

- func: _sparse_mm_reduce_impl(Tensor self, Tensor other, str reduce, *, Tensor? row_indices=None, Tensor? ccol_indices=None, Tensor? csr2csc=None) -> (Tensor, Tensor)
python_module: sparse
dispatch:
SparseCsrCPU: _sparse_mm_reduce_impl_sparse_csr_cpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparseCsrCPU is the dispatch key for sparse compressed tensors in general, so it includes CSR, CSC, BSR, and BSC storage formats. To distinguish the formats, the layout must be used within the _sparse_mm_reduce_impl_sparse_csr_cpu implementation.

@mingfeima
Copy link
Collaborator Author

mingfeima commented Feb 10, 2023

@pearu Thanks for your review! I have addressed your comments and updated this PR, could you please review again ? Thank you very much !

updates:

  1. disable CSC support for now, see the above discussion port sparse_mm.reduce to pytorch and optimize it on CPU #83727 (comment), right now CSC is computing A^T * B, which does not match CSR result (A * B)
  2. remove all the meta tensors (csr2csc, ccol_indices, col_indices) from func signature, to make the API more clean. In the future I will optimize it, right now the backward should be slow due to argsort.

### Motivation of this PR

This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. #71300

**GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes:

* COO: the hotspot is `scatter_reduce`
* CSR: the hotspot is `spmm_reduce`

The reduce type can be choose from: "max", "mean", "max",  "min".

extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally.
`sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs:
* `out` - the actual output
* `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated.


### Performance

Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch.

Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused.


#### before:
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
       torch_sparse::spmm_sum        97.09%       56.086s        97.09%       56.088s        6.232s             9
                 aten::linear         0.00%      85.000us         1.38%     795.485ms      88.387ms             9
                 aten::matmul         0.00%      57.000us         1.38%     795.260ms      88.362ms             9
                     aten::mm         1.38%     795.201ms         1.38%     795.203ms      88.356ms             9
                   aten::relu         0.00%      50.000us         0.76%     440.434ms      73.406ms             6
              aten::clamp_min         0.76%     440.384ms         0.76%     440.384ms      73.397ms             6
                   aten::add_         0.57%     327.801ms         0.57%     327.801ms      36.422ms             9
            aten::log_softmax         0.00%      23.000us         0.10%      55.503ms      18.501ms             3
```

#### after
```
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------
               aten::spmm_sum        87.35%       11.826s        87.36%       11.827s        1.314s             9
                 aten::linear         0.00%      92.000us         5.87%     794.451ms      88.272ms             9
                 aten::matmul         0.00%      62.000us         5.87%     794.208ms      88.245ms             9
                     aten::mm         5.87%     794.143ms         5.87%     794.146ms      88.238ms             9
                   aten::relu         0.00%      53.000us         3.35%     452.977ms      75.496ms             6
              aten::clamp_min         3.35%     452.924ms         3.35%     452.924ms      75.487ms             6
                   aten::add_         2.58%     348.663ms         2.58%     348.663ms      38.740ms             9
                 aten::argmax         0.42%      57.473ms         0.42%      57.475ms      14.369ms             4
            aten::log_softmax         0.00%      22.000us         0.39%      52.605ms      17.535ms             3
```

cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang Guobing-Chen chunyuan-w zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire VitalyFedyunin

[ghstack-poisoned]
@mingfeima mingfeima requested a review from fegin as a code owner February 10, 2023 03:15
mingfeima added a commit that referenced this pull request Feb 10, 2023
ghstack-source-id: a54124a
Pull Request resolved: #83727
@mingfeima mingfeima requested a review from pearu February 10, 2023 03:18
Copy link
Collaborator

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, @mingfeima!

Some notes:

  1. The feature in this PR is implemented only for 2-D CSR tensors using CPU storage. Extending it to CSC tensors is possible but requires more work. For instance, the discussion in #83727 (comment) reveals that kernels need to be extended so that the invariant
    torch.sparse.mm(x.to_sparse_csr(), y, "mean") == torch.sparse.mm(x.to_sparse_csc(), y, "mean")
    holds. My initial estimate is that this requires the availability of both CSR and CSC indices to materialize (using csc.transpose(0, 1) trick will not be sufficient). Notice that this note is not relevant for non-mean reductions that do not require values normalization.
  2. The arguments triple row_indices, ccol_indices, csr2csc represents the tensor self_permute = sparse_csr_tensor(csr.crow_indices(), csr.col_indices(), arange(csr._nnz())).to_sparse_csc() and is likely relevant also elsewhere. So we might want to re-design this API to use a single indices tensor argument self_permute and apply it to other cases. But not in this PR!

@mingfeima
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
fkhan1337, qxy11, boyuantan, likethesky, rsemenov, ...

Details for Dev Infra team Raised by workflow job

@mingfeima
Copy link
Collaborator Author

@ezyang need super user for this one!

@cpuhrsch
Copy link
Contributor

@pytorchbot merge

@mingfeima
Copy link
Collaborator Author

@pearu Thanks for the comments, really helpful! I will come back and perfect this PR later on.

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
Hangjun, kyulee-com, XilunWu, glaringlee, ydwu4, ...

Details for Dev Infra team Raised by workflow job

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACKing reviewer list, no direct review

@ezyang
Copy link
Contributor

ezyang commented Feb 10, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
erichan1, zanqi, JustinPinero, fduwjj, ashkan-software, ...

Details for Dev Infra team Raised by workflow job

@cpuhrsch
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

rusty1s added a commit to pyg-team/pytorch_geometric that referenced this pull request Feb 19, 2023
…nel using CSR format (#6699)

Related to important optimization in Pytorch:
:white_check_mark: [port sparse_mm.reduce to pytorch and optimize it on
CPU #83727
](pytorch/pytorch#83727)
Updating simplified high-level API for `spmm_reduce()` kernel and tests.

The current kernel implementation has limitation to process `src` of
type `torch.Tensor` in `torch.sparse_csr` format, therefore I've added
an option to auto-convert `src` to CSR format using
`src.to_sparse_csr()`, which is `False` by default and will result in
`ValueError` if the input is not provided in the correct format.

The conversion from `SparseTensor` to `torch.Tensor` is enabled by
default for Pytorch > 1.13.

Added transfrom to remove duplicated in ogbn-products dataset, because
the new kernel can't handle duplicate entries (useful for benchmarks).

_Re-opened this PR because the draft (#6689) needed to be scrapped after
a rebase._

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>
@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/85/head branch June 8, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor open source release notes: sparse release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.