optimize scatter_add performance for gnn usage on CPU #82703

mingfeima · 2022-08-03T05:40:44Z

Stack from ghstack:

Motivation of this PR

This PR is targeting at improving performance of scatter_add for GNN usage scenarios on PyG. Currently only CPU optimizations is covered.

Message Passing is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the EdgeIndex is stored as [2, num_edges], scatter_reduce would be a major perf hotspot on current pytorch implementations.

To be more specific, in the process of message passing, scatter_add is used in a very similar way as index_select, except that the self tensor is written into while index_select is only reading. Therefore, the index tensor passed to scatter_add is an expanded tensor on dim0, which means all the rest of dims would end up with the same value.

Algorithm

Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized.

This PR did sorting on the index to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to:

convert memory format from COO to CSR
do spmm reduce

Perf improvement

The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, python reddit.py which runs model SAGE on dataset reddit.

CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz

aten::scatter_add_ has been reduced from 37.797s to 5.989s:

breakdown before

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                     aten::scatter_add_        49.00%       37.797s        49.00%       37.797s      41.445ms           912
                                     aten::index_select        19.74%       15.223s        19.74%       15.227s       6.678ms          2280
                                           aten::linear         0.01%       5.706ms        15.04%       11.602s      12.721ms           912
                                            aten::addmm         6.62%        5.108s         7.92%        6.112s      13.403ms           456
                                           aten::matmul         0.00%       2.339ms         7.10%        5.475s      12.006ms           456

breakdown after

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                     aten::index_select        32.41%       14.677s        32.42%       14.681s       6.439ms          2280
                                           aten::linear         0.01%       6.665ms        26.43%       11.968s      13.123ms           912
                                            aten::addmm        11.76%        5.328s        13.76%        6.232s      13.667ms           456
                                     aten::scatter_add_        13.22%        5.989s        13.22%        5.989s       6.566ms           912
                                           aten::matmul         0.01%       2.303ms        12.63%        5.720s      12.543ms           456

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

facebook-github-bot · 2022-08-03T05:40:50Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82703
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 15 New Failures

As of commit 97c4f58 (more details on the Dr. CI page):

Expand to see more

15/15 failures introduced in this PR

🕵️ 15 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (1/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:43:48.1471359Z RuntimeError: test_ops failed!

2022-08-15T02:43:47.3129230Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-15T02:43:47.3129825Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:43:47.3130149Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:43:47.3134495Z = 1 failed, 1182 passed, 294 skipped, 8 xfailed, 61 warnings, 2 rerun in 125.58s (0:02:05) =
2022-08-15T02:43:47.3379688Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:43:48.1466833Z Traceback (most recent call last):
2022-08-15T02:43:48.1469113Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:43:48.1469308Z     main()
2022-08-15T02:43:48.1469504Z   File "test/run_test.py", line 968, in main
2022-08-15T02:43:48.1471087Z     raise RuntimeError(err_message)
2022-08-15T02:43:48.1471359Z RuntimeError: test_ops failed!
2022-08-15T02:43:48.4494770Z 
2022-08-15T02:43:48.4495011Z real	2m12.306s
2022-08-15T02:43:48.4495272Z user	11m18.926s
2022-08-15T02:43:48.4495443Z sys	0m17.345s
2022-08-15T02:43:48.4527143Z ##[error]Process completed with exit code 1.
2022-08-15T02:43:48.4563675Z Prepare all required actions
2022-08-15T02:43:48.4564000Z Getting action download info
2022-08-15T02:43:48.6382096Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:43:48.6382319Z with:
2022-08-15T02:43:48.6382647Z   github-token: ***

pull / win-vs2019-cuda11.6-py3 / build (2/15)

Step: "Build" (full log | diagnosis details)

2022-08-15T02:42:39.7559263Z C:\actions-runner\...error C3861: '__builtin_clz': identifier not found

2022-08-15T02:42:39.7553626Z         with
2022-08-15T02:42:39.7553891Z         [
2022-08-15T02:42:39.7554174Z             _Ty=int64_t,
2022-08-15T02:42:39.7554485Z             K=int64_t,
2022-08-15T02:42:39.7554765Z             V=int64_t
2022-08-15T02:42:39.7555045Z         ]
2022-08-15T02:42:39.7555939Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp(711): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::cpu_scatter_add_contig_kernel<scalar_t>(const at::Tensor &,const at::Tensor &,const at::Tensor &)' being compiled
2022-08-15T02:42:39.7557069Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): error C2131: expression did not evaluate to a constant
2022-08-15T02:42:39.7557854Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: failure was caused by a read of a variable outside its lifetime
2022-08-15T02:42:39.7558606Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: see usage of 'maxthreads'
2022-08-15T02:42:39.7559263Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(151): error C3861: '__builtin_clz': identifier not found
2022-08-15T02:42:40.6957750Z [4998/6393] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\jit\passes\bailout_graph.cpp.obj
2022-08-15T02:42:40.8620701Z [4999/6393] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\jit\passes\batch_mm.cpp.obj
2022-08-15T02:42:41.1086670Z [5000/6393] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\autograd\generated\VariableType_0.cpp.obj
2022-08-15T02:42:41.7574944Z [5001/6393] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\autograd\generated\Functions.cpp.obj
2022-08-15T02:42:41.9333048Z [5002/6393] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\autograd\generated\VariableType_4.cpp.obj
2022-08-15T02:42:42.0976104Z [5003/6393] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\autograd\generated\VariableType_2.cpp.obj
2022-08-15T02:42:42.2086477Z [5004/6393] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\autograd\generated\VariableType_1.cpp.obj
2022-08-15T02:42:42.2280204Z ninja: build stopped: subcommand failed.
2022-08-15T02:42:42.2387935Z -- Building version 1.13.0a0+git97c4f58
2022-08-15T02:42:42.2389628Z cmake -GNinja -DBUILD_ENVIRONMENT=win-vs2019-cuda11.6-py3 -DBUILD_PYTHON=True -DBUILD_SPLIT_CUDA=ON -DBUILD_TEST=True -DBUILD_TYPE=release -DBUILD_WHEEL=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/bin/nvcc.exe -DCMAKE_CUDA_COMPILER_LAUNCHER=C:/actions-runner/_work/pytorch/pytorch/build/win_tmp/bin/randomtemp.exe;C:/actions-runner/_work/pytorch/pytorch/build/win_tmp\bin\sccache.exe -DCMAKE_GENERATOR=Ninja -DCMAKE_INCLUDE_PATH=C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -DCMAKE_INSTALL_PREFIX=C:\actions-runner\_work\pytorch\pytorch\torch -DCMAKE_PREFIX_PATH=C:\Jenkins\Miniconda3\Lib\site-packages -DCUDA_NVCC_EXECUTABLE=C:/actions-runner/_work/pytorch/pytorch/build/win_tmp/bin/nvcc.bat -DCUDNN_LIBRARY=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\lib\x64 -DNUMPY_INCLUDE_DIR=C:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=C:\Jenkins\Miniconda3\python.exe -DPYTHON_INCLUDE_DIR=C:\Jenkins\Miniconda3\Include -DPYTHON_LIBRARY=C:\Jenkins\Miniconda3/libs/python39.lib -DTORCH_BUILD_VERSION=1.13.0a0+git97c4f58 -DUSE_CUDA=1 -DUSE_NUMPY=True C:\actions-runner\_work\pytorch\pytorch

pull / linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge) (3/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:51:08.0500586Z RuntimeError: /var.../jenkins/workspace/functorch/test/test_ops failed!

2022-08-15T02:51:06.6731040Z 
2022-08-15T02:51:06.6731156Z FAILED (errors=61, skipped=2191, expected failures=482)
2022-08-15T02:51:06.6731161Z 
2022-08-15T02:51:06.6731243Z Generating XML reports...
2022-08-15T02:51:07.5216594Z Generated XML report: test-reports/python-unittest/functorch.test.test_ops/TEST-TestOperatorsCPU-20220815023902.xml
2022-08-15T02:51:08.0496230Z Traceback (most recent call last):
2022-08-15T02:51:08.0496510Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:51:08.0498392Z     main()
2022-08-15T02:51:08.0498597Z   File "test/run_test.py", line 968, in main
2022-08-15T02:51:08.0500314Z     raise RuntimeError(err_message)
2022-08-15T02:51:08.0500586Z RuntimeError: /var/lib/jenkins/workspace/functorch/test/test_ops failed!
2022-08-15T02:51:08.3249670Z ##[error]Process completed with exit code 1.
2022-08-15T02:51:08.3284668Z Prepare all required actions
2022-08-15T02:51:08.3284963Z Getting action download info
2022-08-15T02:51:08.5044917Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:51:08.5045142Z with:
2022-08-15T02:51:08.5045465Z   github-token: ***
2022-08-15T02:51:08.5045715Z env:
2022-08-15T02:51:08.5045929Z   GIT_DEFAULT_BRANCH: master
2022-08-15T02:51:08.5046100Z ##[endgroup]
2022-08-15T02:51:08.5071332Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a

pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge) (4/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:43:42.9554218Z RuntimeError: test_ops failed!

2022-08-15T02:43:42.2113320Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-15T02:43:42.2113976Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:43:42.2114659Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:43:42.2119273Z = 1 failed, 1229 passed, 392 skipped, 8 xfailed, 63 warnings, 2 rerun in 128.03s (0:02:08) =
2022-08-15T02:43:42.2321172Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:43:42.9548962Z Traceback (most recent call last):
2022-08-15T02:43:42.9549286Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:43:42.9551446Z     main()
2022-08-15T02:43:42.9551639Z   File "test/run_test.py", line 968, in main
2022-08-15T02:43:42.9553959Z     raise RuntimeError(err_message)
2022-08-15T02:43:42.9554218Z RuntimeError: test_ops failed!
2022-08-15T02:43:43.2098369Z 
2022-08-15T02:43:43.2098895Z real	2m14.114s
2022-08-15T02:43:43.2099111Z user	10m51.967s
2022-08-15T02:43:43.2099344Z sys	0m16.351s
2022-08-15T02:43:43.2131447Z ##[error]Process completed with exit code 1.
2022-08-15T02:43:43.2169566Z Prepare all required actions
2022-08-15T02:43:43.2169869Z Getting action download info
2022-08-15T02:43:43.3961681Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:43:43.3961902Z with:
2022-08-15T02:43:43.3962222Z   github-token: ***

pull / linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge) (5/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:42:03.5075311Z RuntimeError: test_ops_gradients failed!

2022-08-15T02:42:02.8786218Z FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_fwgrad_bwgrad_cummax_cpu_float64
2022-08-15T02:42:02.8787781Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:42:02.8788822Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:42:02.8793684Z = 1 failed, 66 passed, 930 skipped, 1 xfailed, 27 warnings, 2 rerun in 34.73s ==
2022-08-15T02:42:02.9055754Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:42:03.5069397Z Traceback (most recent call last):
2022-08-15T02:42:03.5069851Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:42:03.5072266Z     main()
2022-08-15T02:42:03.5072623Z   File "test/run_test.py", line 968, in main
2022-08-15T02:42:03.5074913Z     raise RuntimeError(err_message)
2022-08-15T02:42:03.5075311Z RuntimeError: test_ops_gradients failed!
2022-08-15T02:42:03.8000210Z 
2022-08-15T02:42:03.8000603Z real	0m40.792s
2022-08-15T02:42:03.8000905Z user	2m49.626s
2022-08-15T02:42:03.8002234Z sys	0m11.847s
2022-08-15T02:42:03.8033398Z ##[error]Process completed with exit code 1.
2022-08-15T02:42:03.8068039Z Prepare all required actions
2022-08-15T02:42:03.8068319Z Getting action download info
2022-08-15T02:42:03.9528326Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:42:03.9528545Z with:
2022-08-15T02:42:03.9528851Z   github-token: ***

Lint / lintrunner (6/15)

Step: "Run lintrunner on all files" (full log | diagnosis details)

2022-08-15T02:30:26.3018707Z ##[error]This line has trailing spaces; please remove them.

2022-08-15T02:30:26.2564713Z ##[group]Run # Use jq to massage the JSON lint output into GitHub Actions workflow commands.
2022-08-15T02:30:26.2565692Z �[36;1m# Use jq to massage the JSON lint output into GitHub Actions workflow commands.�[0m
2022-08-15T02:30:26.2566128Z �[36;1mjq --raw-output \�[0m
2022-08-15T02:30:26.2566715Z �[36;1m  '"::\(if .severity == "advice" or .severity == "disabled" then "warning" else .severity end) file=\(.path),line=\(.line),col=\(.char),title=\(.code) \(.name)::" + (.description | gsub("\\n"; "%0A"))' \�[0m
2022-08-15T02:30:26.2567207Z �[36;1m  lint.json�[0m
2022-08-15T02:30:26.2612405Z shell: /usr/bin/bash -e {0}
2022-08-15T02:30:26.2612694Z env:
2022-08-15T02:30:26.2612951Z   pythonLocation: /opt/hostedtoolcache/Python/3.8.13/x64
2022-08-15T02:30:26.2613383Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.13/x64/lib
2022-08-15T02:30:26.2613678Z ##[endgroup]
2022-08-15T02:30:26.3018707Z ##[error]This line has trailing spaces; please remove them.
2022-08-15T02:30:26.3103813Z Post job cleanup.
2022-08-15T02:30:26.3134056Z Post job cleanup.
2022-08-15T02:30:26.4172869Z [command]/usr/bin/git version
2022-08-15T02:30:26.4220802Z git version 2.37.1
2022-08-15T02:30:26.4264122Z Temporarily overriding HOME='/home/runner/work/_temp/232cbb5d-f0f2-488a-b1b4-234c41b48133' before making global git config changes
2022-08-15T02:30:26.4264708Z Adding repository directory to the temporary git global config as a safe directory
2022-08-15T02:30:26.4270855Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/pytorch/pytorch
2022-08-15T02:30:26.4318871Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2022-08-15T02:30:26.4361608Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2022-08-15T02:30:26.4613515Z Entering 'android/libs/fbjni'

pull / win-vs2019-cpu-py3 / build (7/15)

Step: "Build" (full log | diagnosis details)

2022-08-15T02:41:04.0276255Z C:\actions-runner\...error C3861: '__builtin_clz': identifier not found

2022-08-15T02:41:04.0272874Z         with
2022-08-15T02:41:04.0273019Z         [
2022-08-15T02:41:04.0273208Z             _Ty=int64_t,
2022-08-15T02:41:04.0273390Z             K=int64_t,
2022-08-15T02:41:04.0273547Z             V=int64_t
2022-08-15T02:41:04.0273702Z         ]
2022-08-15T02:41:04.0274206Z C:/actions-runner/_work/pytorch/pytorch/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp(711): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::cpu_scatter_add_contig_kernel<scalar_t>(const at::Tensor &,const at::Tensor &,const at::Tensor &)' being compiled
2022-08-15T02:41:04.0274880Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): error C2131: expression did not evaluate to a constant
2022-08-15T02:41:04.0275354Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: failure was caused by a read of a variable outside its lifetime
2022-08-15T02:41:04.0275810Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(147): note: see usage of 'maxthreads'
2022-08-15T02:41:04.0276255Z C:\actions-runner\_work\pytorch\pytorch\aten\src\ATen/native/cpu/radix_sort.h(151): error C3861: '__builtin_clz': identifier not found
2022-08-15T02:41:04.1337508Z [4606/5898] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\core\Dict.cpp.obj
2022-08-15T02:41:04.1509861Z [4607/5898] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\record_function.cpp.obj
2022-08-15T02:41:04.1648793Z [4608/5898] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\core\DeprecatedTypeProperties.cpp.obj
2022-08-15T02:41:04.2841918Z [4609/5898] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\detail\MetaGuardImpl.cpp.obj
2022-08-15T02:41:04.2849279Z [4610/5898] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\detail\ORTHooksInterface.cpp.obj
2022-08-15T02:41:04.2864411Z [4611/5898] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\detail\CUDAHooksInterface.cpp.obj
2022-08-15T02:41:04.3831766Z [4612/5898] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\core\BackendSelectFallbackKernel.cpp.obj
2022-08-15T02:41:04.3832837Z ninja: build stopped: subcommand failed.
2022-08-15T02:41:04.3924820Z -- Building version 1.13.0a0+git97c4f58
2022-08-15T02:41:04.3926162Z cmake -GNinja -DBUILD_ENVIRONMENT=win-vs2019-cpu-py3 -DBUILD_PYTHON=True -DBUILD_TEST=True -DBUILD_TYPE=release -DBUILD_WHEEL=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_GENERATOR=Ninja -DCMAKE_INCLUDE_PATH=C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\mkl\include -DCMAKE_INSTALL_PREFIX=C:\actions-runner\_work\pytorch\pytorch\torch -DCMAKE_PREFIX_PATH=C:\Jenkins\Miniconda3\Lib\site-packages -DNUMPY_INCLUDE_DIR=C:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=C:\Jenkins\Miniconda3\python.exe -DPYTHON_INCLUDE_DIR=C:\Jenkins\Miniconda3\Include -DPYTHON_LIBRARY=C:\Jenkins\Miniconda3/libs/python39.lib -DTORCH_BUILD_VERSION=1.13.0a0+git97c4f58 -DUSE_CUDA=0 -DUSE_NUMPY=True C:\actions-runner\_work\pytorch\pytorch

pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge) (8/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T03:50:10.2653240Z ##[error]Process completed with exit code 1.

2022-08-15T03:50:10.2619031Z Non-cacheable calls                   0
2022-08-15T03:50:10.2619287Z Non-compilation calls                 0
2022-08-15T03:50:10.2619536Z Unsupported compiler calls            0
2022-08-15T03:50:10.2619750Z Average cache write               0.000 s
2022-08-15T03:50:10.2620044Z Average cache read miss           0.000 s
2022-08-15T03:50:10.2620375Z Average cache read hit            0.000 s
2022-08-15T03:50:10.2620599Z Failed distributed compilations       0
2022-08-15T03:50:10.2621208Z Cache location                  S3, bucket: Bucket(name=ossci-compiler-cache-circleci-v2, base_url=http://ossci-compiler-cache-circleci-v2.s3.amazonaws.com/)
2022-08-15T03:50:10.2621639Z + echo ::endgroup::
2022-08-15T03:50:10.2622329Z ##[endgroup]
2022-08-15T03:50:10.2653240Z ##[error]Process completed with exit code 1.
2022-08-15T03:50:10.2735724Z Prepare all required actions
2022-08-15T03:50:10.2736084Z Getting action download info
2022-08-15T03:50:10.4359679Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T03:50:10.4359902Z with:
2022-08-15T03:50:10.4360223Z   github-token: ***
2022-08-15T03:50:10.4360380Z env:
2022-08-15T03:50:10.4360556Z   GIT_DEFAULT_BRANCH: master
2022-08-15T03:50:10.4360742Z ##[endgroup]
2022-08-15T03:50:10.4386364Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a
2022-08-15T03:50:10.4386614Z with:

pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (9/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:43:33.3992242Z RuntimeError: test_ops failed!

2022-08-15T02:43:32.5832049Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-15T02:43:32.5833128Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:43:32.5833782Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:43:32.5840222Z = 1 failed, 1157 passed, 292 skipped, 8 xfailed, 60 warnings, 2 rerun in 112.94s (0:01:52) =
2022-08-15T02:43:32.6043948Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:43:33.3987111Z Traceback (most recent call last):
2022-08-15T02:43:33.3987410Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:43:33.3989251Z     main()
2022-08-15T02:43:33.3989524Z   File "test/run_test.py", line 968, in main
2022-08-15T02:43:33.3991840Z     raise RuntimeError(err_message)
2022-08-15T02:43:33.3992242Z RuntimeError: test_ops failed!
2022-08-15T02:43:33.6811428Z 
2022-08-15T02:43:33.6811816Z real	1m59.622s
2022-08-15T02:43:33.6812106Z user	9m49.432s
2022-08-15T02:43:33.6812365Z sys	0m15.613s
2022-08-15T02:43:33.6843178Z ##[error]Process completed with exit code 1.
2022-08-15T02:43:33.6876899Z Prepare all required actions
2022-08-15T02:43:33.6877195Z Getting action download info
2022-08-15T02:43:33.8991396Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:43:33.8991617Z with:
2022-08-15T02:43:33.8991940Z   github-token: ***

pull / linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge) (10/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:51:43.5862320Z RuntimeError: /var.../jenkins/workspace/functorch/test/test_ops failed!

2022-08-15T02:51:42.2049227Z 
2022-08-15T02:51:42.2049341Z FAILED (errors=61, skipped=2191, expected failures=482)
2022-08-15T02:51:42.2049347Z 
2022-08-15T02:51:42.2049428Z Generating XML reports...
2022-08-15T02:51:43.0504311Z Generated XML report: test-reports/python-unittest/functorch.test.test_ops/TEST-TestOperatorsCPU-20220815024044.xml
2022-08-15T02:51:43.5857779Z Traceback (most recent call last):
2022-08-15T02:51:43.5858066Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:51:43.5860091Z     main()
2022-08-15T02:51:43.5860375Z   File "test/run_test.py", line 968, in main
2022-08-15T02:51:43.5862064Z     raise RuntimeError(err_message)
2022-08-15T02:51:43.5862320Z RuntimeError: /var/lib/jenkins/workspace/functorch/test/test_ops failed!
2022-08-15T02:51:43.8369756Z ##[error]Process completed with exit code 1.
2022-08-15T02:51:43.8405805Z Prepare all required actions
2022-08-15T02:51:43.8406105Z Getting action download info
2022-08-15T02:51:44.0332146Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:51:44.0332366Z with:
2022-08-15T02:51:44.0332697Z   github-token: ***
2022-08-15T02:51:44.0332854Z env:
2022-08-15T02:51:44.0333027Z   GIT_DEFAULT_BRANCH: master
2022-08-15T02:51:44.0333397Z ##[endgroup]
2022-08-15T02:51:44.0358400Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a

pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (11/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:42:01.6514252Z RuntimeError: test_ops_gradients failed!

2022-08-15T02:42:00.9775253Z FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_fwgrad_bwgrad_cummax_cpu_float64
2022-08-15T02:42:00.9775599Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:42:00.9776763Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:42:00.9780699Z = 1 failed, 69 passed, 930 skipped, 1 xfailed, 27 warnings, 2 rerun in 28.39s ==
2022-08-15T02:42:00.9983269Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:42:01.6509954Z Traceback (most recent call last):
2022-08-15T02:42:01.6510281Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:42:01.6511459Z     main()
2022-08-15T02:42:01.6511659Z   File "test/run_test.py", line 968, in main
2022-08-15T02:42:01.6513999Z     raise RuntimeError(err_message)
2022-08-15T02:42:01.6514252Z RuntimeError: test_ops_gradients failed!
2022-08-15T02:42:01.9667748Z 
2022-08-15T02:42:01.9668139Z real	0m34.536s
2022-08-15T02:42:01.9668373Z user	2m4.650s
2022-08-15T02:42:01.9668589Z sys	0m10.410s
2022-08-15T02:42:01.9698682Z ##[error]Process completed with exit code 1.
2022-08-15T02:42:01.9732888Z Prepare all required actions
2022-08-15T02:42:01.9733242Z Getting action download info
2022-08-15T02:42:02.1260834Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:42:02.1261058Z with:
2022-08-15T02:42:02.1261405Z   github-token: ***

pull / linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge) (12/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:42:18.6054707Z RuntimeError: test_ops_gradients failed!

2022-08-15T02:42:17.9579612Z FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_fwgrad_bwgrad_cummax_cpu_float64
2022-08-15T02:42:17.9580989Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:42:17.9583393Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:42:17.9587378Z = 1 failed, 87 passed, 936 skipped, 1 xfailed, 27 warnings, 2 rerun in 37.45s ==
2022-08-15T02:42:17.9803775Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:42:18.6050377Z Traceback (most recent call last):
2022-08-15T02:42:18.6050652Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:42:18.6052185Z     main()
2022-08-15T02:42:18.6052496Z   File "test/run_test.py", line 968, in main
2022-08-15T02:42:18.6054452Z     raise RuntimeError(err_message)
2022-08-15T02:42:18.6054707Z RuntimeError: test_ops_gradients failed!
2022-08-15T02:42:18.9116777Z 
2022-08-15T02:42:18.9117105Z real	0m44.108s
2022-08-15T02:42:18.9117332Z user	2m53.791s
2022-08-15T02:42:18.9117508Z sys	0m11.845s
2022-08-15T02:42:18.9151718Z ##[error]Process completed with exit code 1.
2022-08-15T02:42:18.9186271Z Prepare all required actions
2022-08-15T02:42:18.9186561Z Getting action download info
2022-08-15T02:42:19.1849380Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:42:19.1849606Z with:
2022-08-15T02:42:19.1849948Z   github-token: ***

pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (13/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:40:13.0681579Z RuntimeError: test_ops_gradients failed!

2022-08-15T02:40:12.4476717Z FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_fwgrad_bwgrad_cummax_cpu_float64
2022-08-15T02:40:12.4491926Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:40:12.4496545Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:40:12.4500035Z = 1 failed, 69 passed, 930 skipped, 1 xfailed, 27 warnings, 2 rerun in 25.70s ==
2022-08-15T02:40:12.4761063Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:40:13.0674055Z Traceback (most recent call last):
2022-08-15T02:40:13.0674464Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:40:13.0680504Z     main()
2022-08-15T02:40:13.0680938Z   File "test/run_test.py", line 968, in main
2022-08-15T02:40:13.0681351Z     raise RuntimeError(err_message)
2022-08-15T02:40:13.0681579Z RuntimeError: test_ops_gradients failed!
2022-08-15T02:40:13.3462450Z 
2022-08-15T02:40:13.3462980Z real	0m31.749s
2022-08-15T02:40:13.3463254Z user	1m31.328s
2022-08-15T02:40:13.3463506Z sys	0m4.062s
2022-08-15T02:40:13.3493673Z ##[error]Process completed with exit code 1.
2022-08-15T02:40:13.3528671Z Prepare all required actions
2022-08-15T02:40:13.3528970Z Getting action download info
2022-08-15T02:40:13.6327210Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:40:13.6327428Z with:
2022-08-15T02:40:13.6327741Z   github-token: ***

pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (14/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T04:05:37.2475282Z RuntimeError: test_torch failed!

2022-08-15T04:05:36.6465218Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestTorchDeviceTypeCUDA-20220815040346.xml
2022-08-15T04:05:36.6468547Z Generated XML report: test-reports/python-unittest/test_torch/TEST-TestVitalSignsCudaCUDA-20220815040346.xml
2022-08-15T04:05:37.1199192Z [TORCH_VITAL] Dataloader.enabled		 True
2022-08-15T04:05:37.1199669Z [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
2022-08-15T04:05:37.1200036Z [TORCH_VITAL] CUDA.used		 true
2022-08-15T04:05:37.2470933Z Traceback (most recent call last):
2022-08-15T04:05:37.2471553Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 990, in <module>
2022-08-15T04:05:37.2472406Z     main()
2022-08-15T04:05:37.2472917Z   File "/var/lib/jenkins/workspace/test/run_test.py", line 968, in main
2022-08-15T04:05:37.2474908Z     raise RuntimeError(err_message)
2022-08-15T04:05:37.2475282Z RuntimeError: test_torch failed!
2022-08-15T04:05:37.4966713Z 
2022-08-15T04:05:37.4967430Z real	54m39.668s
2022-08-15T04:05:37.4967904Z user	67m43.025s
2022-08-15T04:05:37.4968140Z sys	5m26.490s
2022-08-15T04:05:37.5006006Z ##[error]Process completed with exit code 1.
2022-08-15T04:05:37.5041946Z Prepare all required actions
2022-08-15T04:05:37.5042372Z Getting action download info
2022-08-15T04:05:37.6730990Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T04:05:37.6731288Z with:
2022-08-15T04:05:37.6731775Z   github-token: ***

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (15/15)

Step: "Test" (full log | diagnosis details)

2022-08-15T02:41:47.5190872Z RuntimeError: test_ops failed!

2022-08-15T02:41:46.7159955Z FAILED test_ops.py::TestCommonCPU::test_noncontiguous_samples_scatter_add_cpu_float32
2022-08-15T02:41:46.7160732Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-08-15T02:41:46.7161264Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-08-15T02:41:46.7166569Z = 1 failed, 1157 passed, 292 skipped, 8 xfailed, 60 warnings, 2 rerun in 115.87s (0:01:55) =
2022-08-15T02:41:46.7428031Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-08-15T02:41:47.5186438Z Traceback (most recent call last):
2022-08-15T02:41:47.5187112Z   File "test/run_test.py", line 990, in <module>
2022-08-15T02:41:47.5188376Z     main()
2022-08-15T02:41:47.5188731Z   File "test/run_test.py", line 968, in main
2022-08-15T02:41:47.5190479Z     raise RuntimeError(err_message)
2022-08-15T02:41:47.5190872Z RuntimeError: test_ops failed!
2022-08-15T02:41:47.8060438Z 
2022-08-15T02:41:47.8060711Z real	2m2.351s
2022-08-15T02:41:47.8061050Z user	7m54.011s
2022-08-15T02:41:47.8061367Z sys	0m4.941s
2022-08-15T02:41:47.8091972Z ##[error]Process completed with exit code 1.
2022-08-15T02:41:47.8125116Z Prepare all required actions
2022-08-15T02:41:47.8125468Z Getting action download info
2022-08-15T02:41:47.9914713Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T02:41:47.9914930Z with:
2022-08-15T02:41:47.9915255Z   github-token: ***

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 1cf90bf Pull Request resolved: #82703

mingfeima · 2022-08-03T05:43:03Z

This PR is effort for optimizing pytorch-geometric performance on CPU, as plan #4891

TODO:

Delete the following items since perf impact is minor

try cpu memory allocator
add logic of can_use_32bit_index
add optimization when index and src are 1d, aka. inner_size is 1

Move the following items to next step

(TBD) extend the current optimization to other reduction type, e.g. max, min, mean
(TBD) extend the current optimization to gather which will be used for training

[ghstack-poisoned]

ghstack-source-id: c28a917 Pull Request resolved: #82703

[ghstack-poisoned]

pytorch-bot · 2022-09-22T05:58:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82703

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

win-vs2019-cpu-py3 / build workflows failing consistently with linker crash

✅ No Failures

As of commit 4f7a242:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` [ghstack-poisoned]

mingfeima · 2022-09-23T02:28:42Z

@rusty1s could you please help review this one?

rusty1s · 2022-09-23T12:41:02Z

Yes, I can take a look ASAP.

rusty1s · 2022-09-26T15:24:04Z

cc @mananshah99

rusty1s

Thanks for this PR. This is really cool :)

The overall implementation makes sense to me. I guess there might exist some threshold on K and index.numel() when spmm is actually preferred.

In addition, we can also think about wrapping this into a more high-level segment_add implementation which assumes sorted indices as input.

aten/src/ATen/native/cpu/ScatterGatherKernel.cpp

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` [ghstack-poisoned]

facebook-github-bot · 2022-10-04T00:44:54Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T00:45:08Z

The committers listed above are authorized under a signed CLA.

✅ login: mingfeima / name: Ma Mingfei (554cc41, 97c4f58, 0837ad6, a400d94, c7803c1, 857dbe1, c734254, 5811283)

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` [ghstack-poisoned]

aten/src/ATen/native/cpu/radix_sort.h

test/test_scatter_gather_ops.py

@VitalyFedyunin

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@VitalyFedyunin

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@VitalyFedyunin

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

aten/src/ATen/native/cpu/radix_sort.h

aten/src/ATen/native/cpu/ScatterGatherKernel.cpp

aten/src/ATen/native/TensorAdvancedIndexing.h

@VitalyFedyunin

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

aten/src/ATen/native/cpu/radix_sort.h

ezyang

just reviewed for the framework side of things, did not review algos

ezyang · 2023-01-10T22:10:59Z

@pytorchbot merge

pytorchmergebot · 2023-01-10T22:12:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-10T22:12:44Z

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team

Raised by workflow job

@VitalyFedyunin

### Motivation of this PR This PR is targeting at improving performance of `scatter_add` for GNN usage scenarios on PyG. Currently only CPU optimizations is covered. `Message Passing` is the major step in GNN learning which means exchanging/aggregating info between nodes. And from the perf point of view, if the `EdgeIndex` is stored as [2, num_edges], `scatter_reduce` would be a major perf hotspot on current pytorch implementations. To be more specific, in the process of message passing, `scatter_add` is used in a very similar way as `index_select`, except that the `self` tensor is written into while `index_select` is only reading. Therefore, the `index` tensor passed to `scatter_add` is an expanded tensor on dim0, which means all the rest of dims would end up with the same value. ### Algorithm Current impl on scatter would do parallel on the inner dims for such case which would cause bad perf: non-contiguous memory access pattern and non-vectorized. This PR did sorting on the `index` to solve the write conflicts if we directly parallel on dim0. The algorithm is equivalent to: * convert memory format from `COO` to `CSR` * do spmm reduce ### Perf improvement The benchmark comes from https://github.com/pyg-team/pytorch_geometric/tree/master/examples, `python reddit.py` which runs model SAGE on dataset reddit. CPU type: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz ` aten::scatter_add_` has been reduced from **37.797s** to **5.989s**: * breakdown before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::scatter_add_ 49.00% 37.797s 49.00% 37.797s 41.445ms 912 aten::index_select 19.74% 15.223s 19.74% 15.227s 6.678ms 2280 aten::linear 0.01% 5.706ms 15.04% 11.602s 12.721ms 912 aten::addmm 6.62% 5.108s 7.92% 6.112s 13.403ms 456 aten::matmul 0.00% 2.339ms 7.10% 5.475s 12.006ms 456 ``` * breakdown after ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::index_select 32.41% 14.677s 32.42% 14.681s 6.439ms 2280 aten::linear 0.01% 6.665ms 26.43% 11.968s 13.123ms 912 aten::addmm 11.76% 5.328s 13.76% 6.232s 13.667ms 456 aten::scatter_add_ 13.22% 5.989s 13.22% 5.989s 6.566ms 912 aten::matmul 0.01% 2.303ms 12.63% 5.720s 12.543ms 456 ``` cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima · 2023-01-11T05:53:25Z

@pytorchbot merge

pytorchmergebot · 2023-01-11T05:55:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

optimize scatte_add performance for gnn usage on CPU

554cc41

[ghstack-poisoned]

facebook-github-bot added the cla signed label Aug 3, 2022

mingfeima added a commit that referenced this pull request Aug 3, 2022

optimize scatte_add performance for gnn usage on CPU

fac5f54

ghstack-source-id: 1cf90bf Pull Request resolved: #82703

mingfeima marked this pull request as draft August 3, 2022 05:41

pytorchbot added the open source label Aug 3, 2022

mingfeima mentioned this pull request Aug 3, 2022

[Roadmap] CPU Performance Optimization for PyG pyg-team/pytorch_geometric#4891

Open

32 tasks

Update on "optimize scatte_add performance for gnn usage on CPU"

97c4f58

[ghstack-poisoned]

mingfeima added a commit that referenced this pull request Aug 15, 2022

optimize scatte_add performance for gnn usage on CPU

20e06fa

ghstack-source-id: c28a917 Pull Request resolved: #82703

mingfeima mentioned this pull request Aug 19, 2022

port sparse_mm.reduce to pytorch and optimize it on CPU #83727

Closed

yanbing-j added the intel This tag is for PR from Intel label Aug 22, 2022

Update on "optimize scatte_add performance for gnn usage on CPU"

0837ad6

[ghstack-poisoned]

mingfeima marked this pull request as ready for review September 22, 2022 05:59

mingfeima added 3 commits September 22, 2022 15:08

Update on "optimize scatte_add performance for gnn usage on CPU"

a400d94

[ghstack-poisoned]

Update on "optimize scatte_add performance for gnn usage on CPU"

c7803c1

[ghstack-poisoned]

rusty1s reviewed Sep 27, 2022

View reviewed changes

mingfeima requested a review from rusty1s September 29, 2022 06:51

zhuhaozhe closed this Oct 20, 2022

zhuhaozhe reopened this Oct 20, 2022

ezyang reviewed Dec 9, 2022

View reviewed changes

aten/src/ATen/native/cpu/radix_sort.h Show resolved Hide resolved

ezyang reviewed Dec 9, 2022

View reviewed changes

test/test_scatter_gather_ops.py Show resolved Hide resolved

mingfeima requested review from ezyang, jgong5 and rusty1s and removed request for rusty1s January 3, 2023 03:25

jgong5 approved these changes Jan 3, 2023

View reviewed changes

ezyang changed the title ~~optimize scatte_add performance for gnn usage on CPU~~ optimize scatter_add performance for gnn usage on CPU Jan 3, 2023

mingfeima added 2 commits January 3, 2023 14:55

malfet reviewed Jan 4, 2023

View reviewed changes

mingfeima requested review from malfet and removed request for rusty1s January 5, 2023 08:54

fzhao3 mentioned this pull request Jan 10, 2023

[PT2.0 Feature Proposal] GNN inference and training optimization on CPU #91951

Closed

ezyang reviewed Jan 10, 2023

View reviewed changes

aten/src/ATen/native/cpu/radix_sort.h Show resolved Hide resolved

ezyang approved these changes Jan 10, 2023

View reviewed changes

cpuhrsch requested a review from mikaylagawarecki January 11, 2023 03:09

pytorchmergebot added the Merged label Jan 11, 2023

pytorchmergebot closed this in 8f5f15a Jan 11, 2023

facebook-github-bot deleted the gh/mingfeima/84/head branch June 8, 2023 18:01

optimize scatter_add performance for gnn usage on CPU #82703

optimize scatter_add performance for gnn usage on CPU #82703

Uh oh!

Conversation

mingfeima commented Aug 3, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation of this PR

Algorithm

Perf improvement

Uh oh!

facebook-github-bot commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 15 New Failures

🕵️ 15 new failures recognized by patterns

pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (1/15)

pull / win-vs2019-cuda11.6-py3 / build (2/15)

pull / linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge) (3/15)

pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge) (4/15)

pull / linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge) (5/15)

Lint / lintrunner (6/15)

pull / win-vs2019-cpu-py3 / build (7/15)

pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge) (8/15)

pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (9/15)

pull / linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge) (10/15)

pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (11/15)

pull / linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge) (12/15)

pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (13/15)

pull / linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (14/15)

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (15/15)

Uh oh!

mingfeima commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/82703

❗ 1 Active SEVs

✅ No Failures

Uh oh!

mingfeima commented Sep 23, 2022

Uh oh!

rusty1s commented Sep 23, 2022

Uh oh!

rusty1s commented Sep 26, 2022

Uh oh!

rusty1s left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jan 10, 2023

Uh oh!

pytorchmergebot commented Jan 10, 2023

mingfeima commented Aug 3, 2022 •

edited by pytorch-bot bot

Loading

facebook-github-bot commented Aug 3, 2022 •

edited

Loading

mingfeima commented Aug 3, 2022 •

edited

Loading

pytorch-bot bot commented Sep 22, 2022 •

edited

Loading

rusty1s left a comment •

edited

Loading

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading