[WIP] max_pool2d without indices optimization [CPU] #43267

heitorschueroff · 2020-08-19T15:18:09Z

Stack from ghstack:

[WIP] max_pool2d without indices optimization [CPU] #43267 [WIP] max_pool2d without indices optimization [CPU]

This PR implements a version of max_pool2d that doesn't compute indices when it's not needed. It also makes some optimizations that will be carried over to other pooling functions in future PRs.

Benchmarking:

Tensor Parameters

BATCH = 10
CHANNEL = 16
HEIGHT = 2048
WIDTH = 2048
DTYPE = torch.float32
DEVICE = "cpu"

Pooling Parameters

KERNEL_SIZE = 2
STRIDE = None
PADDING = 0
DILATION = 1
CEIL_MODE = False

Results (time in ms) (speedup factor)

test_max_pool2d: 118.4793 (1.0)
test_mkldnn_max_pool2d: 360.2836 (3.04)
test_max_pool2d_with_indices: 626.9831 (5.29)

Discussion

The new implementation is on average 2~3 times faster than mkldnn and 5x faster than with_indices. The original with_indices code only parallelized over batches and channels, so if these numbers were low it wouldn't achieve optimal parallelism.

This algorithm also reduces duplicate comparisons in the case of overlapping kernel windows. For instance, if we change the pooling parameters above to:

KERNEL_SIZE = 4
STRIDE = 1
PADDING = 1
DILATION = 2
CEIL_MODE = True

Results (time in ms) (speedup factor)

test_max_pool2d: 136.4228 (1.0)
test_mkldnn_max_pool2d: 608.4158 (4.46)
test_max_pool2d_with_indices: 1,230.1916 (9.02)

There is also an issue with the existing pooling implementations that they use nested at::parallel_for loops and as such only the outer most loop is parallelized since at::parallel_for does not support nesting.

Differential Revision: D23273406

closes #28733

[ghstack-poisoned]

ghstack-source-id: a629e14 Pull Request resolved: #43267

dr-ci · 2020-08-19T15:19:27Z

💊 CI failures summary and remediations

As of commit 42a5360 (more details on the Dr. CI page):

16/19 failures possibly* introduced in this PR
- 1/16 non-CircleCI failure(s)
3/19 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

🕵️ 15 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build (1/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 26 21:02:55 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack'

Aug 26 21:02:47 1 warning generated. 
Aug 26 21:02:50 [1171/1550] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LinearAlgebra.cpp.o 
Aug 26 21:02:52 [1172/1550] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL.cpp.o 
Aug 26 21:02:54 [1173/1550] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL2d.cpp.o 
Aug 26 21:02:55 [1174/1550] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o 
Aug 26 21:02:55 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o  
-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -c /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp 
Aug 26 21:02:55 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:101:7: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:02:55   if (xnnpack::use_max_pool2d( 
Aug 26 21:02:55       ^ 
Aug 26 21:02:55 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:02:55     return xnnpack::max_pool2d( 
Aug 26 21:02:55            ^ 
Aug 26 21:02:55 2 errors generated. 
Aug 26 21:02:58 [1175/1550] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxUnpooling.cpp.o 
Aug 26 21:02:59 [1176/1550] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Memory.cpp.o 
Aug 26 21:02:59 ninja: build stopped: subcommand failed.

pytorch_windows_vs2019_py36_cpu_build (2/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant

Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

X -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\DepthwiseConvKernel.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\DepthwiseConvKernel.cpp.DEFAULT.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.obj  
GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: failure was caused by a read of a variable outside its lifetime
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: see usage of 'p'
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(113): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::max_pool2d_kernel<scalar_t>(scalar_t *,const scalar_t *const ,const int64_t,const int64_t,const at::native::PoolingParams &)' being compiled
        with
        [
            scalar_t=scalar_t
        ]
Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_vulkan_x86_32_build (3/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 26 21:03:20 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack'

Aug 26 21:03:18 /var/lib/jenkins/workspace/aten/src/ATen/native/LossMultiLabelMargin.cpp:170:15: warning: unused variable 'c' [-Wunused-variable] 
Aug 26 21:03:18   CheckedFrom c = "multilabel_margin_loss_backward_out_frame"; 
Aug 26 21:03:18               ^ 
Aug 26 21:03:18 1 warning generated. 
Aug 26 21:03:20 [1177/1561] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o 
Aug 26 21:03:20 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o  
-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -c /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp 
Aug 26 21:03:20 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:101:7: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:03:20   if (xnnpack::use_max_pool2d( 
Aug 26 21:03:20       ^ 
Aug 26 21:03:20 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:03:20     return xnnpack::max_pool2d( 
Aug 26 21:03:20            ^ 
Aug 26 21:03:20 2 errors generated. 
Aug 26 21:03:22 [1178/1561] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL2d.cpp.o 
Aug 26 21:03:25 [1179/1561] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxUnpooling.cpp.o 
Aug 26 21:03:25 ninja: build stopped: subcommand failed.

binary_windows_libtorch_3_7_cpu_release_build (4/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant

[1161/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\event.cc.obj 
[1162/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\operators\concat_split_op.cc.obj 
[1163/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\operators\conditional_op.cc.obj 
[1164/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\Activation.cpp.DEFAULT.cpp.obj 
[1165/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\module.cc.obj 
[1166/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\graph.cc.obj 
[1167/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\operators\communicator_op.cc.obj 
[1168/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj 
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.obj  
GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: failure was caused by a read of a variable outside its lifetime
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: see usage of 'p'
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(113): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::max_pool2d_kernel<scalar_t>(scalar_t *,const scalar_t *const ,const int64_t,const int64_t,const at::native::PoolingParams &)' being compiled
        with
        [
            scalar_t=scalar_t
        ]
[1169/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\export_c10_op_to_caffe2.cc.obj 
[1170/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\int8_serialization.cc.obj 
[1171/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\init.cc.obj

pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single (5/15)

Step: "pytorch android gradle custom build single architecture (for PR)" (full log | diagnosis details | 🔁 rerun)

Aug 26 21:02:26 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack'

Aug 26 21:02:21 ../../aten/src/ATen/native/Math.h:371:9: warning: unused function 'abs_impl' [-Wunused-function] 
Aug 26 21:02:21 uint8_t abs_impl(uint8_t v) { 
Aug 26 21:02:21         ^ 
Aug 26 21:02:21 1 warning generated. 
Aug 26 21:02:26 [989/1375] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o 
Aug 26 21:02:26 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o  
-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -c /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp 
Aug 26 21:02:26 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:101:7: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:02:26   if (xnnpack::use_max_pool2d( 
Aug 26 21:02:26       ^ 
Aug 26 21:02:26 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:02:26     return xnnpack::max_pool2d( 
Aug 26 21:02:26            ^ 
Aug 26 21:02:26 2 errors generated. 
Aug 26 21:02:27 [990/1375] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossMultiLabelMargin.cpp.o 
Aug 26 21:02:27 /var/lib/jenkins/workspace/aten/src/ATen/native/LossMultiLabelMargin.cpp:170:15: warning: unused variable 'c' [-Wunused-variable] 
Aug 26 21:02:27   CheckedFrom c = "multilabel_margin_loss_backward_out_frame"; 
Aug 26 21:02:27               ^ 
Aug 26 21:02:27 1 warning generated. 
Aug 26 21:02:29 [991/1375] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL.cpp.o 
Aug 26 21:02:30 [992/1375] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LinearAlgebra.cpp.o

pytorch_linux_xenial_py3_clang5_mobile_custom_build_dynamic (6/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 26 21:01:14 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack'

Aug 26 21:01:14 [1208/1584] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o 
Aug 26 21:01:14 FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o  
-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o -c /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp 
Aug 26 21:01:14 In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:3: 
Aug 26 21:01:14 ../../../aten/src/ATen/native/Pool.h:59:17: warning: unused variable 'nOutputPlane' [-Wunused-variable] 
Aug 26 21:01:14   const int64_t nOutputPlane = nInputPlane; 
Aug 26 21:01:14                 ^ 
Aug 26 21:01:14 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:101:7: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:01:14   if (xnnpack::use_max_pool2d( 
Aug 26 21:01:14       ^ 
Aug 26 21:01:14 /var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack' 
Aug 26 21:01:14     return xnnpack::max_pool2d( 
Aug 26 21:01:14            ^ 
Aug 26 21:01:14 1 warning and 2 errors generated. 
Aug 26 21:01:16 [1209/1584] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Memory.cpp.o 
Aug 26 21:01:17 [1210/1584] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxUnpooling.cpp.o 
Aug 26 21:01:18 [1211/1584] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MetaTensor.cpp.o 
Aug 26 21:01:18 ninja: build stopped: subcommand failed. 
Aug 26 21:01:18 + sccache_epilogue 
Aug 26 21:01:18 + echo '=================== sccache compilation log ===================' 
Aug 26 21:01:18 + python /var/lib/jenkins/workspace/.jenkins/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log

pytorch_linux_xenial_py3_clang5_mobile_build (7/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Tmp/CheckSymbolExists.c: In function \'main\':\n/var/lib/jenkins/workspace/build_test_custom_build/build_default_libtorch/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: \'strtod_l\' undeclared (first use in this function)\n   return ((int*)(&strtod_l))[argc];\n                   ^\n/var/lib/jenkins/workspace/build_test_custom_build/build_default_libtorch/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: note: each undeclared identifier is reported only once for each function it appears in\n" }

Aug 26 20:55:05 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o] Error 1 
Aug 26 20:55:05 make[2]: *** Waiting for unfinished jobs.... 
Aug 26 20:55:06 make[1]: *** [caffe2/CMakeFiles/torch_cpu.dir/all] Error 2 
Aug 26 20:55:06 CMakeFiles/Makefile2:1020: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/all' failed 
Aug 26 20:55:06 make: *** [all] Error 2 
Aug 26 20:55:06 Makefile:138: recipe for target 'all' failed 
Aug 26 20:55:06 + sccache_epilogue 
Aug 26 20:55:06 + echo '=================== sccache compilation log ===================' 
Aug 26 20:55:06 + python /var/lib/jenkins/workspace/.jenkins/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 
Aug 26 20:55:06 =================== sccache compilation log =================== 
mp/CheckSymbolExists.c: In function \'main\':\n/var/lib/jenkins/workspace/build_test_custom_build/build_default_libtorch/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: \'strtod_l\' undeclared (first use in this function)\n   return ((int*)(&strtod_l))[argc];\n                   ^\n/var/lib/jenkins/workspace/build_test_custom_build/build_default_libtorch/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: note: each undeclared identifier is reported only once for each function it appears in\n" } 
Aug 26 20:55:06  
u{1b}[01;31m\u{1b}[Kerror: \u{1b}[m\u{1b}[K\'\u{1b}[01m\u{1b}[Kxnnpack\u{1b}[m\u{1b}[K\' has not been declared\n   if (xnnpack::use_max_pool2d(\n\u{1b}[01;32m\u{1b}[K       ^\u{1b}[m\u{1b}[K\n\u{1b}[01m\u{1b}[K/var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12:\u{1b}[m\u{1b}[K \u{1b}[01;31m\u{1b}[Kerror: \u{1b}[m\u{1b}[K\'\u{1b}[01m\u{1b}[Kxnnpack\u{1b}[m\u{1b}[K\' has not been declared\n     return xnnpack::max_pool2d(\n\u{1b}[01;32m\u{1b}[K            ^\u{1b}[m\u{1b}[K\n" } 
Aug 26 20:55:06  
Aug 26 20:55:06 + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 
Aug 26 20:55:06 + sccache --show-stats 
Aug 26 20:55:06 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 26 20:55:06 Compile requests              1501 
Aug 26 20:55:06 Compile requests executed     1236 
Aug 26 20:55:06 Cache hits                    1231 
Aug 26 20:55:06 Cache misses                     0

pytorch_windows_vs2019_py36_cuda10.1_build (8/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant

Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\CopyKernel.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\CopyKernel.cpp.DEFAULT.cpp 
Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.obj  
GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: failure was caused by a read of a variable outside its lifetime
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: see usage of 'p'
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(113): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::max_pool2d_kernel<scalar_t>(scalar_t *,const scalar_t *const ,const int64_t,const int64_t,const at::native::PoolingParams &)' being compiled
        with
        [
            scalar_t=scalar_t
        ]
Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

pytorch_linux_xenial_py3_6_gcc5_4_test (9/15)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 26 21:59:24 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" }

Aug 26 21:59:24 Traceback (most recent call last): 
Aug 26 21:59:24   File "test/run_test.py", line 721, in <module> 
Aug 26 21:59:24     main() 
Aug 26 21:59:24   File "test/run_test.py", line 710, in main 
Aug 26 21:59:24     raise RuntimeError(err) 
Aug 26 21:59:24 RuntimeError: test_quantization failed! 
Aug 26 21:59:24 + cleanup 
Aug 26 21:59:24 + retcode=1 
Aug 26 21:59:24 + set +x 
Aug 26 21:59:24 =================== sccache compilation log =================== 
Aug 26 21:59:24 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
Aug 26 21:59:24  
Aug 26 21:59:24 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 26 21:59:24 Compile requests                 65 
Aug 26 21:59:24 Compile requests executed        35 
Aug 26 21:59:24 Cache hits                       27 
Aug 26 21:59:24 Cache misses                      7 
Aug 26 21:59:24 Cache timeouts                    0 
Aug 26 21:59:24 Cache read errors                 0 
Aug 26 21:59:24 Forced recaches                   0 
Aug 26 21:59:24 Cache write errors                0

pytorch_linux_xenial_py3_clang5_asan_test2 (10/15)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 26 21:21:22 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in

Aug 26 21:21:22     #7 0x5564847df7eb in PyEval_EvalCode /tmp/build/80754af9/python_1588903631989/work/Python/ceval.c:731 
Aug 26 21:21:22     #8 0x55648485fe73 in run_mod /tmp/build/80754af9/python_1588903631989/work/Python/pythonrun.c:1025 
Aug 26 21:21:22     #9 0x55648485ff0c in PyRun_StringFlags /tmp/build/80754af9/python_1588903631989/work/Python/pythonrun.c:949 
Aug 26 21:21:22     #10 0x55648485ff6e in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1588903631989/work/Python/pythonrun.c:445 
Aug 26 21:21:22     #11 0x556484863d72 in run_command /tmp/build/80754af9/python_1588903631989/work/Modules/main.c:301 
Aug 26 21:21:22     #12 0x556484863d72 in Py_Main /tmp/build/80754af9/python_1588903631989/work/Modules/main.c:749 
Aug 26 21:21:22     #13 0x55648472df2d in main /tmp/build/80754af9/python_1588903631989/work/Programs/python.c:69 
Aug 26 21:21:22     #14 0x7fa0086dd83f in __libc_start_main /build/glibc-e6zv40/glibc-2.23/csu/../csu/libc-start.c:291 
Aug 26 21:21:22     #15 0x55648480d27e in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103 
Aug 26 21:21:22  
Aug 26 21:21:22 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in  
Aug 26 21:21:22 + retcode=1 
Aug 26 21:21:22 + set -e 
Aug 26 21:21:22 + return 1 
Aug 26 21:21:22 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]] 
Aug 26 21:21:22 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]] 
Aug 26 21:21:22 + '[' -n https://github.com/pytorch/pytorch/pull/43267 ']' 
Aug 26 21:21:22 ++ mktemp 
Aug 26 21:21:22 + DETERMINE_FROM=/tmp/tmp.6txZLPEpzc 
Aug 26 21:21:22 + file_diff_from_base /tmp/tmp.6txZLPEpzc 
Aug 26 21:21:22 + set +e

pytorch_windows_vs2019_py36_cuda11.0_build (11/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant

Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

R -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\core\init_omp.cc.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c ..\caffe2\core\init_omp.cc 
Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.obj  
GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: failure was caused by a read of a variable outside its lifetime
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: see usage of 'p'
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(113): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::max_pool2d_kernel<scalar_t>(scalar_t *,const scalar_t *const ,const int64_t,const int64_t,const at::native::PoolingParams &)' being compiled
        with
        [
            scalar_t=scalar_t
        ]
Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

pytorch_linux_bionic_py3_6_clang9_test (12/15)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 26 21:49:08 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" }

Aug 26 21:49:08     raise RuntimeError(err) 
Aug 26 21:49:08 RuntimeError: test_quantization failed! 
Aug 26 21:49:08  
Aug 26 21:49:08 real	26m19.089s 
Aug 26 21:49:08 user	44m24.798s 
Aug 26 21:49:08 sys	2m32.915s 
Aug 26 21:49:08 + cleanup 
Aug 26 21:49:08 + retcode=1 
Aug 26 21:49:08 + set +x 
Aug 26 21:49:08 =================== sccache compilation log =================== 
Aug 26 21:49:08 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
Aug 26 21:49:08  
Aug 26 21:49:08 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 26 21:49:08 Compile requests                 65 
Aug 26 21:49:08 Compile requests executed        35 
Aug 26 21:49:08 Cache hits                       27 
Aug 26 21:49:08 Cache misses                      7 
Aug 26 21:49:08 Cache timeouts                    0 
Aug 26 21:49:08 Cache read errors                 0 
Aug 26 21:49:08 Forced recaches                   0 
Aug 26 21:49:08 Cache write errors                0

pytorch_ios_11_2_1_x86_64_build (13/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 26 20:55:16 /Users/distiller/project/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack'

Aug 26 20:55:10               ^ 
Aug 26 20:55:12 1 warning generated. 
Aug 26 20:55:12 [ 74%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL.cpp.o 
Aug 26 20:55:12 [ 74%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL2d.cpp.o 
Aug 26 20:55:13 2 warnings generated. 
Aug 26 20:55:13 [ 74%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o 
Aug 26 20:55:13 [ 74%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxUnpooling.cpp.o 
Aug 26 20:55:16 /Users/distiller/project/aten/src/ATen/native/MaxPooling.cpp:101:7: error: use of undeclared identifier 'xnnpack' 
Aug 26 20:55:16   if (xnnpack::use_max_pool2d( 
Aug 26 20:55:16       ^ 
Aug 26 20:55:16 /Users/distiller/project/aten/src/ATen/native/MaxPooling.cpp:103:12: error: use of undeclared identifier 'xnnpack' 
Aug 26 20:55:16     return xnnpack::max_pool2d( 
Aug 26 20:55:16            ^ 
Aug 26 20:55:17 2 errors generated. 
Aug 26 20:55:17 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o] Error 1 
Aug 26 20:55:17 make[2]: *** Waiting for unfinished jobs.... 
Aug 26 20:55:19 make[1]: *** [caffe2/CMakeFiles/torch_cpu.dir/all] Error 2 
Aug 26 20:55:19 make: *** [all] Error 2

binary_windows_libtorch_3_7_cpu_debug_build (14/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant

[1155/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\blob_stats.cc.obj 
[1156/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\context.cc.obj 
[1157/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\event.cc.obj 
[1158/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\context_base.cc.obj 
[1159/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\Activation.cpp.DEFAULT.cpp.obj 
[1160/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\db.cc.obj 
[1161/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\init_denormals.cc.obj 
[1162/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj 
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.obj  
GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /Z7 /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -openmp:experimental -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /fp:strict  /DCPU_CAPABILITY=DEFAULT /DCPU_CAPABILITY_DEFAULT /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp 
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): error C2131: expression did not evaluate to a constant
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: failure was caused by a read of a variable outside its lifetime
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(36): note: see usage of 'p'
aten\src\ATen\native\cpu\MaxPooling.cpp.DEFAULT.cpp(113): note: see reference to function template instantiation 'void at::native::`anonymous-namespace'::max_pool2d_kernel<scalar_t>(scalar_t *,const scalar_t *const ,const int64_t,const int64_t,const at::native::PoolingParams &)' being compiled
        with
        [
            scalar_t=scalar_t
        ]
[1163/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\graph.cc.obj 
[1164/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\init.cc.obj 
[1165/2287] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\core\export_c10_op_to_caffe2.cc.obj

pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (15/15)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

bolExists.c: In function \'main\':\n/var/lib/jenkins/workspace/build_test_custom_build/build_custom_libtorch_static/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: \'strtod_l\' undeclared (first use in this function)\n   return ((int*)(&strtod_l))[argc];\n                   ^\n/var/lib/jenkins/workspace/build_test_custom_build/build_custom_libtorch_static/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: note: each undeclared identifier is reported only once for each function it appears in\n" }

Aug 26 21:00:28 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o] Error 1 
Aug 26 21:00:28 make[2]: *** Waiting for unfinished jobs.... 
Aug 26 21:00:33 CMakeFiles/Makefile2:1020: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/all' failed 
Aug 26 21:00:33 make[1]: *** [caffe2/CMakeFiles/torch_cpu.dir/all] Error 2 
Aug 26 21:00:33 make: *** [all] Error 2 
Aug 26 21:00:33 Makefile:138: recipe for target 'all' failed 
Aug 26 21:00:33 =================== sccache compilation log =================== 
Aug 26 21:00:33 + sccache_epilogue 
Aug 26 21:00:33 + echo '=================== sccache compilation log ===================' 
Aug 26 21:00:33 + python /var/lib/jenkins/workspace/.jenkins/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 
olExists.c: In function \'main\':\n/var/lib/jenkins/workspace/build_test_custom_build/build_custom_libtorch_static/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: \'strtod_l\' undeclared (first use in this function)\n   return ((int*)(&strtod_l))[argc];\n                   ^\n/var/lib/jenkins/workspace/build_test_custom_build/build_custom_libtorch_static/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: note: each undeclared identifier is reported only once for each function it appears in\n" } 
Aug 26 21:00:33  
u{1b}[01;31m\u{1b}[Kerror: \u{1b}[m\u{1b}[K\'\u{1b}[01m\u{1b}[Kxnnpack\u{1b}[m\u{1b}[K\' has not been declared\n   if (xnnpack::use_max_pool2d(\n\u{1b}[01;32m\u{1b}[K       ^\u{1b}[m\u{1b}[K\n\u{1b}[01m\u{1b}[K/var/lib/jenkins/workspace/aten/src/ATen/native/MaxPooling.cpp:103:12:\u{1b}[m\u{1b}[K \u{1b}[01;31m\u{1b}[Kerror: \u{1b}[m\u{1b}[K\'\u{1b}[01m\u{1b}[Kxnnpack\u{1b}[m\u{1b}[K\' has not been declared\n     return xnnpack::max_pool2d(\n\u{1b}[01;32m\u{1b}[K            ^\u{1b}[m\u{1b}[K\n" } 
Aug 26 21:00:33  
Aug 26 21:00:33 + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 
Aug 26 21:00:33 + sccache --show-stats 
Aug 26 21:00:33 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 26 21:00:33 Compile requests              1477 
Aug 26 21:00:33 Compile requests executed     1224 
Aug 26 21:00:33 Cache hits                       1 
Aug 26 21:00:33 Cache misses                  1218

❄️ 3 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Aug 26 22:16:36 ConnectionResetError: [Errno 104] Connection reset by peer

Aug 26 22:16:36   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 456, in accept 
Aug 26 22:16:36     answer_challenge(c, self._authkey) 
Aug 26 22:16:36   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge 
Aug 26 22:16:36     message = connection.recv_bytes(256)         # reject large message 
Aug 26 22:16:36   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes 
Aug 26 22:16:36     buf = self._recv_bytes(maxlength) 
Aug 26 22:16:36   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes 
Aug 26 22:16:36     buf = self._recv(4) 
Aug 26 22:16:36   File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv 
Aug 26 22:16:36     chunk = read(handle, remaining) 
Aug 26 22:16:36 ConnectionResetError: [Errno 104] Connection reset by peer 
Aug 26 22:16:36 /opt/conda/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown 
Aug 26 22:16:36   len(cache)) 
Aug 26 22:16:39 Process ErrorTrackingProcess-152: 
Aug 26 22:16:39 Traceback (most recent call last): 
Aug 26 22:16:39   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap 
Aug 26 22:16:39     self.run() 
Aug 26 22:16:39   File "/var/lib/jenkins/workspace/test/test_dataloader.py", line 361, in run 
Aug 26 22:16:39     super(ErrorTrackingProcess, self).run() 
Aug 26 22:16:39   File "/opt/conda/lib/python3.6/multiprocessing/process.py", line 93, in run 
Aug 26 22:16:39     self._target(*self._args, **self._kwargs)

pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Aug 26 21:56:53 hypothesis.errors.Flaky: Hypothesis test_max_pool2d_nhwc(self=, X=(array([[[[1., 1., 1.],

Aug 26 21:56:53   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/hypothesis/core.py", line 1116, in wrapped_test 
Aug 26 21:56:53     raise the_error_hypothesis_found 
Aug 26 21:56:53   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/hypothesis/core.py", line 1071, in wrapped_test 
Aug 26 21:56:53     state.run_engine() 
Aug 26 21:56:53   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/hypothesis/core.py", line 789, in run_engine 
Aug 26 21:56:53     info.__expected_traceback, 
Aug 26 21:56:53   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/hypothesis/core.py", line 656, in execute_once 
Aug 26 21:56:53     % (test.__name__, text_repr[0]) 
Aug 26 21:56:53   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/hypothesis/core.py", line 856, in __flaky 
Aug 26 21:56:53     raise Flaky(message) 
Aug 26 21:56:53 hypothesis.errors.Flaky: Hypothesis test_max_pool2d_nhwc(self=<quantization.test_quantized_op.TestQuantizedOps testMethod=test_max_pool2d_nhwc>, X=(array([[[[1., 1., 1.], 
Aug 26 21:56:53           [1., 1., 1.], 
Aug 26 21:56:53           [1., 1., 1.]]]], dtype=float32), (1.0, 0, torch.quint8)), kernel=3, stride=None, dilation=1, padding=0, ceil_mode=False) produces unreliable results: Falsified on the first call but did not on a subsequent one 
Aug 26 21:56:53  
Aug 26 21:56:53 ---------------------------------------------------------------------- 
Aug 26 21:56:53 Ran 331 tests in 517.519s 
Aug 26 21:56:53  
Aug 26 21:56:53 FAILED (errors=1, skipped=13) 
Aug 26 21:56:53  
Aug 26 21:56:53 Generating XML reports... 
Aug 26 21:56:53 Generated XML report: test-reports/dist-gloo/TEST-quantization.test_bias_correction.TestBiasCorrection-20200826214816.xml

pytorch_linux_bionic_py3_8_gcc9_test (3/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Aug 26 21:53:25 hypothesis.errors.Flaky: Hypothesis test_max_pool2d_nhwc(self=, X=(array([[[[7., 7., 7., ..., 7., 7., 7.],

Aug 26 21:53:25   File "/var/lib/jenkins/.local/lib/python3.8/site-packages/hypothesis/core.py", line 1116, in wrapped_test 
Aug 26 21:53:25     raise the_error_hypothesis_found 
Aug 26 21:53:25   File "/var/lib/jenkins/.local/lib/python3.8/site-packages/hypothesis/core.py", line 1071, in wrapped_test 
Aug 26 21:53:25     state.run_engine() 
Aug 26 21:53:25   File "/var/lib/jenkins/.local/lib/python3.8/site-packages/hypothesis/core.py", line 783, in run_engine 
Aug 26 21:53:25     self.execute_once( 
Aug 26 21:53:25   File "/var/lib/jenkins/.local/lib/python3.8/site-packages/hypothesis/core.py", line 651, in execute_once 
Aug 26 21:53:25     self.__flaky( 
Aug 26 21:53:25   File "/var/lib/jenkins/.local/lib/python3.8/site-packages/hypothesis/core.py", line 856, in __flaky 
Aug 26 21:53:25     raise Flaky(message) 
Aug 26 21:53:25 hypothesis.errors.Flaky: Hypothesis test_max_pool2d_nhwc(self=<quantization.test_quantized_op.TestQuantizedOps testMethod=test_max_pool2d_nhwc>, X=(array([[[[7., 7., 7., ..., 7., 7., 7.], 
Aug 26 21:53:25           [7., 7., 7., ..., 7., 7., 7.], 
Aug 26 21:53:25           [7., 7., 7., ..., 7., 7., 7.], 
Aug 26 21:53:25           ..., 
Aug 26 21:53:25           [7., 7., 7., ..., 7., 7., 7.], 
Aug 26 21:53:25           [7., 7., 7., ..., 7., 7., 7.], 
Aug 26 21:53:25           [7., 7., 7., ..., 7., 7., 7.]], 
Aug 26 21:53:25   
Aug 26 21:53:25          [[7., 7., 7., ..., 7., 7., 7.], 
Aug 26 21:53:25           [7., 7., 7., ..., 7., 7., 7.], 
Aug 26 21:53:25           [7., 7., 7., ..., 7., 7., 7.],

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 80 times.

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

ghstack-source-id: a58ad9a Pull Request resolved: #43267

glaringlee

This is overall a great algo and easy to expand to 3d.
Please see my comments.

aten/src/ATen/native/Pooling.cpp

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

ghstack-source-id: 25fa3da Pull Request resolved: #43267

glaringlee

Some minor comments. Please rebase the code, and then I will approve it.

aten/src/ATen/native/Pooling.cpp

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

glaringlee

@heitorschueroff
LGTM to me now except the std::max error.
where is your benchmark?

aten/src/ATen/native/Pooling.cpp

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

This PR implements a version of max_pool2d that doesn't compute indices when it's not needed. It also makes some optimizations that will be carried over to other pooling functions in future PRs. ## Benchmarking: #### Tensor Parameters BATCH = 10 CHANNEL = 16 HEIGHT = 2048 WIDTH = 2048 DTYPE = torch.float32 DEVICE = "cpu" #### Pooling Parameters KERNEL_SIZE = 2 STRIDE = None PADDING = 0 DILATION = 1 CEIL_MODE = False #### Results (time in ms) (speedup factor) test_max_pool2d: 110.0176 (1.0) test_mkldnn_max_pool2d: 378.5602 (3.44) test_max_pool2d_with_indices: 626.6335 (5.70) ## Discussion The new implementation is on average 2~4 times faster than mkldnn and >5x faster than with_indices. The original with_indices code only parallelized over batches and channels, so if these numbers were low it wouldn't achieve optimal parallelism. This algorithm also reduces duplicate comparisons in the case of overlapping kernel windows. For instance, if we change the pooling parameters above to: KERNEL_SIZE = 4 STRIDE = 2 PADDING = 2 DILATION = 1 CEIL_MODE = True #### Results (time in ms) (speedup factor) test_max_pool2d: 136.4228 (1.0) test_mkldnn_max_pool2d: 608.4158 (4.46) test_max_pool2d_with_indices: 1,230.1916 (9.02) [ghstack-poisoned]

ghstack-source-id: 8dcefd5 Pull Request resolved: #43267

This PR implements a version of max_pool2d that doesn't compute indices when it's not needed. It also makes some optimizations that will be carried over to other pooling functions in future PRs. ## Benchmarking: #### Tensor Parameters BATCH = 10 CHANNEL = 16 HEIGHT = 2048 WIDTH = 2048 DTYPE = torch.float32 DEVICE = "cpu" #### Pooling Parameters KERNEL_SIZE = 2 STRIDE = None PADDING = 0 DILATION = 1 CEIL_MODE = False #### Results (time in ms) (speedup factor) test_max_pool2d: 119.0151 (1.0) test_mkldnn_max_pool2d: 287.4994 (2.42) test_max_pool2d_with_indices: 639.1541 (5.37) ## Discussion The new implementation is on average 2~4 times faster than mkldnn and >5x faster than with_indices. The original with_indices code only parallelized over batches and channels, so if these numbers were low it wouldn't achieve optimal parallelism. This algorithm also reduces duplicate comparisons in the case of overlapping kernel windows. For instance, if we change the pooling parameters above to: KERNEL_SIZE = 4 STRIDE = 2 PADDING = 2 DILATION = 1 CEIL_MODE = True #### Results (time in ms) (speedup factor) test_max_pool2d: 136.4228 (1.0) test_mkldnn_max_pool2d: 608.4158 (4.46) test_max_pool2d_with_indices: 1,230.1916 (9.02) Differential Revision: [D23273406](https://our.internmc.facebook.com/intern/diff/D23273406) closes #28733 [ghstack-poisoned]

ghstack-source-id: 43ba50d Pull Request resolved: #43267

glaringlee

LGTM, approving. Please rebase and import to phabricator.

This PR implements a version of max_pool2d that doesn't compute indices when it's not needed. It also makes some optimizations that will be carried over to other pooling functions in future PRs. ## Benchmarking: #### Tensor Parameters BATCH = 10 CHANNEL = 16 HEIGHT = 2048 WIDTH = 2048 DTYPE = torch.float32 DEVICE = "cpu" #### Pooling Parameters KERNEL_SIZE = 2 STRIDE = None PADDING = 0 DILATION = 1 CEIL_MODE = False #### Results (time in ms) (speedup factor) test_max_pool2d: 119.0151 (1.0) test_mkldnn_max_pool2d: 287.4994 (2.42) test_max_pool2d_with_indices: 639.1541 (5.37) ## Discussion The new implementation is on average 2~4 times faster than mkldnn and >5x faster than with_indices. The original with_indices code only parallelized over batches and channels, so if these numbers were low it wouldn't achieve optimal parallelism. This algorithm also reduces duplicate comparisons in the case of overlapping kernel windows. For instance, if we change the pooling parameters above to: KERNEL_SIZE = 4 STRIDE = 2 PADDING = 2 DILATION = 1 CEIL_MODE = True #### Results (time in ms) (speedup factor) test_max_pool2d: 136.4228 (1.0) test_mkldnn_max_pool2d: 608.4158 (4.46) test_max_pool2d_with_indices: 1,230.1916 (9.02) Differential Revision: [D23273406](https://our.internmc.facebook.com/intern/diff/D23273406) closes #28733 [ghstack-poisoned]

ghstack-source-id: 3d2ec4b Pull Request resolved: #43267

This PR implements a version of max_pool2d that doesn't compute indices when it's not needed. It also makes some optimizations that will be carried over to other pooling functions in future PRs. ## Benchmarking: #### Tensor Parameters BATCH = 10 CHANNEL = 16 HEIGHT = 2048 WIDTH = 2048 DTYPE = torch.float32 DEVICE = "cpu" #### Pooling Parameters KERNEL_SIZE = 2 STRIDE = None PADDING = 0 DILATION = 1 CEIL_MODE = False #### Results (time in ms) (speedup factor) test_max_pool2d: 119.0151 (1.0) test_mkldnn_max_pool2d: 287.4994 (2.42) test_max_pool2d_with_indices: 639.1541 (5.37) ## Discussion The new implementation is on average 2~4 times faster than mkldnn and >5x faster than with_indices. The original with_indices code only parallelized over batches and channels, so if these numbers were low it wouldn't achieve optimal parallelism. This algorithm also reduces duplicate comparisons in the case of overlapping kernel windows. For instance, if we change the pooling parameters above to: KERNEL_SIZE = 4 STRIDE = 2 PADDING = 2 DILATION = 1 CEIL_MODE = True #### Results (time in ms) (speedup factor) test_max_pool2d: 136.4228 (1.0) test_mkldnn_max_pool2d: 608.4158 (4.46) test_max_pool2d_with_indices: 1,230.1916 (9.02) Differential Revision: [D23273406](https://our.internmc.facebook.com/intern/diff/D23273406) closes #28733 [ghstack-poisoned]

ghstack-source-id: fbb5c17 Pull Request resolved: #43267

gchanan · 2020-08-24T22:01:59Z

test/test_nn.py

        helper(10, 512, 31, 31, 3, stride=2)
        helper(1, 129, 8, 8, 3, stride=2)

-    @onlyCUDA


why do you think this was onlyCUDA before? Isn't your test (on CPU) going to run the same thing twice and check it's the same? That's fine I guess?

The original purpose of this onlyCUDA was to test CUDA implementation with CPU impl as a reference. I would also suggest that we keep onlyCUDA there, otherwise there would be a duplicate CPU-CPU comparison.

In order to have a "purely CPU" test, we can have a few hard-coded pooling input and results.

gchanan · 2020-08-24T22:03:08Z

test/test_nn.py

-        def helper(n, c, h, w, ks):
-            x = torch.randn(n, c, h, w, device='cuda', dtype=torch.float, requires_grad=True)
+        def helper(n, c, h, w, ks, requires_grad):
+            x = torch.randn(n, c, h, w, device=device, dtype=torch.float, requires_grad=requires_grad)


not your code, but the line below -- does the detach actually do anything? I also think x.to('cpu', copy=True).requires_grad_ captures the intent more clearly.

x.to('cpu', copy=True).requires_grad_() is returning None for some reason.

or rather calling .grad on the returning Tensor is None

test/test_nn.py

mruberry · 2020-08-25T01:13:11Z

aten/src/ATen/native/Pooling.cpp

  }
 #endif
-  auto output_and_indices = at::max_pool2d_with_indices(
+  if (self.requires_grad() || self.device() != at::kCPU) {


What's up with the gradient check here? Maybe another TODO?

If we require grad then we need to compute indices for backward pass.

mruberry · 2020-08-25T01:13:57Z

test/test_nn.py

-
-            y = pool(x)
-            ref_y = pool(ref_x)
+            pool = torch.nn.MaxPool2d(kernel_size=ks, return_indices=True)


Would you elaborate on this change? In particular:

In the original return_indices was not set and it default to False.

Doesn't your change only affect the return_indices = False codepath?

glaringlee

@heitorschueroff
I put this back to [WIP] since you will add with_indices part as well for this PR. feel free to remove [WIP] once you are ready.

mruberry · 2020-08-25T01:41:45Z

Would you post your benchmark script? cc @ngimel for perf, too. Maybe a couple more sizes as a sanity check?

inception_v3: batch x 64 x 147 x 147
googlenet: batch x 64 x 112 x 112

cross with params for inception v3 (kernel size 3, stride 2), googlenet (kernel size 3, stride 2, ceil mode True) and ResNet (kernel size 3, stride, padding 1)

Are there tests that the other options to maxpool2d are working correctly? Like padding, ceil mode, stride, and dilation?

This PR implements a version of max_pool2d that doesn't compute indices when it's not needed. It also makes some optimizations that will be carried over to other pooling functions in future PRs. ## Benchmarking: #### Tensor Parameters BATCH = 10 CHANNEL = 16 HEIGHT = 2048 WIDTH = 2048 DTYPE = torch.float32 DEVICE = "cpu" #### Pooling Parameters KERNEL_SIZE = 2 STRIDE = None PADDING = 0 DILATION = 1 CEIL_MODE = False #### Results (time in ms) (speedup factor) test_max_pool2d: 118.4793 (1.0) test_mkldnn_max_pool2d: 360.2836 (3.04) test_max_pool2d_with_indices: 626.9831 (5.29) ## Discussion The new implementation is on average 2~3 times faster than mkldnn and 5x faster than with_indices. The original with_indices code only parallelized over batches and channels, so if these numbers were low it wouldn't achieve optimal parallelism. This algorithm also reduces duplicate comparisons in the case of overlapping kernel windows. For instance, if we change the pooling parameters above to: KERNEL_SIZE = 4 STRIDE = 1 PADDING = 1 DILATION = 2 CEIL_MODE = True #### Results (time in ms) (speedup factor) test_max_pool2d: 136.4228 (1.0) test_mkldnn_max_pool2d: 608.4158 (4.46) test_max_pool2d_with_indices: 1,230.1916 (9.02) There is also an issue with the existing pooling implementations that they use nested at::parallel_for loops and as such only the outer most loop is parallelized since at::parallel_for does not support nesting. Differential Revision: [D23273406](https://our.internmc.facebook.com/intern/diff/D23273406) closes #28733 [ghstack-poisoned]

ghstack-source-id: cdfe7fd Pull Request resolved: #43267

xwang233

Besides your changes to test_max_pool2d, there is also another test_max_pool2d_indices. Would you mind combine the two tests together, rather than modifying the current test_max_pool2d to another duplicates of "test_another_max_pool2d_indices"? Thanks!

pytorch/test/test_nn.py

Lines 9868 to 9889 in 42a5360

    
           @onlyCUDA 
        
           def test_max_pool2d_indices(self, device): 
        
               def helper(n, c, h, w, ks): 
        
                   if n is None: 
        
                       x = torch.randn(c, h, w, device='cuda', dtype=torch.float, requires_grad=True) 
        
                   else: 
        
                       x = torch.randn(n, c, h, w, device='cuda', dtype=torch.float, requires_grad=True) 
        
                   ref_x = x.detach().clone().cpu().requires_grad_() 
        
                   pool = torch.nn.MaxPool2d(kernel_size=ks, return_indices=True) 
        
                   y, idx = pool(x) 
        
                   ref_y, ref_idx = pool(ref_x) 
        
                   y.sum().backward() 
        
                   ref_y.sum().backward() 
        
                   self.assertEqual(y, ref_y) 
        
                   self.assertEqual(idx, ref_idx)  # assertEqual implicitly compares shape for tensors 
        
                   self.assertEqual(x.grad, ref_x.grad)

This is part of a larger effort to refactor and optimize the pooling code. Previously I started working on MaxPool2d here #43267 but since it uses MaxPool1d as a subroutine, it made more sense to work on 1D first and get it tested and optimized and then move up to 2D and then 3D. TODO: I'll add some bigger tests and some early benchmarking code and results here. [ghstack-poisoned]

This is part of a larger effort to refactor and optimize the pooling code. Previously I started working on MaxPool2d here #43267 but since it uses MaxPool1d as a subroutine, it made more sense to work on 1D first and get it tested and optimized and then move up to 2D and then 3D. Below are some benchmarking results, the python script I used is under the results. ## Benchmarking ``` Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_googlenet[(3, 2, 0, 1, 0)-new] 79.7659 (1.03) 1,059.6327 (5.32) 90.6280 (1.01) 19.1196 (1.41) 84.2176 (1.01) 2.4289 (1.0) 1079;2818 11.0341 (0.99) 9055 1 test_googlenet[(3, 2, 0, 1, 0)-old] 505.1531 (6.55) 830.8962 (4.17) 563.4763 (6.29) 65.3974 (4.81) 538.3361 (6.43) 80.5371 (33.16) 242;99 1.7747 (0.16) 1742 1 test_googlenet[(3, 2, 0, 1, 1)-new] 80.2949 (1.04) 233.0020 (1.17) 97.6498 (1.09) 19.1228 (1.41) 89.2282 (1.07) 18.5743 (7.65) 1858;741 10.2407 (0.92) 9587 1 test_googlenet[(3, 2, 0, 1, 1)-old] 513.5350 (6.66) 977.4677 (4.91) 594.4559 (6.63) 69.9372 (5.15) 577.9080 (6.90) 79.8218 (32.86) 503;84 1.6822 (0.15) 1675 1 test_googlenet[(3, 2, 1, 1, 0)-new] 77.1061 (1.0) 199.1168 (1.0) 89.6529 (1.0) 13.5864 (1.0) 83.7557 (1.0) 7.5139 (3.09) 1419;1556 11.1541 (1.0) 7434 1 test_googlenet[(3, 2, 1, 1, 0)-old] 543.6055 (7.05) 964.5708 (4.84) 636.9867 (7.11) 84.0732 (6.19) 616.7777 (7.36) 100.4562 (41.36) 434;65 1.5699 (0.14) 1552 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_inception[(3, 2, 0, 1, 0)-new] 84.5827 (1.00) 184.2827 (1.0) 90.5438 (1.01) 9.6324 (1.0) 89.3027 (1.05) 4.5672 (1.03) 637;759 11.0444 (0.99) 6274 1 test_inception[(3, 2, 0, 1, 0)-old] 641.2268 (7.59) 1,704.8977 (9.25) 686.9383 (7.65) 57.2499 (5.94) 682.5905 (8.01) 58.3753 (13.17) 86;21 1.4557 (0.13) 802 1 test_inception[(3, 2, 0, 1, 1)-new] 84.5008 (1.0) 1,093.6335 (5.93) 89.8233 (1.0) 14.0443 (1.46) 85.2682 (1.0) 4.4331 (1.0) 802;1106 11.1330 (1.0) 9190 1 test_inception[(3, 2, 0, 1, 1)-old] 643.7078 (7.62) 851.4188 (4.62) 687.4905 (7.65) 41.1116 (4.27) 685.1386 (8.04) 60.2733 (13.60) 286;14 1.4546 (0.13) 1300 1 test_inception[(3, 2, 1, 1, 0)-new] 106.0739 (1.26) 258.5649 (1.40) 115.3597 (1.28) 17.5436 (1.82) 106.9643 (1.25) 5.5470 (1.25) 894;1402 8.6685 (0.78) 7635 1 test_inception[(3, 2, 1, 1, 0)-old] 651.0504 (7.70) 955.2278 (5.18) 698.0295 (7.77) 45.5097 (4.72) 692.8109 (8.13) 64.6794 (14.59) 145;15 1.4326 (0.13) 909 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_batch_size[new] 2.9608 (1.0) 5.1127 (1.0) 3.3096 (1.0) 0.1936 (1.0) 3.3131 (1.0) 0.2093 (1.0) 71;6 302.1515 (1.0) 297 1 test_large_batch_size[old] 130.6583 (44.13) 152.9521 (29.92) 137.1385 (41.44) 7.4352 (38.40) 135.1784 (40.80) 5.1358 (24.53) 1;1 7.2919 (0.02) 7 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_channel_size[new] 2.9696 (1.0) 5.5595 (1.0) 3.5997 (1.0) 0.5836 (1.0) 3.3497 (1.0) 0.3445 (1.0) 58;54 277.8014 (1.0) 277 1 test_large_channel_size[old] 19.6838 (6.63) 22.6637 (4.08) 21.1775 (5.88) 0.8610 (1.48) 21.3739 (6.38) 1.4930 (4.33) 13;0 47.2199 (0.17) 36 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_width[new] 1.7714 (1.0) 2.4104 (1.0) 1.8988 (1.0) 0.0767 (1.0) 1.8911 (1.0) 0.0885 (1.0) 86;13 526.6454 (1.0) 373 1 test_large_width[old] 19.5708 (11.05) 22.8755 (9.49) 20.7987 (10.95) 0.7009 (9.14) 20.6623 (10.93) 0.8584 (9.70) 14;1 48.0799 (0.09) 46 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_multithreaded[new] 15.0560 (1.0) 24.2891 (1.0) 16.1627 (1.0) 1.5657 (1.0) 15.7182 (1.0) 0.7598 (1.0) 4;6 61.8709 (1.0) 65 1 test_multithreaded[old] 115.7614 (7.69) 120.9670 (4.98) 118.3004 (7.32) 1.6259 (1.04) 118.4164 (7.53) 1.9613 (2.58) 2;0 8.4531 (0.14) 8 1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean ``` ### Benchmarking script To run the benchmark make sure you have pytest-benchmark installed with `pip install pytest-benchmark` and use the following command: `pytest benchmark.py --benchmark-sort='name'` ``` import torch import pytest def _test_speedup(benchmark, batches=1, channels=32, width=32, kernel_size=2, stride=None, padding=0, dilation=1, ceil_mode=False, return_indices=False): torch.set_num_threads(1) x = torch.randn((batches, channels, width)) model = torch.nn.MaxPool1d(kernel_size, stride, padding, dilation, return_indices, ceil_mode) benchmark(model, x) @pytest.mark.benchmark(group="inception") @pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) @pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_inception(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 147, *params, return_indices=return_indices) @pytest.mark.benchmark(group="googlenet") @pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) @pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_googlenet(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 112, *params, return_indices=return_indices) @pytest.mark.benchmark(group="large batch size") @pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_batch_size(benchmark, return_indices): _test_speedup(benchmark, 100000, 1, 32, return_indices=return_indices) @pytest.mark.benchmark(group="large channel size") @pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_channel_size(benchmark, return_indices): _test_speedup(benchmark, 1, 100000, 32, return_indices=return_indices) @pytest.mark.benchmark(group="large width") @pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_width(benchmark, return_indices): _test_speedup(benchmark, 1, 32, 100000, return_indices=return_indices) @pytest.mark.benchmark(group="multithreading") @pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_multithreaded(benchmark, return_indices): x = torch.randn((40, 10000, 32)) model = torch.nn.MaxPool1d(2, return_indices=return_indices) benchmark(model, x) ``` ## Discussion The new algorithm is on average 7x faster than the old one. But because the old algorithm had many issues with how it parallelized the code and made use of the cache, one can come up with input parameters (like large batch size) that will make the new algorithm much faster than the original one. [ghstack-poisoned]

Summary: Pull Request resolved: #43745 This is part of a larger effort to refactor and optimize the pooling code. Previously I started working on MaxPool2d here #43267 but since it uses MaxPool1d as a subroutine, it made more sense to work on 1D first and get it tested and optimized and then move up to 2D and then 3D. Below are some benchmarking results, the python script I used is under the results. ## Benchmarking ``` Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_googlenet[(3, 2, 0, 1, 0)-new] 79.7659 (1.03) 1,059.6327 (5.32) 90.6280 (1.01) 19.1196 (1.41) 84.2176 (1.01) 2.4289 (1.0) 1079;2818 11.0341 (0.99) 9055 1 test_googlenet[(3, 2, 0, 1, 0)-old] 505.1531 (6.55) 830.8962 (4.17) 563.4763 (6.29) 65.3974 (4.81) 538.3361 (6.43) 80.5371 (33.16) 242;99 1.7747 (0.16) 1742 1 test_googlenet[(3, 2, 0, 1, 1)-new] 80.2949 (1.04) 233.0020 (1.17) 97.6498 (1.09) 19.1228 (1.41) 89.2282 (1.07) 18.5743 (7.65) 1858;741 10.2407 (0.92) 9587 1 test_googlenet[(3, 2, 0, 1, 1)-old] 513.5350 (6.66) 977.4677 (4.91) 594.4559 (6.63) 69.9372 (5.15) 577.9080 (6.90) 79.8218 (32.86) 503;84 1.6822 (0.15) 1675 1 test_googlenet[(3, 2, 1, 1, 0)-new] 77.1061 (1.0) 199.1168 (1.0) 89.6529 (1.0) 13.5864 (1.0) 83.7557 (1.0) 7.5139 (3.09) 1419;1556 11.1541 (1.0) 7434 1 test_googlenet[(3, 2, 1, 1, 0)-old] 543.6055 (7.05) 964.5708 (4.84) 636.9867 (7.11) 84.0732 (6.19) 616.7777 (7.36) 100.4562 (41.36) 434;65 1.5699 (0.14) 1552 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_inception[(3, 2, 0, 1, 0)-new] 84.5827 (1.00) 184.2827 (1.0) 90.5438 (1.01) 9.6324 (1.0) 89.3027 (1.05) 4.5672 (1.03) 637;759 11.0444 (0.99) 6274 1 test_inception[(3, 2, 0, 1, 0)-old] 641.2268 (7.59) 1,704.8977 (9.25) 686.9383 (7.65) 57.2499 (5.94) 682.5905 (8.01) 58.3753 (13.17) 86;21 1.4557 (0.13) 802 1 test_inception[(3, 2, 0, 1, 1)-new] 84.5008 (1.0) 1,093.6335 (5.93) 89.8233 (1.0) 14.0443 (1.46) 85.2682 (1.0) 4.4331 (1.0) 802;1106 11.1330 (1.0) 9190 1 test_inception[(3, 2, 0, 1, 1)-old] 643.7078 (7.62) 851.4188 (4.62) 687.4905 (7.65) 41.1116 (4.27) 685.1386 (8.04) 60.2733 (13.60) 286;14 1.4546 (0.13) 1300 1 test_inception[(3, 2, 1, 1, 0)-new] 106.0739 (1.26) 258.5649 (1.40) 115.3597 (1.28) 17.5436 (1.82) 106.9643 (1.25) 5.5470 (1.25) 894;1402 8.6685 (0.78) 7635 1 test_inception[(3, 2, 1, 1, 0)-old] 651.0504 (7.70) 955.2278 (5.18) 698.0295 (7.77) 45.5097 (4.72) 692.8109 (8.13) 64.6794 (14.59) 145;15 1.4326 (0.13) 909 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_batch_size[new] 2.9608 (1.0) 5.1127 (1.0) 3.3096 (1.0) 0.1936 (1.0) 3.3131 (1.0) 0.2093 (1.0) 71;6 302.1515 (1.0) 297 1 test_large_batch_size[old] 130.6583 (44.13) 152.9521 (29.92) 137.1385 (41.44) 7.4352 (38.40) 135.1784 (40.80) 5.1358 (24.53) 1;1 7.2919 (0.02) 7 1 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_channel_size[new] 2.9696 (1.0) 5.5595 (1.0) 3.5997 (1.0) 0.5836 (1.0) 3.3497 (1.0) 0.3445 (1.0) 58;54 277.8014 (1.0) 277 1 test_large_channel_size[old] 19.6838 (6.63) 22.6637 (4.08) 21.1775 (5.88) 0.8610 (1.48) 21.3739 (6.38) 1.4930 (4.33) 13;0 47.2199 (0.17) 36 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_large_width[new] 1.7714 (1.0) 2.4104 (1.0) 1.8988 (1.0) 0.0767 (1.0) 1.8911 (1.0) 0.0885 (1.0) 86;13 526.6454 (1.0) 373 1 test_large_width[old] 19.5708 (11.05) 22.8755 (9.49) 20.7987 (10.95) 0.7009 (9.14) 20.6623 (10.93) 0.8584 (9.70) 14;1 48.0799 (0.09) 46 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_multithreaded[new] 15.0560 (1.0) 24.2891 (1.0) 16.1627 (1.0) 1.5657 (1.0) 15.7182 (1.0) 0.7598 (1.0) 4;6 61.8709 (1.0) 65 1 test_multithreaded[old] 115.7614 (7.69) 120.9670 (4.98) 118.3004 (7.32) 1.6259 (1.04) 118.4164 (7.53) 1.9613 (2.58) 2;0 8.4531 (0.14) 8 1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean ``` ### Benchmarking script To run the benchmark make sure you have pytest-benchmark installed with `pip install pytest-benchmark` and use the following command: `pytest benchmark.py --benchmark-sort='name'` ``` import torch import pytest def _test_speedup(benchmark, batches=1, channels=32, width=32, kernel_size=2, stride=None, padding=0, dilation=1, ceil_mode=False, return_indices=False): torch.set_num_threads(1) x = torch.randn((batches, channels, width)) model = torch.nn.MaxPool1d(kernel_size, stride, padding, dilation, return_indices, ceil_mode) benchmark(model, x) pytest.mark.benchmark(group="inception") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_inception(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 147, *params, return_indices=return_indices) pytest.mark.benchmark(group="googlenet") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) pytest.mark.parametrize("params", [(3, 2), (3, 2, 0, 1, True), (3, 2, 1)], ids=["(3, 2, 0, 1, 0)", "(3, 2, 0, 1, 1)", "(3, 2, 1, 1, 0)"]) def test_googlenet(benchmark, params, return_indices): _test_speedup(benchmark, 10, 64, 112, *params, return_indices=return_indices) pytest.mark.benchmark(group="large batch size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_batch_size(benchmark, return_indices): _test_speedup(benchmark, 100000, 1, 32, return_indices=return_indices) pytest.mark.benchmark(group="large channel size") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_channel_size(benchmark, return_indices): _test_speedup(benchmark, 1, 100000, 32, return_indices=return_indices) pytest.mark.benchmark(group="large width") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_large_width(benchmark, return_indices): _test_speedup(benchmark, 1, 32, 100000, return_indices=return_indices) pytest.mark.benchmark(group="multithreading") pytest.mark.parametrize("return_indices", [True, False], ids=["old", "new"]) def test_multithreaded(benchmark, return_indices): x = torch.randn((40, 10000, 32)) model = torch.nn.MaxPool1d(2, return_indices=return_indices) benchmark(model, x) ``` ## Discussion The new algorithm is on average 7x faster than the old one. But because the old algorithm had many issues with how it parallelized the code and made use of the cache, one can come up with input parameters (like large batch size) that will make the new algorithm much faster than the original one. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23425348 Pulled By: heitorschueroff fbshipit-source-id: 3fa3f9b8e71200da48424a95510124a83f50d7b2

Draft version of max_pool2d without indices optimization

fbbd074

[ghstack-poisoned]

heitorschueroff added a commit that referenced this pull request Aug 19, 2020

Draft version of max_pool2d without indices optimization

b191a64

ghstack-source-id: a629e14 Pull Request resolved: #43267

heitorschueroff changed the title ~~Draft version of max_pool2d without indices optimization~~ [WIP] max_pool2d without indices optimization Aug 19, 2020

heitorschueroff marked this pull request as draft August 19, 2020 15:18

Update on "[WIP] max_pool2d without indices optimization"

d373bcf

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

heitorschueroff added a commit that referenced this pull request Aug 19, 2020

Draft version of max_pool2d without indices optimization

b5c3bda

ghstack-source-id: a58ad9a Pull Request resolved: #43267

glaringlee requested changes Aug 20, 2020

View reviewed changes

Update on "[WIP] max_pool2d without indices optimization"

32cf045

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

heitorschueroff requested a review from glaringlee August 20, 2020 22:27

Update on "[WIP] max_pool2d without indices optimization"

901594b

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

Update on "[WIP] max_pool2d without indices optimization"

64d867a

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

heitorschueroff added a commit that referenced this pull request Aug 21, 2020

Implementation of max_pool2d without indices

62b6864

ghstack-source-id: 25fa3da Pull Request resolved: #43267

glaringlee changed the title ~~[WIP] max_pool2d without indices optimization~~ max_pool2d without indices optimization [CPU] Aug 21, 2020

glaringlee requested changes Aug 21, 2020

View reviewed changes

aten/src/ATen/native/Pooling.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/Pooling.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/Pooling.cpp Outdated Show resolved Hide resolved

Update on "max_pool2d without indices optimization [CPU]"

b9641dd

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

heitorschueroff marked this pull request as ready for review August 21, 2020 19:11

heitorschueroff requested a review from glaringlee August 21, 2020 19:11

glaringlee reviewed Aug 21, 2020

View reviewed changes

aten/src/ATen/native/Pooling.cpp Outdated Show resolved Hide resolved

glaringlee requested changes Aug 21, 2020

View reviewed changes

aten/src/ATen/native/Pooling.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/Pooling.cpp Outdated Show resolved Hide resolved

Update on "max_pool2d without indices optimization [CPU]"

ce0060a

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

Update on "max_pool2d without indices optimization [CPU]"

14c8bd2

Included benchmark file for reference. Will remove on final PR. [ghstack-poisoned]

heitorschueroff added a commit that referenced this pull request Aug 21, 2020

Implementation of max_pool2d without indices

7658e3d

ghstack-source-id: 8dcefd5 Pull Request resolved: #43267

heitorschueroff linked an issue Aug 21, 2020 that may be closed by this pull request

max_pool2d always compute indices even when it's not required #41099

Open

heitorschueroff added a commit that referenced this pull request Aug 22, 2020

Implementation of max_pool2d without indices

5adbba0

ghstack-source-id: 43ba50d Pull Request resolved: #43267

glaringlee approved these changes Aug 24, 2020

View reviewed changes

heitorschueroff added a commit that referenced this pull request Aug 24, 2020

Implementation of max_pool2d without indices

ccce112

ghstack-source-id: 3d2ec4b Pull Request resolved: #43267

heitorschueroff added a commit that referenced this pull request Aug 24, 2020

Implementation of max_pool2d without indices

a1bfb44

ghstack-source-id: fbb5c17 Pull Request resolved: #43267

heitorschueroff requested a review from mruberry August 24, 2020 20:10

gchanan reviewed Aug 24, 2020

View reviewed changes

test/test_nn.py Show resolved Hide resolved

mruberry reviewed Aug 25, 2020

View reviewed changes

glaringlee changed the title ~~max_pool2d without indices optimization [CPU]~~ [WIP] max_pool2d without indices optimization [CPU] Aug 25, 2020

glaringlee requested changes Aug 25, 2020

View reviewed changes

heitorschueroff added a commit that referenced this pull request Aug 26, 2020

Implementation of max_pool2d without indices

53170c6

ghstack-source-id: cdfe7fd Pull Request resolved: #43267

heitorschueroff marked this pull request as draft August 26, 2020 21:16

xwang233 reviewed Aug 27, 2020

View reviewed changes

heitorschueroff mentioned this pull request Aug 27, 2020

MaxPool1d without indices optimization #43745

Closed

heitorschueroff closed this Sep 10, 2020

facebook-github-bot deleted the gh/heitorschueroff/5/head branch October 11, 2020 14:18

	@onlyCUDA
	def test_max_pool2d_indices(self, device):
	def helper(n, c, h, w, ks):
	if n is None:
	x = torch.randn(c, h, w, device='cuda', dtype=torch.float, requires_grad=True)
	else:
	x = torch.randn(n, c, h, w, device='cuda', dtype=torch.float, requires_grad=True)

	ref_x = x.detach().clone().cpu().requires_grad_()

	pool = torch.nn.MaxPool2d(kernel_size=ks, return_indices=True)

	y, idx = pool(x)
	ref_y, ref_idx = pool(ref_x)

	y.sum().backward()
	ref_y.sum().backward()

	self.assertEqual(y, ref_y)
	self.assertEqual(idx, ref_idx) # assertEqual implicitly compares shape for tensors
	self.assertEqual(x.grad, ref_x.grad)

[WIP] max_pool2d without indices optimization [CPU] #43267

[WIP] max_pool2d without indices optimization [CPU] #43267

Uh oh!

Conversation

heitorschueroff commented Aug 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking:

Tensor Parameters

Pooling Parameters

Results (time in ms) (speedup factor)

Discussion

Results (time in ms) (speedup factor)

Uh oh!

dr-ci bot commented Aug 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 15 new failures recognized by patterns

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build (1/15)

pytorch_windows_vs2019_py36_cpu_build (2/15)

pytorch_linux_xenial_py3_clang5_android_ndk_r19c_vulkan_x86_32_build (3/15)

binary_windows_libtorch_3_7_cpu_release_build (4/15)

pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single (5/15)

pytorch_linux_xenial_py3_clang5_mobile_custom_build_dynamic (6/15)

pytorch_linux_xenial_py3_clang5_mobile_build (7/15)

pytorch_windows_vs2019_py36_cuda10.1_build (8/15)

pytorch_linux_xenial_py3_6_gcc5_4_test (9/15)

pytorch_linux_xenial_py3_clang5_asan_test2 (10/15)

pytorch_windows_vs2019_py36_cuda11.0_build (11/15)

pytorch_linux_bionic_py3_6_clang9_test (12/15)

pytorch_ios_11_2_1_x86_64_build (13/15)

binary_windows_libtorch_3_7_cpu_debug_build (14/15)

pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (15/15)

❄️ 3 failures tentatively classified as flaky

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/3)

pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (2/3)

pytorch_linux_bionic_py3_8_gcc9_test (3/3)

ci.pytorch.org: 1 failed

Uh oh!

glaringlee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glaringlee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glaringlee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glaringlee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

heitorschueroff commented Aug 19, 2020 •

edited

Loading

dr-ci bot commented Aug 19, 2020 •

edited

Loading

glaringlee left a comment •

edited

Loading

xwang233 left a comment •

edited

Loading