Port ATen and JIT C++ tests to Catch2 #5788

lantiga · 2018-03-14T19:55:40Z

Note: this is a re-submission of #5698, which had an issue with CI.

This PR addresses #5648. In particular, following the discussion at #5648:

it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils
it ports all ATen tests to Catch
it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)

ezyang · 2018-03-15T01:58:18Z

@pytorchbot retest this please

ezyang · 2018-03-15T13:35:40Z

It looks like the CI hook-up for libtorch is still not working. I strongly advise downloading the Docker image and testing off of that.

lantiga · 2018-03-15T13:38:36Z

Yes, I’m doing that now. Sorry for the trial and error

ezyang · 2018-03-15T13:40:45Z

I don't mind :) The advice is mostly for your sanity ;)

lantiga · 2018-03-16T06:26:29Z

Ok, it looks like fails are genuine fails now (Can only index tensors with integral scalars (got CPUDoubleType). Linking issues seem to be gone.

lantiga · 2018-03-16T10:52:42Z

I get a failure in a python multiprocessing test in pr/pytorch-linux-xenial-cuda9-cudnn7-py2, independent from the PR. I'll rerun the tests.

On the other hand, I need a suggestion for the failure in pr/pytorch-linux-xenial-py3-clang5-asan.

Building CXX object contrib/meter/CMakeFiles/test-meter.dir/test/basic.cc.o
09:01:03 src/ATen/test/CMakeFiles/scalar_test.dir/build.make:62: recipe for target 'src/ATen/test/CMakeFiles/scalar_test.dir/scalar_test.cpp.o' failed
09:01:03 g++-5: internal compiler error: Killed (program cc1plus)

It looks like when asan is activated the compiler crashes. Does this ring any bell @ezyang ?

ezyang · 2018-03-16T14:30:40Z

Yes, I know exactly what the ASAN problem is. Look at the bottom of build.sh here:

if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
  export ASAN_OPTIONS=detect_leaks=0:symbolize=1
  # Disable Valgrind tests in run_aten_tests.sh; otherwise
  # we'll be valgrind'ing an ASAN'ed binary!  ASANity.
  export VALGRIND=0

  sudo apt-get update
  sudo apt-get install clang-5.0

  export PATH="/usr/lib/llvm-5.0/bin:$PATH"

  # TODO: Figure out how to avoid hard-coding these paths
  LD_LIBRARY_PATH=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux \
    CC="sccache clang" \
    CXX="sccache clang++" \
    LDSHARED="clang --shared" \
    LDFLAGS="-stdlib=libstdc++" \
    CFLAGS="-fsanitize=address -shared-libasan" \
    NO_CUDA=1 \
    python setup.py install

  export LD_PRELOAD=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so

else
  python setup.py install

fi

To ensure that the build smoketests properly load the ASAN runtime, we are fucking around with LD_PRELOAD. But your code is building with gcc, not clang. That's not right, and linking in clang's ASAN runtime is definitely bad juju.

I think the right thing to do is to make sure libtorch gets built with clang, and with ASAN. That means figuring out how to apply the flags in the invocation above to the cmake invocation, and also doing the build BEFORE we run the LD_PRELOAD. Another way we can make things safer is if we make the LD_PRELOAD opt-in for commands and then manually reapply it in the necessary cases (I think it was necessary in all of the cases, but not sure.)

ezyang · 2018-03-16T14:32:25Z

Re the other test, it does seem like

09:30:37 ======================================================================
09:30:37 FAIL: test_backwards_fork (__main__.TestMultiprocessing)
09:30:37 backwards() should succeed when called before and after a fork
09:30:37 ----------------------------------------------------------------------
09:30:37 Traceback (most recent call last):
09:30:37   File "test_multiprocessing.py", line 436, in test_backwards_fork
09:30:37     self.assertFalse(p.is_alive())
09:30:37 AssertionError: True is not false
09:30:37

has been flaky.

ezyang · 2018-03-16T14:52:33Z

@pytorchbot retest this please

lantiga · 2018-03-16T16:43:06Z

Thank you @ezyang! What I don't get is why the error is showing up on a build that does not involve libtorch, as libtorch is only built on pr/pytorch-linux-xenial-cuda9-cudnn7-py3 and pr/pytorch-linux-trusty-py3.6-gcc7.2. Could it be leftover CMakeCache.txt or the like?

lantiga · 2018-03-17T21:17:31Z

Thanks for the hint @ezyang, I think I finally figured it out.

The crash was coming from gcc during the tools/run_aten_tests.sh step, on the asan build that does not involve libtorch. Note that ATen tests (that are built for all builds except *cuda*) are always built with gcc, even in asan builds.

Since the asan build has an export LD_PRELOAD after the main build, LD_PRELOAD is also seen from within run_aten_tests.sh.
It's always been that way even before this PR, but apparently the fact that we're using catch now in aten tests makes gcc crash. Why wasn't it crashing before? Not sure.

I now unset LD_PRELOAD in a subshell (to avoid having effects downstream) right before executing run_aten_tests.sh:

( unset LD_PRELOAD; time tools/run_aten_tests.sh )

This way the build succeeds and tests pass.

lantiga · 2018-03-17T21:29:04Z

Current failures are unrelated, I tried re-testing but it didn't help. I'll try later.

ezyang · 2018-03-18T04:52:20Z

It's not been a good weekend for CI.

@pytorchbot retest this please

ezyang · 2018-03-18T05:15:27Z

@pytorchbot retest this please

lantiga · 2018-03-19T20:17:23Z

Thanks @ezyang. Just a note: ATen should also get Catch2 as a submodule.

This reverts commit 6f80023.

…simple") moving average" (#5892) * Revert "Port ATen and JIT C++ tests to Catch2 (#5788)" This reverts commit 6f80023. * Revert "Fix error message for cat-ing zero-dim tensors (#5819)" This reverts commit cf2e176. * Revert "Softmax symbolic should account for negative dim (#5846)" This reverts commit ba64724. * Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)" This reverts commit 22ef8e5. * Revert "Don't modify requires_grad when running DataParallel in no_grad mode (#5880)" This reverts commit d11b7fb. * Revert "fix some methods not showing up in doc (#5882)" This reverts commit 24fca0e. * Revert "ReduceOps cleanup and set_num_threads (#5723)" This reverts commit 84400d5. * Revert "introduce shape_as_tensor and reshape_from_variable_shape (#5824)" This reverts commit f446b82. * Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766)" This reverts commit 99b1f6c.

This PR addresses pytorch#5648. In particular, following the discussion at pytorch#5648: - it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils - it ports all ATen tests to Catch - it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)

…simple") moving average" (pytorch#5892) * Revert "Port ATen and JIT C++ tests to Catch2 (pytorch#5788)" This reverts commit 6f80023. * Revert "Fix error message for cat-ing zero-dim tensors (pytorch#5819)" This reverts commit cf2e176. * Revert "Softmax symbolic should account for negative dim (pytorch#5846)" This reverts commit ba64724. * Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (pytorch#5855)" This reverts commit 22ef8e5. * Revert "Don't modify requires_grad when running DataParallel in no_grad mode (pytorch#5880)" This reverts commit d11b7fb. * Revert "fix some methods not showing up in doc (pytorch#5882)" This reverts commit 24fca0e. * Revert "ReduceOps cleanup and set_num_threads (pytorch#5723)" This reverts commit 84400d5. * Revert "introduce shape_as_tensor and reshape_from_variable_shape (pytorch#5824)" This reverts commit f446b82. * Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (pytorch#5766)" This reverts commit 99b1f6c.

This reverts commit 6f80023.

o8ht88z00f mentioned this pull request Mar 14, 2018

[auto] pytorch-pr-5788 onnxbot/onnx-fb-universe#1122

Closed

lantiga force-pushed the catch_tests branch from e816fcd to edde75f Compare March 16, 2018 01:12

lantiga force-pushed the catch_tests branch from d7e9595 to fccc695 Compare March 16, 2018 08:57

lantiga force-pushed the catch_tests branch 3 times, most recently from 2d5a66f to d985c9a Compare March 17, 2018 20:06

lantiga force-pushed the catch_tests branch from d985c9a to ef88fef Compare March 17, 2018 21:18

Port ATen and JIT C++ tests to Catch2

ef88fef

ezyang merged commit 6f80023 into pytorch:master Mar 19, 2018

ezyang added a commit that referenced this pull request Mar 19, 2018

Revert "Port ATen and JIT C++ tests to Catch2 (#5788)"

060adec

This reverts commit 6f80023.

soumith added a commit that referenced this pull request Mar 19, 2018

Revert "Port ATen and JIT C++ tests to Catch2 (#5788)"

8c5fa69

This reverts commit 6f80023.

ezyang added the open source label Jun 24, 2019

wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020

Revert "Port ATen and JIT C++ tests to Catch2 (pytorch#5788)"

86c03c1

This reverts commit 6f80023.

Port ATen and JIT C++ tests to Catch2 #5788

Port ATen and JIT C++ tests to Catch2 #5788

Uh oh!

Conversation

lantiga commented Mar 14, 2018

Uh oh!

ezyang commented Mar 15, 2018

Uh oh!

ezyang commented Mar 15, 2018

Uh oh!

lantiga commented Mar 15, 2018

Uh oh!

ezyang commented Mar 15, 2018

Uh oh!

lantiga commented Mar 16, 2018

Uh oh!

lantiga commented Mar 16, 2018

Uh oh!

ezyang commented Mar 16, 2018

Uh oh!

ezyang commented Mar 16, 2018

Uh oh!

ezyang commented Mar 16, 2018

Uh oh!

lantiga commented Mar 16, 2018

Uh oh!

lantiga commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lantiga commented Mar 17, 2018

Uh oh!

ezyang commented Mar 18, 2018

Uh oh!

ezyang commented Mar 18, 2018

Uh oh!

lantiga commented Mar 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lantiga commented Mar 17, 2018 •

edited

Loading