Skip to content

Conversation

@yf225
Copy link
Contributor

@yf225 yf225 commented Jun 20, 2018

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

This will unblock the addition of GPU tests to Sandcastle.

We need to make sure that the CUDA memory leak checks are still working correctly in CI after this change.

cc @orionr

@yf225 yf225 requested review from ezyang and ssnl June 20, 2018 00:46
test/test_nn.py Outdated
fullname='AdaptiveLogSoftmax'))


def disable_tests_not_in_shard(module, shard):

This comment was marked as off-topic.

test/common.py Outdated
setattr(self, method_name, self.wrap_with_cuda_memory_check(test_method))
# This logic is similar to https://github.com/python/cpython/blob/master/Lib/unittest/case.py#L406-L413
try:
test_method = getattr(self, method_name)

This comment was marked as off-topic.

@yf225
Copy link
Contributor Author

yf225 commented Jun 20, 2018

@pytorchbot retest this please

test/common.py Outdated
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('--seed', type=int, default=1234)
parser.add_argument('--accept', action='store_true')
parser.add_argument('--shard', type=int, required=False)

This comment was marked as off-topic.

This comment was marked as off-topic.

test/test_nn.py Outdated
torch.double: 1e-5,
torch.half: 1e-2}

NUM_SHARDS = 30

This comment was marked as off-topic.

This comment was marked as off-topic.

if __name__ == '__main__':
options = parse_args()
if options.shard is not None:
def load_tests(loader, tests, pattern):

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix as many of the CR comments you can do easily, and then merge this please.

@yf225 yf225 merged commit d6c873a into pytorch:master Jun 20, 2018
petrex pushed a commit to petrex/pytorch that referenced this pull request Jun 20, 2018
* upstream/master: (92 commits)
  more formatting (pytorch#8701)
  Fix pytorch#8692 (pytorch#8699)
  Create captured inputs recursively for loop to resolve loop-carried dependencies across nested blocks (pytorch#8345)
  Shard test_nn to reduce runtime for each test target (pytorch#8678)
  Create at::tensor (pytorch#8475)
  Clarify mp note about sharing a tensor's grad field. (pytorch#8688)
  Add owner rule for cpp_extension.py (pytorch#8700)
  fix formatting in :math: in fold docstring (pytorch#8696)
  Some 0-sized dimension support, port catArray away from resizeLegacy. (pytorch#8666)
  Implement flatten function (pytorch#8578)
  Created Tensor::to functions (pytorch#8643)
  Add a warning in gradcheck if inputs precision < float64 (pytorch#8663)
  Fix parsing of floating point defaults in python_arg_parser (pytorch#8681)
  Export ProcessGroupGloo options to Python (pytorch#8664)
  Fix build error in pybind_state_ideep (pytorch#8684)
  Compatibility: write nDimension/_nDimension corresponding to dim()/_dim(). (pytorch#8676)
  Improve win-build.sh for local build (pytorch#8674)
  don't do unnecessary copies for bernoulli_ (pytorch#8682)
  Use parallel if get_num_threads 0 (pytorch#8677)
  Fix serialization for Parameters (pytorch#8633)
  ...
petrex pushed a commit to petrex/pytorch that referenced this pull request Jun 21, 2018
* Shard test_nn to reduce runtime for each test target

* Use load_tests for selecting tests to enable

* fix lint

* Use arg parser from common.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants