Skip to content

Conversation

@eellison
Copy link
Contributor

@eellison eellison commented Mar 2, 2022

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 2, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/2a4bb83cff85ed2cdb4c53f6f0fb3c68fdf2c641/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 2, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 715b8b6 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-10T03:04:18.9373585Z RuntimeError: tens...OK() (UNKNOWN: Could not start gRPC server vs. OK)
2022-03-10T03:04:18.9368582Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 315, in _setup_replication
2022-03-10T03:04:18.9368995Z     device = xm.xla_device()
2022-03-10T03:04:18.9369689Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 232, in xla_device
2022-03-10T03:04:18.9370178Z     devkind=devkind if devkind is not None else None)
2022-03-10T03:04:18.9370783Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 137, in get_xla_supported_devices
2022-03-10T03:04:18.9371255Z     xla_devices = _DEVICES.value
2022-03-10T03:04:18.9371661Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/utils/utils.py", line 32, in value
2022-03-10T03:04:18.9372026Z     self._value = self._gen_fn()
2022-03-10T03:04:18.9372777Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 19, in <lambda>
2022-03-10T03:04:18.9373164Z     _DEVICES = xu.LazyProperty(lambda: torch_xla._XLAC._xla_get_devices())
2022-03-10T03:04:18.9373585Z RuntimeError: tensorflow/compiler/xla/xla_client/xrt_local_service.cc:56 : Check failed: tensorflow::NewServer(server_def, &server_) == ::tensorflow::Status::OK() (UNKNOWN: Could not start gRPC server vs. OK)
2022-03-10T03:04:18.9456086Z xla:0 is not a TPU or GPU device
2022-03-10T03:04:18.9481291Z xla:0 is not a TPU or GPU device
2022-03-10T03:04:18.9571039Z xla:0 is not a TPU or GPU device
2022-03-10T03:04:19.1457592Z Traceback (most recent call last):
2022-03-10T03:04:19.1458142Z   File "/var/lib/jenkins/workspace/xla/test/test_mp_all_gather.py", line 49, in <module>
2022-03-10T03:04:19.1458742Z     xmp.spawn(_mp_fn, args=())
2022-03-10T03:04:19.1459331Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
2022-03-10T03:04:19.1459653Z     start_method=start_method)
2022-03-10T03:04:19.1460030Z   File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
2022-03-10T03:04:19.1460300Z     while not context.join():

1 failure not recognized by patterns:

Job Step Action
GitHub Actions Lint / mypy Run mypy 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 2, 2022
The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request Mar 2, 2022
ghstack-source-id: 7626312
Pull Request resolved: #73689
The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
# we haven't tested every single test in this file
# but we enable FULL_PROFILER for a large subset
# of the tests with "with enable_profiling_mode_for_profiling_tests"
torch._C._jit_set_profiling_mode(False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also updates this finally, needed to fix a couple tests @Krovatkin

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
Copy link
Contributor

@davidberard98 davidberard98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Looks good to me

nit: is there a reason we can't get rid of all the bailOut arguments, e.g. in graph_executor.cpp? It doesn't seem like they are used anywhere

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. 

[ghstack-poisoned]
@eellison
Copy link
Contributor Author

nit: is there a reason we can't get rid of all the bailOut arguments, e.g. in graph_executor.cpp? It doesn't seem like they are used anywhere

i think some of it is still being used for passing in the number of remaining specializations.. but yea some cleanup could probably still be done

NesrineMHB pushed a commit to NesrineMHB/pytorch that referenced this pull request Apr 8, 2022
ghstack-source-id: 2cb01bd
Pull Request resolved: pytorch/pytorch#73875

Get rid of DefaultNumBailouts

Pull Request resolved: pytorch/pytorch#73689
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 22, 2022
@github-actions github-actions bot closed this Jun 21, 2022
@facebook-github-bot facebook-github-bot deleted the gh/eellison/274/head branch July 21, 2022 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants