Get rid of DefaultNumBailouts #73689

eellison · 2022-03-02T22:02:04Z

Stack from ghstack:

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor.

[ghstack-poisoned]

pytorch-bot · 2022-03-02T22:02:08Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/2a4bb83cff85ed2cdb4c53f6f0fb3c68fdf2c641/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-03-02T22:02:09Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/73689
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 715b8b6 (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-10T03:04:18.9373585Z RuntimeError: tens...OK() (UNKNOWN: Could not start gRPC server vs. OK)

2022-03-10T03:04:18.9368582Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 315, in _setup_replication
2022-03-10T03:04:18.9368995Z     device = xm.xla_device()
2022-03-10T03:04:18.9369689Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 232, in xla_device
2022-03-10T03:04:18.9370178Z     devkind=devkind if devkind is not None else None)
2022-03-10T03:04:18.9370783Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 137, in get_xla_supported_devices
2022-03-10T03:04:18.9371255Z     xla_devices = _DEVICES.value
2022-03-10T03:04:18.9371661Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/utils/utils.py", line 32, in value
2022-03-10T03:04:18.9372026Z     self._value = self._gen_fn()
2022-03-10T03:04:18.9372777Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 19, in <lambda>
2022-03-10T03:04:18.9373164Z     _DEVICES = xu.LazyProperty(lambda: torch_xla._XLAC._xla_get_devices())
2022-03-10T03:04:18.9373585Z RuntimeError: tensorflow/compiler/xla/xla_client/xrt_local_service.cc:56 : Check failed: tensorflow::NewServer(server_def, &server_) == ::tensorflow::Status::OK() (UNKNOWN: Could not start gRPC server vs. OK)
2022-03-10T03:04:18.9456086Z xla:0 is not a TPU or GPU device
2022-03-10T03:04:18.9481291Z xla:0 is not a TPU or GPU device
2022-03-10T03:04:18.9571039Z xla:0 is not a TPU or GPU device
2022-03-10T03:04:19.1457592Z Traceback (most recent call last):
2022-03-10T03:04:19.1458142Z   File "/var/lib/jenkins/workspace/xla/test/test_mp_all_gather.py", line 49, in <module>
2022-03-10T03:04:19.1458742Z     xmp.spawn(_mp_fn, args=())
2022-03-10T03:04:19.1459331Z   File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.11-py3.7-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
2022-03-10T03:04:19.1459653Z     start_method=start_method)
2022-03-10T03:04:19.1460030Z   File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
2022-03-10T03:04:19.1460300Z     while not context.join():

1 failure not recognized by patterns:

Job	Step	Action
^{Lint / mypy}	^{Run mypy}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

ghstack-source-id: 7626312 Pull Request resolved: #73689

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

eellison · 2022-03-07T22:45:34Z

test/test_jit.py

-# we haven't tested every single test in this file
-# but we enable FULL_PROFILER for a large subset
-# of the tests with "with enable_profiling_mode_for_profiling_tests"
-torch._C._jit_set_profiling_mode(False)


also updates this finally, needed to fix a couple tests @Krovatkin

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

davidberard98

Awesome! Looks good to me

nit: is there a reason we can't get rid of all the bailOut arguments, e.g. in graph_executor.cpp? It doesn't seem like they are used anywhere

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

eellison · 2022-03-14T22:55:40Z

nit: is there a reason we can't get rid of all the bailOut arguments, e.g. in graph_executor.cpp? It doesn't seem like they are used anywhere

i think some of it is still being used for passing in the number of remaining specializations.. but yea some cleanup could probably still be done

ghstack-source-id: 2cb01bd Pull Request resolved: pytorch/pytorch#73875 Get rid of DefaultNumBailouts Pull Request resolved: pytorch/pytorch#73689

github-actions · 2022-05-22T04:42:12Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Get rid of DefaultNumBailouts

2a4bb83

[ghstack-poisoned]

eellison mentioned this pull request Mar 2, 2022

Add Gflags for fusion strategy and make it local to executor #73668

Closed

pytorch-bot bot added the ciflow/default label Mar 2, 2022

facebook-github-bot added the cla signed label Mar 2, 2022

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 2, 2022

eellison requested review from Krovatkin and davidberard98 March 2, 2022 22:03

Update on "Get rid of DefaultNumBailouts"

2cc4283

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

Update on "Get rid of DefaultNumBailouts"

66d49e2

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

Update on "Get rid of DefaultNumBailouts"

70707bb

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

eellison pushed a commit that referenced this pull request Mar 2, 2022

Get rid of DefaultNumBailouts

8ddbea1

ghstack-source-id: 7626312 Pull Request resolved: #73689

Update on "Get rid of DefaultNumBailouts"

6c0e56d

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

eellison mentioned this pull request Mar 4, 2022

Add lock for multithread #73796

Closed

Update on "Get rid of DefaultNumBailouts"

f7c5567

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

This was referenced Mar 7, 2022

Clean up profiling mode and profiling executor strategy #73875

Closed

Remove bailout logic #73876

Closed

Update on "Get rid of DefaultNumBailouts"

daedb83

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

eellison commented Mar 7, 2022

View reviewed changes

Update on "Get rid of DefaultNumBailouts"

de7007d

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

davidberard98 approved these changes Mar 8, 2022

View reviewed changes

Update on "Get rid of DefaultNumBailouts"

715b8b6

The number of bailouts of each executor should just be instantiated from whatever the current fusion strategy is, which is then stored on the executor. [ghstack-poisoned]

suo removed the ciflow/default label Mar 22, 2022

github-actions bot added the Stale label May 22, 2022

github-actions bot closed this Jun 21, 2022

facebook-github-bot deleted the gh/eellison/274/head branch July 21, 2022 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get rid of DefaultNumBailouts #73689

Get rid of DefaultNumBailouts #73689

Uh oh!

eellison commented Mar 2, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 2, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 2, 2022 •

edited

Loading

Uh oh!

eellison Mar 7, 2022

Uh oh!

davidberard98 left a comment

Uh oh!

eellison commented Mar 14, 2022

Uh oh!

github-actions bot commented May 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Get rid of DefaultNumBailouts #73689

Get rid of DefaultNumBailouts #73689

Uh oh!

Conversation

eellison commented Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 2, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (1/1)

1 failure not recognized by patterns:

Uh oh!

eellison Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

davidberard98 left a comment

Choose a reason for hiding this comment

Uh oh!

eellison commented Mar 14, 2022

Uh oh!

github-actions bot commented May 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eellison commented Mar 2, 2022 •

edited

Loading

facebook-github-bot commented Mar 2, 2022 •

edited

Loading