Skip to content

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Mar 3, 2022

Stack from ghstack:

This adds CPU-only slow test jobs, which previously would never run.

Includes fixes for fx slow tests which were previously failing, but which never ran.

Differential Revision: D34628803

There are no CPU-only slow test machines, so this enables device_type slow tests (which previously would never run)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 3, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/9accb66ede90993c9e9f5a2f8a72490b171a77b6/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build ciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 3, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 5baacf0 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

1 failure not recognized by patterns:

Job Step Action
GitHub Actions trunk / macos-11-py3-x86-64 / test (default, 1, 2, macos-11) Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

There are no CPU-only slow test machines, so this enables device_type slow tests (which previously would never run)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 3, 2022
There are no CPU-only slow test machines, so this enables device_type slow tests (which previously would never run)

ghstack-source-id: 0c9385d
Pull Request resolved: #73748
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidberard98 davidberard98 requested a review from albanD March 4, 2022 21:35
@davidberard98
Copy link
Contributor Author

@albanD regarding slowTest on CPU, previously some of the CPU slowTests weren't running (e.g. TestNNCOpInfoCPU).

You can see that the runtime increased for the slow job:
On this PR (2.5 hrs): https://github.com/pytorch/pytorch/runs/5414065481?check_suite_focus=true
On a recent commit on master (1.5 hrs): https://github.com/pytorch/pytorch/runs/5424674750?check_suite_focus=true

Note: this also enables the CPU tests on slow-gradcheck jobs, should that be the case or would we only want to run it on the slow jobs?

@albanD
Copy link
Collaborator

albanD commented Mar 7, 2022

Why does this change the slow gradcheck tests?

@davidberard98
Copy link
Contributor Author

@albanD not entirely sure what the slow-gradcheck tests do... but it seems like they're a subset of slow tests, in that PYTORCH_TEST_WITH_SLOW=1, PYTORCH_TEST_SKIP_FAST=1 are set. So if there were any slow-gradcheck tests marked @slowtest and which use instantiate_device_type_tests(), I think we'd want to enable the CPU versions? (otherwise they would never run, since there's no cpu-only slow-gradcheck jobs, either)

There are no CPU-only slow test machines, so this enables device_type slow tests (which previously would never run)

Differential Revision: [D34628803](https://our.internmc.facebook.com/intern/diff/D34628803)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 9, 2022
There are no CPU-only slow test machines, so this enables device_type slow tests (which previously would never run)

ghstack-source-id: 5afd37a
Pull Request resolved: #73748
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidberard98
Copy link
Contributor Author

actually, I'm going to disable the CPU tests on slow-gradcheck for now, since otherwise it times out.

@davidberard98 davidberard98 requested a review from eellison March 10, 2022 21:30
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it feels like a much better solution would be to add slow CPU tests, wouldn't it? (which can be done by one-line change to https://github.com/pytorch/pytorch/blob/master/.github/scripts/generate_ci_workflows.py

There are no CPU-only slow test machines, so this enables device_type slow tests (which previously would never run)

Differential Revision: [D34628803](https://our.internmc.facebook.com/intern/diff/D34628803)

[ghstack-poisoned]
@davidberard98 davidberard98 changed the title Enable CPU device_type tests on CUDA slow machines Add CPU slow_test jobs Mar 17, 2022
This adds CPU-only slow test jobs, which previously would never run.

Includes fixes for fx slow tests which were previously failing, but which never ran.

Differential Revision: [D34628803](https://our.internmc.facebook.com/intern/diff/D34628803)

[ghstack-poisoned]
This adds CPU-only slow test jobs, which previously would never run.

Includes fixes for fx slow tests which were previously failing, but which never ran.

Differential Revision: [D34628803](https://our.internmc.facebook.com/intern/diff/D34628803)

[ghstack-poisoned]
This adds CPU-only slow test jobs, which previously would never run.

Includes fixes for fx slow tests which were previously failing, but which never ran.

Differential Revision: [D34628803](https://our.internmc.facebook.com/intern/diff/D34628803)

[ghstack-poisoned]
@davidberard98 davidberard98 added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 21, 2022
@davidberard98 davidberard98 changed the title Add CPU slow_test job Add CPU slow test job Mar 21, 2022
This adds CPU-only slow test jobs, which previously would never run.

Includes fixes for fx slow tests which were previously failing, but which never ran.

Differential Revision: [D34628803](https://our.internmc.facebook.com/intern/diff/D34628803)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 21, 2022
This adds CPU-only slow test jobs, which previously would never run.

Includes fixes for fx slow tests which were previously failing, but which never ran.

ghstack-source-id: 27300a4
Pull Request resolved: #73748
Comment on lines +130 to +135
linux-bionic-py3_7-clang9-slow-build:
name: linux-bionic-py3.7-clang9-slow
uses: pytorch/pytorch/.github/workflows/_linux-build.yml@master
with:
build-environment: linux-bionic-py3.7-clang9-slow
docker-image-name: pytorch-linux-bionic-py3.7-clang9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a separate build because there are no trunk-only builds?
@suo with new lint workflow is there a way to add a trunk-only test-shard to PR workflow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just standard GH stuff now, so a job with an if statements would work.

Although: can we not just use one of the many trunk gcc builds instead? Is there something about clang that is special here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo how would the job with if statements work? e.g. where to put if statements?
re: gcc builds, I think we just want a cpu-only build. AFAICT all the other builds in trunk.yml are for cuda (other than parallelnative, which seems like a special configuration).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's interesting. Yeah it looks like there is no trunk-only CPU build. idk, personally I think this change is fine; I'd rather just do an extra build than make the workflows more marginally more complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reference, it would look like the same thing you have, with one additional line:

  if: github.event == 'push'

But like I said, I feel like we should just keep pull jobs in pull.yml, and trunk jobs in trunk.yml. CPU builds take sub-10m and are super cheap, so it's not a big deal to overbuild a bit

Comment on lines +130 to +135
linux-bionic-py3_7-clang9-slow-build:
name: linux-bionic-py3.7-clang9-slow
uses: pytorch/pytorch/.github/workflows/_linux-build.yml@master
with:
build-environment: linux-bionic-py3.7-clang9-slow
docker-image-name: pytorch-linux-bionic-py3.7-clang9
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo how would the job with if statements work? e.g. where to put if statements?
re: gcc builds, I think we just want a cpu-only build. AFAICT all the other builds in trunk.yml are for cuda (other than parallelnative, which seems like a special configuration).

@onlyCPU
@slowTest
@dtypes(torch.float)
@unittest.skipIf(True, "Insufficient memory on linux.(2|4)xlarge")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo @malfet any opinions on whether we should skip this test entirely, or move the slow test to a 9xlarge just to have enough memory for this test?

@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidberard98 davidberard98 marked this pull request as ready for review March 22, 2022 15:38
facebook-github-bot pushed a commit that referenced this pull request Mar 23, 2022
Summary:
Pull Request resolved: #73748

This adds CPU-only slow test jobs, which previously would never run.

Includes fixes/skips for slow tests which fail (they need to be skipped now because they used to never run)

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D34628803

Pulled By: davidberard98

fbshipit-source-id: c090ab7bf7bda9e24ec5cdefa6fd35c6310dbac0
@github-actions
Copy link
Contributor

Hey @davidberard98.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@davidberard98 davidberard98 added the topic: not user facing topic category label Mar 23, 2022
shahofblah pushed a commit that referenced this pull request Mar 25, 2022
Summary:
Pull Request resolved: #73748

This adds CPU-only slow test jobs, which previously would never run.

Includes fixes/skips for slow tests which fail (they need to be skipped now because they used to never run)

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D34628803

Pulled By: davidberard98

fbshipit-source-id: c090ab7bf7bda9e24ec5cdefa6fd35c6310dbac0
(cherry picked from commit 06f7a94)
@facebook-github-bot facebook-github-bot deleted the gh/davidberard98/56/head branch March 27, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed topic: not user facing topic category with-ssh

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants