Make lazy codegen honor per-operator-headers flag #74450

wconstab · 2022-03-21T04:21:45Z

Summary:

per-operator-headers is a strict build mode where compulation units aren't allowed
to depend on bulk headers like ATen/Functions.h, but must instead depend only on the
specific operator headers used. (In other configurations, the reverse is required).

Test Plan: CI to make sure nothing breaks for existing backends, and rebased next diff manual test to make sure it actually helps

Differential Revision: D35002666

pytorch-bot · 2022-03-21T04:21:49Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/wconstab/pytorch/blob/78c0cfc3cacc10bf752779d460d30aa63a7b0506/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
deploy-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-manywheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
windows-binary-libtorch-debug	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-libtorch-release	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-wheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-bionic-rocm4.5-py3.7-distributed	`ciflow/all`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-03-21T04:21:51Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74450
Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 4425d38 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-03-21T04:22:06Z

This pull request was exported from Phabricator. Differential Revision: D35002666

bdhirsh · 2022-03-21T17:49:53Z

.jenkins/pytorch/test.sh

mostly just wondering - were the cpp tests no running in CI before this? And any reason to only enable them for cuda builds on CI? (once the XLA plugin is ready, I'm guessing we want it to run on XLA too?)

correct, it was an oversight, but not a big deal since we mostly looked at the lazy_tensor_staging CI, and also this test was running in sandcastle. But it did confuse me for a while when I was trying to find the output of the expanded test suite in my later diff!

also note, this change is supposed to be landed first in #74449

I'm not sure why the rebase didn't work as intended, but this diff should only be adding the per-operator-headers stuff, and i'll land the other one first and make sure this one is clean.

bdhirsh · 2022-03-21T17:59:54Z

caffe2/CMakeLists.txt

hmm I might just be misunderstanding. But why does the actual codegen need to know if we're building the per-operator flag or not?

I think the way it works today is that the codegen will always emit the per-operator headers. The codegen templates files themselves include an IFDEF like this:

#ifndef AT_PER_OPERATOR_HEADERS #include <ATen/Operators.h> #else ${operator_headers} #endif

That way the conditional code just lives directly in the C++ template file, instead of needing to clutter up the codegen with it. Example template:

pytorch/aten/src/ATen/templates/RegisterFunctionalization.cpp

Line 9 in 14bf20c

#ifndef AT_PER_OPERATOR_HEADERS

I think I didn't realize this logic existed, but it is still not enough since I also had to fix this (see below)

'ops_headers': '#include <ATen/Functions.h>' if not per_operator_headers else '',

Perhaps we could move ATen/Functions.h into the template ifdef and then things would have been OK.

I think the way it works today is that the codegen will always emit the per-operator headers. The codegen templates files themselves include an IFDEF like this:

I think this isn't quite correct, the RegisterFunctionalization template is a nice thing to emulate but it is not how RegisterDispatchKey works currently. RegisterDispatchKey currently has its per_operator_headers thing hardcoded to False in gen_backend_stubs.py (probably since that is presumed only used out of tree), and RegisterDispatchKey.cpp doesn't have the nice #ifndef that you pointed out in RegisterFunctionalization.

I was starting to mess with this, and I am kind of wondering if it's ok to land as-is and then update how it works. I mainly don't want to add any more delay to landing the TS backend since that is blocking other stuff downstream.

ok yep, I definitely don't want to hold up the PR on this if landing this PR is actively blocking other things.

you're also totally right - it looks like we do something similar for RegisterDispatchKey.cpp (plumbing this flag into the codegen), e.g. here:

pytorch/cmake/Codegen.cmake

Line 144 in 37c5f11

set(GEN_PER_OPERATOR_FLAG)

facebook-github-bot · 2022-03-21T22:37:27Z

This pull request was exported from Phabricator. Differential Revision: D35002666

Summary: Pull Request resolved: pytorch#74450 - per-operator-headers is a strict build mode where compulation units aren't allowed to depend on bulk headers like ATen/Functions.h, but must instead depend only on the specific operator headers used. (In other configurations, the reverse is required). Test Plan: CI to make sure nothing breaks for existing backends, and rebased next diff manual test to make sure it actually helps Reviewed By: ezyang Differential Revision: D35002666 fbshipit-source-id: 37d1142e165a1f326df2f463c261c5058c244d10

facebook-github-bot · 2022-03-22T03:47:25Z

This pull request was exported from Phabricator. Differential Revision: D35002666

Summary: Pull Request resolved: #74450 - per-operator-headers is a strict build mode where compulation units aren't allowed to depend on bulk headers like ATen/Functions.h, but must instead depend only on the specific operator headers used. (In other configurations, the reverse is required). Test Plan: CI to make sure nothing breaks for existing backends, and rebased next diff manual test to make sure it actually helps Reviewed By: ezyang, bdhirsh Differential Revision: D35002666 fbshipit-source-id: 712445f8d146cf026759444fbd42a20705be9bef

github-actions · 2022-03-22T16:32:03Z

Hey @wconstab.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Pull Request resolved: #74450 - per-operator-headers is a strict build mode where compulation units aren't allowed to depend on bulk headers like ATen/Functions.h, but must instead depend only on the specific operator headers used. (In other configurations, the reverse is required). Test Plan: CI to make sure nothing breaks for existing backends, and rebased next diff manual test to make sure it actually helps Reviewed By: ezyang, bdhirsh Differential Revision: D35002666 fbshipit-source-id: 712445f8d146cf026759444fbd42a20705be9bef (cherry picked from commit f13e552)

pytorch-bot bot added the ciflow/default label Mar 21, 2022

facebook-github-bot added the cla signed label Mar 21, 2022

facebook-github-bot added the fb-exported label Mar 21, 2022

wconstab requested a review from bdhirsh March 21, 2022 13:57

bdhirsh reviewed Mar 21, 2022

View reviewed changes

wconstab force-pushed the export-D35002666 branch from 78c0cfc to d41bc9c Compare March 21, 2022 22:37

wconstab force-pushed the export-D35002666 branch from d41bc9c to 4425d38 Compare March 22, 2022 03:47

bdhirsh approved these changes Mar 22, 2022

View reviewed changes

pytorchmergebot closed this in 93f7f58 Mar 22, 2022

wconstab added topic: not user facing topic category release notes: lazy release notes category labels Mar 22, 2022

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Make lazy codegen honor per-operator-headers flag #74450

Make lazy codegen honor per-operator-headers flag #74450

Uh oh!

Conversation

wconstab commented Mar 21, 2022

Uh oh!

pytorch-bot bot commented Mar 21, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

facebook-github-bot commented Mar 21, 2022

Uh oh!

bdhirsh Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

wconstab Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wconstab Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

wconstab Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 21, 2022

Uh oh!

facebook-github-bot commented Mar 22, 2022

Uh oh!

github-actions bot commented Mar 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

facebook-github-bot commented Mar 21, 2022 •

edited

Loading

bdhirsh Mar 21, 2022 •

edited

Loading