[JIT] make RegisterCudaFuseGraph use TORCH_API instead of C10_EXPORT #73742

davidberard98 · 2022-03-03T16:26:30Z

Stack from ghstack:

I think this was causing multiple implementations of
RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and
torch_python.dll were exporting RegisterCudaFuseGraph implementations,
so RegisterCudaFuseGraph::isRegistered() would refer to a different
static bool depending on whether the caller was in torch_cpu.dll or
torch_python.dll. See #73717 for a demonstration.

Differential Revision: D34618623

I think this was causing multiple implementations of RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and torch_python.dll were exporting RegisterCudaFuseGraph implementations, so RegisterCudaFuseGraph::isRegistered() would refer to a _different_ static bool depending on whether the caller was in torch_cpu.dll or torch_python.dll. See #73717 for a demonstration. [ghstack-poisoned]

pytorch-bot · 2022-03-03T16:26:35Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/8adca65d09ce4ffef2bb0cc5c677a55795049190/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-03-03T16:26:37Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/73742
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 595968a (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…C10_EXPORT" I think this was causing multiple implementations of RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and torch_python.dll were exporting RegisterCudaFuseGraph implementations, so RegisterCudaFuseGraph::isRegistered() would refer to a _different_ static bool depending on whether the caller was in torch_cpu.dll or torch_python.dll. See #73717 for a demonstration. [ghstack-poisoned]

davidberard98 · 2022-03-03T18:48:35Z

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

eellison

Should we just grep replace C10_EXPORT with TORCH_API everywhere in torch/csrc/jit ?

jjsjann123 · 2022-03-04T06:04:47Z

Should we just grep replace C10_EXPORT with TORCH_API everywhere in torch/csrc/jit ?

Similar question, we are also using C10_EXPORT on a few different places within nvfuser codebase. I guess those should be fine since they are not exposed in python API. But would TORCH_API be a safer bet in general?

jjsjann123

Thanks for the fix!

davidberard98 · 2022-03-04T18:31:16Z

@eellison @jjsjann123 I think some of them belong to TORCH_CUDA_CPP_API (or possibly TORCH_CUDA_CU_API ?). I'll go through and change some of the other instances in torch/csrc/jit/

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 ghstack-source-id: 9c08b52 Pull Request resolved: #73818

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 [ghstack-poisoned]

…73742) Summary: Pull Request resolved: #73742 I think this was causing multiple implementations of RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and torch_python.dll were exporting RegisterCudaFuseGraph implementations, so RegisterCudaFuseGraph::isRegistered() would refer to a _different_ static bool depending on whether the caller was in torch_cpu.dll or torch_python.dll. See #73717 for a demonstration. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34618623 Pulled By: davidberard98 fbshipit-source-id: f9bc4fe792098bfabf50ecfcd1f785ed039184bd

github-actions · 2022-03-07T22:08:20Z

Hey @davidberard98.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 [ghstack-poisoned]

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 ghstack-source-id: 9ca88a0 Pull Request resolved: #73818

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 Differential Revision: [D34698175](https://our.internmc.facebook.com/intern/diff/D34698175) [ghstack-poisoned]

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 ghstack-source-id: 73592b7 Pull Request resolved: #73818

…(#73742) Summary: Pull Request resolved: pytorch/pytorch#73742 I think this was causing multiple implementations of RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and torch_python.dll were exporting RegisterCudaFuseGraph implementations, so RegisterCudaFuseGraph::isRegistered() would refer to a _different_ static bool depending on whether the caller was in torch_cpu.dll or torch_python.dll. See #73717 for a demonstration. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34618623 Pulled By: davidberard98 fbshipit-source-id: f9bc4fe792098bfabf50ecfcd1f785ed039184bd (cherry picked from commit 2de58679342914f456d63a9f952357ac163cc4ec)

Summary: Pull Request resolved: #73818 These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34698175 Pulled By: davidberard98 fbshipit-source-id: cb871e861cf966bff596cfa8340a32a17fca0b66

Summary: Pull Request resolved: #73818 These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34698175 Pulled By: davidberard98 fbshipit-source-id: cb871e861cf966bff596cfa8340a32a17fca0b66 (cherry picked from commit 6b9988e)

Summary: Pull Request resolved: pytorch/pytorch#73818 These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available. I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: pytorch/pytorch#73742 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34698175 Pulled By: davidberard98 fbshipit-source-id: cb871e861cf966bff596cfa8340a32a17fca0b66 (cherry picked from commit 6b9988e5688e6d4a9928c3e331efb74f000a9e4a)

pytorch-bot bot added the ciflow/default label Mar 3, 2022

facebook-github-bot added the cla signed label Mar 3, 2022

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 3, 2022

davidberard98 mentioned this pull request Mar 3, 2022

[JIT] Enable NVFuser tests in CI #73322

Closed

davidberard98 linked an issue Mar 3, 2022 that may be closed by this pull request

[JIT] NVFuser tests that fail on windows #73620

Closed

davidberard98 requested review from eellison and jjsjann123 March 3, 2022 23:18

eellison approved these changes Mar 4, 2022

View reviewed changes

jjsjann123 approved these changes Mar 4, 2022

View reviewed changes

davidberard98 mentioned this pull request Mar 5, 2022

[JIT] C10_EXPORT -> TORCH_API #73818

Closed

pytorchmergebot closed this in a3d099e Mar 7, 2022

davidberard98 added the topic: not user facing topic category label Mar 7, 2022

facebook-github-bot deleted the gh/davidberard98/55/head branch March 11, 2022 15:17

jjsjann123 mentioned this pull request Apr 6, 2022

[JIT] NVFuser tests that fail on windows #73620

Closed

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[JIT] make RegisterCudaFuseGraph use TORCH_API instead of C10_EXPORT #73742

[JIT] make RegisterCudaFuseGraph use TORCH_API instead of C10_EXPORT #73742

Uh oh!

davidberard98 commented Mar 3, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 3, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 3, 2022 •

edited

Loading

Uh oh!

davidberard98 commented Mar 3, 2022

Uh oh!

eellison left a comment

Uh oh!

jjsjann123 commented Mar 4, 2022

Uh oh!

jjsjann123 left a comment

Uh oh!

davidberard98 commented Mar 4, 2022

Uh oh!

github-actions bot commented Mar 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[JIT] make RegisterCudaFuseGraph use TORCH_API instead of C10_EXPORT #73742

[JIT] make RegisterCudaFuseGraph use TORCH_API instead of C10_EXPORT #73742

Uh oh!

Conversation

davidberard98 commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 3, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

davidberard98 commented Mar 3, 2022

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

jjsjann123 commented Mar 4, 2022

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Mar 4, 2022

Uh oh!

github-actions bot commented Mar 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

davidberard98 commented Mar 3, 2022 •

edited

Loading

facebook-github-bot commented Mar 3, 2022 •

edited

Loading