Skip to content

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Mar 3, 2022

Stack from ghstack:

I think this was causing multiple implementations of
RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and
torch_python.dll were exporting RegisterCudaFuseGraph implementations,
so RegisterCudaFuseGraph::isRegistered() would refer to a different
static bool depending on whether the caller was in torch_cpu.dll or
torch_python.dll. See #73717 for a demonstration.

Differential Revision: D34618623

I think this was causing multiple implementations of
RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and
torch_python.dll were exporting RegisterCudaFuseGraph implementations,
so RegisterCudaFuseGraph::isRegistered() would refer to a _different_
static bool depending on whether the caller was in torch_cpu.dll or
torch_python.dll. See #73717 for a demonstration.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 3, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/8adca65d09ce4ffef2bb0cc5c677a55795049190/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build ciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 3, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 595968a (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 3, 2022
…C10_EXPORT"

I think this was causing multiple implementations of
RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and
torch_python.dll were exporting RegisterCudaFuseGraph implementations,
so RegisterCudaFuseGraph::isRegistered() would refer to a _different_
static bool depending on whether the caller was in torch_cpu.dll or
torch_python.dll. See #73717 for a demonstration.

[ghstack-poisoned]
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidberard98 davidberard98 linked an issue Mar 3, 2022 that may be closed by this pull request
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just grep replace C10_EXPORT with TORCH_API everywhere in torch/csrc/jit ?

@jjsjann123
Copy link
Collaborator

Should we just grep replace C10_EXPORT with TORCH_API everywhere in torch/csrc/jit ?

Similar question, we are also using C10_EXPORT on a few different places within nvfuser codebase. I guess those should be fine since they are not exposed in python API. But would TORCH_API be a safer bet in general?

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@davidberard98
Copy link
Contributor Author

@eellison @jjsjann123 I think some of them belong to TORCH_CUDA_CPP_API (or possibly TORCH_CUDA_CU_API ?). I'll go through and change some of the other instances in torch/csrc/jit/

davidberard98 added a commit that referenced this pull request Mar 5, 2022
These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

ghstack-source-id: 9c08b52
Pull Request resolved: #73818
davidberard98 added a commit that referenced this pull request Mar 5, 2022
These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Mar 7, 2022
…73742)

Summary:
Pull Request resolved: #73742

I think this was causing multiple implementations of
RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and
torch_python.dll were exporting RegisterCudaFuseGraph implementations,
so RegisterCudaFuseGraph::isRegistered() would refer to a _different_
static bool depending on whether the caller was in torch_cpu.dll or
torch_python.dll. See #73717 for a demonstration.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34618623

Pulled By: davidberard98

fbshipit-source-id: f9bc4fe792098bfabf50ecfcd1f785ed039184bd
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2022

Hey @davidberard98.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@davidberard98 davidberard98 added the topic: not user facing topic category label Mar 7, 2022
davidberard98 added a commit that referenced this pull request Mar 7, 2022
These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 7, 2022
These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

ghstack-source-id: 9ca88a0
Pull Request resolved: #73818
davidberard98 added a commit that referenced this pull request Mar 8, 2022
These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

Differential Revision: [D34698175](https://our.internmc.facebook.com/intern/diff/D34698175)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 8, 2022
These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

Differential Revision: [D34698175](https://our.internmc.facebook.com/intern/diff/D34698175)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Mar 8, 2022
These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

ghstack-source-id: 73592b7
Pull Request resolved: #73818
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 9, 2022
…(#73742)

Summary:
Pull Request resolved: pytorch/pytorch#73742

I think this was causing multiple implementations of
RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and
torch_python.dll were exporting RegisterCudaFuseGraph implementations,
so RegisterCudaFuseGraph::isRegistered() would refer to a _different_
static bool depending on whether the caller was in torch_cpu.dll or
torch_python.dll. See #73717 for a demonstration.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34618623

Pulled By: davidberard98

fbshipit-source-id: f9bc4fe792098bfabf50ecfcd1f785ed039184bd
(cherry picked from commit 2de58679342914f456d63a9f952357ac163cc4ec)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 9, 2022
…(#73742)

Summary:
Pull Request resolved: pytorch/pytorch#73742

I think this was causing multiple implementations of
RegisterCudaFuseGraph on windows. It looks like both torch_cpu.dll and
torch_python.dll were exporting RegisterCudaFuseGraph implementations,
so RegisterCudaFuseGraph::isRegistered() would refer to a _different_
static bool depending on whether the caller was in torch_cpu.dll or
torch_python.dll. See #73717 for a demonstration.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34618623

Pulled By: davidberard98

fbshipit-source-id: f9bc4fe792098bfabf50ecfcd1f785ed039184bd
(cherry picked from commit 2de58679342914f456d63a9f952357ac163cc4ec)
@facebook-github-bot facebook-github-bot deleted the gh/davidberard98/55/head branch March 11, 2022 15:17
facebook-github-bot pushed a commit that referenced this pull request Mar 14, 2022
Summary:
Pull Request resolved: #73818

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D34698175

Pulled By: davidberard98

fbshipit-source-id: cb871e861cf966bff596cfa8340a32a17fca0b66
pytorchmergebot pushed a commit that referenced this pull request Mar 14, 2022
Summary:
Pull Request resolved: #73818

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: #73742

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D34698175

Pulled By: davidberard98

fbshipit-source-id: cb871e861cf966bff596cfa8340a32a17fca0b66
(cherry picked from commit 6b9988e)
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
Summary:
Pull Request resolved: pytorch/pytorch#73818

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: pytorch/pytorch#73742

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D34698175

Pulled By: davidberard98

fbshipit-source-id: cb871e861cf966bff596cfa8340a32a17fca0b66
(cherry picked from commit 6b9988e5688e6d4a9928c3e331efb74f000a9e4a)
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
Summary:
Pull Request resolved: pytorch/pytorch#73818

These all appear to be defined in libtorch_cpu.so, so they should be marked with TORCH_API. TORCH_API means that these symbols are exported from libtorch_cpu.so and no other libraries. In comparison, C10_EXPORT will export the symbol in _all_ built libraries, if it's available.

I think most of these were fine because most were only defined in cpp files (which would only be included in the targets for one .so file). However, the change in pass_manager.h affects behavior, since the class is defined in the .h file, which could result in two separate implementations of the same static functions. Previously we saw issues on windows with this: pytorch/pytorch#73742

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D34698175

Pulled By: davidberard98

fbshipit-source-id: cb871e861cf966bff596cfa8340a32a17fca0b66
(cherry picked from commit 6b9988e5688e6d4a9928c3e331efb74f000a9e4a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JIT] NVFuser tests that fail on windows

5 participants