Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Feb 18, 2022

Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 18, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/9b4d5e5166e8c8d356911d449c9e133cf6a067ec/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 18, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit c981be7 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Feb 18, 2022
fegin added a commit that referenced this pull request Feb 18, 2022
Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

ghstack-source-id: 149524472
Pull Request resolved: #73116
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this as long as we have consensus on name, API, defaults, etc.

The main thing I think will be the default setting of writeback which is currently True, users will have to explicitly pass in writeback=False to get better perf. However summon_full_params is mostly for debugging, or checkpointing and I think the usability improvement is worth the tradeoff here. We can also always flip the param if need be before the release.

Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Feb 23, 2022
Pull Request resolved: #73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 149776412

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)
@fegin
Copy link
Contributor Author

fegin commented Feb 23, 2022

@rohan-varma agree. The default value of write_back should be False.

Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Feb 23, 2022
Pull Request resolved: #73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 149787096

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)
Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Feb 23, 2022
Pull Request resolved: #73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 149807641

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fegin Maybe I miscommunicated, but why should it be False over the current True? From what I understand summon_full_params is not currently on a perf critical path, and writeback=True by default is nice because modifications will persist by default (or perhaps modifying state by default is not great?)

@fegin
Copy link
Contributor Author

fegin commented Feb 24, 2022

@rohan-varma I misunderstood your original comment. My thought was that summon_full_params and many other functions in FSDP change the storage or create copies of the original parameters. In such a case, writeback gives me an impression that this should not the default behavior. This is not a strong argument. So either way works for me.

Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Feb 25, 2022
Pull Request resolved: #73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 149951203

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)
Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
Users may need summon_full_params() to get the original parameters.

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)

[ghstack-poisoned]
fegin added a commit that referenced this pull request Feb 28, 2022
Pull Request resolved: #73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 150134237

Differential Revision: [D34353034](https://our.internmc.facebook.com/intern/diff/D34353034/)
facebook-github-bot pushed a commit that referenced this pull request Mar 1, 2022
Summary:
Pull Request resolved: #73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 150134237

Test Plan: CI

Reviewed By: rohan-varma

Differential Revision: D34353034

fbshipit-source-id: ac69cc032da177903cd9969094f3f82dc6a61636
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2022

Hey @fegin.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Summary:
Pull Request resolved: pytorch/pytorch#73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 150134237

Test Plan: CI

Reviewed By: rohan-varma

Differential Revision: D34353034

fbshipit-source-id: ac69cc032da177903cd9969094f3f82dc6a61636
(cherry picked from commit 55d34fdee3778110a165a13ae987d0339e8d33c7)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Summary:
Pull Request resolved: pytorch/pytorch#73116

Users may need summon_full_params() to get the original parameters.
ghstack-source-id: 150134237

Test Plan: CI

Reviewed By: rohan-varma

Differential Revision: D34353034

fbshipit-source-id: ac69cc032da177903cd9969094f3f82dc6a61636
(cherry picked from commit 55d34fdee3778110a165a13ae987d0339e8d33c7)
@facebook-github-bot facebook-github-bot deleted the gh/fegin/3/head branch March 5, 2022 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants