[follow-up] Python Attr Serialization #88913

kshitij12345 · 2022-11-11T22:33:45Z

Ref: #81616 (comment)

cc @ezyang @gchanan

pytorch-bot · 2022-11-11T22:33:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88913

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit fa63eaf:

NEW FAILURES - The following jobs have failed:

win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kshitij12345 · 2022-11-25T15:54:08Z

@albanD do you think it would be good to merge this now? Is this past FC period?

…ow-up

kshitij12345 · 2022-11-28T17:25:46Z

@pytorchbot merge

pytorchmergebot · 2022-11-28T17:27:34Z

Merge failed

Reason: PR #88913 has not been reviewed yet (Rule superuser)

Details for Dev Infra team

Raised by workflow job

kshitij12345 · 2022-11-28T17:28:28Z

Oops, Sorry! Wrong PR 😓🙏

Also, ping @albanD

albanD

Thanks!

albanD · 2022-11-29T14:33:06Z

@pytorchbot merge -g

pytorchmergebot · 2022-11-29T14:35:03Z

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-11-29T14:35:07Z

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / cuda11.6-py3.10-gcc7-sm86 / test (default, 1, 4, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

kshitij12345 · 2022-11-29T15:22:17Z

pytorchbot merge -f"Unrelated CI failure: RuntimeError: test_jit_cuda_fuser failed! Received signal: SIGSEGV"

kshitij12345 · 2022-11-29T16:44:09Z

@pytorchbot merge -f"Unrelated CI failure: RuntimeError: test_jit_cuda_fuser failed! Received signal: SIGSEGV"

pytorchmergebot · 2022-11-29T16:46:16Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot · 2022-12-02T20:07:42Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2022-12-02T20:14:06Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2022-12-02T20:14:15Z

@kshitij12345 your PR has been successfully reverted.

This reverts commit 086b251. Reverted #88913 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally

mehtanirav · 2022-12-02T20:15:59Z

@kshitij12345 Unfortunately this PR had to been reverted as the earlier changes haven't been released yet to production owing to the code freeze. @singlaiiit can add more context if I missed anything.

kshitij12345 · 2023-01-12T14:22:27Z

For now, I have updated _acc_grad and _optimizer_hook_handles to be mapped to a parameter with a WeakTensorKeyDictionary (will follow-up in a separate PR for other attributes if it is deemed to be worth avoiding their serialisation).

The distributed tests are passing. Only failure in the Windows CI is unrelated. torch_test_cpp_extension.cpp fails with following error

2023-01-12T12:51:45.3403220Z FAILED: C:/actions-runner/_work/pytorch/pytorch/test/cpp_extensions/build/temp.win-amd64-cpython-39/Release/extension.obj 
2023-01-12T12:51:45.3406353Z cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include\torch\csrc\api\include -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include\TH -IC:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include\THC -IC:\actions-runner\_work\pytorch\pytorch\test\cpp_extensions\self_compiler_include_dirs_test -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29333\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c C:\actions-runner\_work\pytorch\pytorch\test\cpp_extensions\extension.cpp /FoC:\actions-runner\_work\pytorch\pytorch\test\cpp_extensions\build\temp.win-amd64-cpython-39\Release\extension.obj /sdl /permissive- -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cpp -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++17
2023-01-12T12:51:45.3409533Z C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
2023-01-12T12:51:45.3410543Z C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include\c10/util/Optional.h(212): warning C4624: 'c10::constexpr_storage_t<T>': destructor was implicitly defined as deleted
2023-01-12T12:51:45.3411050Z         with
2023-01-12T12:51:45.3411272Z         [
2023-01-12T12:51:45.3411516Z             T=c10::SymInt
2023-01-12T12:51:45.3411752Z         ]

kshitij12345 · 2023-01-13T14:57:50Z

Ping @albanD @awgu for another review. Thanks :)

albanD

Looks pretty clean!
Sounds good to me!

kshitij12345 · 2023-01-13T17:20:36Z

@pytorchbot merge

pytorchmergebot · 2023-01-13T17:22:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-13T17:22:24Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

pull

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

kshitij12345 · 2023-01-13T17:37:06Z

@pytorchbot merge -f"Unrelated Windows CI failure"

pytorchmergebot · 2023-01-13T17:38:47Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…pickling errors After #88913, user-defined parameter states will be pickled. For a FlatParameter, this means `_local_shard` will also be pickled. Since state_dict and load_state_dict only require the tensor, returning the full FlatParameter does not give us any extra benefit. This PR changes the behavior to simply return a view of the FlatParameter. Differential Revision: [D43205127](https://our.internmc.facebook.com/intern/diff/D43205127/) [ghstack-poisoned]

…of FlatParameters to avoid pickling errors" After #88913, user-defined parameter states will be pickled. For a FlatParameter, this means `_local_shard` will also be pickled. Since state_dict and load_state_dict only require the tensor, returning the full FlatParameter does not give us any extra benefit. This PR changes the behavior to simply return a view of the FlatParameter. Differential Revision: [D43205127](https://our.internmc.facebook.com/intern/diff/D43205127/) [ghstack-poisoned]

…pickling errors Pull Request resolved: #94637 After #88913, user-defined parameter states will be pickled. For a FlatParameter, this means `_local_shard` will also be pickled. Since state_dict and load_state_dict only require the tensor, returning the full FlatParameter does not give us any extra benefit. This PR changes the behavior to simply return a view of the FlatParameter. ghstack-source-id: 179983735 Differential Revision: [D43205127](https://our.internmc.facebook.com/intern/diff/D43205127/)

…s to avoid pickling errors" After #88913, user-defined parameter states will be pickled. For a FlatParameter, this means `_local_shard` will also be pickled. Since state_dict and load_state_dict only require the tensor, returning the full FlatParameter does not give us any extra benefit. This PR changes the behavior to simply return a view of the FlatParameter. Differential Revision: [D43205127](https://our.internmc.facebook.com/intern/diff/D43205127/) [ghstack-poisoned]

…pickling errors (#94637) After #88913, user-defined parameter states will be pickled. For a FlatParameter, this means `_local_shard` will also be pickled. Since state_dict and load_state_dict only require the tensor, returning the full FlatParameter does not give us any extra benefit. This PR changes the behavior to simply return a view of the FlatParameter. Differential Revision: [D43205127](https://our.internmc.facebook.com/intern/diff/D43205127/) Pull Request resolved: #94637 Approved by: https://github.com/rohan-varma

This reverts commit 745fe35. [ghstack-poisoned]

This reverts commit 745fe35. ghstack-source-id: 47fc202 Pull Request resolved: #94741

[follow-up] Python Attr Serialization

35b3341

pytorchbot added the open source label Nov 11, 2022

kshitij12345 marked this pull request as ready for review November 25, 2022 15:53

kshitij12345 requested review from albanD and jbschlosser as code owners November 25, 2022 15:53

kshitij12345 changed the title ~~[DO-NOT-MERGE][follow-up] Python Attr Serialization~~ [follow-up] Python Attr Serialization Nov 25, 2022

Merge branch 'pytorch:master' into fix/tensor-serialization/attr-foll…

a578eb5

…ow-up

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 28, 2022

albanD approved these changes Nov 29, 2022

View reviewed changes

pytorchmergebot added the Merged label Nov 29, 2022

pytorchmergebot closed this in 086b251 Nov 29, 2022

pytorchmergebot added the Reverted label Dec 2, 2022

pytorchmergebot added a commit that referenced this pull request Dec 2, 2022

Revert "[follow-up] Python Attr Serialization (#88913)"

f5fbb50

This reverts commit 086b251. Reverted #88913 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally

albanD reopened this Dec 5, 2022

kshitij12345 requested review from H-Huang, awgu, kwen2501 and wanchaol as code owners January 12, 2023 10:12

albanD approved these changes Jan 13, 2023

View reviewed changes

pytorchmergebot closed this in 745fe35 Jan 13, 2023

albanD added the module: bc-breaking Related to a BC-breaking change label Jan 19, 2023

pytorch-bot bot added the topic: bc breaking topic category label Jan 19, 2023

awgu mentioned this pull request Feb 8, 2023

FSDP checkpoint load RuntimeError appeared between jan11 and feb8 #94409

Closed

fegin mentioned this pull request Feb 10, 2023

[FSDP][state_dict] Return tensors instead of FlatParameters to avoid pickling errors #94637

Closed

awgu mentioned this pull request Feb 13, 2023

[FSDP][1/N] Refactor module materialization #94196

Closed

awgu pushed a commit that referenced this pull request Feb 13, 2023

Revert "[follow-up] Python Attr Serialization (#88913)"

fe0edeb

This reverts commit 745fe35. [ghstack-poisoned]

This was referenced Feb 13, 2023

[FSDP][2/N] Add util for computing shared param LCA #94197

Closed

[FSDP][3/N] Add LCA logic to fully_shard #94198

Closed

Revert "[follow-up] Python Attr Serialization (#88913)" #94741

Closed

awgu pushed a commit that referenced this pull request Feb 13, 2023

Revert "[follow-up] Python Attr Serialization (#88913)"

86dbc24

This reverts commit 745fe35. ghstack-source-id: 47fc202 Pull Request resolved: #94741

[follow-up] Python Attr Serialization #88913

[follow-up] Python Attr Serialization #88913

Uh oh!

Conversation

kshitij12345 commented Nov 11, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88913

❌ 1 Failures

Uh oh!

kshitij12345 commented Nov 25, 2022

Uh oh!

kshitij12345 commented Nov 28, 2022

Uh oh!

pytorchmergebot commented Nov 28, 2022

Merge failed

Uh oh!

kshitij12345 commented Nov 28, 2022

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented Nov 29, 2022

Uh oh!

pytorchmergebot commented Nov 29, 2022

Merge started

Uh oh!

pytorchmergebot commented Nov 29, 2022

Merge failed

Uh oh!

kshitij12345 commented Nov 29, 2022

Uh oh!

kshitij12345 commented Nov 29, 2022

Uh oh!

pytorchmergebot commented Nov 29, 2022

Merge started

Uh oh!

facebook-github-bot commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Uh oh!

mehtanirav commented Dec 2, 2022

Uh oh!

kshitij12345 commented Jan 12, 2023

Uh oh!

kshitij12345 commented Jan 13, 2023

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

kshitij12345 commented Jan 13, 2023

Uh oh!

pytorchmergebot commented Jan 13, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 13, 2023

Merge failed

Uh oh!

kshitij12345 commented Jan 13, 2023

Uh oh!

pytorchmergebot commented Jan 13, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

kshitij12345 commented Nov 11, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 11, 2022 •

edited

Loading