Skip to content

Conversation

@awgu
Copy link
Collaborator

@awgu awgu commented Apr 5, 2022

Stack from ghstack:

Test Plan
Check that test_fsdp_optim_state.py still passes.

Differential Revision: D35384883

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 5, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 1dc189f (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-04-05T02:36:07.7239292Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
2022-04-05T02:36:07.6707021Z     #10 0x559e01408c81 in run_mod /home/builder/tkoch/workspace/python_1648536129212/work/Python/pythonrun.c:1037
2022-04-05T02:36:07.6708109Z     #11 0x559e01413c69 in PyRun_StringFlags /home/builder/tkoch/workspace/python_1648536129212/work/Python/pythonrun.c:961
2022-04-05T02:36:07.6709074Z     #12 0x559e01413ccb in PyRun_SimpleStringFlags /home/builder/tkoch/workspace/python_1648536129212/work/Python/pythonrun.c:455
2022-04-05T02:36:07.6709964Z     #13 0x559e01413dc8 in pymain_run_command /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:420
2022-04-05T02:36:07.6710655Z     #14 0x559e01413dc8 in pymain_run_python /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:2907
2022-04-05T02:36:07.6711268Z     #15 0x559e01413dc8 in pymain_main /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:3460
2022-04-05T02:36:07.6711926Z     #16 0x559e0141418b in _Py_UnixMain /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:3495
2022-04-05T02:36:07.7238263Z     #17 0x7f889734583f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
2022-04-05T02:36:07.7238803Z     #18 0x559e013b9039 in _start (/opt/conda/bin/python3.7+0x1d8039)
2022-04-05T02:36:07.7238971Z 
2022-04-05T02:36:07.7239292Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
2022-04-05T02:36:07.7482870Z + retcode=1
2022-04-05T02:36:07.7483327Z + set -e
2022-04-05T02:36:07.7483623Z + return 1
2022-04-05T02:36:07.7487589Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]
2022-04-05T02:36:07.7488092Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X ]]
2022-04-05T02:36:07.7488827Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]
2022-04-05T02:36:07.7489123Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2022-04-05T02:36:07.7489478Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]
2022-04-05T02:36:07.7489756Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\5\1\2 ]]
2022-04-05T02:36:07.7492129Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 5, 2022
awgu pushed a commit that referenced this pull request Apr 5, 2022
ghstack-source-id: 4b40de9
Pull Request resolved: #75243
@awgu
Copy link
Collaborator Author

awgu commented Apr 5, 2022

@awgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

unpadded_numel = flat_param._orig_size.numel() # type: ignore[attr-defined]
tensor_state[state_name] = tensor_buffer[:unpadded_numel].cpu()
# Zero-dimension tensor state and non-tensor state: take this rank's
# value directly (`deepcopy()`ing to avoid aliasing surprises)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete outdated comment.

elif to_save:
if _is_zero_dim_tensor(value):
zero_dim_tensor_state[state_name] = value
zero_dim_tensor_state[state_name] = value.cpu()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, this zero-dimension tensor is on CPU, but we should add a .cpu() call to ensure this.

@awgu awgu marked this pull request as ready for review April 5, 2022 12:51
facebook-github-bot pushed a commit that referenced this pull request Apr 5, 2022
Summary: Pull Request resolved: #75243

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D35384883

Pulled By: awgu

fbshipit-source-id: 8dfc12035b79861df093d5921ed7b36050c9f3a0
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2022

Hey @awgu.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@awgu awgu added the topic: improvements topic category label Apr 5, 2022
@facebook-github-bot facebook-github-bot deleted the gh/awgu/22/head branch April 8, 2022 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants