[FSDP] named_buffers fix #74517

rohan-varma · 2022-03-22T02:49:43Z

Stack from ghstack (oldest at bottom):

-> [FSDP] named_buffers fix #74517

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params.

Differential Revision: D35023191

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params. Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/) [ghstack-poisoned]

facebook-github-bot · 2022-03-22T02:49:49Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74517
Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 88fd8c8 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params. Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/) [ghstack-poisoned]

Pull Request resolved: #74517 Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params. ghstack-source-id: 152309646 Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/)

rohan-varma · 2022-03-27T14:14:31Z

CI did not run properly but it all passed on 46d93e7, landing

kumpera · 2022-03-28T13:23:37Z

torch/distributed/fsdp/fully_sharded_data_parallel.py

+        remove all occurrences of the FSDP-specific flattened buffer prefix
+        when inside the :meth:`summon_full_params` context manager.
+        """
+        in_summon_full_params = getattr(self, "training_state", None) == \


I'm curious on why can't we access training_state directly here.
This property is unconditionally initialized in FSDP's ctor and there's no del calls.

It is my fault for not digging into this clearly when landing the named_parameters() override.

We use this getattr() check for named_parameters() because the constructor calls named_parameters() (twice) before setting training_state.
This should not be needed for named_buffers() since named_buffers() is not called in the constructor.

My thought is that we can either:

move the definition of self.training_state = TrainingState_.IDLE to before the first usage of named_parameters() in the constructor, or

keep named_parameters() as is, add a comment explaining why we use getattr(), and remove the getattr() from named_buffers().

pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py

Line 328 in 88fd8c8

for param_name, param in module.named_parameters():

pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py

Line 346 in 88fd8c8

for n, p in self.named_parameters():

Summary: Pull Request resolved: #74517 Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params. ghstack-source-id: 152309646 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D35023191 fbshipit-source-id: 091c7afa73c595f54e303dbfc938010e08278d64

github-actions · 2022-03-28T14:45:00Z

Hey @rohan-varma.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

[FSDP] named_buffers fix

0e6d146

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params. Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/) [ghstack-poisoned]

rohan-varma requested review from H-Huang, mingzhe09088, mrshenli, pritamdamania87 and zhaojuanmao as code owners March 22, 2022 02:49

facebook-github-bot added the cla signed label Mar 22, 2022

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Mar 22, 2022

This was referenced Mar 22, 2022

[FSDP] Fix summon_full_params test #74456

Closed

[FSDP] Mixed precision enablement #74452

Closed

zhaojuanmao approved these changes Mar 22, 2022

View reviewed changes

rohan-varma added 2 commits March 23, 2022 16:53

Update on "[FSDP] named_buffers fix"

46d93e7

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params. Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/) [ghstack-poisoned]

Update on "[FSDP] named_buffers fix"

88fd8c8

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params. Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/) [ghstack-poisoned]

kumpera reviewed Mar 28, 2022

View reviewed changes

pytorchmergebot closed this in 7104f39 Mar 28, 2022

facebook-github-bot deleted the gh/rohan-varma/527/head branch April 1, 2022 14:17

awgu mentioned this pull request Apr 13, 2022

[FSDP][Easy] named_parameters(), named_buffers() refactor #75732

Closed

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP] named_buffers fix #74517

[FSDP] named_buffers fix #74517

Uh oh!

rohan-varma commented Mar 22, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 22, 2022 •

edited

Loading

Uh oh!

rohan-varma commented Mar 27, 2022

Uh oh!

kumpera Mar 28, 2022

Uh oh!

rohan-varma Mar 28, 2022

Uh oh!

awgu Mar 28, 2022

Uh oh!

awgu Mar 28, 2022

Uh oh!

github-actions bot commented Mar 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[FSDP] named_buffers fix #74517

[FSDP] named_buffers fix #74517

Uh oh!

Conversation

rohan-varma commented Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

rohan-varma commented Mar 27, 2022

Uh oh!

kumpera Mar 28, 2022

Choose a reason for hiding this comment

Uh oh!

rohan-varma Mar 28, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Mar 28, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Mar 28, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rohan-varma commented Mar 22, 2022 •

edited

Loading

facebook-github-bot commented Mar 22, 2022 •

edited

Loading