Skip to content

Conversation

@rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented Mar 22, 2022

Stack from ghstack (oldest at bottom):

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params.

Differential Revision: D35023191

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params.

Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 22, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 88fd8c8 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Mar 22, 2022
Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params.

Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/)

[ghstack-poisoned]
Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params.

Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Mar 27, 2022
Pull Request resolved: #74517

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params.
ghstack-source-id: 152309646

Differential Revision: [D35023191](https://our.internmc.facebook.com/intern/diff/D35023191/)
@rohan-varma
Copy link
Contributor Author

CI did not run properly but it all passed on 46d93e7, landing

remove all occurrences of the FSDP-specific flattened buffer prefix
when inside the :meth:`summon_full_params` context manager.
"""
in_summon_full_params = getattr(self, "training_state", None) == \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious on why can't we access training_state directly here.
This property is unconditionally initialized in FSDP's ctor and there's no del calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @awgu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my fault for not digging into this clearly when landing the named_parameters() override.

We use this getattr() check for named_parameters() because the constructor calls named_parameters() (twice) before setting training_state.
This should not be needed for named_buffers() since named_buffers() is not called in the constructor.

My thought is that we can either:

  • move the definition of self.training_state = TrainingState_.IDLE to before the first usage of named_parameters() in the constructor, or
  • keep named_parameters() as is, add a comment explaining why we use getattr(), and remove the getattr() from named_buffers().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for param_name, param in module.named_parameters():

for n, p in self.named_parameters():

facebook-github-bot pushed a commit that referenced this pull request Mar 28, 2022
Summary:
Pull Request resolved: #74517

Partially addresses #73890 to fix named_buffers by stripping FSDP info in summon_full_params context, similar to named_params.
ghstack-source-id: 152309646

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D35023191

fbshipit-source-id: 091c7afa73c595f54e303dbfc938010e08278d64
@github-actions
Copy link
Contributor

Hey @rohan-varma.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants