[FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_info to map from original FQN to flat_param #89899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

fegin wants to merge 9 commits into gh/fegin/48/base from gh/fegin/48/head

Contributor

fegin commented Nov 30, 2022 •

edited

Loading

Stack from ghstack (oldest at bottom):

Motivation:
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if use_orig_params is True.


          [FSDP][optim_state_dict][2/N] Add a helper to map from original FQN t…

7d3dc02

…o flat_param

[ghstack-poisoned]

fegin requested review from H-Huang, awgu, kwen2501, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners

November 30, 2022 07:23

pytorch-bot bot added the release notes: distributed (fsdp) label

pytorch-bot bot commented Nov 30, 2022 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89899

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 97ff650:

The following jobs have failed:

linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This was referenced Nov 30, 2022

[FSDP][optim_state_dict][1/N] Restructure _optim_state_dict to prepare the support of use_orig_param #89898

Closed

[FSDP][optim_state_dict][3/N] Support use_orig_param optim_state_dict (non-broadcast version) #89900

Closed

fegin added 4 commits

November 30, 2022 09:12


          Update on "[FSDP][optim_state_dict][2/N] Add a helper to map from ori…

793e02e

…ginal FQN to flat_param"

[ghstack-poisoned]


          Update on "[FSDP][optim_state_dict][2/N] Add a helper to map from ori…

…ginal FQN to flat_param"


**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.


[ghstack-poisoned]


          Update on "[FSDP][optim_state_dict][2/N] Add a helper to map from ori…

87e89c5

…ginal FQN to flat_param"


**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.


[ghstack-poisoned]


          Update on "[FSDP][optim_state_dict][2/N] Add a helper to map from ori…

11ff9d0

…ginal FQN to flat_param"


**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.


[ghstack-poisoned]

awgu reviewed

View reviewed changes

Collaborator

awgu left a comment

I made an initial pass with some conceptual questions.

torch/distributed/fsdp/_optim_utils.py Show resolved Hide resolved

torch/distributed/fsdp/_optim_utils.py Outdated Show resolved Hide resolved

torch/distributed/fsdp/_optim_utils.py Outdated Show resolved Hide resolved

torch/distributed/fsdp/_optim_utils.py Outdated

    
              def _get_fqn_to_fsdp_param_info(

                  model: nn.Module, dedup_shared_fqns: Set[str]

Collaborator

awgu Nov 30, 2022

The name dedup_shared_fqns is a bit unclear to me. Is the deduplication just happening because this is a set, but this data structure mainly just represents the FQNs to add as keys to the returned fqn_to_param_info?

Contributor Author

fegin Nov 30, 2022 •

edited

Loading

~~Add a comment. But in general, we only need to track the first fqn of a shared parameter. Same case for _get_param_to_fqns() and param_to_fqns.~~

Contributor Author

fegin Nov 30, 2022

After checking the code, I realized this protection is redundant. Please check the comment I added into the code.

torch/distributed/fsdp/_optim_utils.py Show resolved Hide resolved

torch/distributed/fsdp/_optim_utils.py Outdated

    
              class FSDPParamInfo:

                  state: nn.Module

                  flat_param: FlatParameter

                  fqn_indices: Dict[str, int]

Collaborator

awgu Nov 30, 2022

Could we add a comment explaining what these indices mean? (My understanding is that this is a mapping from FQN from FlatParameter._fqns to the corresponding index in FlatParameter._fqns.)

It also looks like this is not used for this PR (but probably for the next?).

Contributor Author

fegin Nov 30, 2022 •

edited

Loading

Change to param_indices and yes it is used in the next PR.

torch/distributed/fsdp/_optim_utils.py Show resolved Hide resolved

fegin changed the title ~~[FSDP][optim_state_dict][2/N] Add a helper to map from original FQN to flat_param~~ [FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_info to map from original FQN to flat_param


          Update on "[FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_i…

3f67a55

…nfo to map from original FQN to flat_param"


**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.


[ghstack-poisoned]

fegin requested a review from wanchaol as a code owner

November 30, 2022 22:56


          Update on "[FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_i…

23688d1

…nfo to map from original FQN to flat_param"


**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.


[ghstack-poisoned]

This was referenced Dec 1, 2022

[FSDP][optim_state_dict][4/N] Remove the unused _get_flat_param_to_fsdp_module API #89980

Closed

[FSDP][optim_state_dict][5/N] Remove optim_inputs for sharded state_dict. #89981

Closed


          Update on "[FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_i…

ec6526f

…nfo to map from original FQN to flat_param"


**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.


[ghstack-poisoned]

awgu approved these changes

View reviewed changes

Collaborator

awgu left a comment

LGTM! I left some nits.

torch/distributed/fsdp/_optim_utils.py Outdated

    
              def _get_fqn_to_fsdp_param_info(model: nn.Module) -> Dict[str, FSDPParamInfo]:

                  """

                  Construct the maaping from a param's fqn to its corresponding FSDPParamInfo

Collaborator

awgu Dec 7, 2022

nit:

Suggested change

      
                Construct the maaping from a param's fqn to its corresponding FSDPParamInfo
          
                Construct the mapping from a param's fqn to its corresponding FSDPParamInfo

torch/distributed/fsdp/_optim_utils.py Outdated

    
                  """

                  Construct the maaping from a param's fqn to its corresponding FSDPParamInfo

                  if the param is managed by FSDP. FlatParam only stores the first FQN of a

                  shared parameter. So the keys in the mapping are guranteed to map to unique

Collaborator

awgu Dec 7, 2022

nit:

Suggested change

      
                shared parameter. So the keys in the mapping are guranteed to map to unique
          
                shared parameter. So the keys in the mapping are guaranteed to map to unique

torch/distributed/fsdp/_optim_utils.py

    
                  shared parameter. So the keys in the mapping are guranteed to map to unique

                  parameters.

                  """

                  def module_fn(module, prefix, fqn_to_param_info):

Collaborator

awgu Dec 7, 2022

Should we add a comment saying we need to use _apply_to_modules to get the global FQN (since the saved FQNs are like local FQNs, not necessarily prefixed from the global root module)?

awgu reviewed

View reviewed changes

torch/distributed/fsdp/_optim_utils.py Outdated

    
              def _get_fqn_to_fsdp_param_info(model: nn.Module) -> Dict[str, FSDPParamInfo]:

                  """

                  Construct the maaping from a param's fqn to its corresponding FSDPParamInfo

                  if the param is managed by FSDP. FlatParam only stores the first FQN of a

Collaborator

awgu Dec 7, 2022 •

edited

Loading

nit: I got confused at first, but I think I understand. Let me know if this is actually the wrong understanding.

Suggested change

      
                if the param is managed by FSDP. FlatParam only stores the first FQN of a
          
                if the param is managed by FSDP. FlatParameter._fqns only stores the first FQN of a

(add backticks if you want, but maybe not to be consistent with rest of the comment)

Contributor Author

fegin Dec 7, 2022

This is correct, thanks!


          Update on "[FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_i…

97ff650

…nfo to map from original FQN to flat_param"


**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.


[ghstack-poisoned]

Contributor Author

fegin commented Dec 7, 2022

@pytorchbot merge -f "The failing test is not related."

Collaborator

pytorchmergebot commented Dec 7, 2022

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

44779d9

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request


          [FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_info to map …

470cb1c

…from original FQN to flat_param (pytorch#89899)

**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.

Pull Request resolved: pytorch#89899
Approved by: https://github.com/awgu

facebook-github-bot deleted the gh/fegin/48/head branch

June 8, 2023 17:16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

awgu awgu approved these changes

mrshenli Awaiting requested review from mrshenli

zhaojuanmao Awaiting requested review from zhaojuanmao

pritamdamania87 Awaiting requested review from pritamdamania87

rohan-varma Awaiting requested review from rohan-varma

H-Huang Awaiting requested review from H-Huang

kwen2501 Awaiting requested review from kwen2501

wanchaol Awaiting requested review from wanchaol

Labels

Merged release notes: distributed (fsdp)