[FSDP][2/N] Add util for computing shared param LCA #94197

awgu · 2023-02-06T17:48:20Z

Stack from ghstack:

[FSDP][3/N] Add LCA logic to fully_shard #94198 [FSDP][3/N] Add LCA logic to fully_shard
[FSDP][2/N] Add util for computing shared param LCA #94197 [FSDP][2/N] Add util for computing shared param LCA
[FSDP][1/N] Refactor module materialization #94196 [FSDP][1/N] Refactor module materialization

Overview

This PR implements a utility function get_shared_param_info_to_lca() that returns a Dict[SharedParamInfo, nn.Module] mapping SharedParamInfo (representing a shared parameter) to its lowest common ancestor (LCA) module.
This function can be used as a subroutine for assigning shared parameters to their LCA modules during FSDP initialization (for the composable code path in the short term).

Details
The implementation follows a simple version of Tarjan's offline LCA algorithm that is based on a union-find data structure. We can use this algorithm because the set of LCA queries is fixed a priori (i.e. this is offline).

Each module represents a vertex in the module tree, where there is a directed edge from parent module to child module (i.e. p is a parent of c if c is returned from p.children()). The LCA module lca of two modules a and b is the lowest (i.e. greatest depth) module that includes both a and b in its subtree.

For the unit test, here is a visualization of the module tree:

[ghstack-poisoned]

pytorch-bot · 2023-02-06T17:48:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94197

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a0a4cb8:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: d63f1ef Pull Request resolved: pytorch#94197

**Overview** - This PR implements a utility function `get_shared_param_info_to_lca()` that returns a `Dict[SharedParamInfo, nn.Module]` mapping `SharedParamInfo` (representing a shared parameter) to its lowest common ancestor (LCA) module. - This function can be used as a subroutine for assigning shared parameters to their LCA modules during FSDP initialization (for the composable code path in the short term). **Details** The implementation follows a simple version of [Tarjan's offline LCA algorithm](https://en.wikipedia.org/wiki/Tarjan%27s_off-line_lowest_common_ancestors_algorithm) that is based on a union-find data structure. We can use this algorithm because the set of LCA queries is fixed a priori (i.e. this is offline). Each module represents a vertex in the module tree, where there is a directed edge from parent module to child module (i.e. `p` is a parent of `c` if `c` is returned from `p.children()`). The LCA module `lca` of two modules `a` and `b` is the lowest (i.e. greatest depth) module that includes both `a` and `b` in its subtree. For the unit test, here is a visualization of the module tree: ![tree](https://user-images.githubusercontent.com/31054793/202576843-688694dc-ccbd-4d98-9cf8-b82d51d05e8e.png) [ghstack-poisoned]

ghstack-source-id: a3e61ab Pull Request resolved: pytorch#94197

github-actions · 2023-04-14T20:33:54Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

[FSDP][2/N] Add util for computing shared param LCA

c2b25dc

[ghstack-poisoned]

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Feb 6, 2023

This was referenced Feb 6, 2023

[FSDP][1/N] Refactor module materialization #94196

Closed

[FSDP][3/N] Add LCA logic to fully_shard #94198

Closed

awgu pushed a commit to awgu/pytorch that referenced this pull request Feb 6, 2023

[FSDP][2/N] Add util for computing shared param LCA

2d51260

ghstack-source-id: d63f1ef Pull Request resolved: pytorch#94197

awgu marked this pull request as ready for review February 6, 2023 21:37

awgu requested review from H-Huang, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao as code owners February 6, 2023 21:37

awgu added the topic: not user facing topic category label Feb 6, 2023

awgu mentioned this pull request Feb 13, 2023

Revert "[follow-up] Python Attr Serialization (#88913)" #94741

Closed

awgu pushed a commit to awgu/pytorch that referenced this pull request Feb 13, 2023

[FSDP][2/N] Add util for computing shared param LCA

fdf5eb3

ghstack-source-id: a3e61ab Pull Request resolved: pytorch#94197

awgu requested a review from fegin as a code owner February 13, 2023 17:15

awgu pushed a commit to awgu/pytorch that referenced this pull request Apr 3, 2023

[FSDP][2/N] Add util for computing shared param LCA

95d5d0d

ghstack-source-id: a3e61ab Pull Request resolved: pytorch#94197

github-actions bot added the Stale label Apr 14, 2023

github-actions bot closed this May 14, 2023

facebook-github-bot deleted the gh/awgu/323/head branch June 14, 2023 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP][2/N] Add util for computing shared param LCA #94197

[FSDP][2/N] Add util for computing shared param LCA #94197

Uh oh!

awgu commented Feb 6, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 6, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Apr 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FSDP][2/N] Add util for computing shared param LCA #94197

[FSDP][2/N] Add util for computing shared param LCA #94197

Uh oh!

Conversation

awgu commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94197

✅ No Failures

Uh oh!

github-actions bot commented Apr 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awgu commented Feb 6, 2023 •

edited

Loading

pytorch-bot bot commented Feb 6, 2023 •

edited

Loading