Skip to content

Conversation

@awgu
Copy link
Collaborator

@awgu awgu commented Feb 6, 2023

Stack from ghstack:

Overview

  • This PR implements a utility function get_shared_param_info_to_lca() that returns a Dict[SharedParamInfo, nn.Module] mapping SharedParamInfo (representing a shared parameter) to its lowest common ancestor (LCA) module.
  • This function can be used as a subroutine for assigning shared parameters to their LCA modules during FSDP initialization (for the composable code path in the short term).

Details
The implementation follows a simple version of Tarjan's offline LCA algorithm that is based on a union-find data structure. We can use this algorithm because the set of LCA queries is fixed a priori (i.e. this is offline).

Each module represents a vertex in the module tree, where there is a directed edge from parent module to child module (i.e. p is a parent of c if c is returned from p.children()). The LCA module lca of two modules a and b is the lowest (i.e. greatest depth) module that includes both a and b in its subtree.

For the unit test, here is a visualization of the module tree:
tree

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94197

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a0a4cb8:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Feb 6, 2023
awgu pushed a commit to awgu/pytorch that referenced this pull request Feb 6, 2023
@awgu awgu marked this pull request as ready for review February 6, 2023 21:37
@awgu awgu added the topic: not user facing topic category label Feb 6, 2023
**Overview**
- This PR implements a utility function `get_shared_param_info_to_lca()` that returns a `Dict[SharedParamInfo, nn.Module]` mapping `SharedParamInfo` (representing a shared parameter) to its lowest common ancestor (LCA) module.
- This function can be used as a subroutine for assigning shared parameters to their LCA modules during FSDP initialization (for the composable code path in the short term).

**Details**
The implementation follows a simple version of [Tarjan's offline LCA algorithm](https://en.wikipedia.org/wiki/Tarjan%27s_off-line_lowest_common_ancestors_algorithm) that is based on a union-find data structure. We can use this algorithm because the set of LCA queries is fixed a priori (i.e. this is offline).

Each module represents a vertex in the module tree, where there is a directed edge from parent module to child module (i.e. `p` is a parent of `c` if `c` is returned from `p.children()`). The LCA module `lca` of two modules `a` and `b` is the lowest (i.e. greatest depth) module that includes both `a` and `b` in its subtree.


For the unit test, here is a visualization of the module tree:
![tree](https://user-images.githubusercontent.com/31054793/202576843-688694dc-ccbd-4d98-9cf8-b82d51d05e8e.png)

[ghstack-poisoned]
awgu pushed a commit to awgu/pytorch that referenced this pull request Feb 13, 2023
@awgu awgu requested a review from fegin as a code owner February 13, 2023 17:15
awgu pushed a commit to awgu/pytorch that referenced this pull request Apr 3, 2023
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Apr 14, 2023
@github-actions github-actions bot closed this May 14, 2023
@facebook-github-bot facebook-github-bot deleted the gh/awgu/323/head branch June 14, 2023 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants