Skip to content

Conversation

@yhcharles
Copy link
Contributor

Summary:
X-link: meta-pytorch/torchrec#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is

Pull Request resolved: #86348

Differential Revision: D40236769

Pulled By: yhcharles

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 4, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88471

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a47a15c:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40236769

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40236769

Copy link

@gnadathur gnadathur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo, "mechanism"

Comment on lines +278 to +281
Copy link
Member

@H-Huang H-Huang Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these vars only used by class _World? Why not put them under that class. You can define them as attributes, then also you dont need to call global abc each time you access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used in some other projects. So we have to keep them. Let me ping you some examples.

Copy link
Collaborator

@wanchaol wanchaol Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I randomly browsed into this piece of code. I am wondering why we need to keep them? I know it's being used in other projects, but I feel since we make all those variables be private variables, shouldn't that mean we are free to change this to whatever we like without worrying about BC?

If it's internal change related, maybe we can just refactor it directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I have a PR to refactor some of this code since this and the if statements below are basically repeated, #88351. Feel free to land yours first if you are ready though

Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp to unblock, but please address @H-Huang's comments

Summary:
Pull Request resolved: pytorch#88471

X-link: meta-pytorch/torchrec#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch#86348

Reviewed By: gnadathur

Differential Revision: D40236769

Pulled By: yhcharles

fbshipit-source-id: efeaa7990e26a58987769a93cedf7318d5cae445
yhcharles pushed a commit to yhcharles/torchrec that referenced this pull request Nov 4, 2022
Summary:
X-link: pytorch/pytorch#88471

Pull Request resolved: meta-pytorch#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch/pytorch#86348

Reviewed By: gnadathur

Differential Revision: D40236769

Pulled By: yhcharles

fbshipit-source-id: ebd6080e4923da549800a048f089fa0bb69eb331
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40236769

facebook-github-bot pushed a commit to meta-pytorch/torchrec that referenced this pull request Nov 7, 2022
Summary:
X-link: pytorch/pytorch#88471

Pull Request resolved: #781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch/pytorch#86348

Reviewed By: gnadathur

Differential Revision: D40236769

Pulled By: yhcharles

fbshipit-source-id: c6aecff5b0801938713f867827d0d3b4b5c906e6
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

lequytra pushed a commit to lequytra/torchrec that referenced this pull request Dec 6, 2022
Summary:
X-link: pytorch/pytorch#88471

Pull Request resolved: meta-pytorch#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch/pytorch#86348

Reviewed By: gnadathur

Differential Revision: D40236769

Pulled By: yhcharles

fbshipit-source-id: c6aecff5b0801938713f867827d0d3b4b5c906e6
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
pytorch#88471)

Summary:
X-link: meta-pytorch/torchrec#781

Move a bunch of globals to instance methods and replace all use to them.

We move all PG related globals under World and use a singleton instance under _world.

This creates an undocumented extension point to inject full control of how how c10d
state behaves.

One simple hack is to change _world to an implementation that uses a threadlocal
and enable per-thread PGs.

It almost get DDP working and the PG is missing an implementation of all_reduce.

This enables notebook usage of PTD, which is a big deal for learning it:
https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68

This change ensures BC by keeping the global variables around and have the default _World wrap it.

I have relinked this diff to a new github PR, so that I can update it. The original PR is
> Pull Request resolved: pytorch#86348

Differential Revision: D40236769

Pulled By: yhcharles

Pull Request resolved: pytorch#88471
Approved by: https://github.com/gnadathur, https://github.com/rohan-varma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants