[dtensor][8/N] switch DeviceMesh to use numpy array for devices #92931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

wanchaol wants to merge 13 commits into gh/wanchaol/251/base from gh/wanchaol/251/head

Collaborator

wanchaol commented Jan 24, 2023 •

edited

Loading

Stack from ghstack (oldest at bottom):

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

Differential Revision: D42876247


          [dtensor][7/N] switch DeviceMesh to use numpy array for devices

158c45b

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

wanchaol requested review from H-Huang, awgu, kwen2501, mrshenli, rohan-varma and zhaojuanmao as code owners

January 24, 2023 23:31

This was referenced Jan 24, 2023

[dtensor][4/N] refactor dispatching logic and add propagator #90733

Closed

[dtensor][5/N] add cached propagator for TP #90734

Closed

[dtensor][6/N] change to a better/safer op registration #90735

Closed

add numpy typing plugin to mypy config #92930

Closed

pytorch-bot bot commented Jan 24, 2023 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92931

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 Failures, 4 Pending

As of commit 9e7eb45:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wanchaol mentioned this pull request

[WIP] prototype: op decomposition #92126

Closed

wanchaol requested review from XilunWu, aazzolini and fduwjj

January 24, 2023 23:33

wanchaol added the release notes: distributed (dtensor) label

aazzolini approved these changes

View reviewed changes

Contributor

aazzolini left a comment

Thanks! One suggestion for unit testing would be to create a DeviceMesh in FakeMode to reproduce the issue that I had! Or maybe create a DeviceMesh inside of a simple function and make_fx on it?

torch/distributed/_tensor/device_mesh.py Show resolved Hide resolved


          Update on "[dtensor][7/N] switch DeviceMesh to use numpy array for de…

308bd3f

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

wanchaol mentioned this pull request

[dtensor][7/N] remove backend in with_comms #93040

Closed


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

b2ecc10

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

wanchaol changed the title ~~[dtensor][7/N] switch DeviceMesh to use numpy array for devices~~ [dtensor][8/N] switch DeviceMesh to use numpy array for devices

Collaborator Author

wanchaol commented Jan 26, 2023

Thanks! One suggestion for unit testing would be to create a DeviceMesh in FakeMode to reproduce the issue that I had! Or maybe create a DeviceMesh inside of a simple function and make_fx on it?

Thanks @aazzolini! Just added a test with the FakeTensorMode, it works as expected. However if we want to fully avoid this issue, we might want to stop supporting DeviceMesh taking tensor as input for the mesh field, and use either numpy array or n-d list instead. Let me know if you want to go with that direction :)


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

2eb24f7

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update base for Update on "add numpy typing plugin to mypy config"

b136461

This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations

Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
```python
with FakeTensorMode():
    device_mesh = DeviceMesh("cuda", torch.arange(4))
```
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead.


[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update on "add numpy typing plugin to mypy config"

162b870

This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations

Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
```python
with FakeTensorMode():
    device_mesh = DeviceMesh("cuda", torch.arange(4))
```
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead.


[ghstack-poisoned]


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

41a1472

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update base for Update on "add numpy typing plugin to mypy config"

fb47b3f

This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations

Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
```python
with FakeTensorMode():
    device_mesh = DeviceMesh("cuda", torch.arange(4))
```
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead.


[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update on "add numpy typing plugin to mypy config"

90aff87

This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations

Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
```python
with FakeTensorMode():
    device_mesh = DeviceMesh("cuda", torch.arange(4))
```
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead.


[ghstack-poisoned]

Contributor

XilunWu commented Jan 26, 2023

Thanks! One suggestion for unit testing would be to create a DeviceMesh in FakeMode to reproduce the issue that I had! Or maybe create a DeviceMesh inside of a simple function and make_fx on it?

Thanks @aazzolini! Just added a test with the FakeTensorMode, it works as expected. However if we want to fully avoid this issue, we might want to stop supporting DeviceMesh taking tensor as input for the mesh field, and use either numpy array or n-d list instead. Let me know if you want to go with that direction :)

Would allowing passing Tensor as argument but converting it to numpy array or n-d list in DeviceMesh._init_ avoid the tracing issue?


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

81ac010

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update base for Update on "add numpy typing plugin to mypy config"

ee2e8ab

This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations

Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
```python
with FakeTensorMode():
    device_mesh = DeviceMesh("cuda", torch.arange(4))
```
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead.


[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update on "add numpy typing plugin to mypy config"

cbbaba4

This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations

Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
```python
with FakeTensorMode():
    device_mesh = DeviceMesh("cuda", torch.arange(4))
```
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead.


[ghstack-poisoned]

wanchaol added 2 commits

January 30, 2023 16:28


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

91e071a

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

31e9ebe

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request


          add numpy typing plugin to mypy config (#92930)

5f1ac18

This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations

Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
```python
with FakeTensorMode():
    device_mesh = DeviceMesh("cuda", torch.arange(4))
```
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead.

Pull Request resolved: #92930
Approved by: https://github.com/ezyang, https://github.com/malfet


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

37c9af1

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

[ghstack-poisoned]

Collaborator Author

wanchaol commented Jan 31, 2023

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

249e355

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

Differential Revision: [D42876247](https://our.internmc.facebook.com/intern/diff/D42876247)

[ghstack-poisoned]

Collaborator Author

wanchaol commented Jan 31, 2023

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

wanchaol added 3 commits

January 31, 2023 08:26


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

09f2909

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

Differential Revision: [D42876247](https://our.internmc.facebook.com/intern/diff/D42876247)

[ghstack-poisoned]


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

Differential Revision: [D42876247](https://our.internmc.facebook.com/intern/diff/D42876247)

[ghstack-poisoned]


          Update on "[dtensor][8/N] switch DeviceMesh to use numpy array for de…

9e7eb45

…vices"

Switching from torch.Tensor to numpy array to avoid possible
interactions with tracing subsystems

Differential Revision: [D42876247](https://our.internmc.facebook.com/intern/diff/D42876247)

[ghstack-poisoned]

wanchaol mentioned this pull request

[dtensor] remove typing hack of DeviceMesh #94526

Closed

wanchaol added a commit that referenced this pull request


          [dtensor] remove typing hack of DeviceMesh

945b50c

This removes the typing hack, part of #92931

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update base for Update on "[dtensor] remove typing hack of DeviceMesh"

c74a1de

This removes the typing hack, part of #92931

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update on "[dtensor] remove typing hack of DeviceMesh"

bb41f41

This removes the typing hack, part of #92931

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          [dtensor] remove typing hack of DeviceMesh

This removes the typing hack, part of #92931

ghstack-source-id: fdb52c7
Pull Request resolved: #94526

wanchaol added a commit that referenced this pull request


          [dtensor] remove typing hack of DeviceMesh

49b4476

This removes the typing hack, part of #92931

ghstack-source-id: e5890ae
Pull Request resolved: #94526

wanchaol added a commit that referenced this pull request


          Update base for Update on "[dtensor] remove typing hack of DeviceMesh"

5610cb0

This removes the typing hack, part of #92931

[ghstack-poisoned]

wanchaol added a commit that referenced this pull request


          Update on "[dtensor] remove typing hack of DeviceMesh"

2e410ef

This removes the typing hack, part of #92931

[ghstack-poisoned]

wanchaol closed this

pytorchmergebot pushed a commit that referenced this pull request


          [dtensor] remove typing hack of DeviceMesh (#94526)

70b063d

This removes the typing hack, part of #92931
Pull Request resolved: #94526
Approved by: https://github.com/XilunWu

facebook-github-bot deleted the gh/wanchaol/251/head branch

June 8, 2023 19:08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

mrshenli Awaiting requested review from mrshenli

zhaojuanmao Awaiting requested review from zhaojuanmao

rohan-varma Awaiting requested review from rohan-varma

H-Huang Awaiting requested review from H-Huang

awgu Awaiting requested review from awgu

kwen2501 Awaiting requested review from kwen2501

XilunWu Awaiting requested review from XilunWu

fduwjj Awaiting requested review from fduwjj

1 more reviewer

aazzolini aazzolini approved these changes

Labels

release notes: distributed (dtensor)