-
Notifications
You must be signed in to change notification settings - Fork 26.3k
add numpy typing plugin to mypy config #92930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This added the numpy typing plugin to mypy config so that we could use it for DeviceMesh typing annotations [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92930
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 FailuresAs of commit cbbaba4: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure? Would be nice to know what this does
This added the numpy typing plugin to mypy config so that we could use it for DeviceMesh typing annotations [ghstack-poisoned]
This added the numpy typing plugin to mypy config so that we could use it for DeviceMesh typing annotations [ghstack-poisoned]
Yeah thanks! Just updated the comment of the PR to include more concrete details. tl.dr is that we want DeviceMesh's mesh tensor field to avoid the interaction with our tensor subclass systems like FakeTensorMode. |
|
Saving device mesh as a numpy array... feels wrong. Tell me about why it wants to be a tensor/ndarray in the first place? |
Sure, the underlying rationale is that we want to use DeviceMesh to represent the cluster topology, where we could setup communicators base on the topology, it uses the mesh concept similar to other frameworks where it could possibly structure the devices into a n-d array, for example, we have two hosts and each host have four devices, we would like to save it as sth like This represents the topology of the two hosts, and we want to know the shape/size on each dim of this n-d array easily, hence either torch.Tensor or numpy.array as the storage of the mesh works for this case. We were using torch.Tensor but later we realized it interacts with fake tensor or tracing (where we don't need to). So I would like to switch to use numpy array to avoid the subsystem interactions. |
This added the numpy typing plugin to mypy config so that we could use it for DeviceMesh typing annotations Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like: ```python with FakeTensorMode(): device_mesh = DeviceMesh("cuda", torch.arange(4)) ``` It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead. [ghstack-poisoned]
|
cc @eellison But I feel like I would rather make a tensor subclass that denotes "don't fakify this ever"... |
That works, and you can also do whatever non-module bookkeeping under a disable_current_modes context. This pattern is going to come up again, it does feel a bit weird to recommend numpy. |
|
I anti-recommend disable_current_modes. The subclass should be handled by FakeTensor and it shouldn't be necessary for the user to manually disable the mode. |
Why ? In general I think you'd prefer changes that don't require intrusive changes to other core classes. That also won't compose with proxy tensor tracing if they're doing deffered init or something. API wise it might be easier to not have to re-call |
This added the numpy typing plugin to mypy config so that we could use it for DeviceMesh typing annotations Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like: ```python with FakeTensorMode(): device_mesh = DeviceMesh("cuda", torch.arange(4)) ``` It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead. [ghstack-poisoned]
|
@ezyang @eellison for the tensor subclass approach or the disable mode approach, is it possible for user to do sth like the comment in the summary: with FakeTensorMode():
device_mesh = DeviceMesh("cuda", torch.arange(4))i.e. user create a plain tensor pass to FakeTensorMode and it will not be fakify by the mode? |
|
No, you would have to provide special factory functions or have the user turn on a separate mode that is "these are constants, don't fakeify them" |
malfet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, don't you need to add it to https://github.com/pytorch/pytorch/blob/master/.circleci/docker/requirements-ci.txt and
Line 134 in d9f0d14
| 'python3', |
@ezyang I think that would be not ideal for the user, user need to use some special factory method and it might not be an intuitive UX. Another option maybe we can just make DeviceMesh a subclass of Tensor? would that make fake tensor and tracing happy? edit: actually I think directly make DeviceMesh a tensor subclass also seems not gonna work, as we still need to pass the torch.Tensor to that tensor subclass. |
@malfet it looks like lintrunner already have numpy in the |
Sorry, I misunderstood where the plugin should me coming from. And CI preinstalls numpy, but from anaconda rather than PyPI, this is why it's commented out there |
Is this not what you have done if you make people allocate numpy arrays?
Let's rewind a sec actually. In normal eager mode, there is no problem with fake tensor mode because people don't write code under fake tensor mode. So you can use regular tensor for mesh, no problem. So the problem is when you trace through the model. If we can arrange the tracing process to not fakeify tensor computation that goes to the mesh, that's what we want.
Another possibility is to add a 'mesh' or 'const' device type, which is always small and OK to always materialize... |
Yeah this make sense, let me think about how to make a custom factory method that provide a similar UX compare to numpy and Tensor. Regardless of the approach we are going to take, the numpy ndarray typing is still valuable as we want to accept both a n-d list directly (
Yep agreed that it's mostly tracing right now have issues. But I think most likely even eager mode would want the FakeTensorMode once FakeTensorMode supports deferred_init like torchdistx, as we would still want to lazy materialize the model (after tensor parallel), when the model is too large.
Do you mean add a device type to core like |
This added the numpy typing plugin to mypy config so that we could use it for DeviceMesh typing annotations Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like: ```python with FakeTensorMode(): device_mesh = DeviceMesh("cuda", torch.arange(4)) ``` It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving `mesh` as numpy array instead. [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "failure not related" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "causing test_doc_examples (main.TestTypeHints) to fail https://github.com/pytorch/pytorch/actions/runs/4049393005/jobs/6965869223 https://hud.pytorch.org/pytorch/pytorch/commit/5f1ac188f8dd01a81d0ddeebdbc4d22e25311b72, note for revert review: PR was forced merged after first failure, which was flaky" -c ignoredsignal |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@wanchaol your PR has been successfully reverted. |
This reverts commit 5f1ac18. Reverted #92930 on behalf of https://github.com/clee2000 due to causing test_doc_examples (main.TestTypeHints) to fail https://github.com/pytorch/pytorch/actions/runs/4049393005/jobs/6965869223 https://hud.pytorch.org/pytorch/pytorch/commit/5f1ac188f8dd01a81d0ddeebdbc4d22e25311b72, note for revert review: PR was forced merged after first failure, which was flaky
…onfig" reland of #92930 [ghstack-poisoned]
reland of #92930 [ghstack-poisoned]
…onfig" reland of #92930 [ghstack-poisoned]
reland of #92930 [ghstack-poisoned]
reland of #92930 Pull Request resolved: #94525 Approved by: https://github.com/huydhn
Stack from ghstack (oldest at bottom):
This added the numpy typing plugin to mypy config so that we could
use it for DeviceMesh typing annotations
Please see #92931 about why we need this. For example, we are currently saving the DeviceMesh's mesh field as torch.Tensor, where when we do sth like:
It would throw error because FakeTensorMode or any TorchDispatchMode tracks every tensor creation and interactions. While DeviceMesh just want to save a nd-array to record the mesh topology, and would like to avoid the interaction with subsystems like FakeTensor, so we want to support saving
meshas numpy array instead.