-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add lazy tensor python bindings #74508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 733cccf (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
9f0b012 to
bee135a
Compare
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
bee135a to
ba052fe
Compare
test/lazy/test_ts_opinfo.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Krovatkin I assume this line is not important. (i also assume you wrote it, maybe @desertfire did?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you can remove it. When core folks fixed convolution to flow through one op, @desertfire removed the graph checks since convolutions become just another op
torch/csrc/lazy/python/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we land these bindings, i'm wondering if we should try to simplify by not having helper functions like 'StepMarker'.
Taking the full view of the mark step API flow, it looks like
1 torch._lazy.mark_step (python code in torch/_lazy/init.py)
2 -> torch._C._lazy._mark_step (binding code in lazy/python/init.cpp)
3 -> StepMarker (local helper function in lazy/python/init.cpp)
4 -> finally calls torch::lazy::blah APIs
(2) can always be squished into (3) but that can make gdb debugging more annoying. But I also wonder if we can simplify to only have 3 layers, and generally flush stuff that would be in (2) either up into python in (1) or down into the core APIs (4).
Thoughts? @Krovatkin @alanwaketan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just Google's coding style to prefer wrapping a pile of code into a function just for readability not reusability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way.
(2) can always be squished into (3) but that can make gdb debugging more annoying.
If users don't really need a python binding for 2), I'd suggest we go for it.
torch/_lazy/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JackCaoG here's a great example to discuss. How do you envision the mark_step API working when torch-xla is part of torch::lazy? Do you envision your users would ultimately import torch._lazy.mark_step, once you get past the stages of migration?
note: I did not intend to stop supporting _run_step_closures() but I might consider landing without it until we actually add some test coverage / usage scenarios of it.
Generally, my approach would be to only land the minimum bindings/apis we know we need and then keep adding them, rather than just land a bunch up front..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my vision torch._lazy.mark_step and xm.mark_step will do the same thing after we finish the Lazytensor Class migration. They will all trigger a Synctensor. We will slowly redirect user to use lazy version of apis and update our own tutorial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see one small difference here. lazy:0 or in our case xla:0 is not always the default device. In our TPU training case xla:0 is a CPU and xla:1 is a TPU. I think we discussed this before and you guys don't want to have a GetDefaultDevice or GetCurrentDevice API. Is this still true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember the decision is to add the GetDefaultDevice API? It's just that we never find time to do that. Maybe we should add a comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually do have a 'GetDefaultDevice' API for the backend, but we assume it is static after initialization time. Maybe what I didn't want to do was have 'set/get current device' and have some state changing.
I was also thinking about how to improve our configuration flags and backend init APIs. If we provide a good way to configure the backend configuration (python binding and then ENV override for each flag) could you use one of those flags to set default device type? Or do you need something more than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the dummy stub we have in the staging branch:
| m.def("_ltc_get_default_device", []() { |
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
ba052fe to
bd7c3f6
Compare
|
I will take a look today |
torch/_lazy/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my vision torch._lazy.mark_step and xm.mark_step will do the same thing after we finish the Lazytensor Class migration. They will all trigger a Synctensor. We will slowly redirect user to use lazy version of apis and update our own tutorial.
torch/_lazy/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see one small difference here. lazy:0 or in our case xla:0 is not always the default device. In our TPU training case xla:0 is a CPU and xla:1 is a TPU. I think we discussed this before and you guys don't want to have a GetDefaultDevice or GetCurrentDevice API. Is this still true?
torch/_lazy/metrics.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will LTC provide a way for backend to register its counter and metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hadn't thought about this.
Right now we treat the counters' pybindings as just 'getters' (with the exception of reset). We don't actually have any API for 'configuring' the counters, bc I believe they are just configured by static macros.
I am not sure if this system will work once we combine some in-tree and external-backend code, or we will have to adapt it to 'merge' the counters from external backend together with the ones from torch::lazy:: code.
bd7c3f6 to
e640140
Compare
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
Krovatkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
test/lazy/test_ts_opinfo.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you can remove it. When core folks fixed convolution to flow through one op, @desertfire removed the graph checks since convolutions become just another op
torch/csrc/lazy/python/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way.
(2) can always be squished into (3) but that can make gdb debugging more annoying.
If users don't really need a python binding for 2), I'd suggest we go for it.
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
e640140 to
433f8b3
Compare
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
4ff1dd3 to
1b2954f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
1b2954f to
6114a7f
Compare
65043b9 to
779ea3b
Compare
|
@wconstab has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
779ea3b to
3399e61
Compare
|
@wconstab has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
3399e61 to
eaef63f
Compare
|
@wconstab has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
torch/csrc/lazy/python/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of making a separate submodule, what are your thoughts on making ts_backend a submodule of _lazy? i.e. _lazy.ts_backend instead of _lazy_ts_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, this might be what you intended all along based on its usage in test_ts_opinfo.py:
https://github.com/pytorch/pytorch/pull/74508/files#diff-222fb8e63ff517f76308939550ca744de90dd6ea5706e2afd3f72db019df9b90R24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea its a good question;
I tried to make it clean from the actual user API side, which is how it looks when called from test_ts_opinfo.py.
generally the c++ bindings are flatter all over pytorch and then structurized at the glue-layer where .py files re-export them in a proper namespace (the stuff happening in torch/_lazy/*.py), so I didn't think it mattered too much.
I might be convinced to change it though. Wdyt @Krovatkin ? One annoying thing is I would have to then define a 'submodule' correctly in the .pyi file that provides the mypy formatting, but i'm sure that's not too hard
torch/_lazy/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the plan to expose all LTC python functions via torch._lazy? doesn't the underscore usually imply "private" and user shouldn't call? or is it eventually going to something like torch.lazy and the API exposed more similarly to how its currently done on the staging branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, torch._lazy signifies it is not a stable API, that's our convention. It can't be torch.lazy until we (as a team) make a stronger commitment to long term support, etc.
eaef63f to
3973f94
Compare
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
Summary: This adds a minimal set of python bindings for lazy tensor and the torchscript backend. It targets the APIs that are used by the `test_ts_opinfo.py` test (which it also lands, from lazy_tensor_staging, where it is [lazy_tensor_core/test/test_lazy.py](https://github.com/pytorch/pytorch/blob/lazy_tensor_staging/lazy_tensor_core/test/test_lazy.py)). We should land more python bindings obviously. I just wanted to focus on a minimal set that can also be tested, and use it to agree on how we organize the bindings, then others could easily contribute bindings on top of this infrastructure. cc JackCaoG Pull Request resolved: pytorch#74508 Differential Revision: D35032152 Pulled By: wconstab fbshipit-source-id: 0b6ff9b3a82cd63801dac3d267c1909bbf1f4de4
|
This pull request was exported from Phabricator. Differential Revision: D35032152 |
3973f94 to
733cccf
Compare
Summary: This adds a minimal set of python bindings for lazy tensor and the torchscript backend. It targets the APIs that are used by the `test_ts_opinfo.py` test (which it also lands, from lazy_tensor_staging, where it is [lazy_tensor_core/test/test_lazy.py](https://github.com/pytorch/pytorch/blob/lazy_tensor_staging/lazy_tensor_core/test/test_lazy.py)). We should land more python bindings obviously. I just wanted to focus on a minimal set that can also be tested, and use it to agree on how we organize the bindings, then others could easily contribute bindings on top of this infrastructure. cc JackCaoG Pull Request resolved: #74508 Reviewed By: pbelevich Differential Revision: D35032152 Pulled By: wconstab fbshipit-source-id: 526505ab355b7ad27037ece0ff814b2a4b69f1e2
|
Hey @wconstab. |
Differential Revision: D35032152
This adds a minimal set of python bindings for lazy tensor and the torchscript backend.
It targets the APIs that are used by the
test_ts_opinfo.pytest (which it also lands, from lazy_tensor_staging, where it is lazy_tensor_core/test/test_lazy.py).We should land more python bindings obviously. I just wanted to focus on a minimal set that can also be tested, and use it to agree on how we organize the bindings, then others could easily contribute bindings on top of this infrastructure.
cc @JackCaoG