Add lazy tensor python bindings #74508

wconstab · 2022-03-21T23:49:47Z

Differential Revision: D35032152

This adds a minimal set of python bindings for lazy tensor and the torchscript backend.

It targets the APIs that are used by the test_ts_opinfo.py test (which it also lands, from lazy_tensor_staging, where it is lazy_tensor_core/test/test_lazy.py).

We should land more python bindings obviously. I just wanted to focus on a minimal set that can also be tested, and use it to agree on how we organize the bindings, then others could easily contribute bindings on top of this infrastructure.

cc @JackCaoG

facebook-github-bot · 2022-03-21T23:49:55Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74508
Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 733cccf (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-03-21T23:50:20Z

This pull request was exported from Phabricator. Differential Revision: D35032152

torch/csrc/lazy/python/init.cpp

facebook-github-bot · 2022-03-22T17:32:07Z

This pull request was exported from Phabricator. Differential Revision: D35032152

facebook-github-bot · 2022-03-22T23:35:17Z

This pull request was exported from Phabricator. Differential Revision: D35032152

test/lazy/test_ts_opinfo.py

wconstab · 2022-03-22T23:37:19Z

test/lazy/test_ts_opinfo.py

@Krovatkin I assume this line is not important. (i also assume you wrote it, maybe @desertfire did?)

yes you can remove it. When core folks fixed convolution to flow through one op, @desertfire removed the graph checks since convolutions become just another op

wconstab · 2022-03-22T23:41:22Z

torch/csrc/lazy/python/init.cpp

As we land these bindings, i'm wondering if we should try to simplify by not having helper functions like 'StepMarker'.

Taking the full view of the mark step API flow, it looks like

1 torch._lazy.mark_step (python code in torch/_lazy/init.py)
2 -> torch._C._lazy._mark_step (binding code in lazy/python/init.cpp)
3 -> StepMarker (local helper function in lazy/python/init.cpp)
4 -> finally calls torch::lazy::blah APIs

(2) can always be squished into (3) but that can make gdb debugging more annoying. But I also wonder if we can simplify to only have 3 layers, and generally flush stuff that would be in (2) either up into python in (1) or down into the core APIs (4).

Thoughts? @Krovatkin @alanwaketan

I think this is just Google's coding style to prefer wrapping a pile of code into a function just for readability not reusability.

I'm fine either way.

(2) can always be squished into (3) but that can make gdb debugging more annoying.

If users don't really need a python binding for 2), I'd suggest we go for it.

wconstab · 2022-03-22T23:44:10Z

torch/_lazy/__init__.py

@JackCaoG here's a great example to discuss. How do you envision the mark_step API working when torch-xla is part of torch::lazy? Do you envision your users would ultimately import torch._lazy.mark_step, once you get past the stages of migration?

note: I did not intend to stop supporting _run_step_closures() but I might consider landing without it until we actually add some test coverage / usage scenarios of it.

Generally, my approach would be to only land the minimum bindings/apis we know we need and then keep adding them, rather than just land a bunch up front..

In my vision torch._lazy.mark_step and xm.mark_step will do the same thing after we finish the Lazytensor Class migration. They will all trigger a Synctensor. We will slowly redirect user to use lazy version of apis and update our own tutorial.

I see one small difference here. lazy:0 or in our case xla:0 is not always the default device. In our TPU training case xla:0 is a CPU and xla:1 is a TPU. I think we discussed this before and you guys don't want to have a GetDefaultDevice or GetCurrentDevice API. Is this still true?

I remember the decision is to add the GetDefaultDevice API? It's just that we never find time to do that. Maybe we should add a comment above.

I think we actually do have a 'GetDefaultDevice' API for the backend, but we assume it is static after initialization time. Maybe what I didn't want to do was have 'set/get current device' and have some state changing.

I was also thinking about how to improve our configuration flags and backend init APIs. If we provide a good way to configure the backend configuration (python binding and then ENV override for each flag) could you use one of those flags to set default device type? Or do you need something more than that?

This is the dummy stub we have in the staging branch:

pytorch/lazy_tensor_core/lazy_tensor_core/csrc/init_python_bindings.cpp

Line 587 in 8b8c691

m.def("_ltc_get_default_device", []() {

.

facebook-github-bot · 2022-03-23T01:26:06Z

This pull request was exported from Phabricator. Differential Revision: D35032152

JackCaoG · 2022-03-23T17:28:36Z

I will take a look today

JackCaoG · 2022-03-23T18:13:10Z

torch/_lazy/__init__.py

In my vision torch._lazy.mark_step and xm.mark_step will do the same thing after we finish the Lazytensor Class migration. They will all trigger a Synctensor. We will slowly redirect user to use lazy version of apis and update our own tutorial.

JackCaoG · 2022-03-23T18:18:14Z

torch/_lazy/__init__.py

I see one small difference here. lazy:0 or in our case xla:0 is not always the default device. In our TPU training case xla:0 is a CPU and xla:1 is a TPU. I think we discussed this before and you guys don't want to have a GetDefaultDevice or GetCurrentDevice API. Is this still true?

JackCaoG · 2022-03-23T18:20:05Z

torch/_lazy/metrics.py

Will LTC provide a way for backend to register its counter and metrics?

Hadn't thought about this.

Right now we treat the counters' pybindings as just 'getters' (with the exception of reset). We don't actually have any API for 'configuring' the counters, bc I believe they are just configured by static macros.

I am not sure if this system will work once we combine some in-tree and external-backend code, or we will have to adapt it to 'merge' the counters from external backend together with the ones from torch::lazy:: code.

facebook-github-bot · 2022-03-23T21:35:35Z

This pull request was exported from Phabricator. Differential Revision: D35032152

Krovatkin

Krovatkin · 2022-03-24T05:32:20Z

test/lazy/test_ts_opinfo.py

yes you can remove it. When core folks fixed convolution to flow through one op, @desertfire removed the graph checks since convolutions become just another op

test/lazy/test_ts_opinfo.py

torch/_C/__init__.pyi.in

torch/_C/_lazy.pyi

torch/csrc/lazy/python/init.cpp

Krovatkin · 2022-03-24T05:39:28Z

torch/csrc/lazy/python/init.cpp

I'm fine either way.

(2) can always be squished into (3) but that can make gdb debugging more annoying.

If users don't really need a python binding for 2), I'd suggest we go for it.

torch/csrc/lazy/python/init.cpp

facebook-github-bot · 2022-03-24T16:37:01Z

This pull request was exported from Phabricator. Differential Revision: D35032152

facebook-github-bot · 2022-03-24T18:43:03Z

This pull request was exported from Phabricator. Differential Revision: D35032152

facebook-github-bot · 2022-03-24T20:06:38Z

This pull request was exported from Phabricator. Differential Revision: D35032152

facebook-github-bot · 2022-03-25T13:48:33Z

This pull request was exported from Phabricator. Differential Revision: D35032152

facebook-github-bot · 2022-03-26T00:38:37Z

@wconstab has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-03-28T15:46:45Z

@wconstab has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-03-28T16:15:07Z

@wconstab has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

antoniojkim · 2022-03-28T19:30:13Z

torch/csrc/lazy/python/init.cpp

nit: instead of making a separate submodule, what are your thoughts on making ts_backend a submodule of _lazy? i.e. _lazy.ts_backend instead of _lazy_ts_backend

In fact, this might be what you intended all along based on its usage in test_ts_opinfo.py:
https://github.com/pytorch/pytorch/pull/74508/files#diff-222fb8e63ff517f76308939550ca744de90dd6ea5706e2afd3f72db019df9b90R24

yea its a good question;
I tried to make it clean from the actual user API side, which is how it looks when called from test_ts_opinfo.py.

generally the c++ bindings are flatter all over pytorch and then structurized at the glue-layer where .py files re-export them in a proper namespace (the stuff happening in torch/_lazy/*.py), so I didn't think it mattered too much.

I might be convinced to change it though. Wdyt @Krovatkin ? One annoying thing is I would have to then define a 'submodule' correctly in the .pyi file that provides the mypy formatting, but i'm sure that's not too hard

antoniojkim · 2022-03-28T19:33:22Z

torch/_lazy/__init__.py

is the plan to expose all LTC python functions via torch._lazy? doesn't the underscore usually imply "private" and user shouldn't call? or is it eventually going to something like torch.lazy and the API exposed more similarly to how its currently done on the staging branch?

yea, torch._lazy signifies it is not a stable API, that's our convention. It can't be torch.lazy until we (as a team) make a stronger commitment to long term support, etc.

facebook-github-bot · 2022-03-28T22:00:31Z

This pull request was exported from Phabricator. Differential Revision: D35032152

Summary: This adds a minimal set of python bindings for lazy tensor and the torchscript backend. It targets the APIs that are used by the `test_ts_opinfo.py` test (which it also lands, from lazy_tensor_staging, where it is [lazy_tensor_core/test/test_lazy.py](https://github.com/pytorch/pytorch/blob/lazy_tensor_staging/lazy_tensor_core/test/test_lazy.py)). We should land more python bindings obviously. I just wanted to focus on a minimal set that can also be tested, and use it to agree on how we organize the bindings, then others could easily contribute bindings on top of this infrastructure. cc JackCaoG Pull Request resolved: pytorch#74508 Differential Revision: D35032152 Pulled By: wconstab fbshipit-source-id: 0b6ff9b3a82cd63801dac3d267c1909bbf1f4de4

facebook-github-bot · 2022-03-28T22:09:32Z

This pull request was exported from Phabricator. Differential Revision: D35032152

Summary: This adds a minimal set of python bindings for lazy tensor and the torchscript backend. It targets the APIs that are used by the `test_ts_opinfo.py` test (which it also lands, from lazy_tensor_staging, where it is [lazy_tensor_core/test/test_lazy.py](https://github.com/pytorch/pytorch/blob/lazy_tensor_staging/lazy_tensor_core/test/test_lazy.py)). We should land more python bindings obviously. I just wanted to focus on a minimal set that can also be tested, and use it to agree on how we organize the bindings, then others could easily contribute bindings on top of this infrastructure. cc JackCaoG Pull Request resolved: #74508 Reviewed By: pbelevich Differential Revision: D35032152 Pulled By: wconstab fbshipit-source-id: 526505ab355b7ad27037ece0ff814b2a4b69f1e2

github-actions · 2022-03-29T13:40:51Z

Hey @wconstab.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot added the cla signed label Mar 21, 2022

facebook-github-bot added the fb-exported label Mar 21, 2022

alanwaketan reviewed Mar 21, 2022

View reviewed changes

torch/csrc/lazy/python/init.cpp Outdated Show resolved Hide resolved

wconstab force-pushed the export-D35032152 branch from 9f0b012 to bee135a Compare March 22, 2022 17:32

wconstab force-pushed the export-D35032152 branch from bee135a to ba052fe Compare March 22, 2022 23:35

wconstab commented Mar 22, 2022

View reviewed changes

test/lazy/test_ts_opinfo.py Outdated Show resolved Hide resolved

wconstab commented Mar 22, 2022

View reviewed changes

wconstab requested a review from Krovatkin March 22, 2022 23:47

wconstab force-pushed the export-D35032152 branch from ba052fe to bd7c3f6 Compare March 23, 2022 01:26

JackCaoG reviewed Mar 23, 2022

View reviewed changes

wconstab force-pushed the export-D35032152 branch from bd7c3f6 to e640140 Compare March 23, 2022 21:35

Krovatkin approved these changes Mar 24, 2022

View reviewed changes

wconstab force-pushed the export-D35032152 branch from e640140 to 433f8b3 Compare March 24, 2022 16:37

wconstab force-pushed the export-D35032152 branch from 4ff1dd3 to 1b2954f Compare March 25, 2022 13:48

wconstab added the with-ssh label Mar 25, 2022

wconstab force-pushed the export-D35032152 branch from 1b2954f to 6114a7f Compare March 25, 2022 19:08

wconstab force-pushed the export-D35032152 branch from 65043b9 to 779ea3b Compare March 26, 2022 00:37

wconstab force-pushed the export-D35032152 branch from 779ea3b to 3399e61 Compare March 28, 2022 15:46

wconstab force-pushed the export-D35032152 branch from 3399e61 to eaef63f Compare March 28, 2022 16:14

antoniojkim reviewed Mar 28, 2022

View reviewed changes

wconstab force-pushed the export-D35032152 branch from eaef63f to 3973f94 Compare March 28, 2022 22:00

wconstab force-pushed the export-D35032152 branch from 3973f94 to 733cccf Compare March 28, 2022 22:09

pytorchmergebot closed this in ff206ed Mar 29, 2022

wconstab added release notes: lazy release notes category topic: new features topic category labels Mar 29, 2022

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Add lazy tensor python bindings #74508

Add lazy tensor python bindings #74508

Uh oh!

Conversation

wconstab commented Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

facebook-github-bot commented Mar 21, 2022

Uh oh!

Uh oh!

facebook-github-bot commented Mar 22, 2022

Uh oh!

facebook-github-bot commented Mar 22, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 23, 2022

Uh oh!

JackCaoG commented Mar 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 23, 2022

Uh oh!

Krovatkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Mar 24, 2022

Uh oh!

facebook-github-bot commented Mar 24, 2022

Uh oh!

facebook-github-bot commented Mar 24, 2022

Uh oh!

facebook-github-bot commented Mar 25, 2022

Uh oh!

facebook-github-bot commented Mar 26, 2022

Uh oh!

facebook-github-bot commented Mar 28, 2022

Uh oh!

wconstab commented Mar 21, 2022 •

edited

Loading

facebook-github-bot commented Mar 21, 2022 •

edited

Loading

antoniojkim Mar 28, 2022 •

edited

Loading