Dynamo+LTC: merging related code from staging branch to master #75044

shunting314 · 2022-03-31T22:55:19Z

Merge the code needed for dynamic+ltc integration from the staging branch to the master branch.

Test plan:
Unit test

pytest test_extract_compiled_graph

test thru dynamo

LTC_TS_CUDA=1 time python torchbench.py --speedup-ltc -dcuda --nvfuser --randomize-input --only <model name>

facebook-github-bot · 2022-03-31T22:55:25Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/75044
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
❓Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit f98d4f0 (more details on the Dr. CI page):

3/3 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

Lint / clang-tidy (1/2)

Step: "Check for warnings" (full log | diagnosis details | 🔁 rerun)

2022-03-31T23:03:04.5065072Z /__w/pytorch/pytor...-inefficient-vector-operation,-warnings-as-errors]

2022-03-31T23:03:04.5058299Z                 ivalues.push_back(tsDataPtr->scalar.value());
2022-03-31T23:03:04.5058613Z                         ^~~~~~~~~~
2022-03-31T23:03:04.5058850Z                         emplace_back(
2022-03-31T23:03:04.5059645Z /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:232:15: error: loop variable is copied but only used as const reference; consider making it a const reference [performance-for-range-copy,-warnings-as-errors]
2022-03-31T23:03:04.5060139Z     for (auto arg : graph_inputs) {
2022-03-31T23:03:04.5060315Z               ^
2022-03-31T23:03:04.5062185Z          const  &
2022-03-31T23:03:04.5063304Z /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:233:7: error: 'emplace_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
2022-03-31T23:03:04.5064062Z       stack.emplace_back(arg);
2022-03-31T23:03:04.5064311Z       ^
2022-03-31T23:03:04.5065072Z /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:238:7: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
2022-03-31T23:03:04.5065566Z       result.push_back(elem.toTensor());
2022-03-31T23:03:04.5065762Z       ^
2022-03-31T23:03:04.5065925Z Warnings detected!
2022-03-31T23:03:04.5066089Z Summary:
2022-03-31T23:03:04.5070503Z [performance-inefficient-vector-operation] occurred 3 times
2022-03-31T23:03:04.5071208Z     /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:149
2022-03-31T23:03:04.5071826Z     /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:233
2022-03-31T23:03:04.5072413Z     /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:238
2022-03-31T23:03:04.5072745Z 
2022-03-31T23:03:04.5073052Z [modernize-use-emplace] occurred 2 times

Lint / quick-checks (2/2)

Step: "Ensure all test files have header containing ownership information" (full log | diagnosis details | 🔁 rerun)

2022-03-31T22:56:32.4857170Z python: can't open..._launches.py': [Errno 2] No such file or directory

2022-03-31T22:56:32.4519260Z ##[group]Run set -eux
2022-03-31T22:56:32.4519556Z �[36;1mset -eux�[0m
2022-03-31T22:56:32.4519982Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2022-03-31T22:56:32.4569178Z shell: /bin/bash -e {0}
2022-03-31T22:56:32.4569455Z env:
2022-03-31T22:56:32.4569787Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.4/x64
2022-03-31T22:56:32.4570206Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.4/x64/lib
2022-03-31T22:56:32.4570512Z ##[endgroup]
2022-03-31T22:56:32.4650294Z + python torch/testing/_check_kernel_launches.py
2022-03-31T22:56:32.4651506Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2022-03-31T22:56:32.4857170Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2022-03-31T22:56:32.4923384Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2022-03-31T22:56:32.4924594Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2022-03-31T22:56:32.4978451Z shell: /bin/bash -e {0}
2022-03-31T22:56:32.4978717Z env:
2022-03-31T22:56:32.4979040Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.4/x64
2022-03-31T22:56:32.4979454Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.4/x64/lib
2022-03-31T22:56:32.4979780Z ##[endgroup]
2022-03-31T22:56:32.5338878Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2022-03-31T22:56:32.5339878Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2022-03-31T22:56:32.5389498Z shell: /bin/bash -e {0}

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR

Job	Step	Action
^{Lint / mypy}	^{Run mypy}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-03-31T22:58:20Z

@shunting314 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

wconstab · 2022-03-31T23:03:48Z

torch/_lazy/__init__.py

+    """Return a unique id of the lazy tensor maintained by LTC"""
+    return torch._C._lazy._get_tensor_id(tensor)
+
+def get_tensors_text(tensors):


i wonder if we want to make a _lazy.debug. module for apis like this,
and if we want to rename the APIs

get_tensors_text -> debug.dump_ir(tensors)?

get_tensors_dot -> debug.render_ir_graph(tensors)?

it's easier to land it as-is, i am ok with possibly renaming/moving in a separate PR.

And for get_tensors_backend , rename to debug.dump_ir_backend(tensors) ?

I can do that in a followup PR

yea, sgtm. (or, just make a parameter to debug.dump_ir(tensors, format="text"/"backend")? which is cleaner?

wconstab · 2022-03-31T23:04:37Z

torch/_lazy/computation.py

+import torch._C._lazy
+import torch._C._lazy_ts_backend
+
+def get_tensors_ts_device_data_node(tensors):


I think anything that is ts_backend specific should be under the torch._lazy.ts_backend module.

I thought about the same thing. The reason I'm putting the method here is we should generalize it for XLA soon. But I'm okay to move this to ts_backend.py before that . Just let me know.

yea, I think ideally we want it at this level, and it is unclear if we can actually deliver that easily or if it will be a little different for xla. but lets leave it here now and just add some comments explaining the proposal to add xla here, and that this is still 'unstable/prototype'

wconstab · 2022-03-31T23:06:03Z

torch/_lazy/computation.py

+    """Return the graph hash for the passed in lazy tensors"""
+    return torch._C._lazy._get_graph_hash(tensors)
+
+def run_cached_graph(hash_str, graph_inputs):


maybe for now you can leave this 'run_cached_graph' api under the ts_backend namespace, and later we can consider a generic api at this level if we can make it work for all backends

wconstab · 2022-03-31T23:09:53Z

torch/_lazy/extract_compiled_graph.py

+        print("Fx code:\n", model.code)
+        print("LTC IR:", lazy.get_tensors_text(lazy_out))
+
+    graph_input_tensor_ids, graph_input_ivalues = computation.get_tensors_ts_device_data_node(lazy_out)


should we leave this file here and leave the ts_backend specific things in it? I think there are 2 alternatives that are not that great

move this file and all of this related stuff to under _lazy.ts_backend module (to be most correct, even though we want this functionality to be generic later

or just leave it all like it is now, but add some comments saying we intend to make this file more generic in the future and replace ts_ functions with generic ones in _lazy.computation layer

Since we anyway need to make things generic to all backends, I would prefer option 2.

wconstab · 2022-03-31T23:11:29Z

torch/csrc/lazy/python/init.cpp

 namespace torch {
 namespace lazy {

+struct NoGilSection {


we don't want to land NoGilSection, can you switch to pybind11::gil_scoped_release no_gil; like mark_step uses?

wconstab · 2022-03-31T23:12:43Z

torch/csrc/lazy/ts_backend/ts_backend_impl.h

@@ -1,3 +1,5 @@
+#pragma once


woah! good catch

shunting314 · 2022-03-31T23:27:20Z

@wconstab this PR was not generated by ghstack thus it shows 'Merging is blocked' . I create another one with ghstack: #75046 . Sorry about the confusing.

I'll resolve comments from this PR in the new one

wconstab · 2022-03-31T23:28:30Z

@wconstab this PR was not generated by ghstack thus it shows 'Merging is blocked' . I create another one with ghstack: #75046 . Sorry about the confusing.

I'll resolve comments from this PR in the new one

ok, no problem. But i'm confused, i never use ghstack but i seem to be able to land PRs that I just directly push. did something change since this morning?

shunting314 · 2022-03-31T23:30:27Z

@wconstab I don't know. I just figured that since I'm merging to master, I need follow some doc which says I should use ghstack. Maybe not using ghstack should also work.

wconstab · 2022-03-31T23:32:09Z

ah. you don't have to do it any particular way. you can export from sandcastle to gh, or you can push to gh any way and import to sandcastle. you still have to land via sandcastle either way, until GH first workflow is live.

shunting314 · 2022-03-31T23:34:05Z

ah. you don't have to do it any particular way. you can export from sandcastle to gh, or you can push to gh any way and import to sandcastle. you still have to land via sandcastle either way, until GH first workflow is live.

Got it. That's good to now. I guess I'm just scared by the red 'Merging is blocked' warning.

Dynamo+LTC: merging related code from staging branch to master

f98d4f0

facebook-github-bot added the cla signed label Mar 31, 2022

shunting314 requested a review from wconstab March 31, 2022 22:55

shunting314 requested a review from Krovatkin March 31, 2022 22:55

wconstab reviewed Mar 31, 2022

View reviewed changes

torch/csrc/lazy/ts_backend/ts_backend_impl.h

@@ -1,3 +1,5 @@

#pragma once

Copy link

Contributor

wconstab Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah! good catch

shunting314 closed this Mar 31, 2022

shunting314 mentioned this pull request Mar 31, 2022

Dynamo+LTC: merging related code from staging branch to master #75046

Closed

shunting314 deleted the merge-from-staging branch March 31, 2022 23:34

Dynamo+LTC: merging related code from staging branch to master #75044

Dynamo+LTC: merging related code from staging branch to master #75044

Uh oh!

Conversation

shunting314 commented Mar 31, 2022

Uh oh!

facebook-github-bot commented Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

Lint / clang-tidy (1/2)

Lint / quick-checks (2/2)

🕵️‍♀️ 1 failure not recognized by patterns:

Uh oh!

facebook-github-bot commented Mar 31, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shunting314 commented Mar 31, 2022

Uh oh!

wconstab commented Mar 31, 2022

Uh oh!

shunting314 commented Mar 31, 2022

Uh oh!

wconstab commented Mar 31, 2022

Uh oh!

shunting314 commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented Mar 31, 2022 •

edited

Loading