Skip to content

Conversation

@shunting314
Copy link
Contributor

Merge the code needed for dynamic+ltc integration from the staging branch to the master branch.

Test plan:
Unit test

pytest test_extract_compiled_graph

test thru dynamo

LTC_TS_CUDA=1 time python torchbench.py --speedup-ltc -dcuda --nvfuser --randomize-input --only <model name>

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 31, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit f98d4f0 (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build Lint / clang-tidy (1/2)

Step: "Check for warnings" (full log | diagnosis details | 🔁 rerun)

2022-03-31T23:03:04.5065072Z /__w/pytorch/pytor...-inefficient-vector-operation,-warnings-as-errors]
2022-03-31T23:03:04.5058299Z                 ivalues.push_back(tsDataPtr->scalar.value());
2022-03-31T23:03:04.5058613Z                         ^~~~~~~~~~
2022-03-31T23:03:04.5058850Z                         emplace_back(
2022-03-31T23:03:04.5059645Z /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:232:15: error: loop variable is copied but only used as const reference; consider making it a const reference [performance-for-range-copy,-warnings-as-errors]
2022-03-31T23:03:04.5060139Z     for (auto arg : graph_inputs) {
2022-03-31T23:03:04.5060315Z               ^
2022-03-31T23:03:04.5062185Z          const  &
2022-03-31T23:03:04.5063304Z /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:233:7: error: 'emplace_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
2022-03-31T23:03:04.5064062Z       stack.emplace_back(arg);
2022-03-31T23:03:04.5064311Z       ^
2022-03-31T23:03:04.5065072Z /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:238:7: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
2022-03-31T23:03:04.5065566Z       result.push_back(elem.toTensor());
2022-03-31T23:03:04.5065762Z       ^
2022-03-31T23:03:04.5065925Z Warnings detected!
2022-03-31T23:03:04.5066089Z Summary:
2022-03-31T23:03:04.5070503Z [performance-inefficient-vector-operation] occurred 3 times
2022-03-31T23:03:04.5071208Z     /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:149
2022-03-31T23:03:04.5071826Z     /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:233
2022-03-31T23:03:04.5072413Z     /__w/pytorch/pytorch/torch/csrc/lazy/python/init.cpp:238
2022-03-31T23:03:04.5072745Z 
2022-03-31T23:03:04.5073052Z [modernize-use-emplace] occurred 2 times

See GitHub Actions build Lint / quick-checks (2/2)

Step: "Ensure all test files have header containing ownership information" (full log | diagnosis details | 🔁 rerun)

2022-03-31T22:56:32.4857170Z python: can't open..._launches.py': [Errno 2] No such file or directory
2022-03-31T22:56:32.4519260Z ##[group]Run set -eux
2022-03-31T22:56:32.4519556Z �[36;1mset -eux�[0m
2022-03-31T22:56:32.4519982Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2022-03-31T22:56:32.4569178Z shell: /bin/bash -e {0}
2022-03-31T22:56:32.4569455Z env:
2022-03-31T22:56:32.4569787Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.4/x64
2022-03-31T22:56:32.4570206Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.4/x64/lib
2022-03-31T22:56:32.4570512Z ##[endgroup]
2022-03-31T22:56:32.4650294Z + python torch/testing/_check_kernel_launches.py
2022-03-31T22:56:32.4651506Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2022-03-31T22:56:32.4857170Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2022-03-31T22:56:32.4923384Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2022-03-31T22:56:32.4924594Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2022-03-31T22:56:32.4978451Z shell: /bin/bash -e {0}
2022-03-31T22:56:32.4978717Z env:
2022-03-31T22:56:32.4979040Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.4/x64
2022-03-31T22:56:32.4979454Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.4/x64/lib
2022-03-31T22:56:32.4979780Z ##[endgroup]
2022-03-31T22:56:32.5338878Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2022-03-31T22:56:32.5339878Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2022-03-31T22:56:32.5389498Z shell: /bin/bash -e {0}

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step Action
GitHub Actions Lint / mypy Run mypy 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@shunting314 shunting314 requested a review from Krovatkin March 31, 2022 22:55
@facebook-github-bot
Copy link
Contributor

@shunting314 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

"""Return a unique id of the lazy tensor maintained by LTC"""
return torch._C._lazy._get_tensor_id(tensor)

def get_tensors_text(tensors):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we want to make a _lazy.debug. module for apis like this,
and if we want to rename the APIs

get_tensors_text -> debug.dump_ir(tensors)?

get_tensors_dot -> debug.render_ir_graph(tensors)?

it's easier to land it as-is, i am ok with possibly renaming/moving in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for get_tensors_backend , rename to debug.dump_ir_backend(tensors) ?

I can do that in a followup PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, sgtm. (or, just make a parameter to debug.dump_ir(tensors, format="text"/"backend")? which is cleaner?

import torch._C._lazy
import torch._C._lazy_ts_backend

def get_tensors_ts_device_data_node(tensors):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anything that is ts_backend specific should be under the torch._lazy.ts_backend module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about the same thing. The reason I'm putting the method here is we should generalize it for XLA soon. But I'm okay to move this to ts_backend.py before that . Just let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I think ideally we want it at this level, and it is unclear if we can actually deliver that easily or if it will be a little different for xla. but lets leave it here now and just add some comments explaining the proposal to add xla here, and that this is still 'unstable/prototype'

"""Return the graph hash for the passed in lazy tensors"""
return torch._C._lazy._get_graph_hash(tensors)

def run_cached_graph(hash_str, graph_inputs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe for now you can leave this 'run_cached_graph' api under the ts_backend namespace, and later we can consider a generic api at this level if we can make it work for all backends

print("Fx code:\n", model.code)
print("LTC IR:", lazy.get_tensors_text(lazy_out))

graph_input_tensor_ids, graph_input_ivalues = computation.get_tensors_ts_device_data_node(lazy_out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we leave this file here and leave the ts_backend specific things in it? I think there are 2 alternatives that are not that great

  • move this file and all of this related stuff to under _lazy.ts_backend module (to be most correct, even though we want this functionality to be generic later
  • or just leave it all like it is now, but add some comments saying we intend to make this file more generic in the future and replace ts_ functions with generic ones in _lazy.computation layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we anyway need to make things generic to all backends, I would prefer option 2.

namespace torch {
namespace lazy {

struct NoGilSection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to land NoGilSection, can you switch to pybind11::gil_scoped_release no_gil; like mark_step uses?

@@ -1,3 +1,5 @@
#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah! good catch

@shunting314
Copy link
Contributor Author

@wconstab this PR was not generated by ghstack thus it shows 'Merging is blocked' . I create another one with ghstack: #75046 . Sorry about the confusing.

I'll resolve comments from this PR in the new one

@wconstab
Copy link
Contributor

@wconstab this PR was not generated by ghstack thus it shows 'Merging is blocked' . I create another one with ghstack: #75046 . Sorry about the confusing.

I'll resolve comments from this PR in the new one

ok, no problem. But i'm confused, i never use ghstack but i seem to be able to land PRs that I just directly push. did something change since this morning?

@shunting314
Copy link
Contributor Author

@wconstab I don't know. I just figured that since I'm merging to master, I need follow some doc which says I should use ghstack. Maybe not using ghstack should also work.

@wconstab
Copy link
Contributor

ah. you don't have to do it any particular way. you can export from sandcastle to gh, or you can push to gh any way and import to sandcastle. you still have to land via sandcastle either way, until GH first workflow is live.

@shunting314
Copy link
Contributor Author

ah. you don't have to do it any particular way. you can export from sandcastle to gh, or you can push to gh any way and import to sandcastle. you still have to land via sandcastle either way, until GH first workflow is live.

Got it. That's good to now. I guess I'm just scared by the red 'Merging is blocked' warning.

@shunting314 shunting314 deleted the merge-from-staging branch March 31, 2022 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants