Skip to content

Conversation

@huiguoo
Copy link
Contributor

@huiguoo huiguoo commented Mar 4, 2022

Stack from ghstack (oldest at bottom):

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: D34647822

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 4, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/59e95b891f3864083e9e39b3319657d2c286e0f3/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build ciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 4, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 7819277 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-03-29T17:55:32.3059660Z The PR is introduc...m to confirm whether this change is wanted or not.
2022-03-29T17:55:32.3046997Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-03-29T17:55:32.3048396Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-03-29T17:55:32.3049665Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-03-29T17:55:32.3050955Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-03-29T17:55:32.3052709Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-03-29T17:55:32.3053655Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-29T17:55:32.3055097Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-29T17:55:32.3056328Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-03-29T17:55:32.3057910Z processing existing schema:  _dump_stats(__torch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-03-29T17:55:32.3059307Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-03-29T17:55:32.3059660Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-03-29T17:55:32.3059752Z 
2022-03-29T17:55:32.3059834Z Broken ops: [
2022-03-29T17:55:32.3060157Z 	aten::histogramdd(Tensor self, int[] bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges)
2022-03-29T17:55:32.3060476Z 	aten::histogramdd.int_bins(Tensor self, int bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges)
2022-03-29T17:55:32.3060816Z 	aten::histogramdd.TensorList_bins(Tensor self, Tensor[] bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges)
2022-03-29T17:55:32.3060974Z 	aten::_to_dense(Tensor self, int? dtype=None) -> (Tensor)
2022-03-29T17:55:32.3061037Z ]
2022-03-29T17:55:32.4047002Z + cleanup
2022-03-29T17:55:32.4047122Z + retcode=1
2022-03-29T17:55:32.4047240Z + set +x

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See GitHub Actions build pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun) ❄️

2022-03-29T19:32:04.0763708Z unknown file: Failure
2022-03-29T19:32:03.3354853Z �[0;32m[       OK ] �[mKernel.CatInputTypesPromotion (71 ms)
2022-03-29T19:32:03.3355300Z �[0;32m[ RUN      ] �[mKernel.CatAndInlineWithAConstantDim
2022-03-29T19:32:03.7075755Z �[0;32m[       OK ] �[mKernel.CatAndInlineWithAConstantDim (372 ms)
2022-03-29T19:32:03.7076251Z �[0;32m[ RUN      ] �[mKernel.CatWithEmptyInputs
2022-03-29T19:32:03.9370204Z �[0;32m[       OK ] �[mKernel.CatWithEmptyInputs (229 ms)
2022-03-29T19:32:03.9370548Z �[0;32m[ RUN      ] �[mKernel.CatWoConditionals
2022-03-29T19:32:04.0070285Z �[0;32m[       OK ] �[mKernel.CatWoConditionals (68 ms)
2022-03-29T19:32:04.0070636Z �[0;32m[ RUN      ] �[mKernel.OptimizeConditionals
2022-03-29T19:32:04.0642583Z �[0;32m[       OK ] �[mKernel.OptimizeConditionals (58 ms)
2022-03-29T19:32:04.0642884Z �[0;32m[ RUN      ] �[mKernel.Stack
2022-03-29T19:32:04.0763708Z unknown file: Failure
2022-03-29T19:32:04.0763989Z C++ exception with description "Expected to not find "\n" but found it
2022-03-29T19:32:04.0765641Z       for (int64_t k = 0ll; k < 2ll; k++) {
2022-03-29T19:32:04.0766071Z         for (int64_t l = 0ll; l < 3ll; l++) {
2022-03-29T19:32:04.0766499Z           for (int64_t m = 0ll; m < 6ll; m++) {
2022-03-29T19:32:04.0766878Z             aten_stack[(((108ll * i + 18ll * k) + m) + 36ll * j) + 6ll * l] = k==1ll ? (ty_1[((18ll * j + m) + 54ll * i) + 6ll * l]) : (tx_1[((18ll * j + m) + 54ll * i) + 6ll * l]);
2022-03-29T19:32:04.0767383Z           }
2022-03-29T19:32:04.0767663Z         }
2022-03-29T19:32:04.0768051Z From CHECK-NEXT: aten_stack
2022-03-29T19:32:04.0769408Z " thrown in the test body.
2022-03-29T19:32:04.0769939Z �[0;31m[  FAILED  ] �[mKernel.Stack (12 ms)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 4, 2022
huiguoo added a commit that referenced this pull request Mar 4, 2022
ghstack-source-id: ede6abb
Pull Request resolved: #73801
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 4, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we need shape analysis support for this for dynamic shapes..

size_t min_group_size = 2,
bool add_composed_op = false,
bool fuse_to_dynamic_shapes = false);
bool fuse_to_dynamic_shapes = true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be changing the default as part of this PR i don't think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry it was a draft PR. Just updated it, and now it's ready for review!

In this PR, aten::stack is not enabled for fusion yet. Will enable the fusion (w/wo dynamic shapes) in separate PRs.

huiguoo added a commit that referenced this pull request Mar 8, 2022
ghstack-source-id: 740715a
Pull Request resolved: #73801
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 9, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.


static bool checkStackInputShape(const std::vector<BufHandle>& bufList) {
if (bufList.size() == 0) {
throw std::runtime_error("Empty input list is passed to aten::stack");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's be consistent in handling error cases. Two lines below we use TORCH_INTERNAL_ASSERT and here we're throwing an error.

Comment on lines 723 to 724
load = ifThenElse(
CompareSelect::make(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we generate both ifThenElse and CompareSelect?

Comment on lines 706 to 707
std::vector<ExprHandle> newAxes(axes.begin(), axes.begin() + dim);
newAxes.insert(newAxes.end(), axes.begin() + dim + 1, axes.end());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. First line makes

newAxes[:] = axes[:]

Second line then appends it with axes[dim+1:].

That means that if we have 10-d tensors and dim=0 the stacked tensor will have rank 19. I think it should be 11-d instead, am I missing something?

Also, please update the comment to describe what new axis are created by this op and where they are inserted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line makes

newAxes = axes[:dim]

Overall, there will be N-1 elements in newAxes where N is the size of axes.

Copy link

@ZolotukhinM ZolotukhinM Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it, it makes sense. Maybe it'd be easier to grasp if we call newAxes as outputAxes and axes as inputAxes - currently "new" corresponds to inputs which is somewhat counterintuitive (even though I understand why it's done that way - we're deducing input indexes from output indexes).

Comment on lines 66 to 67
auto at = at::rand(dims, at::kFloat);
auto bt = at::rand(dims, at::kFloat);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only checks stacking two tensors with equal dimensions. We should also check other cases, e.g.:

  1. stacking just one tensor
  2. stacking N>2 tensors
  3. stacking tensors with different dimensions

It would be good to also have such tests in python, so that whenever opinfo covers stack we can use that too.

It might be also a good idea to have a test verifying IR that we generate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only checks stacking two tensors with equal dimensions. We should also check other cases...

That's a good point! Will add more tests.

It would be good to also have such tests in python, so that whenever opinfo covers stack we can use that too.

Not sure if I understand this. Are you suggesting to test above cases from Python instead of C++? We don't have many unit tests for ops right now. I only know aten::sum, aten::cat, and aten::conv. For these 3 ops, we put the unit tests in 3 files under test/cpp/tensorexpr. Maybe we should put them together in one file like test_ops.cpp.

It might be also a good idea to have a test verifying IR that we generate.

Agree. Will add it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to test above cases from Python instead of C++?

Having C++ tests is fine, but currently the most comprehensive coverage that we have is in python:

class TestNNCOpInfo(JitCommonTestCase):

It leverages OpInfo, which is basically a collection of representative inputs for an op. What I suggested was to add a test for stack to those tests as well (if possible). If stack is not supported by OpInfo, then I think it still would make sense to write a python test manually, as most of our ops are currently tested from python rather than from C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer!! Added such a test for aten::stack in PR #74077.

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
huiguoo added a commit that referenced this pull request Mar 10, 2022
ghstack-source-id: 92268ec
Pull Request resolved: #73801
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 10, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@huiguoo huiguoo requested a review from ZolotukhinM March 10, 2022 21:10
return hasEmptyDims;
}

Tensor computeStack(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an implementation that lowers into multiple loops like in the case of aten::cat, as done in computeCatWoConditionals?

The conditionals based lowering does not give good perf on CPUs, mainly because we can't vectorize it. I think the conditionals based lowering still makes sense for CUDA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditionals based lowering does not give good perf on CPUs, mainly because we can't vectorize it.

aten::stack only uses compareSelect conditionals which can be vectorized. I'll see if the other implementation has better performance. Will add it if it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use some representative shapes from the models you have been working on, for perf analysis.

I still believe the multiple loops approach would be faster than conditionals because we will have both spacial and temporal cache locality for all inputs. It would be good to confirm with a perf analysis.

If you want to do this as a followup, thats fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! I'll first enable aten::stack in the fusion, and then do a perf analysis on real models for both approaches.

}

int64_t dim_ = c10::get<int64_t>(argDim);
auto dim = normalizeAndCheckIndex(dim_, axes.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like function errors out when dim_ == axes.size(). Isn't that a correct input for stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The axes are for the output of aten::stack. For example,

a = [1, 2]
b = [3, 4]
output = aten::stack([a, b], dim=dim_)

axes.size() equals to 3, and dim_ should be a number between 0 to 2, or -1 to -3, right? It cannot be 3. We have such dim tests for stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, I missed the point that axes is for the output of stack. Makes sense.

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 11, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@navahgar navahgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Please ensure you address all comments from Mikhail.

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 14, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think it's still a good idea to add a pyhton test though.

Comment on lines +794 to +798
# CHECK: for
# CHECK-NEXT: for
# CHECK-NEXT: for
# CHECK-NEXT: for
# CHECK-NEXT: aten_stack)IR";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check essentially just verifies that the output is 4-d. Do we want to check the actual IR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted to check the stmt IR in the following way,

CHECK-NEXT: aten_stack[Ramp(5ll * (j + 2ll * i), 1ll, 4)] = (Broadcast(j, 4))==(Broadcast(1ll, 4)) ? (ty_1[Ramp(5ll * i, 1ll, 4)]) : (tx_1[Ramp(5ll * i, 1ll, 4)]))

it failed one CI test (linux-xenial-py3.7-clang7-asan). With the configuration in this CI test, the vectorization failed thus the generated stmt is different. There are two ways to fix this,

  1. implement a CHECK that can check patterns in a string such as ".?.:" for stack
  2. save the original stmt before prepare_to_codegen in TEK for checking issues

Option 2 does not seem good as it introduces additional member vars in TEK only for testing issues; Option 1 can be implemented in a separate PR. Currently landing this PR with the original checks.

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 22, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 25, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 25, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This PR adds the lowering function for aten::stack in NNC.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822)

[ghstack-poisoned]
@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 29, 2022

@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Mar 30, 2022
Summary: Pull Request resolved: #73801

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D34647822

Pulled By: huiguoo

fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a
pytorchmergebot pushed a commit that referenced this pull request Mar 30, 2022
Summary: Pull Request resolved: #73801

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D34647822

Pulled By: huiguoo

fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a
(cherry picked from commit c71778c)
@github-actions
Copy link
Contributor

Hey @huiguoo.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@zengk95
Copy link
Contributor

zengk95 commented Mar 31, 2022

@pytorchbot revert this

@pytorchmergebot
Copy link
Collaborator

Reverting PR 73801 failed due to Can't revert PR that was landed via phabricator as D34647822
Raised by https://github.com/pytorch/pytorch/actions/runs/2068620191

@malfet
Copy link
Contributor

malfet commented Mar 31, 2022

Reverting, as it broke ASAN at Kernel.Stack test, see https://github.com/pytorch/pytorch/runs/5761874340?check_suite_focus=true

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 3cc49984f4116be253e08ede8408021cfde30509. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 43313cb. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 43313cb. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants