-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[tensorexpr] Add support for aten::stack #73801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 7819277 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
eellison
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we need shape analysis support for this for dynamic shapes..
| size_t min_group_size = 2, | ||
| bool add_composed_op = false, | ||
| bool fuse_to_dynamic_shapes = false); | ||
| bool fuse_to_dynamic_shapes = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be changing the default as part of this PR i don't think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry it was a draft PR. Just updated it, and now it's ready for review!
In this PR, aten::stack is not enabled for fusion yet. Will enable the fusion (w/wo dynamic shapes) in separate PRs.
Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
|
||
| static bool checkStackInputShape(const std::vector<BufHandle>& bufList) { | ||
| if (bufList.size() == 0) { | ||
| throw std::runtime_error("Empty input list is passed to aten::stack"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's be consistent in handling error cases. Two lines below we use TORCH_INTERNAL_ASSERT and here we're throwing an error.
| load = ifThenElse( | ||
| CompareSelect::make( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we generate both ifThenElse and CompareSelect?
| std::vector<ExprHandle> newAxes(axes.begin(), axes.begin() + dim); | ||
| newAxes.insert(newAxes.end(), axes.begin() + dim + 1, axes.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. First line makes
newAxes[:] = axes[:]
Second line then appends it with axes[dim+1:].
That means that if we have 10-d tensors and dim=0 the stacked tensor will have rank 19. I think it should be 11-d instead, am I missing something?
Also, please update the comment to describe what new axis are created by this op and where they are inserted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line makes
newAxes = axes[:dim]
Overall, there will be N-1 elements in newAxes where N is the size of axes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it, it makes sense. Maybe it'd be easier to grasp if we call newAxes as outputAxes and axes as inputAxes - currently "new" corresponds to inputs which is somewhat counterintuitive (even though I understand why it's done that way - we're deducing input indexes from output indexes).
test/cpp/tensorexpr/test_ops.cpp
Outdated
| auto at = at::rand(dims, at::kFloat); | ||
| auto bt = at::rand(dims, at::kFloat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test only checks stacking two tensors with equal dimensions. We should also check other cases, e.g.:
- stacking just one tensor
- stacking N>2 tensors
- stacking tensors with different dimensions
It would be good to also have such tests in python, so that whenever opinfo covers stack we can use that too.
It might be also a good idea to have a test verifying IR that we generate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test only checks stacking two tensors with equal dimensions. We should also check other cases...
That's a good point! Will add more tests.
It would be good to also have such tests in python, so that whenever opinfo covers
stackwe can use that too.
Not sure if I understand this. Are you suggesting to test above cases from Python instead of C++? We don't have many unit tests for ops right now. I only know aten::sum, aten::cat, and aten::conv. For these 3 ops, we put the unit tests in 3 files under test/cpp/tensorexpr. Maybe we should put them together in one file like test_ops.cpp.
It might be also a good idea to have a test verifying IR that we generate.
Agree. Will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to test above cases from Python instead of C++?
Having C++ tests is fine, but currently the most comprehensive coverage that we have is in python:
pytorch/test/test_jit_fuser_te.py
Line 2476 in 122f864
| class TestNNCOpInfo(JitCommonTestCase): |
It leverages OpInfo, which is basically a collection of representative inputs for an op. What I suggested was to add a test for stack to those tests as well (if possible). If stack is not supported by OpInfo, then I think it still would make sense to write a python test manually, as most of our ops are currently tested from python rather than from C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer!! Added such a test for aten::stack in PR #74077.
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| return hasEmptyDims; | ||
| } | ||
|
|
||
| Tensor computeStack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an implementation that lowers into multiple loops like in the case of aten::cat, as done in computeCatWoConditionals?
The conditionals based lowering does not give good perf on CPUs, mainly because we can't vectorize it. I think the conditionals based lowering still makes sense for CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditionals based lowering does not give good perf on CPUs, mainly because we can't vectorize it.
aten::stack only uses compareSelect conditionals which can be vectorized. I'll see if the other implementation has better performance. Will add it if it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use some representative shapes from the models you have been working on, for perf analysis.
I still believe the multiple loops approach would be faster than conditionals because we will have both spacial and temporal cache locality for all inputs. It would be good to confirm with a perf analysis.
If you want to do this as a followup, thats fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! I'll first enable aten::stack in the fusion, and then do a perf analysis on real models for both approaches.
| } | ||
|
|
||
| int64_t dim_ = c10::get<int64_t>(argDim); | ||
| auto dim = normalizeAndCheckIndex(dim_, axes.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like function errors out when dim_ == axes.size(). Isn't that a correct input for stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The axes are for the output of aten::stack. For example,
a = [1, 2]
b = [3, 4]
output = aten::stack([a, b], dim=dim_)
axes.size() equals to 3, and dim_ should be a number between 0 to 2, or -1 to -3, right? It cannot be 3. We have such dim tests for stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, I missed the point that axes is for the output of stack. Makes sense.
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
navahgar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please ensure you address all comments from Mikhail.
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
ZolotukhinM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think it's still a good idea to add a pyhton test though.
| # CHECK: for | ||
| # CHECK-NEXT: for | ||
| # CHECK-NEXT: for | ||
| # CHECK-NEXT: for | ||
| # CHECK-NEXT: aten_stack)IR"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check essentially just verifies that the output is 4-d. Do we want to check the actual IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted to check the stmt IR in the following way,
CHECK-NEXT: aten_stack[Ramp(5ll * (j + 2ll * i), 1ll, 4)] = (Broadcast(j, 4))==(Broadcast(1ll, 4)) ? (ty_1[Ramp(5ll * i, 1ll, 4)]) : (tx_1[Ramp(5ll * i, 1ll, 4)]))
it failed one CI test (linux-xenial-py3.7-clang7-asan). With the configuration in this CI test, the vectorization failed thus the generated stmt is different. There are two ways to fix this,
- implement a CHECK that can check patterns in a string such as ".?.:" for stack
- save the original stmt before prepare_to_codegen in TEK for checking issues
Option 2 does not seem good as it introduces additional member vars in TEK only for testing issues; Option 1 can be implemented in a separate PR. Currently landing this PR with the original checks.
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This PR adds the lowering function for aten::stack in NNC. Differential Revision: [D34647822](https://our.internmc.facebook.com/intern/diff/D34647822) [ghstack-poisoned]
|
@huiguoo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Pull Request resolved: #73801 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D34647822 Pulled By: huiguoo fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a
|
Hey @huiguoo. |
|
@pytorchbot revert this |
|
Reverting PR 73801 failed due to Can't revert PR that was landed via phabricator as D34647822 |
|
Reverting, as it broke ASAN at Kernel.Stack test, see https://github.com/pytorch/pytorch/runs/5761874340?check_suite_focus=true |
|
This pull request has been reverted by 3cc49984f4116be253e08ede8408021cfde30509. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk). |
|
This pull request has been reverted by 43313cb. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk). |
1 similar comment
|
This pull request has been reverted by 43313cb. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk). |
Stack from ghstack (oldest at bottom):
This PR adds the lowering function for aten::stack in NNC.
Differential Revision: D34647822