Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type #27457

ngimel · 2019-10-07T01:35:44Z

Summary:

Short-circuits computing common type and type promotion logic for the common case of operands and result of the same type
Improves performance of checking memory overlap by returning MemoryOverlap::FULL if tensors are the same, skips the call
from TensorIterator when tensors are the same
Changes the default size of DimVector from 5 to 6, thus allowing it not to be resized for a common case of binary operation. strides
DimVector is forced to have at least 2*num_tensors elements, which for an operation with 2 inputs and one output is 6
If offset is 0 (common non-broadcasting case), don't fill strides vector with 0-s, because all the values will be subsequently written to.

These changes combined improve the overhead from 1.02 us to .74 us for a simple in-place operation.

Test Plan: should be covered by existing tests

Differential Revision: D17784532

facebook-github-bot · 2019-10-07T01:35:59Z

This pull request was exported from Phabricator. Differential Revision: D17784532

facebook-github-bot · 2019-10-07T04:52:30Z

This pull request was exported from Phabricator. Differential Revision: D17784532

ngimel · 2019-10-07T16:01:28Z

cc @nairbv for the TODO item https://github.com/pytorch/pytorch/pull/27457/files#diff-b47a50873394e38a005b4c1acd151957R116

colesbury · 2019-10-07T14:53:48Z

aten/src/ATen/native/TensorIterator.cpp

indentation

aten/src/ATen/core/DimVector.h

facebook-github-bot · 2019-10-08T00:15:04Z

This pull request was exported from Phabricator. Differential Revision: D17784532

resistor · 2019-10-08T22:04:28Z

aten/src/ATen/native/TensorIterator.cpp

Should this check be sunk into assert_no_partial_overlap?

I already added a short-circuit in get_overlap_status, but it was behind .contiguous so was slower, now I pulled it up there and removed from here, seems to be ok.

… inputs and outputs of the same type (pytorch#27457) Summary: Pull Request resolved: pytorch#27457 1) Short-circuits computing common type and type promotion logic for the common case of operands and result of the same type 2) Improves performance of checking memory overlap by returning MemoryOverlap::FULL if tensors are the same, skips the call from TensorIterator when tensors are the same 3) Changes the default size of DimVector from 5 to 6, thus allowing it not to be resized for a common case of binary operation. `strides` DimVector is forced to have at least 2*num_tensors elements, which for an operation with 2 inputs and one output is 6 4) If `offset` is 0 (common non-broadcasting case), don't fill `strides` vector with 0-s, because all the values will be subsequently written to. These changes combined improve the overhead from 1.02 us to .74 us for a simple in-place operation. Test Plan: should be covered by existing tests Differential Revision: D17784532 fbshipit-source-id: 5e523e1892cfe4b7aee0d90cc8e820409c309e9f

facebook-github-bot · 2019-10-09T16:41:03Z

This pull request was exported from Phabricator. Differential Revision: D17784532

VitalyFedyunin · 2019-10-14T19:16:13Z

aten/src/ATen/native/TensorIterator.h

 struct CAFFE2_API TensorIterator {
  using DimMask = std::bitset<64>;
  using PtrVector = SmallVector<char*, 4>;
+  using StrideVector = SmallVector<int64_t, 6>;


super nit: maybe move this magical numbers into constant with good name
super super nit: make Ptr vector same size as StrideVector

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

… inputs and outputs of the same type (#27457) Summary: 1) Short-circuits computing common type and type promotion logic for the common case of operands and result of the same type 2) Improves performance of checking memory overlap by returning MemoryOverlap::FULL if tensors are the same, skips the call from TensorIterator when tensors are the same 3) Changes the default size of DimVector from 5 to 6, thus allowing it not to be resized for a common case of binary operation. `strides` DimVector is forced to have at least 2*num_tensors elements, which for an operation with 2 inputs and one output is 6 4) If `offset` is 0 (common non-broadcasting case), don't fill `strides` vector with 0-s, because all the values will be subsequently written to. These changes combined improve the overhead from 1.02 us to .74 us for a simple in-place operation. Pull Request resolved: pytorch/pytorch#27457 Test Plan: should be covered by existing tests Differential Revision: D17784532 Pulled By: ngimel fbshipit-source-id: e6a8ee58be5de14461bdbc2e2b0b6d16a96c309f

facebook-github-bot · 2019-10-16T22:13:42Z

@ngimel merged this pull request in 174e1ba.

… inputs and outputs of the same type (pytorch#27457) Summary: 1) Short-circuits computing common type and type promotion logic for the common case of operands and result of the same type 2) Improves performance of checking memory overlap by returning MemoryOverlap::FULL if tensors are the same, skips the call from TensorIterator when tensors are the same 3) Changes the default size of DimVector from 5 to 6, thus allowing it not to be resized for a common case of binary operation. `strides` DimVector is forced to have at least 2*num_tensors elements, which for an operation with 2 inputs and one output is 6 4) If `offset` is 0 (common non-broadcasting case), don't fill `strides` vector with 0-s, because all the values will be subsequently written to. These changes combined improve the overhead from 1.02 us to .74 us for a simple in-place operation. Pull Request resolved: pytorch#27457 Test Plan: should be covered by existing tests Differential Revision: D17784532 Pulled By: ngimel fbshipit-source-id: e6a8ee58be5de14461bdbc2e2b0b6d16a96c309f

pytorchbot added module: internals Related to internal abstractions in c10 and ATen module: operators labels Oct 7, 2019

ngimel force-pushed the export-D17784532 branch from 59120d7 to 7310549 Compare October 7, 2019 04:52

soumith requested review from VitalyFedyunin and colesbury October 7, 2019 05:51

colesbury reviewed Oct 7, 2019

View reviewed changes

ngimel force-pushed the export-D17784532 branch from 7310549 to c83fa03 Compare October 8, 2019 00:15

resistor reviewed Oct 8, 2019

View reviewed changes

ngimel force-pushed the export-D17784532 branch from c83fa03 to 3960835 Compare October 9, 2019 16:41

VitalyFedyunin approved these changes Oct 14, 2019

View reviewed changes

facebook-github-bot reviewed Oct 16, 2019

View reviewed changes

facebook-github-bot closed this in 174e1ba Oct 16, 2019

facebook-github-bot added the merged label Oct 16, 2019

ngimel deleted the export-D17784532 branch December 5, 2019 03:48

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type #27457

Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type #27457

Uh oh!

ngimel commented Oct 7, 2019

Uh oh!

facebook-github-bot commented Oct 7, 2019

Uh oh!

facebook-github-bot commented Oct 7, 2019

Uh oh!

ngimel commented Oct 7, 2019

Uh oh!

colesbury Oct 7, 2019

Uh oh!

Uh oh!

facebook-github-bot commented Oct 8, 2019

Uh oh!

resistor Oct 8, 2019

Uh oh!

ngimel Oct 9, 2019

Uh oh!

facebook-github-bot commented Oct 9, 2019

Uh oh!

VitalyFedyunin Oct 14, 2019

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Oct 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type #27457

Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type #27457

Uh oh!

Conversation

ngimel commented Oct 7, 2019

Uh oh!

facebook-github-bot commented Oct 7, 2019

Uh oh!

facebook-github-bot commented Oct 7, 2019

Uh oh!

ngimel commented Oct 7, 2019

Uh oh!

colesbury Oct 7, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Oct 8, 2019

Uh oh!

resistor Oct 8, 2019

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 9, 2019

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 9, 2019

Uh oh!

VitalyFedyunin Oct 14, 2019

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants