Move type casting to c10/util/TypeCast.h #28343

zasdfgbnm · 2019-10-20T03:15:19Z

Stack from ghstack:

Move type casting to c10/util/TypeCast.h #28343 Move type casting to c10/util/TypeCast.h
Simplify copy kernel #28352 Simplify copy kernel
Make TensorIterator stop promoting types by copying #28344 Make TensorIterator stop promoting types by copying
Move type casting to c10/util/TypeCast.h #28343 Move type casting to c10/util/TypeCast.h

Type casting is used in copy, and will be used also in tensor iterator
in the next stacked diff. I move it to c10 to make it serve as an common
util for different things.

I also add two dynamic casting functions

fetch_and_cast
cast_and_store

fetch_and_cast fetch a value with dynamic type specified by a ScalarType
from a void pointer and cast it to a static type.

cast_and_store casts a static typed value into dynamic type specified
by a ScalarType, and store it into a void pointer.

Type casting is used in copy, and will be used also in tensor iterator in the next stacked diff. I move it to c10 to make it serve as an common util for different things. I also add two dynamic casting functions - fetch_and_cast - cast_and_store fetch_and_cast fetch a value with dynamic type specified by a ScalarType from a void pointer and cast it to a static type. cast_and_store casts a static typed value into dynamic type specified by a ScalarType, and store it into a void pointer. [ghstack-poisoned]

ezyang · 2019-10-21T19:44:11Z

Fetch and cast seems like a bad idea. It does a dynamic test, which means if you ever do it in a loop, you are going to do a billion tests, no?

I'm OK with the movement, that can go in whenever.

zasdfgbnm · 2019-10-21T20:18:58Z

@ezyang Yes and no. It will execute a billion tests, but it is not as bad as it might look:

on CPU, this is a branch that always has the same outcome, therefore hopefully the branch predictor could do the job pretty well
on GPU, these branches will not diverge, so we could still have the same warp executing the same line of code
Most kernels, like add, are bandwidth bound, adding a few clock cycles to check an integer does not hurt the performance much because the ALUs would wait for load instructions anyway.

For example for the benchmark at #28344, the add_ speed up ~3x, which justifies that the argument on being bandwidth bound (previously 1 read 1 write for the cast, 2 read 1 write for add, 1 read 1 write for casting back therefore 7 memory access in total, after 1 read, cast, add, cast, 1 write, 2 memory access, 3x less).

The choice of using dynamic cast is mostly for the consideration of the compilation time and binary size, for example, to implement add_ with type promotion, without dynamic casting, if we support N dtypes, we should do something like

AT_DISPATCH_ALL_TYPES(output.dtype(),
   AT_DISPATCH_ALL_TYPES(input1.dtype(),
      AT_DISPATCH_ALL_TYPES(input2.dtype(),
          [](arg0_t a, arg1_t b) -> out_t { return a + b; }
      )
   )
)

which would generate the a+b kernel for all the N * N * N different supported types, the compilation time and binary size would become horrible.

zasdfgbnm · 2019-10-21T20:43:23Z

@ezyang See also #28352 (comment)
The performance for the copy kernel with static vs dynamic typing is also comparable.

ezyang · 2019-10-22T16:05:40Z

OK, that's a lovely explanation. Let's put it in the code? Then I'll approve

zasdfgbnm · 2019-10-22T17:03:46Z

It seems that I mess things up...

zasdfgbnm mentioned this pull request Oct 20, 2019

Make TensorIterator stop promoting types by copying #28344

Merged

zasdfgbnm requested review from colesbury and ezyang October 20, 2019 07:38

zasdfgbnm added module: internals Related to internal abstractions in c10 and ATen module: type promotion Related to semantics of type promotion labels Oct 20, 2019

zasdfgbnm mentioned this pull request Oct 21, 2019

Simplify copy kernel #28352

Merged

zasdfgbnm merged commit b3009ac into gh/zasdfgbnm/8/base Oct 22, 2019

This was referenced Oct 22, 2019

Move type casting to c10/util/TypeCast.h #28426

Closed

Fail if multiple commits have the same ghstack-source-id ezyang/ghstack#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move type casting to c10/util/TypeCast.h #28343

Move type casting to c10/util/TypeCast.h #28343

Uh oh!

zasdfgbnm commented Oct 20, 2019 •

edited

Loading

Uh oh!

ezyang commented Oct 21, 2019

Uh oh!

zasdfgbnm commented Oct 21, 2019 •

edited

Loading

Uh oh!

zasdfgbnm commented Oct 21, 2019

Uh oh!

ezyang commented Oct 22, 2019

Uh oh!

zasdfgbnm commented Oct 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Move type casting to c10/util/TypeCast.h #28343

Move type casting to c10/util/TypeCast.h #28343

Uh oh!

Conversation

zasdfgbnm commented Oct 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 21, 2019

Uh oh!

zasdfgbnm commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm commented Oct 21, 2019

Uh oh!

ezyang commented Oct 22, 2019

Uh oh!

zasdfgbnm commented Oct 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zasdfgbnm commented Oct 20, 2019 •

edited

Loading

zasdfgbnm commented Oct 21, 2019 •

edited

Loading