Double-dispatch copy. #7197

ezyang · 2018-05-02T19:07:44Z

In order to split ATen's CPU/CUDA code into two separate libraries
which don't require a build flag (AT_CUDA_ENABLED) to separate them,
we need to be able to split source files based on whether or not they
handle CPU functionality only, or also touch CUDA. Copy poses a unique
challenge here, because the naive implementation involves writing
a matrix for all combinations of CPU/GPU in a single file.

This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting
the following matrix:

to\from    CPU           CUDA
      +---------------------------
CPU   | CPUCopy.cpp   CUDACopy.cpp
CUDA  | CUDACopy.cpp  CUDACopy.cpp

When you run x.copy_(y) where x is CPU and y is CUDA, we do a second
virtual dispatch to copy_from(y, x) on y's type, so that we can get
from CPUCopy.cpp to CUDACopy.cpp

The new autogenerated code for CPU looks like this:

Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const {
  // code generated by copy_wrapper
  checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false);
  switch (src.type().ID()) {
    case TypeID::CPUByte:
        THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor);
        break;
    case TypeID::CPUChar:
        THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor);
        break;
    ...
    default:
      return src.type().s_copy_from(src, dst, non_blocking);

Notice that the fall through goes to s_copy_from. s_copy_from is like s_copy
but the arguments are reversed.

This commit is a TEMPORARY state of affairs; when the multiple-dispatcher is online we can get rid of all of this goo.

Signed-off-by: Edward Z. Yang ezyang@fb.com

gchanan · 2018-05-02T21:52:50Z

This doesn't seem to compile for me.

aten/src/ATen/copy_wrapper.py

gchanan · 2018-05-03T15:01:58Z

looks like lint is failing.

In order to split ATen's CPU/CUDA code into two separate libraries which don't require a build flag (AT_CUDA_ENABLED) to separate them, we need to be able to split source files based on whether or not they handle CPU functionality only, or also touch CUDA. Copy poses a unique challenge here, because the naive implementation involves writing a matrix for all combinations of CPU/GPU in a single file. This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting the following matrix: to\from CPU CUDA +--------------------------- CPU | CPUCopy.cpp CUDACopy.cpp CUDA | CUDACopy.cpp CUDACopy.cpp When you run x.copy_(y) where x is CPU and y is CUDA, we do a second virtual dispatch to copy_from(y, x) on y's type, so that we can get from CPUCopy.cpp to CUDACopy.cpp The new autogenerated code for CPU looks like this: Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const { // code generated by copy_wrapper checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false); switch (src.type().ID()) { case TypeID::CPUByte: THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor); break; case TypeID::CPUChar: THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor); break; ... default: return src.type().s_copy_from(src, dst, non_blocking); Notice that the fall through goes to s_copy_from. s_copy_from is like s_copy but the arguments are reversed. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Double-dispatch copy. In order to split ATen's CPU/CUDA code into two separate libraries which don't require a build flag (AT_CUDA_ENABLED) to separate them, we need to be able to split source files based on whether or not they handle CPU functionality only, or also touch CUDA. Copy poses a unique challenge here, because the naive implementation involves writing a matrix for all combinations of CPU/GPU in a single file. This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting the following matrix: to\from CPU CUDA +--------------------------- CPU | CPUCopy.cpp CUDACopy.cpp CUDA | CUDACopy.cpp CUDACopy.cpp When you run x.copy_(y) where x is CPU and y is CUDA, we do a second virtual dispatch to copy_from(y, x) on y's type, so that we can get from CPUCopy.cpp to CUDACopy.cpp The new autogenerated code for CPU looks like this: Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const { // code generated by copy_wrapper checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false); switch (src.type().ID()) { case TypeID::CPUByte: THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor); break; case TypeID::CPUChar: THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor); break; ... default: return src.type().s_copy_from(src, dst, non_blocking); Notice that the fall through goes to s_copy_from. s_copy_from is like s_copy but the arguments are reversed. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lintfix and no-CUDA fix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix compilation erorr. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang requested a review from gchanan May 2, 2018 19:07

ezyang requested review from apaszke, colesbury, soumith and zdevito as code owners May 2, 2018 19:07

onnxbot-worker-1 mentioned this pull request May 2, 2018

[auto] pytorch-pr-7197 onnxbot/onnx-fb-universe#1887

Closed

gchanan reviewed May 2, 2018

View reviewed changes

aten/src/ATen/copy_wrapper.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

gchanan reviewed May 2, 2018

View reviewed changes

ezyang added 4 commits May 3, 2018 20:01

Lintfix and no-CUDA fix

458a0e4

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix compilation erorr.

d9139b5

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CR

578200f

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang force-pushed the pr/double-dispatch-copy branch from bf48b95 to 578200f Compare May 4, 2018 04:07

ezyang mentioned this pull request May 4, 2018

Split libATen.so into libATen_cpu.so and libATen_cuda.so #7275

Merged

gchanan approved these changes May 4, 2018

View reviewed changes

ezyang merged commit 4abb229 into pytorch:master May 4, 2018

Double-dispatch copy. #7197

Double-dispatch copy. #7197

Uh oh!

Conversation

ezyang commented May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gchanan commented May 2, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

gchanan commented May 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ezyang commented May 2, 2018 •

edited

Loading