Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented May 2, 2018

In order to split ATen's CPU/CUDA code into two separate libraries
which don't require a build flag (AT_CUDA_ENABLED) to separate them,
we need to be able to split source files based on whether or not they
handle CPU functionality only, or also touch CUDA. Copy poses a unique
challenge here, because the naive implementation involves writing
a matrix for all combinations of CPU/GPU in a single file.

This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting
the following matrix:

to\from    CPU           CUDA
      +---------------------------
CPU   | CPUCopy.cpp   CUDACopy.cpp
CUDA  | CUDACopy.cpp  CUDACopy.cpp

When you run x.copy_(y) where x is CPU and y is CUDA, we do a second
virtual dispatch to copy_from(y, x) on y's type, so that we can get
from CPUCopy.cpp to CUDACopy.cpp

The new autogenerated code for CPU looks like this:

Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const {
  // code generated by copy_wrapper
  checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false);
  switch (src.type().ID()) {
    case TypeID::CPUByte:
        THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor);
        break;
    case TypeID::CPUChar:
        THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor);
        break;
    ...
    default:
      return src.type().s_copy_from(src, dst, non_blocking);

Notice that the fall through goes to s_copy_from. s_copy_from is like s_copy
but the arguments are reversed.

This commit is a TEMPORARY state of affairs; when the multiple-dispatcher is online we can get rid of all of this goo.

Signed-off-by: Edward Z. Yang ezyang@fb.com

@gchanan
Copy link
Contributor

gchanan commented May 2, 2018

This doesn't seem to compile for me.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@gchanan
Copy link
Contributor

gchanan commented May 3, 2018

looks like lint is failing.

ezyang added 4 commits May 3, 2018 20:01
In order to split ATen's CPU/CUDA code into two separate libraries
which don't require a build flag (AT_CUDA_ENABLED) to separate them,
we need to be able to split source files based on whether or not they
handle CPU functionality only, or also touch CUDA.  Copy poses a unique
challenge here, because the naive implementation involves writing
a matrix for all combinations of CPU/GPU in a single file.

This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting
the following matrix:

    to\from    CPU           CUDA
          +---------------------------
    CPU   | CPUCopy.cpp   CUDACopy.cpp
    CUDA  | CUDACopy.cpp  CUDACopy.cpp

When you run x.copy_(y) where x is CPU and y is CUDA, we do a second
virtual dispatch to copy_from(y, x) on y's type, so that we can get
from CPUCopy.cpp to CUDACopy.cpp

The new autogenerated code for CPU looks like this:

Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const {
  // code generated by copy_wrapper
  checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false);
  switch (src.type().ID()) {
    case TypeID::CPUByte:
        THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor);
        break;
    case TypeID::CPUChar:
        THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor);
        break;
    ...
    default:
      return src.type().s_copy_from(src, dst, non_blocking);

Notice that the fall through goes to s_copy_from.  s_copy_from is like s_copy
but the arguments are reversed.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang ezyang merged commit 4abb229 into pytorch:master May 4, 2018
Jorghi12 pushed a commit to wsttiger/pytorch that referenced this pull request May 10, 2018
* Double-dispatch copy.

In order to split ATen's CPU/CUDA code into two separate libraries
which don't require a build flag (AT_CUDA_ENABLED) to separate them,
we need to be able to split source files based on whether or not they
handle CPU functionality only, or also touch CUDA.  Copy poses a unique
challenge here, because the naive implementation involves writing
a matrix for all combinations of CPU/GPU in a single file.

This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting
the following matrix:

    to\from    CPU           CUDA
          +---------------------------
    CPU   | CPUCopy.cpp   CUDACopy.cpp
    CUDA  | CUDACopy.cpp  CUDACopy.cpp

When you run x.copy_(y) where x is CPU and y is CUDA, we do a second
virtual dispatch to copy_from(y, x) on y's type, so that we can get
from CPUCopy.cpp to CUDACopy.cpp

The new autogenerated code for CPU looks like this:

Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const {
  // code generated by copy_wrapper
  checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false);
  switch (src.type().ID()) {
    case TypeID::CPUByte:
        THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor);
        break;
    case TypeID::CPUChar:
        THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor);
        break;
    ...
    default:
      return src.type().s_copy_from(src, dst, non_blocking);

Notice that the fall through goes to s_copy_from.  s_copy_from is like s_copy
but the arguments are reversed.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lintfix and no-CUDA fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix compilation erorr.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
* Double-dispatch copy.

In order to split ATen's CPU/CUDA code into two separate libraries
which don't require a build flag (AT_CUDA_ENABLED) to separate them,
we need to be able to split source files based on whether or not they
handle CPU functionality only, or also touch CUDA.  Copy poses a unique
challenge here, because the naive implementation involves writing
a matrix for all combinations of CPU/GPU in a single file.

This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting
the following matrix:

    to\from    CPU           CUDA
          +---------------------------
    CPU   | CPUCopy.cpp   CUDACopy.cpp
    CUDA  | CUDACopy.cpp  CUDACopy.cpp

When you run x.copy_(y) where x is CPU and y is CUDA, we do a second
virtual dispatch to copy_from(y, x) on y's type, so that we can get
from CPUCopy.cpp to CUDACopy.cpp

The new autogenerated code for CPU looks like this:

Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const {
  // code generated by copy_wrapper
  checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false);
  switch (src.type().ID()) {
    case TypeID::CPUByte:
        THByteTensor_copyByte(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUByteTensor*>(src.pImpl)->tensor);
        break;
    case TypeID::CPUChar:
        THByteTensor_copyChar(static_cast<CPUByteTensor*>(dst.pImpl)->tensor, static_cast<CPUCharTensor*>(src.pImpl)->tensor);
        break;
    ...
    default:
      return src.type().s_copy_from(src, dst, non_blocking);

Notice that the fall through goes to s_copy_from.  s_copy_from is like s_copy
but the arguments are reversed.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lintfix and no-CUDA fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix compilation erorr.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants