Skip to content

Conversation

@bstriner
Copy link
Contributor

Hi Guys!

This exports descriptors (TensorDescriptor, ConvolutionDescriptor, etc.). Also re-implements previous PR to include cudnn/*.h in the install, which was knocked out by recent refactoring: #7749

Descriptors.h, Handles.h and Exceptions.h eliminate a lot of boilerplate in cudnn extensions, so this PR makes them available.

Cheers

@fmassa
Copy link
Member

fmassa commented Jun 10, 2018

Hi,

I'm all in for exporting more things that makes life easier when writing extensions!
But I think it's getting time to clearly define what is stable API and what isn't.
There is a lot of refactoring going on in ATen / TH, so that many things will break. With a clear stable API explained users will at least know that a few functionalities that they are using might change.

@bstriner
Copy link
Contributor Author

@fmassa what were you thinking specifically? API documentation or something like more cpp_extension examples and tests.

Those specific APIs don't seem to have changed much recently, but the build itself has been refactored a lot. That said, there is a lot of code in ATen that could be refactored, especially around creating descriptors. For example, I'm pretty sure you could eliminate 3/4 duplicate descriptors in this file:
https://github.com/pytorch/pytorch/blob/master/caffe2/operators/rnn/recurrent_op_cudnn.cc

My specific use case is just implementing cudnnCTCLoss but it should be similar to any cudnn extension. These are what I'm using, and everything builds and tests correctly, but please let me know if there are some others that I should be using and if this list seems reasonable:

  • at::Tensor
  • at::TensorArg, checkDim, checkScalarType, checkSize, checkSameSize, checkBackend, checkContiguous for argument checking
  • at::native::TensorDescriptor to make the cudnnTensorDescriptor_t
  • at::native::getCudnnHandle to get the cudnn handle
  • at::native::CUDNN_CHECK to check cudnn return codes
  • TORCH_CUDA_CHECK to check cuda return codes

These are the includes I ended up with:

#include <ATen/TensorUtils.h>
#include <ATen/cudnn/Descriptors.h>
#include <ATen/cudnn/Exceptions.h>
#include <ATen/cudnn/Handles.h>
#include <torch/csrc/cuda/cuda_check.h>
#include <torch/torch.h>

Would it make sense to add an extension test? It would just be running cpp_extension to see if some specified list of headers and functions link correctly in the extension.

@bstriner
Copy link
Contributor Author

@pytorchbot retest this please.

@soumith
Copy link
Contributor

soumith commented Jun 10, 2018

cc: @ssnl who was looking into implementing cudnnCTCLoss bindings as well, looks like Ben has done some work :)

@fmassa
Copy link
Member

fmassa commented Jun 10, 2018

I think that everything which is currently exposed in the public ATen is considered stable, but I think it would be good to have some clear instructions to avoid problems like the one in #8267

@bstriner
Copy link
Contributor Author

@soumith @ssnl actually implemented them a little while ago, but having some fun trying to use more torch APIs. I think I've got the SLOC to a bare minimum (with things like this PR). Guy I was working with was away for a while and it is under his github, but probably will release this week or so.

@bstriner
Copy link
Contributor Author

@ssnl just saw dotaPredict. Awesome! If you're ever doing any some dota-based projects please let me know.

@bstriner
Copy link
Contributor Author

@soumith @peterjc123 Any thoughts on an approach to this current error (unrelated to this PR)? It looks like onnx sets /WX (treat warnings as errors) and includes protobuf, and protobuf has a small warning in it (forcing int to bool). Making that an atomic bool instead of an atomic int is marked as a todo in protobuf (at least the version that is the submodule).

Maybe remove the /WX from onnx, fix or supress the warning in protobuf, or try updating the submodules to see if there are any changes. Not sure if there would be a good way to inject a pragma disable from torch that would propagate through onnx to protobuf.

Relevant build log below:

 C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\CL.exe /c /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\cmake\..\third_party\eigen" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\cmake\..\third_party\pybind11\include" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\cmake\..\third_party\cub" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\onnx" /nologo /W3 /WX /MP /O2 /Ob2 /D WIN32 /D _WINDOWS /D NDEBUG /D ONNX_NAMESPACE=onnx_c2 /D "CMAKE_INTDIR=\"Release\"" /D _MBCS /Gm- /EHsc /MT /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /Fo"onnx_proto.dir\Release\\" /Fd"onnx_proto.dir\Release\onnx_proto.pdb" /Gd /TP /wd4146 /errorReport:queue "C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc" "C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc"
...
20:26:13      3>Done Building Project "C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\caffe2\caffe2_protos.vcxproj" (default targets).
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): error C2220: warning treated as error - no 'object' file generated (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): warning C4800: 'google::protobuf::internal::Atomic64': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): error C2220: warning treated as error - no 'object' file generated (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): warning C4800: 'google::protobuf::internal::Atomic64': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/generated_message_util.h(160): warning C4800: 'google::protobuf::uint32': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/generated_message_util.h(160): warning C4800: 'google::protobuf::uint32': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]

@bstriner
Copy link
Contributor Author

This looks like the culprit: onnx/onnx#1104

If we just set onnx to the version before that, it would be a temporary fix.

@bstriner
Copy link
Contributor Author

And this looks like the fix (1 hr ago): onnx/onnx#1105

Thanks @bddppq !

@ezyang
Copy link
Contributor

ezyang commented Jun 11, 2018

Hey @bstriner, as author of these descriptor classes, I can say a little more about what @fmassa is talking about.

Basically, if I'm writing some code which I know is only going to be used in one context, I may take shortcuts / make assumptions that I wouldn't make if I was designing a general purpose API. Glancing over the descriptor classes, I can see a few assumptions I made:

  • FilterDescriptor hard-codes CUDNN_TENSOR_NCHW (because that's the only layout we expose at the moment)
  • Similarly, ConvolutionDescriptor, RNNDescriptor, and SpatialTransformerDescriptor hard-code some parameter choices

And there are some maybe questionable design choices, such as having mut_desc be responsible for actually initializing the descriptor.

So, the danger is, at some point someone will want to actually add support for something that is not in the code, and the most straightforward thing to do will be to just refactor the (public APIs!) of these classes so that they have one more parameter or something, and we'll accept that patch because none of the tests test for a particular API, and then your code will break.

That being said, I'm not opposed to making this part of the public API and committing to supporting them for the forseeable future; they're pretty self-contained classes, and unlikely to be misused, so keeping them BC shouldn't be a big deal. But we haven't really setup the process for making sure we don't accidentally break pieces like this; which is why @fmassa is nervous :)

@bstriner
Copy link
Contributor Author

@ezyang I was actually thinking about some updates along those lines. I like the idea of some central code for creating descriptors but it could use some extra options/tests/whatever.

Personally, I'm only using TensorDescriptor, I just figured it would make sense to export the rest of that header while I was in there.

It's not the end of the world if I go back to calling cudnnSetTensorNdDescriptor myself in my own helper, but TensorDescriptor is just so much cleaner if you already have a bunch of at::Tensors. Same with things like CUDNN_CHECK. Easy to write but makes much more sense to just import from pytorch.

If the concern is that something will break, I was thinking about just adding a test. Write a CPP file that uses all of the relevant functions, call cpp_extension and assert that everything links correctly. Doesn't need to actually do anything in particular but you could also add some tests. That would check that the headers are in the right place and that the functions are exported correctly.

@bstriner
Copy link
Contributor Author

You might have a point about the mutex. I'll have to check if I can see any noticeable performance difference between inlining everything and using the exports, but I don't expect much.

@ezyang
Copy link
Contributor

ezyang commented Jun 11, 2018

Yes, I think some tests would be just the ticket!

bindings. If a dictionary is given, it should map function names to
docstrings (which are otherwise just the function names).
with_cuda: If ``cuda_sources`` is provided, CUDA will always be
enabled. Use this flag for pure CPP extensions that require CUDA.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the executable bit on the files and this is OK to merge.

Copy link
Contributor

@goldsborough goldsborough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some time to look at this, this change is new to me

Copy link
Contributor

@goldsborough goldsborough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take another look. I think there are some necessary changes

#include <torch/torch.h>
#include <iostream>

using namespace at;

This comment was marked as off-topic.

This comment was marked as off-topic.

const char *cudnn_relu_name = "cudnn_relu";

// Check arguments to cudnn_relu
int cudnn_relu_check(const Tensor &inputs, const Tensor &outputs) {

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

return 0;
}

int cudnn_relu(const Tensor& inputs, const Tensor& outputs) {

This comment was marked as off-topic.

This comment was marked as off-topic.

checkScalarType(cudnn_relu_name, arg_inputs, ScalarType::Float); \
checkBackend(cudnn_relu_name, arg_inputs.tensor, Backend::CUDA);
checkContiguous(cudnn_relu_name, arg_outputs);
checkScalarType(cudnn_relu_name, arg_outputs, ScalarType::Float); \

This comment was marked as off-topic.

}

int cudnn_relu(const Tensor& inputs, const Tensor& outputs) {
//Check inputs

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

//Destroy
CUDNN_CHECK(cudnnDestroyActivationDescriptor(activationDesc));
delete input_tensor_desc;
//Return

This comment was marked as off-topic.

#include <ATen/cudnn/Handles.h>
#include <torch/csrc/cuda/cuda_check.h>
#include <torch/torch.h>
#include <iostream>

This comment was marked as off-topic.

@@ -0,0 +1,73 @@
#include <torch/torch.h>

This comment was marked as off-topic.

extra_include_paths=None,
build_directory=None,
verbose=False):
verbose=False,

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.


from torch.utils.cpp_extension import CUDA_HOME
TEST_CUDA = torch.cuda.is_available() and CUDA_HOME is not None
TEST_CUDNN = TEST_CUDA and torch._C.has_cudnn

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@bstriner
Copy link
Contributor Author

@goldsborough going through changes now. I think some sort of extra flag is required if you want cuda libraries but don't happen to be making a new kernel. Example use case is if you compiled kernels separately or something.

What is your take on the actual semantics/name/documentation of that option?

An extra option like with_cudnn might make sense as well. Would be more stable than hardcoding the ldflags like I did in the test. The test works fine on CI but if cudnn.lib is off the path the test might not work. I would probably do that as a separate issue though, since I would want to add to both JIT and setup. Torch finds cudnn during install so it should use the same linkage if the extension needs cudnn.

@goldsborough
Copy link
Contributor

@bstriner I just replied to inline to say that we should keep with_cuda, but default it to None.

I did see the extra cudnn library includes in the test, and it looks a bit fragile. I'd be happy to add with_cudnn, but I'm not sure how easy it is to determine where cudnn lives? If it's always in the same CUDA_HOME directory as normal cuda libs, this would be fine. Happy to hear your thoughts on this

@bstriner
Copy link
Contributor Author

If it is same location as cuda, then it isn't a problem. Some people have multiple cudnn installs in separate directories, which is a complete pain.

One possibility is to pull the paths found by cmake and use them as extra include and library paths if they exist. That would mean cudnn would work if it is in cuda_home or if it is in the same place as when built. whatever libraries you built with would be used for extensions.

Should cover most situations. Not sure if there would be a situation where you would want to compile with one cudnn in one directory and build extensions with a different one in a different directory.

https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindCuDNN.cmake

@goldsborough
Copy link
Contributor

Not all users using C++ extensions will be compiling PyTorch from source however. So there may not be any cmake stuff to pull from.

I think it's ok to add with_cudnn and assume it is in CUDA_HOME. If users have complicated setups, they can always set with_cudnn=False and pass the libraries themselves, that's totally fine.

@bstriner
Copy link
Contributor Author

The cmake would just be a hint, so if there isn't anything, it isn't going to do any worse than it did before. Current test just assumes it is in one of the known paths (including CUDA_HOME), which should work for most situations. Maybe the simplest workaround would be just respecting some sort of CUDNN_HOME environment variable and getting includes and libraries from there (if provided).

Will have to think about it some more and maybe start another issue later.

@bstriner bstriner requested a review from fmassa as a code owner June 20, 2018 23:22
@bstriner
Copy link
Contributor Author

@goldsborough Think I got all of the comments addressed now. Thanks for all the help! Biggest change is just that parameter logic and documentation.

Existing logic:

        with_cuda = any(map(_is_cuda_file, sources))
        ...
        if _is_cuda_file(source_file):
            rule = 'cuda_compile'
            ...
        else:
            rule = 'compile'
            ...

My first version:

        with_cuda = with_cuda or any(map(_is_cuda_file, sources))
        ...
        if _is_cuda_file(source_file):
            rule = 'cuda_compile'
            ...
        else:
            rule = 'compile'
            ...

New logic based on this discussion:

        if with_cuda is None:
            with_cuda = any(map(_is_cuda_file, sources))
        ...
        if _is_cuda_file(source_file) and with_cuda:
            rule = 'cuda_compile'
            ...
        else:
            rule = 'compile'
            ...

Most people would use with_cuda=True or with_cuda=None but maybe someone will find with_cuda=False useful.

@bstriner
Copy link
Contributor Author

@pytorchbot retest this please.

@bstriner
Copy link
Contributor Author

Currently some third_party issues are blocking the tests (onnx sqrt issue). Needs retest after those are resolved.

@bstriner
Copy link
Contributor Author

Looks like this commit changed sqrt in onnx: onnx/onnx@068f1a4

Copy link
Contributor

@goldsborough goldsborough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks a lot for implementing this

@bstriner
Copy link
Contributor Author

I just pushed to onnx something that should fix the issue:

onnx/onnx#1129

@bstriner
Copy link
Contributor Author

@pytorchbot retest this please.

@bstriner
Copy link
Contributor Author

@goldsborough @ezyang @fmassa Thanks for all the comments! Looks like that onnx PR got CI working again. I think this is now good to go but please let me know if there is anything else missing.

Possible follow-up tasks:

  • More tests for descriptors
  • Add with_cudnn flag similar to with_cuda
  • Add clang-format checks to CI
  • Add permission checks to CI (e.g., 664 for all cpp)

@goldsborough goldsborough merged commit 4f604a4 into pytorch:master Jun 21, 2018
@bstriner bstriner deleted the export_TensorDescriptor branch June 21, 2018 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants