Export tensor descriptor #8313

bstriner · 2018-06-10T04:01:32Z

Hi Guys!

This exports descriptors (TensorDescriptor, ConvolutionDescriptor, etc.). Also re-implements previous PR to include cudnn/*.h in the install, which was knocked out by recent refactoring: #7749

Descriptors.h, Handles.h and Exceptions.h eliminate a lot of boilerplate in cudnn extensions, so this PR makes them available.

Cheers

…/pytorch into export_TensorDescriptor

fmassa · 2018-06-10T08:34:58Z

Hi,

I'm all in for exporting more things that makes life easier when writing extensions!
But I think it's getting time to clearly define what is stable API and what isn't.
There is a lot of refactoring going on in ATen / TH, so that many things will break. With a clear stable API explained users will at least know that a few functionalities that they are using might change.

bstriner · 2018-06-10T20:15:20Z

@fmassa what were you thinking specifically? API documentation or something like more cpp_extension examples and tests.

Those specific APIs don't seem to have changed much recently, but the build itself has been refactored a lot. That said, there is a lot of code in ATen that could be refactored, especially around creating descriptors. For example, I'm pretty sure you could eliminate 3/4 duplicate descriptors in this file:
https://github.com/pytorch/pytorch/blob/master/caffe2/operators/rnn/recurrent_op_cudnn.cc

My specific use case is just implementing cudnnCTCLoss but it should be similar to any cudnn extension. These are what I'm using, and everything builds and tests correctly, but please let me know if there are some others that I should be using and if this list seems reasonable:

at::Tensor
at::TensorArg, checkDim, checkScalarType, checkSize, checkSameSize, checkBackend, checkContiguous for argument checking
at::native::TensorDescriptor to make the cudnnTensorDescriptor_t
at::native::getCudnnHandle to get the cudnn handle
at::native::CUDNN_CHECK to check cudnn return codes
TORCH_CUDA_CHECK to check cuda return codes

These are the includes I ended up with:

#include <ATen/TensorUtils.h>
#include <ATen/cudnn/Descriptors.h>
#include <ATen/cudnn/Exceptions.h>
#include <ATen/cudnn/Handles.h>
#include <torch/csrc/cuda/cuda_check.h>
#include <torch/torch.h>

Would it make sense to add an extension test? It would just be running cpp_extension to see if some specified list of headers and functions link correctly in the extension.

bstriner · 2018-06-10T20:16:35Z

@pytorchbot retest this please.

soumith · 2018-06-10T20:29:35Z

cc: @ssnl who was looking into implementing cudnnCTCLoss bindings as well, looks like Ben has done some work :)

fmassa · 2018-06-10T20:37:35Z

I think that everything which is currently exposed in the public ATen is considered stable, but I think it would be good to have some clear instructions to avoid problems like the one in #8267

bstriner · 2018-06-10T20:38:39Z

@soumith @ssnl actually implemented them a little while ago, but having some fun trying to use more torch APIs. I think I've got the SLOC to a bare minimum (with things like this PR). Guy I was working with was away for a while and it is under his github, but probably will release this week or so.

bstriner · 2018-06-10T21:13:40Z

@ssnl just saw dotaPredict. Awesome! If you're ever doing any some dota-based projects please let me know.

…ptor

bstriner · 2018-06-10T21:46:28Z

@soumith @peterjc123 Any thoughts on an approach to this current error (unrelated to this PR)? It looks like onnx sets /WX (treat warnings as errors) and includes protobuf, and protobuf has a small warning in it (forcing int to bool). Making that an atomic bool instead of an atomic int is marked as a todo in protobuf (at least the version that is the submodule).

Maybe remove the /WX from onnx, fix or supress the warning in protobuf, or try updating the submodules to see if there are any changes. Not sure if there would be a good way to inject a pragma disable from torch that would propagate through onnx to protobuf.

Relevant build log below:

 C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64\CL.exe /c /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\cmake\..\third_party\eigen" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\cmake\..\third_party\pybind11\include" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\cmake\..\third_party\cub" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx" /I"C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\onnx" /nologo /W3 /WX /MP /O2 /Ob2 /D WIN32 /D _WINDOWS /D NDEBUG /D ONNX_NAMESPACE=onnx_c2 /D "CMAKE_INTDIR=\"Release\"" /D _MBCS /Gm- /EHsc /MT /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /Fo"onnx_proto.dir\Release\\" /Fd"onnx_proto.dir\Release\onnx_proto.pdb" /Gd /TP /wd4146 /errorReport:queue "C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc" "C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc"
...
20:26:13      3>Done Building Project "C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\caffe2\caffe2_protos.vcxproj" (default targets).
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): error C2220: warning treated as error - no 'object' file generated (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): warning C4800: 'google::protobuf::internal::Atomic64': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): error C2220: warning treated as error - no 'object' file generated (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/io/coded_stream.h(869): warning C4800: 'google::protobuf::internal::Atomic64': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/generated_message_util.h(160): warning C4800: 'google::protobuf::uint32': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]
20:26:13     19>C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\third_party\protobuf\src\google/protobuf/generated_message_util.h(160): warning C4800: 'google::protobuf::uint32': forcing value to bool 'true' or 'false' (performance warning) (compiling source file C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx\onnx-operators_onnx_c2.pb.cc) [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\third_party\onnx\onnx_proto.vcxproj]

bstriner · 2018-06-10T21:51:44Z

This looks like the culprit: onnx/onnx#1104

If we just set onnx to the version before that, it would be a temporary fix.

bstriner · 2018-06-10T21:57:23Z

And this looks like the fix (1 hr ago): onnx/onnx#1105

Thanks @bddppq !

ezyang · 2018-06-11T00:35:55Z

Hey @bstriner, as author of these descriptor classes, I can say a little more about what @fmassa is talking about.

Basically, if I'm writing some code which I know is only going to be used in one context, I may take shortcuts / make assumptions that I wouldn't make if I was designing a general purpose API. Glancing over the descriptor classes, I can see a few assumptions I made:

FilterDescriptor hard-codes CUDNN_TENSOR_NCHW (because that's the only layout we expose at the moment)
Similarly, ConvolutionDescriptor, RNNDescriptor, and SpatialTransformerDescriptor hard-code some parameter choices

And there are some maybe questionable design choices, such as having mut_desc be responsible for actually initializing the descriptor.

So, the danger is, at some point someone will want to actually add support for something that is not in the code, and the most straightforward thing to do will be to just refactor the (public APIs!) of these classes so that they have one more parameter or something, and we'll accept that patch because none of the tests test for a particular API, and then your code will break.

That being said, I'm not opposed to making this part of the public API and committing to supporting them for the forseeable future; they're pretty self-contained classes, and unlikely to be misused, so keeping them BC shouldn't be a big deal. But we haven't really setup the process for making sure we don't accidentally break pieces like this; which is why @fmassa is nervous :)

bstriner · 2018-06-11T00:52:43Z

@ezyang I was actually thinking about some updates along those lines. I like the idea of some central code for creating descriptors but it could use some extra options/tests/whatever.

Personally, I'm only using TensorDescriptor, I just figured it would make sense to export the rest of that header while I was in there.

It's not the end of the world if I go back to calling cudnnSetTensorNdDescriptor myself in my own helper, but TensorDescriptor is just so much cleaner if you already have a bunch of at::Tensors. Same with things like CUDNN_CHECK. Easy to write but makes much more sense to just import from pytorch.

If the concern is that something will break, I was thinking about just adding a test. Write a CPP file that uses all of the relevant functions, call cpp_extension and assert that everything links correctly. Doesn't need to actually do anything in particular but you could also add some tests. That would check that the headers are in the right place and that the functions are exported correctly.

bstriner · 2018-06-11T05:25:59Z

You might have a point about the mutex. I'll have to check if I can see any noticeable performance difference between inlining everything and using the exports, but I don't expect much.

ezyang · 2018-06-11T15:06:00Z

Yes, I think some tests would be just the ticket!

…ptor

torch/utils/cpp_extension.py

            bindings. If a dictionary is given, it should map function names to
            docstrings (which are otherwise just the function names).
+        with_cuda: If ``cuda_sources`` is provided, CUDA will always be
+            enabled. Use this flag for pure CPP extensions that require CUDA.


ezyang

Fix the executable bit on the files and this is OK to merge.

goldsborough

I need some time to look at this, this change is new to me

goldsborough

Please take another look. I think there are some necessary changes