Autocast support for cudnn RNNs #42385

mcarilli · 2020-07-31T21:15:57Z

Should close #36428.

The cudnn RNN API expects weights to occupy a flat buffer in memory with a particular layout. This PR implements a "speed of light" fix: _cudnn_rnn_cast_reflatten (the autocast wrapper assigned to _cudnn_rnn) copies weights to the right slices of a flat FP16 buffer with a single read/write per weight (as opposed to casting them to FP16 individually then reflattening the individual FP16 weights, which would require 2 read/writes per weight).

It isn't pretty but IMO it doesn't make rnn bindings much more tortuous than they already are.

The test tries a forward under autocast and a backward for the full cross product of RNN options and input/weight/hidden dtypes. As for all FP16list autocast tests, forward output and backward grads are checked against a control where inputs (including RNN module weights in this case) are precasted to FP16 on the python side.

Not sure who to ask for review, tagging @ezyang and @ngimel because Ed wrote this file (almost 2 years ago) and Natalia did the most recent major surgery.

Side quests discovered:

Should we update persistent RNN heuristics to include compute capability 8.0? Could be another PR but seems easy enough to include.
Many (maybe all?!) the raw cudnn API calls in RNN.cpp are deprecated in cudnn 8. I don't mind taking the AI to update them since my mental cache is full of rnn stuff, but that would be a substantial separate PR.

… if bias=False, need to debug.

dr-ci · 2020-07-31T21:17:45Z

💊 CI failures summary and remediations

As of commit 8cf8c98 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 38 times.

ezyang · 2020-08-04T03:28:42Z

oh god I probably should review this XD

mcarilli · 2020-08-04T05:12:55Z

oh god I probably should review this XD

Thanks for volunteering 🚪🔥 abandon hope all ye who enter here. Given the complexity of the code and the small number of people who have safaried through it, I made the test exhaustive to keep us on track. The best thing I can say about my implementation is "if its stupid and it works it aint stupid."

ngimel

Sorry, can't continue today.

aten/src/ATen/autocast_mode.cpp

aten/src/ATen/native/RNN.h

aten/src/ATen/native/cudnn/RNN.cpp

ezyang · 2020-08-04T14:04:51Z

aten/src/ATen/autocast_mode.cpp

+#include <ATen/cuda/CUDAConfig.h>
+
+#if AT_CUDNN_ENABLED()
+#include <ATen/native/cudnn/RNNUtils.h>


Danger danger will robinson. ATen/autocast_mode.cpp is compiled as part of ATen_cpu and it should not access any headers in the CUDA directory. You will probably have to chuck these autocast wrappers in a separate file in ATen/cuda

hmmm but if I move my wrapper (along with the above include) to an inline function in a header in ATen/cuda, then include the header in autocast_mode.cpp, seems like I'm no better off.

Do you mean I should put my wrapper (_cudnn_rnn_cast_reflatten) declaration+definition in, say, ATen/cuda/AutocastRNN.h+.cpp,
have ATen/autocast_mode.cpp include AutocastRNN.h (or forward declare _cudnn_rnn_cast_reflatten),
and have ATen/cuda/AutocastRNN.cpp be the thing that includes ATen/native/cudnn/RNNUtils.h?
That would mean ATen/native/cudnn/RNNUtils.h doesn't wind up directly included in autocast_mode.cpp. Or am I misinterpreting?

Also, are the files in ATen/cuda compiled into a separate library from ATen_cpu? If so, and I do the above, should I declare _cudnn_rnn_cast_reflatten with TORCH_CUDA_API?

Would doing the above using ATen/cudnn instead of ATen/cuda to host my files be equally valid? If so, I think cudnn rather than cuda is a more appropriate home for the RNN wrapper.

You should do the registration inside the cpp file in cuda. Then you do not need to include the header from ATen/autocast_mode.cpp

Feel free to put it in cudnn directory; both dirs end up in torch_cuda library in the end.

hopefully resolved

ezyang · 2020-08-04T14:11:47Z

aten/src/ATen/native/cudnn/RNN.cpp

+// Utilities exposed in RNNUtils.h
+namespace cudnn_rnn {
+
+  TORCH_CUDA_API std::tuple<Tensor, std::vector<Tensor>> copy_weights_to_flat_buf_views(


Any substantive change to logic here?

Added knobs are to make sure it can service both _cudnn_rnn_flatten_weight and autocast::_cudnn_rnn_cast_reflatten.

The existing behavior of _cudnn_rnn_flatten_weight should remain unchanged.

aten/src/ATen/native/cudnn/RNN.cpp

ezyang · 2020-08-05T14:25:08Z

While I am still not sure why you had to factor out a chunk of code into a separate helper, overall the changes seem reasonable and lightweight. Happy to approve when this is out of WIP.

ezyang · 2020-08-10T22:21:40Z

test/test_cuda.py

+    # so they get a dedicated test.
+    # Despite the large number of RNN cases it tries, the test takes < 15 seconds on a Titan V (similar to V100).
+    @unittest.skipIf(not TEST_CUDNN, 'CUDNN not available')
+    def test_autocast_rnn(self):


cc @mruberry for an interesting ad hoc testing example

ezyang · 2020-08-10T22:24:30Z


19:36:09 ======================================================================
19:36:09 FAIL: test_autocast_rnn (__main__.TestCuda)
19:36:09 ----------------------------------------------------------------------
19:36:09 Traceback (most recent call last):
19:36:09   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 815, in wrapper
19:36:09     method(*args, **kwargs)
19:36:09   File "test_cuda.py", line 2659, in test_autocast_rnn
19:36:09     self.assertEqual(out.grad_fn.name(), "CudnnRnnBackward")
19:36:09   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1126, in assertEqual
19:36:09     super().assertEqual(x, y, msg=msg)
19:36:09 AssertionError: 'CatBackward' != 'CudnnRnnBackward'
19:36:09 - CatBackward
19:36:09 + CudnnRnnBackward

rocm failure is real

also lint error

  {
    path: 'test/test_cuda.py',
    start_line: 2610,
    end_line: 2610,
    start_column: 5,
    end_column: 5,
    annotation_level: 'failure',
    message: '[E301] expected 1 blank line, found 0'
  }

mcarilli · 2020-08-11T05:01:42Z

Thanks for quick review! Users will be happy about this.

oops, I left the lint failure deliberately as a reminder to discuss if the test should be tried without cudnn as well. What do you think?

As for rocm, are we ok to @skipIfRocm, or should the test be tried with rocm?
Tbh I'm confused why the existing @unittest.skipIf(not TEST_CUDNN, 'CUDNN not available') allowed the to run with rocm, is TEST_CUDNN considered true for rocm??

ngimel · 2020-08-11T05:15:16Z

Yes, TEST_CUDNN = TEST_CUDA and (TEST_WITH_ROCM or something else). And yes, it's ok to @skipIfRocm

mcarilli · 2020-08-11T19:47:29Z

Failures look spurious now (rocm failure is in test_nn)

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2020-08-17T21:24:30Z

OK, I need to lock changes to this PR from OSS side, as it looks like this PR needs fbcode side build system changes.

facebook-github-bot · 2020-08-18T22:17:05Z

@ezyang merged this pull request in fbf274f.

In some versions of GCC, tuple constructor from initializer list is marked as explicit, which results in the following compilation error: ``` /var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/RNN.cpp: In function 'std::tuple<at::Tensor, std::vector<at::Tensor, std::allocator<at::Tensor> > > at::native::cudnn_rnn::copy_weights_to_flat_buf_views(at::TensorList, int64_t, int64_t, int64_t, int64_t, int64_t, bool, bool, cudnnDataType_t, const c10::TensorOptions&, bool, bool, bool)': /var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/RNN.cpp:687:35: error: converting to 'std::tuple<at::Tensor, std::vector<at::Tensor, std::allocator<at::Tensor> > >' from initializer list would use explicit constructor 'constexpr std::tuple<_T1, _T2>::tuple(_U1&&, _U2&&) [with _U1 = at::Tensor&; _U2 = std::vector<at::Tensor>&; <template-parameter-2-3> = void; _T1 = at::Tensor; _T2 = std::vector<at::Tensor>]' return {weight_buf, params_arr}; ``` This regression was introduced by pytorch#42385

Summary: In some versions of GCC, tuple constructor from initializer list is marked as explicit, which results in the following compilation error: ``` /var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/RNN.cpp: In function 'std::tuple<at::Tensor, std::vector<at::Tensor, std::allocator<at::Tensor> > > at::native::cudnn_rnn::copy_weights_to_flat_buf_views(at::TensorList, int64_t, int64_t, int64_t, int64_t, int64_t, bool, bool, cudnnDataType_t, const c10::TensorOptions&, bool, bool, bool)': /var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/RNN.cpp:687:35: error: converting to 'std::tuple<at::Tensor, std::vector<at::Tensor, std::allocator<at::Tensor> > >' from initializer list would use explicit constructor 'constexpr std::tuple<_T1, _T2>::tuple(_U1&&, _U2&&) [with _U1 = at::Tensor&; _U2 = std::vector<at::Tensor>&; <template-parameter-2-3> = void; _T1 = at::Tensor; _T2 = std::vector<at::Tensor>]' return {weight_buf, params_arr}; ``` This regression was introduced by #42385 Fixes #{issue number} Pull Request resolved: #43244 Reviewed By: pbelevich Differential Revision: D23205656 Pulled By: malfet fbshipit-source-id: 51470386ad95290c7c99d733fc1fe655aa27d009

Michael Carilli added 6 commits July 26, 2020 23:20

The c++ structure might be...ok?git diff Still need to compile.

593ed03

Merge remote-tracking branch 'upstream/master' into amp_rnn

f0d95c0

Adding RNNUtils.h so i don't accidentally rinse it again

13ec785

compiles?

393abac

It works!

8395faa

Integrated test. Forward succeeds for all cases, backward still fails…

b03faac

… if bias=False, need to debug.

pytorchbot added the open source label Jul 31, 2020

Michael Carilli added 6 commits July 31, 2020 15:38

Merge remote-tracking branch 'upstream/master' into amp_rnn

7be3005

Fix bias backward

bdad360

merge autocast_mode.cpp

1df7e92

_cudnn_rnn not UNBOXED anymore

0222d5f

changed back to impl_UNBOXED

9628fb4

Update _cudnn_rnn signature

a9819c2

mcarilli requested review from ezyang and ngimel August 3, 2020 23:23

ngimel reviewed Aug 4, 2020

View reviewed changes

ezyang reviewed Aug 4, 2020

View reviewed changes

mcarilli mentioned this pull request Aug 4, 2020

CuDNN RNN bindings are basically all deprecated in cudnn 8 #42545

Open

ezyang reviewed Aug 5, 2020

View reviewed changes

aten/src/ATen/native/cudnn/RNN.cpp Outdated Show resolved Hide resolved

mcarilli mentioned this pull request Aug 5, 2020

LSTMCell and GRUCell need autocast patching #42605

Closed

Michael Carilli added 5 commits August 5, 2020 15:48

Merge remote-tracking branch 'upstream/master' into amp_rnn

51c3d8b

Merge remote-tracking branch 'upstream/master' into amp_rnn

8cb7048

addressing some comments

18e6467

split _cudnn_rnn wrapper into ATen/cudnn

39818b9

more reorg

7e93e94

smessmer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 10, 2020

ezyang reviewed Aug 10, 2020

View reviewed changes

ezyang approved these changes Aug 10, 2020

View reviewed changes

Michael Carilli added 3 commits August 11, 2020 10:24

skipIfRocm for test, delete flake8 error

793d92d

Merge remote-tracking branch 'upstream/master' into amp_rnn

e062f67

Merge remote-tracking branch 'upstream/master' into amp_rnn

452cef5

facebook-github-bot reviewed Aug 12, 2020

View reviewed changes

Michael Carilli added 3 commits August 14, 2020 13:45

Merge remote-tracking branch 'upstream/master' into amp_rnn

933fcdd

Merge remote-tracking branch 'upstream/master' into amp_rnn

7f10ba1

Merge remote-tracking branch 'upstream/master' into amp_rnn

8cf8c98

facebook-github-bot reviewed Aug 17, 2020

View reviewed changes

facebook-github-bot closed this in fbf274f Aug 18, 2020

facebook-github-bot added the merged label Aug 18, 2020

malfet mentioned this pull request Aug 19, 2020

In copy_weights_to_flat_buf_views() explicitly construct tuple #43244

Closed

This was referenced Aug 20, 2020

Update autocast in dispatcher tutorial pytorch/tutorials#1128

Merged

CUDNN_STATUS_BAD_PARAM with LSTM/RNN in 1.6.0 with autocast #43322

Closed

mcarilli mentioned this pull request Aug 28, 2020

How to use and debug mixed-precision in 1.6.0 ? #42996

Closed

mcarilli mentioned this pull request Sep 6, 2020

Use autocast's built-in cast-helper functions pytorch/vision#2646

Merged

mruberry added the Merged label Oct 28, 2020

Autocast support for cudnn RNNs #42385

Autocast support for cudnn RNNs #42385

Uh oh!

Conversation

mcarilli commented Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

ezyang commented Aug 4, 2020

Uh oh!

mcarilli commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezyang Aug 4, 2020

Choose a reason for hiding this comment

Uh oh!

mcarilli Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcarilli Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcarilli Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

mcarilli Aug 10, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang Aug 4, 2020

Choose a reason for hiding this comment

Uh oh!

mcarilli Aug 6, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ezyang commented Aug 5, 2020

Uh oh!

ezyang Aug 10, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang commented Aug 10, 2020

Uh oh!

mcarilli commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Aug 11, 2020

Uh oh!

mcarilli commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Aug 17, 2020

Uh oh!

mcarilli commented Jul 31, 2020 •

edited

Loading

dr-ci bot commented Jul 31, 2020 •

edited

Loading

mcarilli commented Aug 4, 2020 •

edited

Loading

mcarilli Aug 6, 2020 •

edited

Loading

mcarilli Aug 6, 2020 •

edited

Loading

mcarilli Aug 7, 2020 •

edited

Loading

mcarilli commented Aug 11, 2020 •

edited

Loading

mcarilli commented Aug 11, 2020 •

edited

Loading