Enable TF32 support for cuDNN #40737

zasdfgbnm · 2020-06-29T22:40:48Z

No description provided.

ghstack-source-id: 9504cd9 Pull Request resolved: #39361

dr-ci · 2020-06-29T22:43:36Z

💊 CI failures summary and remediations

As of commit 2d85abe (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_rocm3_5_1_py3_6_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 21 22:12:24 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected ‘;’ before ‘}’ token\n int main() { return 0 }\n                       ^\n" }

Aug 21 22:12:24 Traceback (most recent call last): 
Aug 21 22:12:24   File "test/run_test.py", line 719, in <module> 
Aug 21 22:12:24     main() 
Aug 21 22:12:24   File "test/run_test.py", line 708, in main 
Aug 21 22:12:24     raise RuntimeError(err) 
Aug 21 22:12:24 RuntimeError: distributed/test_c10d failed! Received signal: SIGIOT 
Aug 21 22:12:24 + cleanup 
Aug 21 22:12:24 + retcode=1 
Aug 21 22:12:24 + set +x 
Aug 21 22:12:24 =================== sccache compilation log =================== 
Aug 21 22:12:24 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected ‘;’ before ‘}’ token\n int main() { return 0 }\n                       ^\n" } 
Aug 21 22:12:24  
Aug 21 22:12:24 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 21 22:12:24 Compile requests                155 
Aug 21 22:12:24 Compile requests executed        53 
Aug 21 22:12:24 Cache hits                        3 
Aug 21 22:12:24 Cache misses                     49 
Aug 21 22:12:24 Cache timeouts                    0 
Aug 21 22:12:24 Cache read errors                 0 
Aug 21 22:12:24 Forced recaches                   0 
Aug 21 22:12:24 Cache write errors                0

ci.pytorch.org: 1 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.5.1-py3.6-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 252 times.

aten/src/ATen/cudnn/Descriptors.h

aten/src/ATen/native/cudnn/Conv.cpp

…cudnn

mcarilli · 2020-07-15T07:28:30Z

I think we need something like this

} else if (dataType == CUDNN_DATA_FLOAT && !allow_tf32) {
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 8000
      cudnnSetRNNMatrixMathType(mut_desc(), CUDNN_FMA_MATH);
#endif
}

where the math mode is set for RNNs as well. (github wont let me comment on the line directly)

Summary: For mobile custom build, we only generate code for ops that are used by specific models to reduce binary size. There multiple places where we apply the op filtering: - generated_unboxing_wrappers_*.cpp - autograd/VariableType*.cpp - c10 op registration (in aten/gen.py) For c10 op registration, we filter by the main op name - all overloads that match the main op name part will be kept. For generated_unboxing_wrappers_*, we filter by the full op name - only those having exactly the same overload name will be kept. This PR changes generated_unboxing_wrappers_* and autograd/VariableType*.cpp codegen to also filter by the main op name. The reasons are: - keeping all overloads can have better backward compatibility; - generated_unboxing_wrappers_* are relatively small as it only contains thin wrappers for root ops. - generated_unboxing_wrappers_* will be replaced by c10 op registration soon anyway. - autograd/VariableType*.cpp are not included in OSS build. Why it offers better backward compatibility? #40737 is an example: It introduced a new `_convolution` overload and renamed the original one to `_convolution.deprecated`. Before this PR, the model prepared by the old version PyTorch won't be able to run on the custom mobile build generated on the PR because `_convolution.deprecated` won't be kept in the custom build due to full op name matching policy. By relaxing it to partial matching policy, the mobile custom build CI on the PR can pass. Will test the size impact for FB production build before landing. Differential Revision: [D22809564](https://our.internmc.facebook.com/intern/diff/D22809564) [ghstack-poisoned]

Summary: For mobile custom build, we only generate code for ops that are used by specific models to reduce binary size. There multiple places where we apply the op filtering: - generated_unboxing_wrappers_*.cpp - autograd/VariableType*.cpp - c10 op registration (in aten/gen.py) For c10 op registration, we filter by the main op name - all overloads that match the main op name part will be kept. For generated_unboxing_wrappers_*, we filter by the full op name - only those having exactly the same overload name will be kept. This PR changes generated_unboxing_wrappers_* and autograd/VariableType*.cpp codegen to also filter by the main op name. The reasons are: - keeping all overloads can have better backward compatibility; - generated_unboxing_wrappers_* are relatively small as it only contains thin wrappers for root ops. - generated_unboxing_wrappers_* will be replaced by c10 op registration soon anyway. - autograd/VariableType*.cpp are not included in OSS build. Why it offers better backward compatibility? #40737 is an example: It introduced a new `_convolution` overload and renamed the original one to `_convolution.deprecated`. Before this PR, the model prepared by the old version PyTorch won't be able to run on the custom mobile build generated on the PR because `_convolution.deprecated` won't be kept in the custom build due to full op name matching policy. By relaxing it to partial matching policy, the mobile custom build CI on the PR can pass. Will test the size impact for FB production build before landing. ghstack-source-id: 7f87d57 Pull Request resolved: #42214

Summary: For mobile custom build, we only generate code for ops that are used by specific models to reduce binary size. There multiple places where we apply the op filtering: - generated_unboxing_wrappers_*.cpp - autograd/VariableType*.cpp - c10 op registration (in aten/gen.py) For c10 op registration, we filter by the main op name - all overloads that match the main op name part will be kept. For generated_unboxing_wrappers_*, we filter by the full op name - only those having exactly the same overload name will be kept. This PR changes generated_unboxing_wrappers_* and autograd/VariableType*.cpp codegen to also filter by the main op name. The reasons are: - keeping all overloads can have better backward compatibility; - generated_unboxing_wrappers_* are relatively small as it only contains thin wrappers for root ops. - generated_unboxing_wrappers_* will be replaced by c10 op registration soon anyway. - autograd/VariableType*.cpp are not included in OSS build. Why it offers better backward compatibility? #40737 is an example: It introduced a new `_convolution` overload and renamed the original one to `_convolution.deprecated`. Before this PR, the model prepared by the old version PyTorch won't be able to run on the custom mobile build generated on the PR because `_convolution.deprecated` won't be kept in the custom build due to full op name matching policy. By relaxing it to partial matching policy, the mobile custom build CI on the PR can pass. Will test the size impact for FB production build before landing. Differential Revision: D22809564 Test Plan: Imported from OSS Reviewed By: iseeyuan Pulled By: ljk53 fbshipit-source-id: e2fc017da31f38b9430cc2113f33e6d21a0eaf0b

ljk53 · 2020-07-31T23:12:37Z

FYI, if you rebase on #42214 the OSS mobile CIs should pass.

@ngimel could you please take a look at my question on #42214 (comment) - 1) what's the reason we want to keep the old schema as "_convolution.deprecated" (cannot non-mobile case handle BC with the default value?) 2) if we do want to keep the old schema, can we give the new schema a different overload instead? This way we can workaround internal lite-interpreter BC problems easily.

…cudnn

ngimel · 2020-08-01T01:17:18Z

@zasdfgbnm per @ljk53's point, it makes sense instead of creating new overload for _convolution just add an argument with default value - server bc should be able to handle it, and that should make mobile fixes easier?
@ljk53, would just changing _convolution signature and not adding overload make lite interpreter fixes more straightforward?

When a default argument is added, it does not break backward compatibility (BC) for full-jit, but does break BC for mobile bytecode. For example, #40737. To make bytecode BC in this case, we 1. Introduce kMinSupportedBytecodeVersion. The loaded model version should be between kMinSupportedBytecodeVersion and kProducedBytecodeVersion. 2. If an operator is updated, and we can handle BC, bump the kProducedBytecodeVersion (for example, from 3 to 4). 3. If model version is at the older version of the operator, add an adapter function at loading. For the added default arg, we push this default arg to stack before calling the actual operator function. [ghstack-poisoned]

ljk53 · 2020-08-04T17:28:56Z

@ljk53, would just changing _convolution signature and not adding overload make lite interpreter fixes more straightforward?

No - it's the same for the purpose of fixing lite-interp. I asked just to check my understanding. Sounds there was legit reason not to pick a true/false default value for the new bool param.

We have split conversions at different PRs, so here is the summary of status:

[pytorch] include all overloads for OSS custom build #42214 already fixed OSS mobile CIs on this PR (yeah!)
For internal mobile BC CIs, we have several different options:
2a) Martin's PR Mobile backward compatibility #42413 - which creates an adapter for mobile interpreter to handle the old _convolution used in old model;
2b) @ezyang 's comment: [pytorch] include all overloads for OSS custom build #42214 (comment) - have we thought about not modifying the signature of "_convolution"? (e.g. keep reading it from global state in _convolution?)
2c) If we adding overload instead of modifying overload (from "" to "deprecated"):

  original _convolution -> keep and remain unchanged (instead of changing it to ".deprecated")
  new _convolution -> call it _convolution.tf32 or _convolution.v2

I mentioned 2c) just to show a possible way to workaround lite-interp BC problem in general but it might not the most reasonable fix for this specific case. 2b) might be a reasonable thing to do but I'll let you guys decide. Since Martin has prepared 2a) we can go with it if we decide to add the new param.

When a default argument is added, it does not break backward compatibility (BC) for full-jit, but does break BC for mobile bytecode. For example, #40737. To make bytecode BC in this case, we 1. Introduce kMinSupportedBytecodeVersion. The loaded model version should be between kMinSupportedBytecodeVersion and kProducedBytecodeVersion. 2. If an operator is updated, and we can handle BC, bump the kProducedBytecodeVersion (for example, from 3 to 4). 3. If model version is at the older version of the operator, add an adapter function at loading. For the added default arg, we push this default arg to stack before calling the actual operator function. Differential Revision: [D22898314](https://our.internmc.facebook.com/intern/diff/D22898314) [ghstack-poisoned]

Summary: Pull Request resolved: #42413 When a default argument is added, it does not break backward compatibility (BC) for full-jit, but does break BC for mobile bytecode. For example, #40737. To make bytecode BC in this case, we 1. Introduce kMinSupportedBytecodeVersion. The loaded model version should be between kMinSupportedBytecodeVersion and kProducedBytecodeVersion. 2. If an operator is updated, and we can handle BC, bump the kProducedBytecodeVersion (for example, from 3 to 4). 3. If model version is at the older version of the operator, add an adapter function at loading. For the added default arg, we push this default arg to stack before calling the actual operator function. Test Plan: Imported from OSS Reviewed By: xcheng16 Differential Revision: D22898314 Pulled By: iseeyuan fbshipit-source-id: 90d339f8e1365f4bb178db8db7c147390173372b

…ll/tf32-cudnn

iseeyuan · 2020-08-21T23:21:59Z

#42413 has been landed. Please bump kProducedBytecodeVersion (in caffe2/serialize/inline_container.h) to 0x4L in this PR so that it will not break backward compatibility for mobile.

zasdfgbnm · 2020-08-21T23:23:25Z

@iseeyuan Great! Thanks for the reminder!

zasdfgbnm · 2020-08-24T20:13:21Z

This should be ready?

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Pull Request resolved: #40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2

Summary: Pull Request resolved: pytorch/pytorch#40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2

zasdfgbnm added 2 commits June 29, 2020 15:39

[WIP] Enable TF32 support for cuDNN

99e0ba6

ghstack-source-id: 9504cd9 Pull Request resolved: #39361

fix

051ea01

pytorchbot added the open source label Jun 29, 2020

zasdfgbnm added 2 commits June 29, 2020 16:02

internal check math type

2697ce9

fix

e55bb70

mcarilli reviewed Jun 30, 2020

View reviewed changes

aten/src/ATen/cudnn/Descriptors.h Show resolved Hide resolved

mcarilli reviewed Jun 30, 2020

View reviewed changes

aten/src/ATen/native/cudnn/Conv.cpp Show resolved Hide resolved

mcarilli reviewed Jun 30, 2020

View reviewed changes

aten/src/ATen/native/cudnn/Conv.cpp Outdated Show resolved Hide resolved

zasdfgbnm added 7 commits June 29, 2020 20:37

defined(CUDNN_VERSION) && CUDNN_VERSION >= 8000

4331b53

fix

99b5fe9

pyi

914d184

Merge branch 'master' of github.com:pytorch/pytorch into ci-all/tf32-…

86ed050

…cudnn

Merge branch 'master' into ci-all/tf32-cudnn

7843e51

temporarily disable circleci

e7a1cc6

test_grid_sample_large

09c5c93

zasdfgbnm added 13 commits July 15, 2020 15:24

fix flags

6cc8c70

more

c36a1a6

test

8b21893

fix

b937c5a

try

8da9537

fix

23f0709

fix

4f8a9a2

fix

a5cfe7d

try

5a75b37

try

4d891ea

fix

5b40f53

try

c7cbd94

try

930bd4c

zasdfgbnm added 2 commits July 31, 2020 17:50

Merge branch 'master' of github.com:pytorch/pytorch into ci-all/tf32-…

34de063

…cudnn

fix

b6ff8d5

fix autocast

a2bd91a

iseeyuan mentioned this pull request Aug 3, 2020

Mobile backward compatibility #42413

Closed

zasdfgbnm added 2 commits August 21, 2020 16:13

Merge branch 'master' of https://github.com/pytorch/pytorch into ci-a…

7703d1e

…ll/tf32-cudnn

Update check_backward_compatibility.py

539e2de

Update inline_container.h

2d85abe

facebook-github-bot reviewed Aug 24, 2020

View reviewed changes

facebook-github-bot closed this in pytorch/glow@b3a550a Sep 1, 2020

facebook-github-bot added the merged label Sep 1, 2020

zasdfgbnm deleted the ci-all/tf32-cudnn branch September 2, 2020 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable TF32 support for cuDNN #40737

Enable TF32 support for cuDNN #40737

Uh oh!

zasdfgbnm commented Jun 29, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Jun 29, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mcarilli commented Jul 15, 2020 •

edited

Loading

Uh oh!

ljk53 commented Jul 31, 2020

Uh oh!

ngimel commented Aug 1, 2020

Uh oh!

ljk53 commented Aug 4, 2020 •

edited

Loading

Uh oh!

iseeyuan commented Aug 21, 2020

Uh oh!

zasdfgbnm commented Aug 21, 2020

Uh oh!

zasdfgbnm commented Aug 24, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Enable TF32 support for cuDNN #40737

Enable TF32 support for cuDNN #40737

Uh oh!

Conversation

zasdfgbnm commented Jun 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Jun 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_rocm3_5_1_py3_6_test2 (1/1)

ci.pytorch.org: 1 failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mcarilli commented Jul 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljk53 commented Jul 31, 2020

Uh oh!

ngimel commented Aug 1, 2020

Uh oh!

ljk53 commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iseeyuan commented Aug 21, 2020

Uh oh!

zasdfgbnm commented Aug 21, 2020

Uh oh!

zasdfgbnm commented Aug 24, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

zasdfgbnm commented Jun 29, 2020 •

edited

Loading

dr-ci bot commented Jun 29, 2020 •

edited

Loading

mcarilli commented Jul 15, 2020 •

edited

Loading

ljk53 commented Aug 4, 2020 •

edited

Loading