Add autograd automatic anomaly detection #7677

albanD · 2018-05-18T15:43:40Z

This PR contains two main features:

A metadata dictionary is associated to every C Function. If not used, this cost a single if statement in the Function destructor. This dictionary will remain even if the python Function object goes out of scope. This is not documented as this can only be used from python by recovering the Function object by traversing the graph which is not documented either.
New anomaly detection feature for the autograd. If not used, this cost one if statement on Function creation and on autograd function execution during backward. This can be enabled with a context manager or a global flag switch. When enabled, this has two effect:
- If any function returns nan during the backward pass, an error will be raised.
- If an error is raised during the backward pass, will print the traceback for the forward call that created the failing backward function.

Please double check the python refcounting as I'm not used to write this code and I might have missed one.

To be done in a future PR:

Add checks for nan during nn.Module forwarding. I can't find a good way to do this check for generic forward that does not use nn.Module especially when no grad are required.
Use the metadata to instrument an nn.Module-based forward to allow better graph printing (like collapsing modules).

torch/autograd/anomaly_mode.py

test/test_autograd.py

torch/autograd/anomaly_mode.py

torch/csrc/autograd/anomaly_mode.cpp

torch/csrc/autograd/engine.cpp

torch/csrc/autograd/function.h

torch/csrc/autograd/anomaly_mode.cpp

soumith · 2018-05-30T22:12:11Z

@pytorchbot retest this please

torch/csrc/autograd/anomaly_mode.cpp

torch/csrc/autograd/anomaly_mode.h

colesbury

This is great! Some comments about the C++:

torch/csrc/autograd/anomaly_mode.h

torch/csrc/autograd/anomaly_mode.cpp

albanD · 2018-06-11T14:58:53Z

@ezyang I fixed it so that it works without any NO_PYTHON macro.
It still uses std::cout at the moment. I think the right thing to do is introduce a new warning system that this will use and that will corresponding to printing to std::cerr in cpp build and a python warning in python build.
Given that his might create more discussion, we can get this in as is and i'll change it to the warning system as soon as I get it done?

ezyang · 2018-06-12T01:26:40Z

Yeah, getting cout to the right place is a discussion in and of itself, and we've got a few other places that would benefit from that treatment. No need to block this PR on it.

apaszke

Did we do any microbenchmarks before merging this?

torch/csrc/autograd/python_engine.h

      bool keep_graph,
      bool create_graph,
      const edge_list& outputs = {}) override;
+  virtual std::unique_ptr<AnomalyMetadata> make_anomaly_metadata() override;


torch/csrc/autograd/anomaly_mode.h

+  }
+
+private:
+  static bool _enabled;


albanD · 2018-06-15T10:05:11Z

@apaszke I ran some benchmarks on an op that corresponds to add a constant to a small tensor of size 10. Both before this PR and with this PR with anomaly mode disabled takes 1.08micro s per call.
Note that for such a small op, the overhead is significant when the feature is enabled as one call takes 85micro s per call. But given that we have to acquire the GIL, call into python and gather a whole traceback, that's expected.

nnop · 2020-04-18T08:47:21Z

Does this work for Inf values?

albanD · 2020-04-20T20:44:30Z

No, this only detects NaN.

albanD requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners May 18, 2018 15:43

onnxbot-worker-3 mentioned this pull request May 18, 2018

[auto] pytorch-pr-7677 onnxbot/onnx-fb-universe#2131

Open

fmassa reviewed May 18, 2018

View reviewed changes

torch/autograd/anomaly_mode.py Outdated

This comment was marked as off-topic.

Sign in to view

gchanan reviewed May 18, 2018

View reviewed changes

ezyang reviewed Jun 6, 2018

View reviewed changes

torch/csrc/autograd/anomaly_mode.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jun 6, 2018

View reviewed changes

colesbury reviewed Jun 8, 2018

View reviewed changes

torch/csrc/autograd/anomaly_mode.h Outdated

This comment was marked as off-topic.

Sign in to view

torch/csrc/autograd/anomaly_mode.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

albanD added 8 commits June 11, 2018 15:42

add autograd automatic anomaly detection

7c872d1

python 3 string support

0c0c166

Fix non python build

9d67193

fix typo in doc

f7cba1c

better test and naming fix

9b75ec1

fix no python build and python object handling

5e2eca1

fix missing checks

d6b9f43

clean NO_PYTHON build

a605c35

albanD force-pushed the detect_anomaly branch from 47db851 to a605c35 Compare June 11, 2018 14:55

Remove unwanted changes

ff608fe

ezyang approved these changes Jun 12, 2018

View reviewed changes

ezyang merged commit 78e3259 into pytorch:master Jun 12, 2018

albanD deleted the detect_anomaly branch June 12, 2018 09:22

apaszke reviewed Jun 13, 2018

View reviewed changes

ezyang added the open source label Jun 24, 2019

Varal7 mentioned this pull request Jul 1, 2021

Register Saved Tensors hooks #60663

Closed

Add autograd automatic anomaly detection #7677

Add autograd automatic anomaly detection #7677

Uh oh!

Conversation

albanD commented May 18, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

soumith commented May 30, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

albanD commented Jun 11, 2018

Uh oh!

ezyang commented Jun 12, 2018

Uh oh!

apaszke left a comment