Distributed autograd converts all python exceptions to RuntimeError

## 🐛 Bug

If an instance of `torch.autograd.Function` raises a python exception during the backward pass, distributed autograd will always convert the type of that exception to `RuntimeError`.

## To Reproduce

See `DistAutogradTest.test_backward_autograd_engine_error`:
https://github.com/pytorch/pytorch/blob/6ad9e5c70d1f496b9eaca191ec18e26c106c6436/torch/testing/_internal/distributed/rpc/dist_autograd_test.py#L1055-L1058

The test explicitly expects `RuntimeError` even though `SimulateBackwardError` raises `Exception`:
https://github.com/pytorch/pytorch/blob/6ad9e5c70d1f496b9eaca191ec18e26c106c6436/torch/testing/_internal/distributed/rpc/dist_autograd_test.py#L144-L145

## Expected behavior

The raised exception should have the same type as raised in the python code. 

## Additional context

This root cause seems to be that autograd raises a `python_error` exception in C++ and that is translated to `RuntimeError` by the default `pybind11` exception translator. In #30588 I register a new exception translator that treats `python_error`s correctly. However, this failed CI with some workers crashing (see https://github.com/pytorch/pytorch/pull/30588#discussion_r359909521)


cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jjlilley

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed autograd converts all python exceptions to RuntimeError #32636

🐛 Bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	with self.assertRaisesRegex(RuntimeError, 'Simulate error on backward pass'):
	# Run backwards, and validate we receive an error.
	dist_autograd.backward([val.sum()])

	def backward(ctx, input):
	raise Exception('Simulate error on backward pass')

Distributed autograd converts all python exceptions to RuntimeError #32636

Description

🐛 Bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions