-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Describe the bug
This bug is reproducible in the current stable release (2.0.1), but not in the latest nightly. I suspect it got fixed in #92986 or #97184 but I am still reporting for future reference, as the failure mode and error messages are slightly different than in previously reported issues. I am also not sure if the issue if fixed or simply hidden.
The reproducer is pretty fragile, but it seems like the key elements are inductor, DDP with at least two buckets, and weight_norm.
The errors I've seen (from the same script, just enabling/disabling reproduction):
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised Unsupported: Unsupported: meta converter nyi with fake tensor propagation.
and
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
Possibly related issues: #92877, #92941, #94574, maybe #61470
Error logs
Log 1:
Running on rank 0.
Running on rank 1.
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping enable_dynamic /usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py
[2023-06-29 22:37:25,098] torch._dynamo.eval_frame: [DEBUG] skipping enable_dynamic /usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py
[2023-06-29 22:37:25,105] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
[2023-06-29 22:37:25,105] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
[2023-06-29 22:37:25,106] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:28
[2023-06-29 22:37:25,106] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:28
[2023-06-29 22:37:25,106] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:37:25,106] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:37:25,106] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:37:25,106] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:37:25,107] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:37:25,107] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:37:25,107] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,107] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,108] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,108] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,108] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,108] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,119] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,119] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,135] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST a [TensorVariable()]
[2023-06-29 22:37:25,135] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:29
[2023-06-29 22:37:25,136] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:37:25,136] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:37:25,136] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST a [TensorVariable()]
[2023-06-29 22:37:25,136] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:29
[2023-06-29 22:37:25,136] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:37:25,136] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:37:25,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:37:25,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:37:25,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,138] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,138] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,138] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:37:25,138] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,149] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,149] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST c [TensorVariable()]
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST c [TensorVariable()]
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:30
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:30
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST a []
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST a []
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST c [TensorVariable()]
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST c [TensorVariable()]
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_TUPLE 2 [TensorVariable(), TensorVariable()]
[2023-06-29 22:37:25,164] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_TUPLE 2 [TensorVariable(), TensorVariable()]
[2023-06-29 22:37:25,165] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TupleVariable()]
[2023-06-29 22:37:25,165] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TupleVariable()]
[2023-06-29 22:37:25,165] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
[2023-06-29 22:37:25,165] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
[2023-06-29 22:37:25,165] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
[2023-06-29 22:37:25,165] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
[2023-06-29 22:37:25,165] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /content/dist.py, line 30 in forward>])
[2023-06-29 22:37:25,165] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /content/dist.py, line 30 in forward>])
[2023-06-29 22:37:25,168] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function compile_fn
[2023-06-29 22:37:25,168] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function compile_fn
[2023-06-29 22:37:25,168] torch._dynamo.backends.distributed: [INFO] DDPOptimizer used bucket cap 26214400 and produced the following buckets:
[2023-06-29 22:37:25,168] torch._dynamo.backends.distributed: [INFO] DDPOptimizer used bucket cap 26214400 and produced the following buckets:
[2023-06-29 22:37:25,175] torch._dynamo.backends.distributed: [INFO]
DDPOptimizer bucket assignments
Index Size (b) Param Names
------- ---------- ----------------
0 1314816 self_c2_bias
self_c2_weight_g
self_c2_weight_v
1 1320960 self_c1_weight
self_c1_bias
self_c2_bias
self_c2_weight_g
self_c2_weight_v
self_c1_weight
self_c1_bias
[2023-06-29 22:37:25,175] torch._dynamo.backends.distributed: [INFO]
DDPOptimizer bucket assignments
Index Size (b) Param Names
------- ---------- ----------------
0 1314816 self_c2_bias
self_c2_weight_g
self_c2_weight_v
1 1320960 self_c1_weight
self_c1_bias
self_c2_bias
self_c2_weight_g
self_c2_weight_v
self_c1_weight
self_c1_bias
[2023-06-29 22:37:25,179] torch._dynamo.backends.distributed: [DEBUG]
---orig graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2_1 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return (self_c2, self_c2_1)
---split graph---
graph():
%x : torch.Tensor [#users=1] = placeholder[target=x]
%submod_0 : [#users=2] = call_module[target=submod_0](args = (%x,), kwargs = {})
%getitem : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 0), kwargs = {})
%getitem_1 : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 1), kwargs = {})
%submod_1 : [#users=1] = call_module[target=submod_1](args = (%getitem,), kwargs = {})
return (getitem_1, submod_1)
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
---submod_1 graph---
graph():
%self_c1_1 : [#users=1] = placeholder[target=self_c1_1]
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return self_c2
---------------
[2023-06-29 22:37:25,179] torch._dynamo.backends.distributed: [DEBUG]
---orig graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2_1 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return (self_c2, self_c2_1)
---split graph---
graph():
%x : torch.Tensor [#users=1] = placeholder[target=x]
%submod_0 : [#users=2] = call_module[target=submod_0](args = (%x,), kwargs = {})
%getitem : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 0), kwargs = {})
%getitem_1 : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 1), kwargs = {})
%submod_1 : [#users=1] = call_module[target=submod_1](args = (%getitem,), kwargs = {})
return (getitem_1, submod_1)
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
---submod_1 graph---
graph():
%self_c1_1 : [#users=1] = placeholder[target=self_c1_1]
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return self_c2
---------------
[2023-06-29 22:37:25,180] torch._dynamo.backends.distributed: [DEBUG] run_node placeholder, x got args tuple()
[2023-06-29 22:37:25,180] torch._dynamo.backends.distributed: [DEBUG] run_node placeholder, x got args tuple()
[2023-06-29 22:37:25,180] torch._dynamo.backends.distributed: [DEBUG] run_node call_module, submod_0 got args tuple(T[torch.Size([20, 1, 16, 16])])
[2023-06-29 22:37:25,180] torch._dynamo.backends.distributed: [DEBUG] run_node call_module, submod_0 got args tuple(T[torch.Size([20, 1, 16, 16])])
[2023-06-29 22:37:25,195] torch._dynamo.backends.distributed: [DEBUG]
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
[2023-06-29 22:37:25,197] torch._dynamo.debug_utils: [WARNING] Compiled Fx GraphModule failed. Creating script to minify the error.
[2023-06-29 22:37:25,200] torch._dynamo.debug_utils: [WARNING] Writing minified repro to /content/torch_compile_debug/run_2023_06_29_22_37_25_088214-pid_1997/minifier/minifier_launcher.py
[2023-06-29 22:37:25,206] torch._dynamo.backends.distributed: [DEBUG]
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
[2023-06-29 22:37:25,209] torch._dynamo.debug_utils: [WARNING] Compiled Fx GraphModule failed. Creating script to minify the error.
[2023-06-29 22:37:25,212] torch._dynamo.debug_utils: [WARNING] Writing minified repro to /content/torch_compile_debug/run_2023_06_29_22_37_25_091232-pid_1998/minifier/minifier_launcher.py
Traceback (most recent call last):
File "/content/dist.py", line 44, in <module>
mp.spawn(demo_basic,
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 670, in call_user_compiler
compiled_fn = compiler_fn(gm, self.fake_example_inputs())
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/distributed.py", line 349, in compile_fn
submod_compiler.run(*example_inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/fx/interpreter.py", line 136, in run
self.env[node] = self.run_node(node)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/distributed.py", line 330, in run_node
compiled_submod_real = self.compile_submod(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/distributed.py", line 273, in compile_submod
self.compiler(input_mod, args),
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/debug_utils.py", line 1031, in debug_wrapper
compiled_gm = compiler_fn(copy.deepcopy(gm), example_inputs)
File "/usr/lib/python3.10/copy.py", line 153, in deepcopy
y = copier(memo)
File "/usr/local/lib/python3.10/dist-packages/torch/fx/graph_module.py", line 712, in __deepcopy__
fake_mod.__dict__ = copy.deepcopy(self.__dict__, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.10/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 153, in deepcopy
y = copier(memo)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 86, in __deepcopy__
raise RuntimeError(
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
While executing %submod_0 : [#users=2] = call_module[target=submod_0](args = (%x,), kwargs = {})
Original traceback:
None
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/dist.py", line 38, in demo_basic
outputs = ddp_model(torch.randn(20, 1, 16, 16))
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1113, in _run_ddp_forward
return module_to_run(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 82, in forward
return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 334, in catch_errors
return hijacked_callback(frame, cache_size, hooks)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
result = inner_convert(frame, cache_size, hooks)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
return _compile(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
out_code = transform_code_object(code, transform)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
transformations(instructions, code_options)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 311, in transform
tracer.run()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
super().run()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
and self.step()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
getattr(self, inst.opname)(inst)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
self.output.compile_subgraph(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 541, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
While executing %submod_0 : [#users=2] = call_module[target=submod_0](args = (%x,), kwargs = {})
Original traceback:
None
Set torch._dynamo.config.verbose=True for more information
Minifier script written to /content/torch_compile_debug/run_2023_06_29_22_37_25_091232-pid_1998/minifier/minifier_launcher.py. Run this script to find the smallest traced graph which reproduces this error.
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Log 2:
Running on rank 1.
[W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:12355 (errno: 99 - Cannot assign requested address).
Running on rank 0.
[2023-06-29 22:42:19,651] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,651] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping __init__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping __enter__ /usr/lib/python3.10/contextlib.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping enable_dynamic /usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py
[2023-06-29 22:42:19,652] torch._dynamo.eval_frame: [DEBUG] skipping enable_dynamic /usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py
[2023-06-29 22:42:19,661] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
[2023-06-29 22:42:19,661] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
[2023-06-29 22:42:19,662] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:28
[2023-06-29 22:42:19,662] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:28
[2023-06-29 22:42:19,662] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:42:19,662] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:42:19,662] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:42:19,662] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:42:19,663] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:42:19,663] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:42:19,663] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,663] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,664] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,664] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,664] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,664] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,677] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,677] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,698] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST a [TensorVariable()]
[2023-06-29 22:42:19,699] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST a [TensorVariable()]
[2023-06-29 22:42:19,699] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:29
[2023-06-29 22:42:19,699] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:29
[2023-06-29 22:42:19,699] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:42:19,699] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self []
[2023-06-29 22:42:19,699] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:42:19,699] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c2 [NNModuleVariable()]
[2023-06-29 22:42:19,700] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:42:19,700] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST self [NNModuleVariable()]
[2023-06-29 22:42:19,700] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,700] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR c1 [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,701] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,701] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [NNModuleVariable(), NNModuleVariable()]
[2023-06-29 22:42:19,701] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,701] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,716] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,716] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [NNModuleVariable(), TensorVariable()]
[2023-06-29 22:42:19,735] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST c [TensorVariable()]
[2023-06-29 22:42:19,735] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:30
[2023-06-29 22:42:19,736] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST a []
[2023-06-29 22:42:19,736] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST c [TensorVariable()]
[2023-06-29 22:42:19,736] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_TUPLE 2 [TensorVariable(), TensorVariable()]
[2023-06-29 22:42:19,736] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TupleVariable()]
[2023-06-29 22:42:19,736] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
[2023-06-29 22:42:19,736] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
[2023-06-29 22:42:19,737] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /content/dist.py, line 30 in forward>])
[2023-06-29 22:42:19,737] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_FAST c [TensorVariable()]
[2023-06-29 22:42:19,737] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /content/dist.py:30
[2023-06-29 22:42:19,737] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST a []
[2023-06-29 22:42:19,738] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST c [TensorVariable()]
[2023-06-29 22:42:19,738] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_TUPLE 2 [TensorVariable(), TensorVariable()]
[2023-06-29 22:42:19,738] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TupleVariable()]
[2023-06-29 22:42:19,738] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
[2023-06-29 22:42:19,738] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
[2023-06-29 22:42:19,738] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /content/dist.py, line 30 in forward>])
[2023-06-29 22:42:19,740] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function compile_fn
[2023-06-29 22:42:19,740] torch._dynamo.backends.distributed: [INFO] DDPOptimizer used bucket cap 26214400 and produced the following buckets:
[2023-06-29 22:42:19,741] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function compile_fn
[2023-06-29 22:42:19,742] torch._dynamo.backends.distributed: [INFO] DDPOptimizer used bucket cap 26214400 and produced the following buckets:
[2023-06-29 22:42:19,749] torch._dynamo.backends.distributed: [INFO]
DDPOptimizer bucket assignments
Index Size (b) Param Names
------- ---------- ----------------
0 1314816 self_c2_bias
self_c2_weight_g
self_c2_weight_v
1 1320960 self_c1_weight
self_c1_bias
self_c2_bias
self_c2_weight_g
self_c2_weight_v
self_c1_weight
self_c1_bias
[2023-06-29 22:42:19,749] torch._dynamo.backends.distributed: [INFO]
DDPOptimizer bucket assignments
Index Size (b) Param Names
------- ---------- ----------------
0 1314816 self_c2_bias
self_c2_weight_g
self_c2_weight_v
1 1320960 self_c1_weight
self_c1_bias
self_c2_bias
self_c2_weight_g
self_c2_weight_v
self_c1_weight
self_c1_bias
[2023-06-29 22:42:19,753] torch._dynamo.backends.distributed: [DEBUG]
---orig graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2_1 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return (self_c2, self_c2_1)
---split graph---
graph():
%x : torch.Tensor [#users=1] = placeholder[target=x]
%submod_0 : [#users=2] = call_module[target=submod_0](args = (%x,), kwargs = {})
%getitem : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 0), kwargs = {})
%getitem_1 : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 1), kwargs = {})
%submod_1 : [#users=1] = call_module[target=submod_1](args = (%getitem,), kwargs = {})
return (getitem_1, submod_1)
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
---submod_1 graph---
graph():
%self_c1_1 : [#users=1] = placeholder[target=self_c1_1]
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return self_c2
---------------
[2023-06-29 22:42:19,754] torch._dynamo.backends.distributed: [DEBUG]
---orig graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2_1 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return (self_c2, self_c2_1)
---split graph---
graph():
%x : torch.Tensor [#users=1] = placeholder[target=x]
%submod_0 : [#users=2] = call_module[target=submod_0](args = (%x,), kwargs = {})
%getitem : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 0), kwargs = {})
%getitem_1 : [#users=1] = call_function[target=operator.getitem](args = (%submod_0, 1), kwargs = {})
%submod_1 : [#users=1] = call_module[target=submod_1](args = (%getitem,), kwargs = {})
return (getitem_1, submod_1)
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
---submod_1 graph---
graph():
%self_c1_1 : [#users=1] = placeholder[target=self_c1_1]
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1_1,), kwargs = {})
return self_c2
---------------
[2023-06-29 22:42:19,755] torch._dynamo.backends.distributed: [DEBUG] run_node placeholder, x got args tuple()
[2023-06-29 22:42:19,755] torch._dynamo.backends.distributed: [DEBUG] run_node call_module, submod_0 got args tuple(T[torch.Size([20, 1, 16, 16])])
[2023-06-29 22:42:19,755] torch._dynamo.backends.distributed: [DEBUG] run_node placeholder, x got args tuple()
[2023-06-29 22:42:19,755] torch._dynamo.backends.distributed: [DEBUG] run_node call_module, submod_0 got args tuple(T[torch.Size([20, 1, 16, 16])])
[2023-06-29 22:42:19,774] torch._dynamo.backends.distributed: [DEBUG]
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
[2023-06-29 22:42:19,774] torch._dynamo.backends.distributed: [DEBUG]
---submod_0 graph---
graph():
%x : torch.Tensor [#users=2] = placeholder[target=x]
%self_c1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
%self_c2 : [#users=1] = call_module[target=self_c2](args = (%self_c1,), kwargs = {})
%self_c1_1 : [#users=1] = call_module[target=self_c1](args = (%x,), kwargs = {})
return (self_c1_1, self_c2)
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
[2023-06-29 22:42:23,364] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling FORWARDS graph 0
[2023-06-29 22:42:23,368] torch._inductor.compile_fx: [INFO] Step 3: torchinductor compiling FORWARDS graph 0
[2023-06-29 22:42:23,384] torch._inductor.graph: [INFO] Creating implicit fallback for:
target: aten._weight_norm_interface.default
args[0]: TensorBox(StorageBox(
InputBuffer(name='primals_5', layout=FixedLayout('cpu', torch.float32, size=[512, 128, 5, 1], stride=[640, 5, 1, 1]))
))
args[1]: TensorBox(StorageBox(
InputBuffer(name='primals_4', layout=FixedLayout('cpu', torch.float32, size=[512, 1, 1, 1], stride=[1, 1, 1, 1]))
))
[2023-06-29 22:42:23,387] torch._inductor.graph: [INFO] Creating implicit fallback for:
target: aten._weight_norm_interface.default
args[0]: TensorBox(StorageBox(
InputBuffer(name='primals_5', layout=FixedLayout('cpu', torch.float32, size=[512, 128, 5, 1], stride=[640, 5, 1, 1]))
))
args[1]: TensorBox(StorageBox(
InputBuffer(name='primals_4', layout=FixedLayout('cpu', torch.float32, size=[512, 1, 1, 1], stride=[1, 1, 1, 1]))
))
[2023-06-29 22:42:23,406] torch._inductor.graph: [INFO] Using FallbackKernel: torch.ops.aten._weight_norm_interface.default
[2023-06-29 22:42:23,413] torch._inductor.graph: [INFO] Using FallbackKernel: torch.ops.aten._weight_norm_interface.default
[2023-06-29 22:42:23,430] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling FORWARDS graph 0
[2023-06-29 22:42:23,430] torch._inductor.debug: [WARNING] model__0_forward_1 debug trace: /tmp/torchinductor_root/4b/c4bjupqmt2t2wjnxedlu3nozk4gbcjot2cglrvvjco75ypzce326.debug
[2023-06-29 22:42:23,442] torch._inductor.compile_fx: [INFO] Step 3: torchinductor done compiling FORWARDS graph 0
[2023-06-29 22:42:23,442] torch._inductor.debug: [WARNING] model__0_forward_1 debug trace: /tmp/torchinductor_root/4b/c4bjupqmt2t2wjnxedlu3nozk4gbcjot2cglrvvjco75ypzce326.debug
[2023-06-29 22:42:23,455] torch._dynamo.backends.distributed: [DEBUG] run_node call_function, <built-in function getitem> got args tuple(tuple(T[torch.Size([20, 128, 12, 16])], T[torch.Size([20, 512, 8, 16])]), 0)
[2023-06-29 22:42:23,456] torch._dynamo.backends.distributed: [DEBUG] run_node call_function, <built-in function getitem> got args tuple(tuple(T[torch.Size([20, 128, 12, 16])], T[torch.Size([20, 512, 8, 16])]), 1)
[2023-06-29 22:42:23,456] torch._dynamo.backends.distributed: [DEBUG] run_node call_module, submod_1 got args tuple(T[torch.Size([20, 128, 12, 16])])
[2023-06-29 22:42:23,464] torch._dynamo.utils: [WARNING] Unsupported: meta converter nyi with fake tensor propagation.
[2023-06-29 22:42:23,469] torch._dynamo.backends.distributed: [DEBUG] run_node call_function, <built-in function getitem> got args tuple(tuple(T[torch.Size([20, 128, 12, 16])], T[torch.Size([20, 512, 8, 16])]), 0)
[2023-06-29 22:42:23,469] torch._dynamo.backends.distributed: [DEBUG] run_node call_function, <built-in function getitem> got args tuple(tuple(T[torch.Size([20, 128, 12, 16])], T[torch.Size([20, 512, 8, 16])]), 1)
[2023-06-29 22:42:23,469] torch._dynamo.backends.distributed: [DEBUG] run_node call_module, submod_1 got args tuple(T[torch.Size([20, 128, 12, 16])])
[2023-06-29 22:42:23,476] torch._dynamo.utils: [WARNING] Unsupported: meta converter nyi with fake tensor propagation.
Traceback (most recent call last):
File "/content/dist.py", line 44, in <module>
mp.spawn(demo_basic,
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 808, in wrap_fake_exception
return fn()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 819, in <lambda>
return wrap_fake_exception(lambda: copy.deepcopy(obj))
File "/usr/lib/python3.10/copy.py", line 153, in deepcopy
y = copier(memo)
File "/usr/local/lib/python3.10/dist-packages/torch/fx/graph_module.py", line 712, in __deepcopy__
fake_mod.__dict__ = copy.deepcopy(self.__dict__, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.10/copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.10/copy.py", line 153, in deepcopy
y = copier(memo)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 84, in __deepcopy__
return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo)
File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1534, in handle_torch_function
result = mode.__torch_function__(public_api, types, args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1417, in __torch_function__
out = self.fake_mode.from_tensor(tensor, static_shapes=True)
File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1324, in from_tensor
return self.fake_tensor_converter(
File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 314, in __call__
return self.from_real_tensor(
File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 280, in from_real_tensor
raise UnsupportedFakeTensorException("meta converter nyi")
torch._subclasses.fake_tensor.UnsupportedFakeTensorException: meta converter nyi
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 670, in call_user_compiler
compiled_fn = compiler_fn(gm, self.fake_example_inputs())
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/distributed.py", line 349, in compile_fn
submod_compiler.run(*example_inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/fx/interpreter.py", line 136, in run
self.env[node] = self.run_node(node)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/distributed.py", line 318, in run_node
curr_submod = deepcopy_to_fake_tensor(real_mod, fake_mode)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 819, in deepcopy_to_fake_tensor
return wrap_fake_exception(lambda: copy.deepcopy(obj))
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 814, in wrap_fake_exception
raise unimplemented(msg) from e
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/exc.py", line 71, in unimplemented
raise Unsupported(msg)
torch._dynamo.exc.Unsupported: Unsupported: meta converter nyi with fake tensor propagation.
While executing %submod_1 : [#users=1] = call_module[target=submod_1](args = (%getitem,), kwargs = {})
Original traceback:
None
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/dist.py", line 38, in demo_basic
outputs = ddp_model(torch.randn(20, 1, 16, 16))
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1113, in _run_ddp_forward
return module_to_run(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 82, in forward
return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 334, in catch_errors
return hijacked_callback(frame, cache_size, hooks)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
result = inner_convert(frame, cache_size, hooks)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
return _compile(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
out_code = transform_code_object(code, transform)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
transformations(instructions, code_options)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 311, in transform
tracer.run()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
super().run()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
and self.step()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
getattr(self, inst.opname)(inst)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
self.output.compile_subgraph(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 541, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised Unsupported: Unsupported: meta converter nyi with fake tensor propagation.
While executing %submod_1 : [#users=1] = call_module[target=submod_1](args = (%getitem,), kwargs = {})
Original traceback:
None
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
Minified repro
# automatic repro does not work
#DOESNT CRASH WITH BACKEND='EAGER', unless torchdynamo_repro_after is uncommented
#Unsupported: meta converter nyi with fake tensor propagation.
import os
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.multiprocessing as mp
from torch.nn import Conv2d
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.nn.utils import weight_norm
def setup(rank, world_size):
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355'
os.environ['TORCH_COMPILE_DEBUG'] = '1'
#os.environ['TORCHDYNAMO_REPRO_AFTER']="dynamo" #uncomment for a different failure mode :)
# initialize the process group
dist.init_process_group("gloo", rank=rank, world_size=world_size)
class Network(nn.Module):
def __init__(self):
super().__init__()
self.c1 = Conv2d(1, 128, (5,1))
self.c2 = weight_norm(Conv2d(128, 512, (5, 1)))
def forward(self, x):
a = self.c2(self.c1(x))
c = self.c2(self.c1(x))
return a, c
def demo_basic(rank, world_size):
print(f"Running on rank {rank}.")
setup(rank, world_size)
model = Network()
model = torch.compile(model)
ddp_model = DDP(model)
outputs = ddp_model(torch.randn(20, 1, 16, 16))
dist.destroy_process_group()
if __name__ == "__main__":
world_size=2
mp.spawn(demo_basic,
args=(world_size,),
nprocs=world_size)
Versions
Collecting environment information...
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.25.2
Libc version: glibc-2.31
Python version: 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.107+-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 0
CPU MHz: 2200.150
BogoMIPS: 4400.30
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32 KiB
L1i cache: 32 KiB
L2 cache: 256 KiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,1
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable; SMT Host state unknown
Vulnerability Meltdown: Vulnerable
Vulnerability Mmio stale data: Vulnerable
Vulnerability Retbleed: Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==2.0.1+cu118
[pip3] torchaudio==2.0.2+cu118
[pip3] torchdata==0.6.1
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.15.2
[pip3] torchvision==0.15.2+cu118
[pip3] triton==2.0.0
[conda] Could not collect