Skip to content

Commit 28a4db8

Browse files
Aidyn-Apytorchmergebot
authored andcommitted
[ARM] Fix infinite recursion in unwind (#134387)
Fixes #119905 The `TORCH_SHOW_CPP_STACKTRACES=1` setting on ARM causes infinite recursive unwind because on failure a `StackTraceFetcher` attempts to unwind the <ins>failed instruction</ins>: https://github.com/pytorch/pytorch/blob/5ad759ca33ba8299cf7e1a6bb1dff7c9a5555e29/torch/csrc/profiler/combined_traceback.cpp#L25 then the unwind itself fails: https://github.com/pytorch/pytorch/blob/5ad759ca33ba8299cf7e1a6bb1dff7c9a5555e29/torch/csrc/profiler/unwind/unwind.cpp#L10-L12 and it causes another attempt to unwind the failure in `unwind()`... In summary, the executed instruction is equivalent to: ```C++ std::vector<void*> unwind() { // some instructions ... return unwind(); } ``` This PR replaces `TORCH_CHECK` by `TORCH_WARN_ONCE` as it will not cause an uncontrolled recursion. The only side effect would be an empty back-trace. Huge thanks to @nWEIdia who found the root cause! Pull Request resolved: #134387 Approved by: https://github.com/eqy, https://github.com/nWEIdia, https://github.com/malfet
1 parent 900c508 commit 28a4db8

File tree

3 files changed

+11
-16
lines changed

3 files changed

+11
-16
lines changed

test/run_test.py

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -573,13 +573,8 @@ def run_test(
573573

574574
def try_set_cpp_stack_traces(env, command, set=True):
575575
# Print full c++ stack traces during retries
576-
# Don't do it for macos inductor tests as it makes them
577-
# segfault for some reason
578-
if not (
579-
IS_MACOS and len(command) >= 2 and command[2].startswith(INDUCTOR_TEST_PREFIX)
580-
):
581-
env = env or {}
582-
env["TORCH_SHOW_CPP_STACKTRACES"] = "1" if set else "0"
576+
env = env or {}
577+
env["TORCH_SHOW_CPP_STACKTRACES"] = "1" if set else "0"
583578
return env
584579

585580

torch/csrc/Module.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ static PyObject* THPModule_initExtension(
170170
PyObject* _unused,
171171
PyObject* shm_manager_path) {
172172
HANDLE_TH_ERRORS
173-
#if !defined(FBCODE_CAFFE2)
173+
#if !defined(FBCODE_CAFFE2) && !defined(__aarch64__)
174174
if (torch::get_cpp_stacktraces_enabled()) {
175175
c10::SetStackTraceFetcher([]() -> std::string {
176176
auto tb = torch::CapturedTraceback::gather(false, false, true);

torch/csrc/profiler/unwind/unwind.cpp

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,29 @@
77
!__has_include("ext/stdio_filebuf.h")
88
namespace torch::unwind {
99
std::vector<void*> unwind() {
10-
TORCH_CHECK(
11-
false,
10+
TORCH_WARN_ONCE(
1211
"record_context_cpp is not support on non-linux non-x86_64 platforms");
12+
return {};
1313
}
1414

1515
std::optional<std::pair<std::string, uint64_t>> libraryFor(void* addr) {
16-
TORCH_CHECK(
17-
false,
16+
TORCH_WARN_ONCE(
1817
"record_context_cpp is not support on non-linux non-x86_64 platforms");
18+
return {};
1919
}
2020

2121
#ifndef FBCODE_CAFFE2
2222
std::vector<Frame> symbolize(const std::vector<void*>& frames, Mode mode) {
23-
TORCH_CHECK(
24-
false,
23+
TORCH_WARN_ONCE(
2524
"record_context_cpp is not support on non-linux non-x86_64 platforms");
25+
return {};
2626
}
2727
#endif
2828

2929
Stats stats() {
30-
TORCH_CHECK(
31-
false,
30+
TORCH_WARN_ONCE(
3231
"record_context_cpp is not support on non-linux non-x86_64 platforms");
32+
return {};
3333
}
3434

3535
} // namespace torch::unwind

0 commit comments

Comments
 (0)