Skip to content

Conversation

@nairbv
Copy link
Collaborator

@nairbv nairbv commented Nov 13, 2018

Summary:

torch.nn.utils.rnn.pack_padded_sequence segment fault if not in
decreasing order #13324

We were seeing this segfault on throw, pre-emptively checking avoids
this:

*** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 ***

Test Plan:

Added unit test based on example provided in issue.

Reviewers:

Subscribers:

Tasks:

Tags:

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to understand where the segfault comes from, because I can’t see how this solution is safer than the previous algorithm.

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 14, 2018

"It would be good to understand where the segfault comes"

I'm not an expert in C++ memory, but it seems to be an issue in how the stack gets unwound when the exception is thrown. The error message suggests a double-free. Stepping through in the debugger I see that it reaches the expected AT_CHECK, satisfies the condition, and throws the error in the original code as expected, but then segfaults when returning. I think failing faster is safer because there's nothing yet allocated that would need to be freed when the exception is thrown.

Summary:

torch.nn.utils.rnn.pack_padded_sequence segment fault if not in
decreasing order pytorch#13324

We were seeing this segfault on throw, pre-emptively checking avoids
this:

*** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 ***

Test Plan:

Added unit test based on example provided in issue.

Reviewers:

Subscribers:

Tasks:

Tags:
@nairbv nairbv force-pushed the pack_padded_order_13324 branch from 4effde1 to a14623f Compare November 14, 2018 17:35
@zou3519 zou3519 self-requested a review November 14, 2018 19:49
@ezyang
Copy link
Contributor

ezyang commented Nov 15, 2018

Did ASAN report anything on this test? It's possible it is a stack unwinding problem but I don't see anything on the stack in this function that should cause the problem. ASAN might be able to say something better.

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 15, 2018

Here's the full output running with ASAN, I haven't read through it in detail yet:

In [6]: pack_padded_sequence(b_a, [22, 25])

==213533==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60f000027c90 at pc 0x7f510d37d464 bp 0x7fff530ffab0 sp 0x7fff530ffaa8
WRITE of size 8 at 0x60f000027c90 thread T0
SCARINESS: 42 (8-byte-write-heap-buffer-overflow)
#0 0x7f510d37d463 in at::native::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) caffe2/aten/src/ATen/native/PackedSequence.cpp:67
#1 0x7f510cf2d4dd in at::TypeDefault::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) const buck-out/dev/gen/caffe2/aten/gen_aten=TypeDefault.cpp/TypeDefault.cpp:4742
#2 0x7f51586327f1 in torch::autograd::VariableType::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) const buck-out/dev/gen/caffe2/generate-code=VariableType_2.cpp/VariableType_2.cpp:380
#3 0x7f515c7a4e42 in at::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) buck-out/dev/gen/caffe2/aten/generated-aten-headers-cpu#header-mode-symlink-tree-with-header-map,headers/ATen/Functions.h:5085
#4 0x7f515c7a4c58 in torch::autograd::dispatch__pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) buck-out/dev/gen/caffe2/generated-autograd-headers-bare#header-mode-symlink-tree-with-header-map,headers/python_torch_functions_dispatch.h:250
#5 0x7f515c5a8bfc in torch::autograd::THPVariable__pack_padded_sequence(_object*, _object*, _object*) buck-out/dev/gen/caffe2/generate-code=python_torch_functions.cpp/python_torch_functions.cpp:970
#6 0x7f517c670bcd in _PyCFunction_FastCallDict /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/methodobject.c:231
#7 0x7f517c770366 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5011
#8 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#9 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#10 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#11 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#12 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#13 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#14 0x7f517c770582 in PyEval_EvalCodeEx /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4361
#15 0x7f517c7705ae in PyEval_EvalCode /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:878
#16 0x7f517c76c394 in builtin_exec_impl /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/bltinmodule.c:983
#17 0x7f517c76c394 in builtin_exec /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/clinic/bltinmodule.c.h:283
#18 0x7f517c670d87 in _PyCFunction_FastCallDict /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/methodobject.c:234
#19 0x7f517c770366 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5011
#20 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#21 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#22 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#23 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#24 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#25 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#26 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#27 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#28 0x7f517c774130 in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3525
#29 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#30 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#31 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#32 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#33 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#34 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#35 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#36 0x7f517c774130 in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3525
#37 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#38 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#39 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#40 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#41 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#42 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#43 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#44 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#45 0x7f517c76e708 in _PyFunction_FastCall /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5093
#46 0x7f517c770502 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5128
#47 0x7f517c770502 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#48 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#49 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#50 0x7f517c77e1e9 in _PyFunction_FastCallDict /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5244
#51 0x7f517c5df21d in _PyObject_FastCallDict /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/abstract.c:2310
#52 0x7f517c5df31a in _PyObject_Call_Prepend /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/abstract.c:2373
#53 0x7f517c5dedc9 in PyObject_Call /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/abstract.c:2261
#54 0x7f517c7742a5 in do_call_core /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5280
#55 0x7f517c7742a5 in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3578
#56 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#57 0x7f517c770554 in PyEval_EvalCodeEx /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4361
#58 0x7f517c62d171 in function_call /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/funcobject.c:604
#59 0x7f517c5dedc9 in PyObject_Call /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/abstract.c:2261
#60 0x7f517c7742a5 in do_call_core /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5280
#61 0x7f517c7742a5 in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3578
#62 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#63 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#64 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#65 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#66 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#67 0x7f517c770582 in PyEval_EvalCodeEx /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4361
#68 0x7f517c7705ae in PyEval_EvalCode /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:878
#69 0x7f517c76c394 in builtin_exec_impl /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/bltinmodule.c:983
#70 0x7f517c76c394 in builtin_exec /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/clinic/bltinmodule.c.h:283
#71 0x7f517c670d87 in _PyCFunction_FastCallDict /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/methodobject.c:234
#72 0x7f517c770366 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5011
#73 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#74 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#75 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#76 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#77 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#78 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#79 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#80 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#81 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#82 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#83 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#84 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#85 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#86 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#87 0x7f517c770582 in PyEval_EvalCodeEx /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4361
#88 0x7f517c7705ae in PyEval_EvalCode /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:878
#89 0x7f517c7cda11 in run_mod /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/pythonrun.c:980
#90 0x7f517c7cda11 in PyRun_StringFlags /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/pythonrun.c:904
#91 0x7f517c7cda9b in PyRun_SimpleStringFlags /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/pythonrun.c:421
#92 0x7f517c7fb2f0 in run_command /home/engshare/third-party2/python/3.6/src/cpython-3.6/Modules/main.c:299
#93 0x7f517c7fb2f0 in Py_Main /home/engshare/third-party2/python/3.6/src/cpython-3.6/Modules/main.c:747
#94 0x400bbc in main (/usr/local/fbcode/gcc-5-glibc-2.23/bin/python3.6+0x400bbc)
#95 0x7f517b887857 in __libc_start_main /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../csu/libc-start.c:289
#96 0x400db8 in _start (/usr/local/fbcode/gcc-5-glibc-2.23/bin/python3.6+0x400db8)

0x60f000027c90 is located 0 bytes to the right of 176-byte region [0x60f000027be0,0x60f000027c90)
allocated by thread T0 here:
#0 0x7f517cc926d8 in malloc (/data/users/bvaughan/fbsource/fbcode/buck-out/dev/gen/pytorch/ifbpy#link-tree/libtools_build_sanitizers_asan-ubsan-py.so+0xf96d8)
#1 0x7f510dbba9f2 in THAllocInternal(long) caffe2/aten/src/TH/THGeneral.cpp:180
#2 0x7f510dbba769 in THAlloc caffe2/aten/src/TH/THGeneral.cpp:196
#3 0x7f510db78ab9 in THDefaultAllocator::allocate(unsigned long) const caffe2/aten/src/TH/THAllocator.cpp:24
#4 0x7f510d4c26ce in at::native::empty_cpu(c10::ArrayRef, at::TensorOptions const&) caffe2/aten/src/ATen/native/TensorFactories.cpp:117
#5 0x7f510cd045ad in at::CPULongType::empty(c10::ArrayRef, at::TensorOptions const&) const buck-out/dev/gen/caffe2/aten/gen_aten=CPULongType.cpp/CPULongType.cpp:2200
#6 0x7f510d37e958 in at::empty(c10::ArrayRef, at::TensorOptions const&) buck-out/dev/gen/caffe2/aten/generated-aten-headers-cpu#header-mode-symlink-tree-with-header-map,headers/ATen/Functions.h:3893
#7 0x7f510d37c9c1 in at::native::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) caffe2/aten/src/ATen/native/PackedSequence.cpp:28
#8 0x7f510cf2d4dd in at::TypeDefault::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) const buck-out/dev/gen/caffe2/aten/gen_aten=TypeDefault.cpp/TypeDefault.cpp:4742
#9 0x7f51586327f1 in torch::autograd::VariableType::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) const buck-out/dev/gen/caffe2/generate-code=VariableType_2.cpp/VariableType_2.cpp:380
#10 0x7f515c7a4e42 in at::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) buck-out/dev/gen/caffe2/aten/generated-aten-headers-cpu#header-mode-symlink-tree-with-header-map,headers/ATen/Functions.h:5085
#11 0x7f515c7a4c58 in torch::autograd::dispatch__pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) buck-out/dev/gen/caffe2/generated-autograd-headers-bare#header-mode-symlink-tree-with-header-map,headers/python_torch_functions_dispatch.h:250
#12 0x7f515c5a8bfc in torch::autograd::THPVariable__pack_padded_sequence(_object*, _object*, _object*) buck-out/dev/gen/caffe2/generate-code=python_torch_functions.cpp/python_torch_functions.cpp:970
#13 0x7f517c670bcd in _PyCFunction_FastCallDict /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/methodobject.c:231
#14 0x7f517c770366 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5011
#15 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#16 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#17 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#18 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#19 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#20 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#21 0x7f517c770582 in PyEval_EvalCodeEx /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4361
#22 0x7f517c7705ae in PyEval_EvalCode /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:878
#23 0x7f517c76c394 in builtin_exec_impl /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/bltinmodule.c:983
#24 0x7f517c76c394 in builtin_exec /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/clinic/bltinmodule.c.h:283
#25 0x7f517c670d87 in _PyCFunction_FastCallDict /home/engshare/third-party2/python/3.6/src/cpython-3.6/Objects/methodobject.c:234
#26 0x7f517c770366 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5011
#27 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#28 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340
#29 0x7f517c770274 in fast_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5152
#30 0x7f517c770274 in call_function /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:5032
#31 0x7f517c77746e in _PyEval_EvalFrameDefault /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:3509
#32 0x7f517c76fa0f in _PyEval_EvalCodeWithName /home/engshare/third-party2/python/3.6/src/cpython-3.6/Python/ceval.c:4340

SUMMARY: AddressSanitizer: heap-buffer-overflow caffe2/aten/src/ATen/native/PackedSequence.cpp:67 in at::native::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool)
Shadow bytes around the buggy address:
0x0c1e7fffcf40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c1e7fffcf50: fd fd fd fd fd fd fa fa fa fa fa fa fa fa 00 00
0x0c1e7fffcf60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c1e7fffcf70: 00 00 00 00 fa fa fa fa fa fa fa fa 00 00 00 00
0x0c1e7fffcf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c1e7fffcf90: 00 00[fa]fa fa fa fa fa fa fa 00 00 00 00 00 00
0x0c1e7fffcfa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c1e7fffcfb0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c1e7fffcfc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa
0x0c1e7fffcfd0: fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd fd
0x0c1e7fffcfe0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==213533==ABORTING

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 15, 2018

I'm a bit confused that the original error was "double free or corruption," and when I stepped through the code it got to line 71 to throw before segfaulting (AT_ERROR("'lengths' array has to be sorted in decreasing order");), whereas here it's pointing to line 67 ((*batch_sizes++) = current_batch_size;) and saying "heap-buffer-overflow"

@igormq
Copy link

igormq commented Nov 15, 2018

Hi @nairbv and @apaszke, I think that I know what might be causing the exception.

In Line 33, we have:

at::Tensor batch_sizes_t = at::empty(lengths[0], _lengths.options());

If the parameter lengths is not sorted, lengths[0] does not have the maximum length, so you guys are probably allocating batch_sizes_t with the wrong size, which could explain the exception in Line 67.

So this code should be something like (I have no skill using the ATEN library :( )

at::Tensor batch_sizes_t = at::empty(at::max(lengths), _lengths.options());

(I do no if at::max exists, but it gives you the idea)

am I right? Does it make sense for you guys? I did not test anything, just pass my eyes through the code, so the probability of my analysis is wrong is really high. Hope I could help.

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 15, 2018

@igormq ah, yes, that does make sense, and much better explains why this PR avoided the segfault. Thanks

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue, as @igormq discovered and @nairbv explained to me offline was that we were performing the check for sortedness too late and indexing pass the end of the batch_sizes_t tensor as a result, causing a buffer overflow.

this looks good to me, I had two minor comments in the code (please read them!). I think it would be nice in the future to remove the sortedness requirement but that needs discussion.

(*batch_sizes++) = current_batch_size;
}
prev_l = l;
} else if (prev_l > l) {

This comment was marked as off-topic.

AT_CHECK(lengths[batch_size - 1] > 0,
"Length of all samples has to be greater than 0, but found an element "
"in 'lengths' that is <= 0");
for(auto i = 0; i < batch_size - 1; i++ ) {

This comment was marked as off-topic.

@nairbv nairbv dismissed apaszke’s stale review November 15, 2018 19:56

found reason for segfault (see other comments)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nairbv is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 16, 2018
Summary:
torch.nn.utils.rnn.pack_padded_sequence segment fault if not in
decreasing order #13324

We were seeing this segfault on throw, pre-emptively checking avoids
this:

*** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 ***
Pull Request resolved: pytorch/pytorch#13933

Differential Revision: D13090389

Pulled By: nairbv

fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8
@ezyang ezyang added the merged label Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants