gh-123940: Ensure force-terminated daemon threads can be joined#124150
gh-123940: Ensure force-terminated daemon threads can be joined#124150mpage wants to merge 16 commits intopython:mainfrom
Conversation
During finalization, daemon threads are force to exit immediately (without returning through the call-stack normally) upon acquiring the GIL. Finalizers that run after this must be able to join the forcefully terminated threads. The current implementation notified of thread completion before returning from `thread_run`. This code will never execute if the thread is forced to exit during finalization. Any code that attempts to join such a thread will block indefinitely. To fix this, use the old approach of notifying of thread completion when the thread state is cleared. This happens both when `thread_run` exits normally and when thread states are destroyed as part of finalization (which happens immediately after forcing daemon threads to exit, before any python code can run).
| def loop(): | ||
| while True: | ||
| pass |
There was a problem hiding this comment.
A slight variant where loop() calls time.sleep(1) hard crashes in 3.11 and 3.12. I'm not sure what to make of that, other than this sort of behavior wasn't robust previously either. I think it's noteworthy that the change that led to the issue was from just a few days ago -- it doesn't look like it was some longstanding code that just broke now.
There was a problem hiding this comment.
This is the issue #116514 colesbury linked when I asked (I wasn't able to repro just the time.sleep variant mentioned here)
…on_threads_exited
|
Marking this as a draft while I figure out the linker errors on windows. |
Include threadmodule when building _freeze_module
|
@colesbury @ericsnowcurrently - Would you please have another look at this? I think it makes sense to move |
ericsnowcurrently
left a comment
There was a problem hiding this comment.
mostly LGTM
There's just the matter of where the code lives. It should mostly go in Python/thread.c. However, we have a bit of a mess in threadmodule.c already, which we needn't try to fix all in this PR. I might be okay with punting on that in favor of a follow-up PR, but I'd like the opinion of others on that first.
| llist_init(&interp->threads.daemon_handles); | ||
| llist_init(&interp->threads.non_daemon_handles); |
There was a problem hiding this comment.
It would make sense to have a _PyThread_InitThreadHandles() for this, to match _PyThread_ClearThreadHandles().
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
@ericsnowcurrently and I chatted in person and the consensus is that this PR is ok as-is, but we should follow it up with a refactor (only on main) to move functionality out of |
FYI - #105805, if merged, is set to change this so that these threads do not exit because our C API has no way for them to do so cleanly to cause an unwind of the call stack. They'll be made to hang instead. |
|
And now with #130402, you get It looks like there's nothing to do here, but, I'm not sure if someone still wants to go back to (and improve) force-killing rather than hanging. @mpage, do you want to close the PR? |
During finalization, daemon threads are forced to exit immediately:
cpython/Python/pylifecycle.c
Lines 2023 to 2026 in 10de360
without returning through the call-stack normally upon acquiring the GIL:
cpython/Python/ceval_gil.c
Lines 294 to 302 in 10de360
Finalizers that run after this must be able to join the forcefully terminated threads.
The current implementation notified of thread completion before returning from
thread_run:cpython/Modules/_threadmodule.c
Line 361 in 10de360
Unfortunately, this code will never execute if the thread is forced to exit during finalization. Any code that attempts to join such a thread will block indefinitely.
To fix this, keep track of daemon threads and have the runtime notify their events after they are forced to exit.