Make outputs go to correct cell when generated in threads/asyncio#1186
Conversation
5dae96f to
10c149c
Compare
1ef474c to
9e9c40e
Compare
davidbrochart
left a comment
There was a problem hiding this comment.
Impressive PR @krassowski, that's great.
Just to confirm, this wouldn't work with threads that are not created from Python, right? For instance in a C extension.
|
Yes, writing from threads which were created outside of Python will still write to the most recently executed cell, as will writing directly to OS-level file descriptor corresponding to stdout/stderr. |
ipykernel/ipkernel.py
Outdated
| if isinstance(stream, OutStream): | ||
| stream._thread_parents[self.ident] = parent_header | ||
|
|
||
| threading.Thread.start = start_closure # type:ignore[method-assign] |
There was a problem hiding this comment.
I was wondering if there could be a way to not monkey-patch threading.Thread.start.
In akernel it's the print function that is monkey-patched, so this PR goes further because it works for lower-level writes to stdout/stderr, but with monkey-patching there is always a risk that another library changes the same object, and we cannot know that our change will be the last applied.
There was a problem hiding this comment.
My understanding as per https://bugs.python.org/issue14073 is that there is no other way, but we could mention our use case as another situation motivating introduction of start/exit hooks/callbacks to threads (which is something Python committers have considered in the past but I presume it was not a sufficiently high priority until now).
There was a problem hiding this comment.
We crossed 'comments', see my other thread on this exact line for (what I think is) a better solution.
ipykernel/ipkernel.py
Outdated
| if isinstance(stream, OutStream): | ||
| stream._thread_parents[self.ident] = parent_header | ||
|
|
||
| threading.Thread.start = start_closure # type:ignore[method-assign] |
There was a problem hiding this comment.
I don't think it thread-safe. If a thread starts a new thread, that new thread will start outputting in the last executed cell.
I've taken a different approach in Solara, where I patch the ctor and run from the start:
https://github.com/widgetti/solara/blob/0ac4767a8c5f8c8b221cafe41fa9ac36270adbe8/solara/server/patch.py#L336
I think this approach might work better.
There was a problem hiding this comment.
To summarize the strategy, the ctor (__init__) would take a reference to the parent_header, and in run, that parent header is then set in a thread_local storage (in your case _thread_parents).
There was a problem hiding this comment.
If a thread starts a new thread, that new thread will start outputting in the last executed cell.
Correct (there is no such a problem with asyncio side of things).
I think this approach might work better.
Thank you for the link, I will take a look!
There was a problem hiding this comment.
Taking a closer look at solara, current_context takes the role of _thread_parents, right? This does not seem to differ that much different. Your approach of patching earlier on initialization rather than startup has the advantage that it will work with:
from threading import Thread
from time import sleep
def child_target():
for i in range(iterations):
print(i, end='', flush=True)
sleep(interval)
def parent_target():
thread = Thread(target=child_target)
sleep(interval)
thread.start()
Thread(target=parent_target).start()but still not with:
def parent_target():
sleep(interval)
Thread(target=child_target).start()
Thread(target=parent_target).start()do I see this right?
There was a problem hiding this comment.
In the end my implementation converged to overriding the same methods as yours after all in e1258de :)
| self._parent_header: contextvars.ContextVar[Dict[str, Any]] = contextvars.ContextVar( | ||
| "parent_header" | ||
| ) | ||
| self._parent_header.set({}) | ||
| self._thread_parents = {} | ||
| self._parent_header_global = {} |
There was a problem hiding this comment.
Curious why the ContextVar and the _thread_parents dict? Is a threading.local not more standard?
There was a problem hiding this comment.
Because contextvar works well for asyncio edge cases. There is some more explanation in the PEP 567, but the gist is:
Thread-local variables are insufficient for asynchronous tasks that execute concurrently in the same OS thread. Any context manager that saves and restores a context value using
threading.local()will have its context values bleed to other code unexpectedly when used in async/await code.
That said, I will take another look at using threading.local instead of _thread_parents.
There was a problem hiding this comment.
I understand that we need ContextVar now, because also when we create new tasks we want to output to the right output cell, didn't think of that!
I think if we use ContextVar, we do not need the thread local storage, it's a superset of threading.local. In combination with overriding the Thread ctor, we can drop _thread_parents.
34867ac to
e1258de
Compare
|
As of now this PR also handles nested threads (started after execution of other cells) as well. |
|
Hi folks, just bumping this PR as there was no further review/comments in the past three weeks. Any suggestions, or is it good to merge? |
|
I'll merge tomorrow if there are no further comments. |
References
Code changes
contextvarsto retrieve the parent header of emitting asyncio taskUser facing changes
Backward-incompatible changes
None