-
Notifications
You must be signed in to change notification settings - Fork 115
Fix race condition in new_session() by avoiding list-sessions query #625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
0f803dd to
e81a44f
Compare
|
Thanks @tony! Signed the CLA. I checked and the version is And in terms of CI where this failed, the reason why we were able to identify this is because this CI run failed: https://github.com/OpenHands/software-agent-sdk/actions/runs/21638051841/job/62369267088 We were able to trace it back to the upgrade from Python 3.12 to 3.13, and after reverting to 3.12, the consistent failures due to the issue I cited here disappeared (although this particular run still has a few failures): https://github.com/OpenHands/software-agent-sdk/actions/runs/21710957108 |
Previously, new_session() would run 'tmux new-session -P -F#{session_id}'
then immediately query 'tmux list-sessions' to fetch full session data.
This created a race condition in PyInstaller + Python 3.13+ + Docker
environments where list-sessions might not see the newly created session.
The fix expands the -F format string to include all Obj fields, parsing
the output directly into a Session object without a separate query.
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Tony Narlock <tony@git-pull.com>
Co-authored-by: Tony Narlock <tony@git-pull.com>
8b00212 to
058ee6d
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #625 +/- ##
==========================================
+ Coverage 45.39% 45.43% +0.03%
==========================================
Files 22 22
Lines 2249 2256 +7
Branches 360 360
==========================================
+ Hits 1021 1025 +4
- Misses 1082 1085 +3
Partials 146 146 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@neubig Thank you very much! I'll look at merging this as a pre-release tomorrow so we can test if it works on your side. |
|
Great, thank you! |
why: Graham's race-condition fix introduced get_output_format() and parse_output() in neo.py; fetch_objs() had duplicate logic doing the same thing. Functions also lacked the NumPy-style docstrings and doctests required by project standards. what: - Add NumPy-style docstrings with doctests to get_output_format(), parse_output(), and fetch_objs() - Make parse_output() call get_output_format() instead of duplicating the field-list computation - Refactor fetch_objs() to use get_output_format() and parse_output() instead of inline format/parse logic - Cache get_output_format() with @functools.cache (fields are static) - Remove unused formats import from server.py - Simplify format_string = get_output_format()[1] in new_session()
e672929 to
bc33be0
Compare
why: The race condition fix changed new_session() to construct Session from -P output instead of a follow-up list-sessions query, but had no dedicated test verifying session_id and session_name are populated. what: - Add test asserting session_id is not None and session_name matches
bc33be0 to
6d202e0
Compare
|
@neubig (when github is back online): what's the best way for you to test this? Can you pin the branch? Do you need a PyPI release (and would an alpha do?) Any disagreement / opinion on the changes I made in the PR? |
Fixes: #624
Previously, new_session() would:
This created a race condition in some environments where list-sessions might not see the newly created session yet, causing TmuxObjectDoesNotExist errors.
The fix:
This is more efficient (one fewer subprocess call) and eliminates the race condition by making session creation atomic.
I tested this using the script in the original issue and it works in all three cases: