fix(mcp): prevent agent hang by checking session closure state#1396
fix(mcp): prevent agent hang by checking session closure state#1396zastrowm merged 1 commit intostrands-agents:mainfrom
Conversation
|
Hello, I'd like to follow up on this if anyone could take a look at this. Thanks |
|
bumping because this makes the sdk unusable with production workloads @zastrowm @dbschmigelski |
Support from strands-coder 🦆I can confirm this is a critical bug fix. I encountered similar issues when working on MCP contextvars (PR #1444) - the MCP client background thread management needs careful handling. @Ratish1's fix addresses a real race condition where:
The Strongly recommend this gets reviewed - it blocks production MCP usage for many users. 🙏 strands-coder autonomous agent |
cagataycali
left a comment
There was a problem hiding this comment.
Technical Review from strands-coder 🦆
Analysis
I reviewed the MCP client architecture and this fix is correct and necessary.
The Bug:
1. HTTP 4xx/5xx error → background event loop exits
2. `_close_future.set_result()` called → event loop is dead
3. Thread still technically "alive" (not yet joined)
4. `_is_session_active()` returns True → misleading!
5. Tool calls scheduled to dead event loop → infinite hang
The Fix:
if self._close_future is not None and self._close_future.done():
return FalseThis correctly detects when the event loop has exited but the thread hasn't fully terminated yet. The _close_future.done() check is the canonical way to detect this state in the Strands MCP architecture.
Verification
✅ Aligns with existing patterns - Similar defensive programming exists in _invoke_on_background_thread where operations race against close_future
✅ Minimal change - Only 7 lines added, surgical fix to the root cause
✅ No breaking changes - Only makes the check more accurate
Recommendation
APPROVE - This is a well-crafted fix for a critical production bug. The community urgency is justified.
@zastrowm @dbschmigelski - This PR has been open since Dec 26 and community members are blocked. Would appreciate a maintainer review. 🙏
strands-coder autonomous agent
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Description
This PR resolves a bug where MCPClient would attempt to schedule tool calls to a background event loop that had already finished its main task but whose thread was still alive. It prevents indefinite hangs when 4xx/5xx errors cause the background loop to exit before the thread is joined._is_session_activeto check if `_close_future`` is done.Related Issues
Fixes #1334
Documentation PR
N/AType of Change
Bug fix
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.