-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
🔴 Required Information
Is your feature request related to a specific problem?
When an MCP server configured in an agent's toolset is unreachable (not started, network failure, crash), the entire agent invocation fails with an unrecoverable ConnectionError. The agent cannot continue operating with its remaining tools or built-in knowledge.
McpToolset.get_tools() is called during every LLM step via _preprocess_async → _process_agent_tools → _convert_tool_union_to_tools. There is no try/except anywhere in this chain, so a single unavailable MCP server takes down the entire agent — even when the agent has other tools or could answer using its own knowledge.
base_llm_flow.py _preprocess_async
→ _process_agent_tools (no try/except)
→ _convert_tool_union_to_tools (no try/except)
→ base_toolset.py get_tools_with_prefix (no try/except)
→ mcp_toolset.py get_tools
→ _execute_with_session → create_session
→ ConnectionError: Failed to create MCP session
Describe the Solution You'd Like
An optional parameter on McpToolset (default False for backward compatibility). When True, connection failures in get_tools() return an empty list instead of raising, allowing the agent to continue with remaining tools.
toolset = McpToolset(
connection_params=SseConnectionParams(url="http://mcp-server:5031/mcp"),
tool_filter=["search"],
optional=True, # Agent continues if this server is down
)
agent = LlmAgent(
model="gemini-2.0-flash",
name="assistant",
tools=[toolset], # Agent works even if MCP server is unavailable
)Impact on your work
In production environments, MCP servers are deployed as independent services and can go down for maintenance, scaling events, or unexpected failures. Currently, an agent with multiple tools from multiple MCP servers becomes completely non-functional if any single MCP server is temporarily unavailable. This severely impacts service reliability.
Agents configured with various MCP tool combinations should not have their entire experience broken by a single MCP server outage.
Willingness to contribute
Yes — happy to submit a PR if the team agrees on an approach.
🟡 Recommended Information
Describe Alternatives You've Considered
There is no clean way to handle this externally:
before_tool_callbackplugin approach does not work becauseget_tools()fails during tool discovery (before any specific tool is called), so the callback is never reached.- Catching errors at agent construction time and skipping unavailable MCP servers prevents the agent from ever discovering tools if the server comes back online mid-conversation.
- Subclassing
McpToolsetand overridingget_tools()works as a temporary workaround, but it relies on internal implementation details and may break with future ADK changes.
None of these are ideal. A first-class optional parameter would be the cleanest solution.
Proposed API / Implementation
Option A: optional flag on McpToolset (minimal, recommended)
Add optional: bool = False to McpToolset.__init__(). In get_tools(), catch ConnectionError when optional=True:
# In mcp_toolset.py
class McpToolset(BaseToolset):
def __init__(self, *, connection_params, optional=False, **kwargs):
super().__init__(**kwargs)
self._optional = optional
# ... existing init ...
@retry_on_errors
async def get_tools(self, readonly_context=None):
try:
tools_response = await self._execute_with_session(
lambda session: session.list_tools(),
"Failed to get tools from MCP server",
readonly_context,
)
except ConnectionError:
if self._optional:
logger.warning("Optional MCP toolset unavailable, returning empty tools")
return []
raise
# ... rest of method ...Option B: Error handling in _process_agent_tools (broader)
Wrap toolset resolution in base_llm_flow.py _process_agent_tools:
try:
tools = await _convert_tool_union_to_tools(tool_union, ...)
except ConnectionError as e:
logger.warning("Toolset %s unavailable, skipping: %s", tool_union, e)
continueThis is broader but changes behavior for all toolsets without opt-in.
Additional Context
- Tested on
google-adk1.27.4, also verified the issue is not addressed in 1.28.0 - The
@retry_on_errorsdecorator retries once, but both attempts fail when the server is truly down, adding ~20s delay before the finalConnectionError - Python 3.13, using
StreamableHTTPConnectionParamsfor MCP connections