fix: session resumption reconnection loop never iterates by brucearctor · Pull Request #5007 · google/adk-python

brucearctor · 2026-03-26T04:49:45Z

Summary

Fixes #4996 — the run_live() reconnection loop was unreachable due to unconditional re-raising of exceptions, redundant history transmission on reconnection, and ignored goAway server messages.

Changes

1. Exception handlers continue instead of raise (`base_llm_flow.py`)

ConnectionClosed / ConnectionClosedOK handler now continues the while True loop when live_session_resumption_handle is present
Added APIError handling (genai SDK wraps ConnectionClosed as APIError) with the same continue-on-handle logic

2. Skip `send_history` on reconnection (`base_llm_flow.py`)

Added guard: if llm_request.contents and not invocation_context.live_session_resumption_handle
Server already has the session context via the resumption handle — no need to resend

3. Surface `go_away` messages (`gemini_llm_connection.py` + `llm_response.py`)

GeminiLlmConnection.receive() now detects message.go_away and yields it as LlmResponse.go_away
Added go_away: Optional[LiveServerGoAway] field to LlmResponse
Enables proactive reconnection ~60s before server terminates the connection

Testing

Suite	Tests	Status
`test_gemini_llm_connection.py`	27 (incl. new `test_receive_go_away`)	✅
`test_run_live_reconnection.py`	7 (new)	✅
`tests/unittests/flows/llm_flows/` (all)	364	✅

New reconnection tests cover:

Loop continues on ConnectionClosedOK / APIError when resumption handle exists
Exceptions propagate without handle (preserves old behavior)
Non-APIError exceptions always propagate, even with handle
send_history skipped with handle, called without

Three fixes for the run_live() reconnection loop: 1. Exception handlers in base_llm_flow.py now continue instead of raise when a session resumption handle is available. This covers both ConnectionClosed (from websockets) and APIError (from genai SDK wrapping ConnectionClosed). 2. send_history is skipped on reconnection — the server already has the session context via the resumption handle. 3. go_away messages from the server are now surfaced through gemini_llm_connection.py's receive() as LlmResponse.go_away, enabling proactive reconnection before server termination. Fixes: google#4996

Seven tests covering the exception handling and reconnection behavior of the outer while-True loop in run_live(): - test_reconnects_on_connection_closed_with_handle - test_reconnects_on_api_error_with_handle - test_raises_connection_closed_without_handle - test_raises_api_error_without_handle - test_raises_non_api_error_with_handle - test_skips_history_on_reconnect - test_sends_history_without_handle Uses a _LoopBreak sentinel and _make_connect_fn helper for deterministic loop termination in tests.

rohityan · 2026-03-26T22:11:14Z

src/google/adk/flows/llm_flows/base_llm_flow.py

+          logger.info(
+              'Connection closed (%s), reconnecting with session handle.', e
+          )
+          continue


This is great for brief network glitches. How do you think this would behave if the server was down for a few minutes?

Good question! Right now the loop relies on the inherent connection timeout from llm.connect() to throttle reconnection attempts. For brief glitches that's sufficient, but you're right that a multi-minute outage would result in aggressive retries.

Makes me think:

Add exponential backoff with jitter (e.g., 1s → 2s → 4s → ... capped at ~30s) between reconnection attempts
AND

Add a max retry count and raise after N failures

I'll add that to this PR. Or, would you suggest other?

rohityan · 2026-03-26T22:17:24Z

src/google/adk/flows/llm_flows/base_llm_flow.py

        )
        async with llm.connect(llm_request) as llm_connection:
-          if llm_request.contents:
+          if llm_request.contents and not invocation_context.live_session_resumption_handle:


What happens if the resumption handle is rejected by the server?

Great point. Looking at the Gemini Live API behavior: if the handle is rejected, the server sends back a session_resumption_update with resumable=False (or simply doesn't echo back a new_handle). It does not silently drop the context.

However, to be safe, we could:

Clear the handle on rejection: When we receive a session_resumption_update where resumable is False, clear live_session_resumption_handle so the next reconnection falls back to sending full history
Add a fallback: If the connection succeeds but no session_resumption_update arrives within a timeout, assume the handle was rejected and resend history
I think option (1) is already partially covered by the existing _receive_from_model logic that updates the handle from session_resumption_update.

Want me to verify the server behavior and add an explicit guard for the rejection case?

Add exponential backoff with jitter (1s base, 30s max) and a retry cap (10 attempts) to the run_live() reconnection loop. This prevents aggressive reconnection attempts during extended server outages. - Backoff delay: min(1s * 2^(attempt-1), 30s) + random(0,1) jitter - Max retries: 10 (configurable via MAX_RECONNECT_ATTEMPTS) - Attempt counter resets on successful connection - New tests: test_raises_after_max_retries_connection_closed, test_raises_after_max_retries_api_error Addresses reviewer feedback on PR google#5007.

Instead of re-raising the raw ConnectionClosedOK/APIError when max retries are exhausted, wrap it in a ConnectionError with a clear message and chain the original exception via 'from e'. This lets callers distinguish 'reconnection was attempted and exhausted' from a single unexpected disconnect.

brucearctor added 2 commits March 25, 2026 20:45

adk-bot added the live [Component] This issue is related to live, voice and video chat label Mar 26, 2026

rohityan self-assigned this Mar 26, 2026

Merge branch 'main' into fix/live-session-resumption-4996

1e5775c

surajksharma07 mentioned this pull request Mar 26, 2026

Session resumption reconnection loop in run_live() never iterates #4996

Open

rohityan reviewed Mar 26, 2026

View reviewed changes

rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Mar 26, 2026

rohityan reviewed Mar 26, 2026

View reviewed changes

brucearctor added 2 commits March 26, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: session resumption reconnection loop never iterates#5007

fix: session resumption reconnection loop never iterates#5007
brucearctor wants to merge 5 commits intogoogle:mainfrom
brucearctor:fix/live-session-resumption-4996

brucearctor commented Mar 26, 2026

Uh oh!

rohityan Mar 26, 2026 •

edited

Loading

Uh oh!

brucearctor Mar 27, 2026 •

edited

Loading

Uh oh!

rohityan Mar 26, 2026

Uh oh!

brucearctor Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brucearctor commented Mar 26, 2026

Summary

Changes

1. Exception handlers continue instead of raise (base_llm_flow.py)

2. Skip send_history on reconnection (base_llm_flow.py)

3. Surface go_away messages (gemini_llm_connection.py + llm_response.py)

Testing

New reconnection tests cover:

Uh oh!

rohityan Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brucearctor Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohityan Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

brucearctor Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Exception handlers continue instead of raise (`base_llm_flow.py`)

2. Skip `send_history` on reconnection (`base_llm_flow.py`)

3. Surface `go_away` messages (`gemini_llm_connection.py` + `llm_response.py`)

rohityan Mar 26, 2026 •

edited

Loading

brucearctor Mar 27, 2026 •

edited

Loading