Skip to content

feat(plugins): add on_agent_error_callback and on_run_error_callback#5045

Draft
caohy1988 wants to merge 4 commits intogoogle:mainfrom
caohy1988:feat/agent-run-error-callbacks
Draft

feat(plugins): add on_agent_error_callback and on_run_error_callback#5045
caohy1988 wants to merge 4 commits intogoogle:mainfrom
caohy1988:feat/agent-run-error-callbacks

Conversation

@caohy1988
Copy link
Copy Markdown

@caohy1988 caohy1988 commented Mar 28, 2026

Summary

Implements RFC #5044 to close the error callback coverage gap at the agent and runner levels (fixes #4863).

  • Add on_agent_error_callback(callback_context, agent, error) and on_run_error_callback(invocation_context, error) to BasePlugin
  • Wire try/except Exception in BaseAgent.run_async(), run_live(), and Runner._exec_with_plugin()
  • Add AGENT_ERROR and INVOCATION_ERROR event types to BigQueryAgentAnalyticsPlugin with traceback capture and TraceManager cleanup
  • Enrich v_agent_error and v_invocation_error analytics views with error_traceback column
  • Keep after_agent_callback and after_run_callback as success-only — no semantic change to existing callbacks
  • Catch Exception, not BaseExceptionCancelledError, KeyboardInterrupt, SystemExit do not trigger error callbacks
  • Notification-only dispatch_run_notification_callbacks() ensures all plugins are notified regardless of return values

Design decisions

  • Notification-only: Error callbacks cannot suppress exceptions (unlike on_model_error_callback). The exception is always re-raised.
  • Exactly-once per layer: Agent crash → 1× on_agent_error_callback + 1× on_run_error_callback. Neither fires more than once per failure.
  • No after_* semantic change: All 6 built-in after_* callback usages (logging_plugin, debug_logging_plugin, bigquery_agent_analytics_plugin) treat them as success callbacks. Moving to finally would produce dual events (e.g., both AGENT_ERROR and AGENT_COMPLETED).
  • Dedicated dispatch path: _run_notification_callbacks() always iterates all plugins, unlike _run_callbacks() which short-circuits on non-None returns.

Files changed

File Change
base_plugin.py +2 new callback methods
plugin_manager.py +2 PluginCallbackName entries, +2 dispatch methods, +_run_notification_callbacks()
base_agent.py try/except Exception in run_async()/run_live(), _handle_agent_error_callback()
runners.py try/except Exception in _exec_with_plugin()
bigquery_agent_analytics_plugin.py AGENT_ERROR/INVOCATION_ERROR events, traceback capture, cleanup, view columns
test_error_callbacks.py 14 framework-level tests
test_bigquery_agent_analytics_plugin.py 4 BQ plugin-specific tests

Test plan

  • 14 framework tests: error callbacks fire on crash, after_* NOT called on failure, exception re-raised, CancelledError doesn't trigger, exactly-once-per-layer, notification dispatch (no short-circuit)
  • 4 BQ plugin tests: AGENT_ERROR/INVOCATION_ERROR rows logged correctly, cleanup safety, view SQL correctness
  • 36 existing test_base_agent.py tests pass (0 regressions)
  • 9 existing test_plugin_manager.py tests pass (0 regressions)
  • 190 existing BQ plugin tests pass (0 regressions)
  • Total: 249 passed, 0 failures

🤖 Generated with Claude Code

Implements RFC google#5044 to close the error callback coverage gap at the
agent and runner levels. Previously, unhandled exceptions escaping
agent execution or runner execution left dangling *_STARTING events
with no terminal error event.

Changes:
- Add on_agent_error_callback and on_run_error_callback to BasePlugin
  (notification-only, exception always re-raised)
- Wire try/except Exception in BaseAgent.run_async() and run_live()
- Wire try/except Exception in Runner._exec_with_plugin()
- Add AGENT_ERROR and INVOCATION_ERROR event types to
  BigQueryAgentAnalyticsPlugin with traceback capture and cleanup
- Keep after_agent_callback and after_run_callback as success-only
  (no semantic change to existing callbacks)
- Catch Exception (not BaseException) to exclude CancelledError/
  KeyboardInterrupt from error callback dispatch

Closes google#4863

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@google-cla
Copy link
Copy Markdown

google-cla bot commented Mar 28, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Mar 28, 2026
@caohy1988 caohy1988 marked this pull request as draft March 28, 2026 15:48
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Mar 28, 2026

Response from ADK Triaging Agent

Hello @caohy1988, thank you for creating this PR!

It looks like you haven't signed the Contributor License Agreement (CLA) yet. Please visit https://cla.developers.google.com/ to sign the agreement. Once you've signed it, the "cla/google" check will pass.

Thanks!

Error callbacks are documented as notification-only (all plugins must
be notified), but were dispatched through _run_callbacks() which
short-circuits on the first non-None return. A buggy plugin returning
a value would prevent subsequent plugins from seeing the error.

Add _run_notification_callbacks() that always iterates all plugins
regardless of return values, and switch run_on_agent_error_callback()
and run_on_run_error_callback() to use it.

Update tests to assert both plugins are always called even when one
returns non-None.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
caohy1988 added a commit to caohy1988/adk-python that referenced this pull request Mar 28, 2026
Follow-up to google#5045. Adds BQ plugin-specific test coverage and enriches
the analytics views for AGENT_ERROR and INVOCATION_ERROR events.

Changes:
- Add error_traceback column extraction to v_agent_error and
  v_invocation_error views (JSON_VALUE from content field)
- Add test_on_agent_error_callback_logs_correctly: verifies AGENT_ERROR
  event is logged with correct error_message, status, and traceback
- Add test_on_run_error_callback_logs_correctly: verifies
  INVOCATION_ERROR event is logged with correct fields
- Add TestRunErrorCallbackCleanupSafety: verifies TraceManager cleanup
  runs even when _log_event raises during on_run_error_callback
- Add test_error_views_contain_traceback_column: verifies view SQL
  includes error_traceback extraction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
caohy1988 and others added 2 commits March 28, 2026 10:53
Follow-up to google#5045. Adds BQ plugin-specific test coverage and enriches
the analytics views for AGENT_ERROR and INVOCATION_ERROR events.

Changes:
- Add error_traceback column extraction to v_agent_error and
  v_invocation_error views (JSON_VALUE from content field)
- Add test_on_agent_error_callback_logs_correctly: verifies AGENT_ERROR
  event is logged with correct error_message, status, and traceback
- Add test_on_run_error_callback_logs_correctly: verifies
  INVOCATION_ERROR event is logged with correct fields
- Add TestRunErrorCallbackCleanupSafety: verifies TraceManager cleanup
  runs even when _log_event raises during on_run_error_callback
- Add test_error_views_contain_traceback_column: verifies view SQL
  includes error_traceback extraction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_run_notification_callbacks() previously raised RuntimeError on the
first plugin callback failure, which masked the original application
exception and prevented later plugins from being notified.

Now it logs the error and continues iterating all plugins. The caller's
original exception (agent crash / runner crash) remains the primary
error that propagates to the caller.

Add regression tests:
- test_plugin_callback_failure_does_not_mask_app_error: verifies p2
  is still notified when p1's error callback raises
- test_original_exception_propagates_despite_plugin_failure: end-to-end
  test verifying the caller sees the original agent exception, not the
  plugin's internal error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement on_agent_error_callback and support AGENT_ERROR event in BigQuery Analytics Plugin

2 participants