Skip to content

fix: ensure LLM callbacks share the same OTel span context#4854

Open
brucearctor wants to merge 11 commits intogoogle:mainfrom
brucearctor:fix/otel-span-id-mismatch-4851
Open

fix: ensure LLM callbacks share the same OTel span context#4854
brucearctor wants to merge 11 commits intogoogle:mainfrom
brucearctor:fix/otel-span-id-mismatch-4851

Conversation

@brucearctor
Copy link
Copy Markdown

Description

Fixes #4851.

When OpenTelemetry tracing is enabled, before_model_callback and after_model_callback / on_model_error_callback see different span IDs, causing LLM_REQUEST.span_id != LLM_RESPONSE.span_id in the BigQuery Analytics Plugin.

Root Cause

Two issues in base_llm_flow.py:

  1. before_model_callback ran outside the call_llm span
  2. after_model_callback ran inside a child generate_content span (created by _run_and_handle_erroruse_inference_span)

Fix

  1. Move before_model_callback inside the call_llm span so it shares the same span context as the other callbacks
  2. Wrap after_model_callback with trace.use_span(span) to re-activate the call_llm span (needed because the async generator from _run_and_handle_error yields responses inside the child generate_content span)
  3. Import trace from opentelemetry

Testing

Added 3 new tests in test_llm_callback_span_consistency.py:

  • test_before_and_after_model_callbacks_share_span_id — core regression test
  • test_before_and_on_error_model_callbacks_share_span_id — error path
  • test_before_model_callback_short_circuit_has_span — short-circuit case

All 51 existing callback/tracing tests continue to pass.

Move before_model_callback inside the call_llm span and wrap
after_model_callback with trace.use_span(span) to re-activate the
call_llm span context. This ensures before_model_callback,
after_model_callback, and on_model_error_callback all see the same
span_id, fixing the mismatch that broke the BigQuery Analytics Plugin.

The root cause was twofold:
1. before_model_callback ran outside the call_llm span
2. after_model_callback ran inside a child generate_content span
   (created by _run_and_handle_error via use_inference_span)

Fixes google#4851
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical OpenTelemetry tracing issue where LLM callbacks (before_model_callback, after_model_callback, on_model_error_callback) were not consistently sharing the same span ID. This inconsistency led to fragmented tracing data, particularly affecting analytics plugins. The fix involves strategically adjusting the execution context of these callbacks within the _call_llm_async method to ensure they all operate under the same call_llm span, thereby providing accurate and unified tracing information.

Highlights

  • OpenTelemetry Span Consistency: Resolved an issue where before_model_callback and after_model_callback/on_model_error_callback had different OpenTelemetry span IDs, leading to inconsistent tracing data in the BigQuery Analytics Plugin.
  • Callback Execution Context Adjustment: Modified base_llm_flow.py to ensure before_model_callback executes within the call_llm span and after_model_callback reactivates the call_llm span, guaranteeing all callbacks share the same span context.
  • New Span Consistency Tests: Introduced three new unit tests to validate that LLM callbacks consistently share the same OpenTelemetry span ID across various scenarios, including successful calls, error handling, and short-circuiting.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/google/adk/flows/llm_flows/base_llm_flow.py
    • Imported the opentelemetry.trace module.
    • Refactored the _call_llm_async method to move the before_model_callback invocation inside the call_llm span.
    • Wrapped after_model_callback calls with trace.use_span(span) to re-activate the call_llm span.
  • tests/unittests/flows/llm_flows/test_llm_callback_span_consistency.py
    • Added a new test file to verify LLM callback span consistency.
    • Implemented test_before_and_after_model_callbacks_share_span_id to confirm span ID matching for successful LLM calls.
    • Implemented test_before_and_on_error_model_callbacks_share_span_id to verify span ID consistency during error handling.
    • Implemented test_before_model_callback_short_circuit_has_span to ensure before_model_callback has a valid span even when short-circuiting the LLM call.
Activity
  • New unit tests were added to validate the fix for OpenTelemetry span consistency in LLM callbacks.
  • All 51 existing callback and tracing tests were confirmed to still pass after the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the tracing [Component] This issue is related to OpenTelemetry tracing label Mar 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves the OpenTelemetry span inconsistency between before_model_callback and after_model_callback. The approach of moving before_model_callback into the call_llm span and reactivating this span for after_model_callback is correct. The new regression tests are comprehensive and well-written, covering success, error, and short-circuit scenarios. I have one suggestion to refactor a small piece of duplicated code that was introduced with this fix to improve maintainability.

@brucearctor brucearctor reopened this Mar 17, 2026
Extract the duplicated after_model_callback + trace.use_span(span)
logic into a local _apply_after_model_callback coroutine for DRY.
@brucearctor
Copy link
Copy Markdown
Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves an issue with OpenTelemetry span context consistency between LLM callbacks. The changes correctly ensure that before_model_callback, after_model_callback, and on_model_error_callback all share the same span by adjusting their execution context within the call_llm span. The introduction of the _apply_after_model_callback wrapper is a clean solution for reactivating the correct span. The new tests are comprehensive, covering success, error, and short-circuit scenarios, which provides strong confidence in the fix. I have one minor suggestion regarding import ordering for better code style.

Move 'from opentelemetry import trace' to the third-party imports
group per PEP 8 import ordering convention.
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @brucearctor , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh

@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Mar 17, 2026
@brucearctor
Copy link
Copy Markdown
Author

@rohityan -- on it!

@brucearctor
Copy link
Copy Markdown
Author

@rohityan -- looks fixed :-)

Copy link
Copy Markdown

@caohy1988 caohy1988 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I validated the PR head locally and I do not think this is proven fixed yet.

Findings

  1. The new regression tests fail on the current PR head.

    • tests/unittests/flows/llm_flows/test_llm_callback_span_consistency.py
    • Running:
      .venv/bin/pytest tests/unittests/flows/llm_flows/test_llm_callback_span_consistency.py -q
      
    • Result: 3 failed
    • All three failures are because before_model_callback does not capture a valid span at all.
  2. In runtime instrumentation, before_model_callback still sees NonRecordingSpan / invalid context.

    • Right before _handle_before_model_callback: NonRecordingSpan False 0
    • Inside plugin before_model_callback: NonRecordingSpan False 0
    • Inside plugin after_model_callback: valid _Span
  3. Even when I force a global TracerProvider, the callbacks are still not proven to share the same span.

    • With trace.set_tracer_provider(TracerProvider()), before_model_callback and after_model_callback both get valid spans, but they still showed different span IDs in my repro.

Interpretation

  • Moving before_model_callback under start_as_current_span('call_llm') is the right direction.
  • Wrapping after_model_callback with trace.use_span(span) is also the right idea.
  • But on the current PR head, the fix is not yet validated:
    1. before_model_callback still does not reliably observe the call_llm span
    2. after_model_callback is not yet proven to be rebound to the exact same span

Recommendation

  • I would not remove the plugin-side workaround yet.

  • I would first fix the test/tracing setup and get a passing framework-level regression test that demonstrates:

    • before_model_callback
    • after_model_callback
    • on_model_error_callback

    all observe the same valid call_llm span ID.

@brucearctor
Copy link
Copy Markdown
Author

I'll take a look @caohy1988 , thanks!

…r in tests

Address review feedback from caohy1988:

1. Pass the call_llm span explicitly to _run_and_handle_error and
   re-activate it with trace.use_span(parent_span) for error callbacks.
   This ensures on_model_error_callback reliably sees the same span
   context as before_model_callback, defending against async context
   propagation issues across generator yield boundaries.

2. Replace mock.patch-based tracer setup in tests with a real global
   TracerProvider. The original tests masked the production code path
   by mocking the tracer — now they validate the same proxy tracer
   behavior used at runtime.

Fixes google#4851
Add span name assertions so tests prove each callback sees specifically
the 'call_llm' span (not just any span with a matching ID). This directly
addresses the reviewer's request for proof that before_model_callback,
after_model_callback, and on_model_error_callback all observe the same
valid call_llm span ID.
@brucearctor
Copy link
Copy Markdown
Author

@caohy1988 - I think I got this addressed. Do advise if further concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

request clarification [Status] The maintainer need clarification or more information from the author tracing [Component] This issue is related to OpenTelemetry tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenTelemetry integration creates span ID mismatch between LLM_REQUEST and LLM_RESPONSE/LLM_ERROR (BigQuery Analytics Plugin)

4 participants