Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: databricks/databricks-sql-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v4.2.0
Choose a base ref
...
head repository: databricks/databricks-sql-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
  • 17 commits
  • 49 files changed
  • 7 contributors

Commits on Nov 20, 2025

  1. Add ignore_transactions config to disable transaction operations (#711)

    Introduces a new `ignore_transactions` configuration parameter (default: True)
    to control transaction-related behavior in the Connection class.
    
    When ignore_transactions=True (default):
    - commit(): no-op, returns immediately
    - rollback(): raises NotSupportedError with message "Transactions are not supported on Databricks"
    - autocommit setter: no-op, returns immediately
    
    When ignore_transactions=False:
    - All transaction methods execute normally
    
    Changes:
    - Added ignore_transactions parameter to Connection.__init__() with default value True
    - Modified commit(), rollback(), and autocommit setter to check ignore_transactions flag
    - Updated unit tests to pass ignore_transactions=False when testing transaction functionality
    - Updated e2e transaction tests to pass ignore_transactions=False
    - Added three new unit tests to verify ignore_transactions
    jayantsing-db authored Nov 20, 2025
    Configuration menu
    Copy the full SHA
    a4899cb View commit details
    Browse the repository at this point in the history
  2. Ready for 4.2.1 release (#713)

    Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
    vikrantpuppala authored Nov 20, 2025
    Configuration menu
    Copy the full SHA
    ad227ca View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2025

  1. Change default use_hybrid_disposition to False (#714)

    This changes the default value of use_hybrid_disposition from True to False
    in the SEA backend, disabling hybrid disposition by default.
    samikshya-db authored Nov 21, 2025
    Configuration menu
    Copy the full SHA
    b8494ff View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2025

  1. Circuit breaker changes using pybreaker (#705)

    * Added driver connection params
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * Added model fields for chunk/result latency
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed linting issues
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * lint issue fixing
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * circuit breaker changes using pybreaker
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * Added interface layer top of http client to use circuit rbeaker
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * Added test cases to validate ciruit breaker
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixing broken tests
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed linting issues
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed failing test cases
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed urllib3 issue
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * added more test cases for telemetry
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * simplified CB config
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * poetry lock
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fix minor issues & improvement
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * improved circuit breaker for handling only 429/503
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * linting issue fixed
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * raise CB only for 429/503
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fix broken test cases
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed untyped references
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * added more test to verify the changes
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * description changed
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * remove cb congig class to constants
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * removed mocked reponse and use a new exlucded exception in CB
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed broken test
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * added e2e test to verify circuit breaker
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * lower log level for telemetry
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed broken test, removed tests on log assertions
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * modified unit to reduce the noise and follow dry principle
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    ---------
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    nikhilsuri-db authored Nov 26, 2025
    Configuration menu
    Copy the full SHA
    73580fe View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2025

  1. perf: Optimize telemetry latency logging to reduce overhead (#715)

     perf: Optimize telemetry latency logging to reduce overhead
    
    Optimizations implemented:
    1. Eliminated extractor pattern - replaced wrapper classes with direct
       attribute access functions, removing object creation overhead
    2. Added feature flag early exit - checks cached telemetry_enabled flag
       to skip heavy work when telemetry is disabled
    3. Simplified code structure with early returns for better readability
    
    
    Signed-off-by: Samikshya Chand <samikshya.chand@databricks.com>
    samikshya-db authored Nov 27, 2025
    Configuration menu
    Copy the full SHA
    8d5e155 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2025

  1. basic e2e test for force telemetry verification (#708)

    * basic e2e test for force telemetry verification
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * Added more integration test scenarios
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * default on telemetry + logs to investigate failing test
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed linting issue
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * added more logs to identify server side flag evaluation
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * remove unused logs
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fix broken test case for default enable telemetry
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * redcude test length and made more reusable code
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * removed telemetry e2e to daily single run
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    ---------
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    nikhilsuri-db authored Nov 28, 2025
    Configuration menu
    Copy the full SHA
    d524f0e View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2025

  1. feat: Implement host-level telemetry batching to reduce rate limiting (

    …#718)
    
    * feat: Implement host-level telemetry batching to reduce rate limiting
    
    Changes telemetry client architecture from per-session to per-host batching,
    matching the JDBC driver implementation. This reduces the number of HTTP
    requests to the telemetry endpoint and prevents rate limiting in test
    environments.
    
    Key changes:
    - Add _TelemetryClientHolder with reference counting for shared clients
    - Change TelemetryClientFactory to key clients by host_url instead of session_id
    - Add getHostUrlSafely() helper for defensive null handling
    - Update all callers (client.py, exc.py, latency_logger.py) to pass host_url
    
    Before: 100 connections to same host = 100 separate TelemetryClients
    After:  100 connections to same host = 1 shared TelemetryClient (refcount=100)
    
    This fixes rate limiting issues seen in e2e tests where 300+ parallel
    connections were overwhelming the telemetry endpoint with 429 errors.
    
    * chore: Change all telemetry logging to DEBUG level
    
    Reduces log noise by changing all telemetry-related log statements
    (info, warning, error) to debug level. Telemetry operations are
    background tasks and should not clutter logs with operational messages.
    
    Changes:
    - Circuit breaker state changes: info/warning -> debug
    - Telemetry send failures: error -> debug
    - All telemetry operations now consistently use debug level
    
    * chore: Fix remaining telemetry warning log to debug
    
    Changes remaining logger.warning in telemetry_push_client.py to debug level
    for consistency with other telemetry logging.
    
    * fix: Update tests to use host_url instead of session_id_hex
    
    - Update circuit breaker test to check logger.debug instead of logger.info
    - Replace all session_id_hex test parameters with host_url
    - Apply Black formatting to exc.py and telemetry_client.py
    
    This fixes test failures caused by the signature change from session_id_hex
    to host_url in the Error class and TelemetryClientFactory.
    
    * fix: Revert session_id_hex in tests for functions that still use it
    
    Only Error classes changed from session_id_hex to host_url.
    Other classes (TelemetryClient, ResultSetDownloadHandler, etc.) still use session_id_hex.
    
    Reverted:
    - test_telemetry.py: TelemetryClient and initialize_telemetry_client
    - test_downloader.py: ResultSetDownloadHandler
    - test_download_manager.py: ResultFileDownloadManager
    
    Kept as host_url:
    - test_client.py: Error class instantiation
    
    * fix: Update all Error raises and test calls to use host_url
    
    Changes:
    1. client.py: Changed all error raises from session_id_hex to host_url
       - Connection class: session_id_hex=self.get_session_id_hex() -> host_url=self.session.host
       - Cursor class: session_id_hex=self.connection.get_session_id_hex() -> host_url=self.connection.session.host
    
    2. test_telemetry.py: Updated get_telemetry_client() and close() calls
       - get_telemetry_client(session_id) -> get_telemetry_client(host_url)
       - close(session_id) -> close(host_url=host_url)
    
    3. test_telemetry_push_client.py: Changed logger.warning to logger.debug
       - Updated test assertion to match debug logging level
    
    These changes complete the migration from session-level to host-level
    telemetry client management.
    
    * fix: Update thrift_backend.py to use host_url instead of session_id_hex
    
    Changes:
    1. Added self._host attribute to store server_hostname
    2. Updated all error raises to use host_url=self._host
    3. Changed method signatures from session_id_hex to host_url:
       - _check_response_for_error
       - _hive_schema_to_arrow_schema
       - _col_to_description
       - _hive_schema_to_description
       - _check_direct_results_for_error
    4. Updated all method calls to pass self._host instead of self._session_id_hex
    
    This completes the migration from session-level to host-level error reporting.
    
    * Fix Black formatting by adjusting fmt directive placement
    
    Moved the `# fmt: on` directive to the except block level instead
    of inside the if statement to resolve Black parsing confusion.
    
    * Fix telemetry feature flag tests to set mock session host
    
    The tests were failing because they called get_telemetry_client("test")
    but the mock session didn't have .host set, so the telemetry client was
    registered under a different key (likely None or MagicMock). This caused
    the factory to return NoopTelemetryClient instead of the expected client.
    
    Fixed by setting mock_session_instance.host = "test" in all three tests.
    
    * Add teardown_method to clear telemetry factory state between tests
    
    Without this cleanup, tests were sharing telemetry clients because they
    all used the same host key ("test"), causing test pollution. The first
    test would create an enabled client, and subsequent tests would reuse it
    even when they expected a disabled client.
    
    * Clear feature flag context cache in teardown to fix test pollution
    
    The FeatureFlagsContextFactory caches feature flag contexts per session,
    causing tests to share the same feature flag state. This resulted in the
    first test creating a context with telemetry enabled, and subsequent tests
    incorrectly reusing that enabled state even when they expected disabled.
    
    * fix: Access actual client from holder in flush worker
    
    The flush worker was calling _flush() on _TelemetryClientHolder objects
    instead of the actual TelemetryClient. Fixed by accessing holder.client
    before calling _flush().
    
    Fixes AttributeError in e2e tests: '_TelemetryClientHolder' object has
    no attribute '_flush'
    
    * Clear telemetry client cache in e2e test teardown
    
    Added _clients.clear() to the teardown fixture to prevent telemetry
    clients from persisting across e2e tests, which was causing session ID
    pollution in test_concurrent_queries_sends_telemetry.
    
    * Pass session_id parameter to telemetry export methods
    
    With host-level telemetry batching, multiple connections share one
    TelemetryClient. Each client stores session_id_hex from the first connection
    that created it. This caused all subsequent connections' telemetry events
    to use the wrong session ID.
    
    Changes:
    - Modified telemetry export method signatures to accept optional session_id
    - Updated Connection.export_initial_telemetry_log() to pass session_id
    - Updated latency_logger.py export_latency_log() to pass session_id
    - Updated Error.__init__() to accept optional session_id_hex and pass it
    - Updated all error raises in Connection and Cursor to pass session_id_hex
    
    🤖 Generated with [Claude Code](https://claude.com/claude-code)
    
    Co-Authored-By: Claude <noreply@anthropic.com>
    
    * Fix Black formatting in telemetry_client.py
    
    * Use 'test-host' instead of 'test' for mock host in telemetry tests
    
    * Replace test-session-id with test-host in test_client.py
    
    * Fix telemetry client lookup to use test-host in tests
    
    * Make session_id_hex keyword-only parameter in Error.__init__
    
    ---------
    
    Co-authored-by: Claude <noreply@anthropic.com>
    samikshya-db and claude authored Dec 3, 2025
    Configuration menu
    Copy the full SHA
    ebe4b07 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2025

  1. Prepare for a release with telemetry on by default (#717)

    * Prepare for a release with telemetry on by default
    
    Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
    
    * Make edits
    
    Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
    
    * Update version
    
    Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
    
    * Fix CHANGELOG formatting to match previous style
    
    Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
    
    * Fix telemetry e2e tests for default-enabled behavior
    
    - Update test expectations to reflect telemetry being enabled by default
    - Add feature flags cache cleanup in teardown to prevent state leakage between tests
    - This ensures each test runs with fresh feature flag state
    
    * Add wait after connection close for async telemetry submission
    
    * Remove debug logging from telemetry tests
    
    * Mark telemetry e2e tests as serial - must not run in parallel
    
    Root cause: Telemetry tests share host-level client across pytest-xdist workers,
    causing test isolation issues with patches. Tests pass serially but fail with -n auto.
    
    Solution: Add @pytest.mark.serial marker. CI needs to run these separately without -n auto.
    
    * Split test execution to run serial tests separately
    
    Telemetry e2e tests must run serially due to shared host-level
    telemetry client across pytest-xdist workers. Running with -n auto
    causes test isolation issues where futures aren't properly captured.
    
    Changes:
    - Run parallel tests with -m 'not serial' -n auto
    - Run serial tests with -m 'serial' without parallelization
    - Use --cov-append for serial tests to combine coverage
    - Mark telemetry e2e tests with @pytest.mark.serial
    - Update test expectations for default telemetry behavior
    - Add feature flags cache cleanup in test teardown
    
    * Mark telemetry e2e tests as serial - must not run in parallel
    
    The concurrent telemetry e2e test globally patches telemetry methods
    to capture events. When run in parallel with other tests via pytest-xdist,
    it captures telemetry events from other concurrent tests, causing
    assertion failures (expected 60 events, got 88).
    
    All telemetry e2e tests must run serially to avoid cross-test
    interference with the shared host-level telemetry client.
    
    ---------
    
    Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
    samikshya-db authored Dec 4, 2025
    Configuration menu
    Copy the full SHA
    d2ae1e8 View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2025

  1. added pandas < 2.4.0 support and tests for py 3.14 (#720)

    * added pandas 2.3.3 support and tests for py 3.14
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    * generated poetry.lock
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    * lz4 version update for py 3.14
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    * dependency selection based on py version
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    * pyarrow version update for py 3.14
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    * poetry.lock with latest poetry version
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    ---------
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    sreekanth-db authored Dec 11, 2025
    2 Configuration menu
    Copy the full SHA
    7c6adee View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2025

  1. pandas 2.3.3 support for py < 3.14 (#721)

    * pandas 2.3.3 support for py < 3.14
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    * poetry lock
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    ---------
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    sreekanth-db authored Dec 18, 2025
    Configuration menu
    Copy the full SHA
    f7822fd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ce55e7b View commit details
    Browse the repository at this point in the history

Commits on Jan 1, 2026

  1. Fixed the exception handler close() on _TelemetryClientHolder (#723)

    Fixed the exception handler calls close() on _TelemetryClientHolder objects instead of accessing the client inside them.
    msrathore-db authored Jan 1, 2026
    Configuration menu
    Copy the full SHA
    9b4e577 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2026

  1. created util method to normalise http protocol in http path (#724)

    * created util method to normalise http protocol in http path
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * Added impacted files using util method
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * Fixed linting issues
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * fixed broken test with mock host string
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * mocked http client
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * made case sensitive check in url utils
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * linting issue resolved
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * removed unnecessary md files
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * made test readbale
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    * changes done in auth util as well as sea http
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    
    ---------
    
    Signed-off-by: Nikhil Suri <nikhil.suri@databricks.com>
    nikhilsuri-db authored Jan 5, 2026
    Configuration menu
    Copy the full SHA
    946a265 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2026

  1. New minor version release 4.2.4 (#725)

    New minor version release
    samikshya-db authored Jan 8, 2026
    Configuration menu
    Copy the full SHA
    03eb369 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2026

  1. [PECOBLR-1168] query tags telemetry (#716)

    * query tags telemetry
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    * code linting fix
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    
    ---------
    
    Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
    sreekanth-db authored Jan 9, 2026
    Configuration menu
    Copy the full SHA
    4b7df5b View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2026

  1. [ES-1717039] Fix 60 seconds delay in gov cloud connections + Fix PR c…

    …heck failures in the repo (#735)
    
    * Fix 60 seconds delay in gov cloud connections
    
    * keep it simple :)
    
    * Add fix for krb error
    
    * pin poetry
    
    * Pin for publish flow too
    
    * Fix failing tests
    
    * Edit order for pypi
    
    * One last fix : pls work
    samikshya-db authored Feb 5, 2026
    Configuration menu
    Copy the full SHA
    cafed60 View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2026

  1. [PECOBLR-1735] Fix #729 and #731: Telemetry lifecycle management (#734)

    * Fix #729 and #731: Telemetry lifecycle management
    
    Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
    
    * Address review comments: revert timeout and telemetry_enabled changes
    
    Per reviewer feedback on PR #734:
    
    1. Revert timeout from 30s back to 900s (line 299)
       - Reviewer noted that with wait=False, timeout is not critical
       - The async nature and wait=False handle the exit speed
    
    2. Revert telemetry_enabled parameter back to True (line 734)
       - Reviewer noted this is redundant given the early return
       - If enable_telemetry=False, we return early (line 729)
       - Line 734 only executes when enable_telemetry=True
       - Therefore using the parameter here is unnecessary
    
    These changes address the reviewer's valid technical concerns while
    keeping the core fixes intact:
    - wait=False for non-blocking shutdown (critical for Issue #729)
    - Early return when enable_telemetry=False (critical for Issue #729)
    - All Issue #731 fixes (null-safety, __del__, documentation)
    
    Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
    
    * Fix Black formatting violations
    
    Apply Black formatting to files modified in previous commits:
    - src/databricks/sql/common/unified_http_client.py
    - src/databricks/sql/telemetry/telemetry_client.py
    
    Changes are purely cosmetic (quote style consistency).
    
    Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
    
    * Fix CI test failure: Prevent parallel execution of telemetry tests
    
    Add @pytest.mark.xdist_group to telemetry test classes to ensure they
    run sequentially on the same worker when using pytest-xdist (-n auto).
    
    Root cause: Tests marked @pytest.mark.serial were still being
    parallelized in CI because pytest-xdist doesn't respect custom markers
    by default. With host-level telemetry batching (PR #718), tests
    running in parallel would share the same TelemetryClient and interfere
    with each other's event counting, causing test_concurrent_queries_sends_telemetry
    to see 88 events instead of the expected 60.
    
    The xdist_group marker ensures all tests in the "serial_telemetry"
    group run on the same worker sequentially, preventing state interference.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Fix telemetry test fixtures: Clean up state before AND after tests
    
    Modified telemetry_setup_teardown fixtures to clean up
    TelemetryClientFactory state both BEFORE and AFTER each test, not just
    after. This prevents leftover state from previous tests (pending events,
    active executors) from interfering with the current test.
    
    Root cause: In CI with sequential execution on the same worker, if a
    previous test left pending telemetry events in the executor, those
    events could be captured by the next test's mock, causing inflated
    event counts (88 instead of 60).
    
    Now ensures complete isolation between tests by resetting all shared
    state before each test starts.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Fix CI test failure: Clear _flush_event between tests
    
    The _flush_event threading.Event was never cleared after stopping the
    flush thread, remaining in "set" state. This caused timing issues in
    subsequent tests where the Event was already signaled, triggering
    unexpected flush behavior and causing extra telemetry events to be
    captured (88 instead of 60).
    
    Now explicitly clear the _flush_event flag in both setup (before test)
    and teardown (after test) to ensure clean state isolation between tests.
    
    This explains why CI consistently got 88 events - the flush_event from
    previous tests triggered additional flushes during test execution.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Add debug workflow and output to diagnose CI test failure
    
    1. Created new workflow 'test-telemetry-only.yml' that runs only the
       failing telemetry test with -n auto, mimicking real CI but much faster
    
    2. Added debug output to test showing:
       - Client-side captured events
       - Number of futures/batches
       - Number of server responses
       - Server-reported successful events
    
    This will help identify why CI gets 88 events vs local 60 events.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Fix workflow: Add krb5 system dependency
    
    The workflow was failing during poetry install due to missing krb5
    system libraries needed for kerberos dependencies.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Fix xdist_group: Add --dist=loadgroup to pytest commands
    
    The @pytest.mark.xdist_group markers were being ignored because
    pytest-xdist uses --dist=load by default, which doesn't respect groups.
    
    With --dist=loadgroup, tests in the same xdist_group run sequentially
    on the same worker, preventing telemetry state interference between
    tests.
    
    This is the ROOT CAUSE of the 88 vs 60 events issue - tests were
    running in parallel across workers instead of sequentially on one
    worker as intended.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Add aggressive flush before test to prevent event interference
    
    CI shows 72 events instead of 60. Debug output reveals:
    - Client captured: 60 events (correct)
    - Server received: 72 events across 2 batches
    
    The 12 extra events accumulate in the timing window between fixture
    cleanup and mock setup. Other tests (like circuit breaker tests not in
    our xdist_group) may be sending telemetry concurrently.
    
    Solution: Add an explicit flush+shutdown RIGHT BEFORE setting up the
    mock to ensure a completely clean slate with zero buffered events.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Split workflow: Isolate telemetry tests in separate job
    
    To prevent interference from other e2e tests, split into two jobs:
    
    Job 1 (run-non-telemetry-tests):
    - Runs all e2e tests EXCEPT telemetry tests
    - Uses -n auto for parallel execution
    
    Job 2 (run-telemetry-tests):
    - Runs ONLY telemetry tests
    - Depends on Job 1 completing (needs: run-non-telemetry-tests)
    - Fresh Python process = complete isolation
    - No ambient telemetry from other tests
    
    This eliminates the 68 vs 60 event discrepancy by ensuring
    telemetry tests run in a clean environment with zero interference.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Fix workflows: Add krb5 deps and cleanup debug code
    
    Changes across multiple workflows:
    
    1. integration.yml:
       - Add krb5 system dependency to telemetry job
       - Fixes: krb5-config command not found error during poetry install
    
    2. code-coverage.yml:
       - Add krb5 system dependency
       - Split telemetry tests into separate step for isolation
       - Maintains coverage accumulation with --cov-append
    
    3. publish-test.yml:
       - Add krb5 system dependency for consistent builds
    
    4. test_concurrent_telemetry.py:
       - Remove debug print statements
    
    5. Delete test-telemetry-only.yml:
       - Remove temporary debug workflow
    
    All workflows now have proper telemetry test isolation and
    required system dependencies for kerberos packages.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Fix publish-test.yml: Update Python 3.9 -> 3.10
    
    Poetry 2.3.2 installation fails with Python 3.9:
      Installing Poetry (2.3.2): An error occurred.
    
    Other workflows use Python 3.10 and work fine. Updating to match
    ensures consistency and avoids Poetry installation issues.
    
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    
    * Fix integration workflow: Remove --dist=loadgroup from non-telemetry tests
    
    - Remove --dist=loadgroup from non-telemetry job (only needed for telemetry)
    - Remove test_telemetry_e2e.py from telemetry job (was skipped before)
    - This should fix test_uc_volume_life_cycle failure caused by changed test distribution
    
    * Fix code-coverage workflow: Remove test_telemetry_e2e.py from coverage tests
    
    - Only run test_concurrent_telemetry.py in isolated telemetry step
    - test_telemetry_e2e.py was excluded in original workflow, keep it excluded
    
    * Fix publish-test workflow: Remove cache conditional
    
    - Always run poetry install (not just on cache miss)
    - Ensures fresh install with system dependencies (krb5)
    - Matches pattern used in integration.yml
    
    * Fix publish-test.yml: Remove duplicate krb5 install, restore cache conditional
    
    - Remove duplicate system dependencies step
    - Restore cache conditional to match main branch
    - Keep Python 3.10 (our change from 3.9)
    
    * Fix code-coverage: Remove serial tests step
    
    - All serial tests are telemetry tests (test_concurrent_telemetry.py and test_telemetry_e2e.py)
    - They're already run in the isolated telemetry step
    - Running -m serial with --ignore on both files results in 0 tests (exit code 5)
    
    ---------
    
    Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
    Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
    msrathore-db authored Feb 6, 2026
    Configuration menu
    Copy the full SHA
    61f8029 View commit details
    Browse the repository at this point in the history
Loading