Skip to content

fix(manifest server): default concurrency settings#792

Merged
maxi297 merged 6 commits intomainfrom
maxi297/connector_builder_fix_concurrency
Oct 15, 2025
Merged

fix(manifest server): default concurrency settings#792
maxi297 merged 6 commits intomainfrom
maxi297/connector_builder_fix_concurrency

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Oct 15, 2025

Users are entering a deadlock state where the partition generator threads are filling the queue of partitions to process without having a thread to relieve the queue.

See: https://airbytehq-team.slack.com/archives/C09B97YAB2B/p1760532962689779

Summary by CodeRabbit

  • Bug Fixes

    • The system no longer injects a default concurrency value when a manifest omits concurrency settings — manifests without an explicit concurrency level remain unchanged.
    • Manifests that already specify a concurrency level continue to behave as configured.
  • Tests

    • Tests updated to validate behavior when concurrency is present or absent; an explicit-case test for enforced concurrency was added and one overly-specific constructor assertion test was removed.

@github-actions github-actions bot added the bug Something isn't working label Oct 15, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/connector_builder_fix_concurrency#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/connector_builder_fix_concurrency

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

📝 Walkthrough

Walkthrough

Removed automatic injection of concurrency_level.default_concurrency = 1 from build_source; router test_read now deep-copies the manifest and explicitly ensures a concurrency_level with default_concurrency = 1 before calling safe_build_source.

Changes

Cohort / File(s) Summary
Manifest server utils
airbyte_cdk/manifest_server/command_processor/utils.py
Removed logic that copied the manifest and injected/updated concurrency_level with default_concurrency = 1; build_source no longer adds a default concurrency when missing.
Manifest router (test flow)
airbyte_cdk/manifest_server/routers/manifest.py
In test_read, deep-copies the incoming manifest and inserts/updates a concurrency_level structure with default_concurrency = 1 before calling safe_build_source (so caller enforces concurrency now). Added contextual comment and copy usage.
Unit tests
unit_tests/manifest_server/command_processor/test_utils.py, unit_tests/manifest_server/routers/test_manifest.py
Removed a test that asserted build_source constructed a ConcurrentDeclarativeSource with extensive params; updated router tests to pass a manifest that includes concurrency_level (merged into expected config) when asserting build_source calls.

Sequence Diagram(s)

sequenceDiagram
  participant Router as Manifest Router (test_read)
  participant Request as Request.Manifest
  participant SafeBuild as safe_build_source / build_source

  Router->>Request: deep-copy manifest
  Router->>Router: ensure concurrency_level.default_concurrency = 1
  Router->>SafeBuild: call safe_build_source(augmented manifest, config, catalog, state)
  SafeBuild->>SafeBuild: build_source uses provided manifest (no injection)
  SafeBuild-->>Router: return built source / error
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • brianjlai
  • maxi297

Should we add a unit test that verifies behavior when a manifest lacks concurrency_level (to prevent regressions at other callers), wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately captures the core change by specifying a fix to the manifest server’s default concurrency settings, directly reflecting the adjustments made in build_source and related tests. It uses a clear, concise format with the “fix(manifest server)” scope to immediately convey context without extraneous detail. Reviewers can understand the intent and area of impact at a glance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/connector_builder_fix_concurrency

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20cece8 and dbe6661.

📒 Files selected for processing (2)
  • unit_tests/manifest_server/command_processor/test_utils.py (0 hunks)
  • unit_tests/manifest_server/routers/test_manifest.py (2 hunks)
💤 Files with no reviewable changes (1)
  • unit_tests/manifest_server/command_processor/test_utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
unit_tests/manifest_server/routers/test_manifest.py (1)

112-124: LGTM! Tests correctly verify the explicit concurrency behavior.

Both test functions now properly assert that build_source receives a manifest with explicit concurrency_level settings rather than relying on auto-injection. This aligns with the PR's fix for the deadlock issue in partition generator threads. The pattern is consistent across both tests, and I notice other endpoints (resolve, check, discover) weren't updated—which makes sense since they don't involve concurrent partition processing.

Also applies to: 176-189


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Oct 15, 2025

PyTest Results (Fast)

3 796 tests   - 1   3 784 ✅  - 1   6m 35s ⏱️ +11s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 1991c07. ± Comparison against base commit 8158f0d.

This pull request removes 1 test.
unit_tests.manifest_server.command_processor.test_utils.TestManifestUtils ‑ test_build_source_creates_source

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Oct 15, 2025

PyTest Results (Full)

3 799 tests   - 1   3 787 ✅  - 1   11m 19s ⏱️ +5s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 1991c07. ± Comparison against base commit 8158f0d.

This pull request removes 1 test.
unit_tests.manifest_server.command_processor.test_utils.TestManifestUtils ‑ test_build_source_creates_source

♻️ This comment has been updated with latest results.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/manifest_server/routers/manifest.py (1)

91-100: Nice fix for the deadlock issue!

The logic correctly enforces default_concurrency = 1 for test_read to prevent the partition generator from overwhelming worker threads, while preserving ordering for the builder message responses. The deep copy and conditional handling look solid.

One optional thought: if this pattern might be needed elsewhere in the future, would extracting this into a helper function like enforce_single_threaded_concurrency(manifest_dict) make it more reusable? Just a suggestion though—totally fine to keep it inline for now, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3205424 and db8312d.

📒 Files selected for processing (2)
  • airbyte_cdk/manifest_server/command_processor/utils.py (0 hunks)
  • airbyte_cdk/manifest_server/routers/manifest.py (2 hunks)
💤 Files with no reviewable changes (1)
  • airbyte_cdk/manifest_server/command_processor/utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte_cdk/manifest_server/routers/manifest.py (2)

1-1: LGTM!

The copy import is needed for the deep copy operation below and is placed appropriately with the other imports.


102-110: LGTM!

The call to safe_build_source now correctly uses the modified manifest with the enforced concurrency settings. This ensures the single-threaded behavior is applied during source building.

@maxi297 maxi297 requested a review from brianjlai October 15, 2025 19:02
Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@maxi297
Copy link
Contributor Author

maxi297 commented Oct 15, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@maxi297 maxi297 merged commit f360d91 into main Oct 15, 2025
27 of 28 checks passed
@maxi297 maxi297 deleted the maxi297/connector_builder_fix_concurrency branch October 15, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants