Skip to content

fix: schema inferrer supports more than two types#830

Merged
maxi297 merged 1 commit intomainfrom
oncall_9920/schema_inferrer_support_more_than_two_types
Nov 10, 2025
Merged

fix: schema inferrer supports more than two types#830
maxi297 merged 1 commit intomainfrom
oncall_9920/schema_inferrer_support_more_than_two_types

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Nov 7, 2025

What

https://github.com/airbytehq/oncall/issues/9920

More specifically following this comment, I'm not sure why we only supported two entries in the anyOf so I've updated to code to support any number higher than 1.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced schema type inference to more robustly handle anyOf structures with multiple type options and null types, improving consistency in schema cleaning and normalization across varying configurations.
  • Tests

    • Updated test cases to validate the improved handling of complex anyOf type structures with expanded type options.

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@oncall_9920/schema_inferrer_support_more_than_two_types#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch oncall_9920/schema_inferrer_support_more_than_two_types

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

📝 Walkthrough

Walkthrough

The schema inferrer utility updates the logic for handling anyOf structures to assign null-type defaults in broader scenarios beyond the previous exact-length-2 requirement. Corresponding test cases are expanded to validate the new behavior with nested items and additional schema options.

Changes

Cohort / File(s) Change Summary
anyOf null-type handling
airbyte_cdk/utils/schema_inferrer.py
Modified the conditional in _clean_any_of to apply null-type assignment when anyOf length is greater than 1 (instead of exactly 2 with no null type present), expanding the scope of null-aware type defaults
Test data expansion
unit_tests/utils/test_schema_inferrer.py
Added Nested_3 string entry to data_with_nested_arrays test case and expanded the anyOf schema for the value field to include a string option alongside existing array and object types

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Areas for focused attention:

  • The conditional change in _clean_any_of warrants verification that it doesn't inadvertently alter type normalization for existing anyOf configurations with length > 2 or those already containing null types
  • Ensure the expanded test case adequately covers the new conditional branches and doesn't mask edge cases

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: extending schema inferrer support from exactly two types to more than two types in anyOf structures.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch oncall_9920/schema_inferrer_support_more_than_two_types

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
unit_tests/utils/test_schema_inferrer.py (1)

215-270: Consider additional test coverage for edge cases?

The test nicely validates the core scenario with 3 non-null types in anyOf. For even more comprehensive coverage, you might consider adding test cases for:

  • anyOf with 3+ types where one entry is null
  • anyOf with exactly 1 type (edge case to confirm it's handled gracefully)

These would help ensure robustness, but the current test definitely covers the main objective of the PR. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6504148 and 14d3632.

📒 Files selected for processing (2)
  • airbyte_cdk/utils/schema_inferrer.py (1 hunks)
  • unit_tests/utils/test_schema_inferrer.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: MyPy Check
  • GitHub Check: preview_docs
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/utils/schema_inferrer.py (1)

123-124: Logic expansion looks good!

The change from == 2 to > 1 correctly extends support to anyOf structures with 3+ entries. The temporary type: ["null"] placeholder is properly cleaned up at line 163 by _remove_type_from_any_of, so this won't affect the final schema output. The logic remains backward compatible for the 2-entry case. Nice fix!

unit_tests/utils/test_schema_inferrer.py (2)

231-231: Good test coverage for the 3-entry anyOf scenario!

The addition of Nested_3 with a string value creates the perfect test case to validate the expanded anyOf logic. This ensures the inferrer correctly handles scenarios with more than two types.


249-261: Expected schema correctly reflects the three-type anyOf!

The updated expected schema with the string type in the anyOf properly validates that the inferrer can handle 3+ distinct types for a field. The schema structure looks correct with all three type variants (string, array, object) represented.

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

PyTest Results (Fast)

3 817 tests  ±0   3 805 ✅ ±0   6m 31s ⏱️ -12s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 14d3632. ± Comparison against base commit 6504148.

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

PyTest Results (Full)

3 820 tests   3 808 ✅  11m 15s ⏱️
    1 suites     12 💤
    1 files        0 ❌

Results for commit 14d3632.

@maxi297 maxi297 changed the title schema inferrer supports more than two types fix: schema inferrer supports more than two types Nov 10, 2025
@maxi297 maxi297 merged commit 6395265 into main Nov 10, 2025
29 of 32 checks passed
@maxi297 maxi297 deleted the oncall_9920/schema_inferrer_support_more_than_two_types branch November 10, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants