fix: schema inferrer supports more than two types#830
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@oncall_9920/schema_inferrer_support_more_than_two_types#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch oncall_9920/schema_inferrer_support_more_than_two_typesHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughThe schema inferrer utility updates the logic for handling anyOf structures to assign null-type defaults in broader scenarios beyond the previous exact-length-2 requirement. Corresponding test cases are expanded to validate the new behavior with nested items and additional schema options. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Areas for focused attention:
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/utils/test_schema_inferrer.py (1)
215-270: Consider additional test coverage for edge cases?The test nicely validates the core scenario with 3 non-null types in anyOf. For even more comprehensive coverage, you might consider adding test cases for:
- anyOf with 3+ types where one entry is null
- anyOf with exactly 1 type (edge case to confirm it's handled gracefully)
These would help ensure robustness, but the current test definitely covers the main objective of the PR. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/utils/schema_inferrer.py(1 hunks)unit_tests/utils/test_schema_inferrer.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: MyPy Check
- GitHub Check: preview_docs
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/utils/schema_inferrer.py (1)
123-124: Logic expansion looks good!The change from
== 2to> 1correctly extends support to anyOf structures with 3+ entries. The temporarytype: ["null"]placeholder is properly cleaned up at line 163 by_remove_type_from_any_of, so this won't affect the final schema output. The logic remains backward compatible for the 2-entry case. Nice fix!unit_tests/utils/test_schema_inferrer.py (2)
231-231: Good test coverage for the 3-entry anyOf scenario!The addition of Nested_3 with a string value creates the perfect test case to validate the expanded anyOf logic. This ensures the inferrer correctly handles scenarios with more than two types.
249-261: Expected schema correctly reflects the three-type anyOf!The updated expected schema with the string type in the anyOf properly validates that the inferrer can handle 3+ distinct types for a field. The schema structure looks correct with all three type variants (string, array, object) represented.
PyTest Results (Full)3 820 tests 3 808 ✅ 11m 15s ⏱️ Results for commit 14d3632. |
What
https://github.com/airbytehq/oncall/issues/9920
More specifically following this comment, I'm not sure why we only supported two entries in the
anyOfso I've updated to code to support any number higher than 1.Summary by CodeRabbit
Release Notes
Bug Fixes
Tests