feat(low-code cdk): Allow for multiple schema_loaders to be defined for a stream#536
feat(low-code cdk): Allow for multiple schema_loaders to be defined for a stream#536
Conversation
📝 WalkthroughWalkthroughThis change introduces support for specifying multiple schema loaders in declarative streams by allowing the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DeclarativeStream
participant ModelToComponentFactory
participant CompositeSchemaLoader
participant SchemaLoader(s)
User->>DeclarativeStream: Define stream with schema_loader (single or list)
DeclarativeStream->>ModelToComponentFactory: Pass model with schema_loader(s)
ModelToComponentFactory->>CompositeSchemaLoader: If list, wrap loaders
CompositeSchemaLoader->>SchemaLoader(s): Call get_json_schema() on each
CompositeSchemaLoader->>ModelToComponentFactory: Merge schemas, return combined
ModelToComponentFactory->>DeclarativeStream: Return stream with composite loader
Suggested reviewers
Would you like to add more integration tests for edge cases, such as deeply nested property conflicts, to further ensure the robustness of the schema merging logic? Wdyt? Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (9)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1928-1934:⚠️ Potential issueFix type check errors in the implementation
There are mypy errors in this section:
- Line 1928: Type error when passing
model.schema_loaderto_create_component_from_model- Line 1934: Incompatible types in assignment (DefaultSchemaLoader vs CompositeSchemaLoader)
The first error is because mypy expects a BaseModel but gets a list. The second might be related to how the else branch assigns a DefaultSchemaLoader to a variable that previously held a CompositeSchemaLoader.
To fix the type issues, consider using a type annotation with Union for the schema_loader variable:
+ schema_loader: Union[CompositeSchemaLoader, DefaultSchemaLoader, JsonFileSchemaLoader, InlineSchemaLoader, DynamicSchemaLoader] if model.schema_loader and isinstance(model.schema_loader, list): nested_schema_loaders = [ self._create_component_from_model(model=schema_loader, config=config) for schema_loader in model.schema_loader ] schema_loader = CompositeSchemaLoader( schema_loaders=nested_schema_loaders, parameters=model.parameters or {} ) elif model.schema_loader: schema_loader = self._create_component_from_model( model=model.schema_loader, config=config ) else: options = model.parameters or {} if "name" not in options: options["name"] = model.name schema_loader = DefaultSchemaLoader(config=config, parameters=options)🧰 Tools
🪛 GitHub Actions: Linters
[error] 1928-1928: mypy error: Argument "model" to "_create_component_from_model" of "ModelToComponentFactory" has incompatible type "InlineSchemaLoader | DynamicSchemaLoader | JsonFileSchemaLoader | CustomSchemaLoader | list[InlineSchemaLoader | DynamicSchemaLoader | JsonFileSchemaLoader | CustomSchemaLoader]"; expected "BaseModel" [arg-type]
[error] 1934-1934: mypy error: Incompatible types in assignment (expression has type "DefaultSchemaLoader", variable has type "CompositeSchemaLoader") [assignment]
🧹 Nitpick comments (10)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1424-1427: Polish description grammar?
The new description is clear, but maybe we could tighten up the phrasing—for example, “Schema loaders defined first taking precedence…” could read better as “Loaders listed first take precedence in case of conflict.” Would you be up for a quick tweak here? wdyt?
1428-1439: Consider tightening the array schema constraints?
The addition of- type: arraywithitems.anyOfnicely enables multiple loaders. A couple of ideas to guard against edge cases:
- Add
minItems: 1to ensure the array isn’t empty.- Add
uniqueItems: trueto prevent duplicate loader definitions.- Switch
anyOftooneOfif you want each value to match exactly one branch (single loader vs array) rather than any matching subset.Any of these sound useful to you? wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1918-1926: Good implementation of multiple schema loader support, but consider passing model parametersThe implementation correctly handles both list and single schema loader cases, maintaining backward compatibility. However, you're creating the CompositeSchemaLoader with an empty parameters dictionary. Should you pass
model.parameters or {}instead to be consistent with other component creations? wdyt?- schema_loader = CompositeSchemaLoader( - schema_loaders=nested_schema_loaders, parameters={} - ) + schema_loader = CompositeSchemaLoader( + schema_loaders=nested_schema_loaders, parameters=model.parameters or {} + )
1918-1926: Consider adding a brief comment explaining the new functionalityIt would be helpful to add a comment explaining the purpose of allowing multiple schema loaders and how they're combined. This makes the code more maintainable.
if model.schema_loader and isinstance(model.schema_loader, list): + # Handle multiple schema loaders by creating a CompositeSchemaLoader + # which merges the properties from each schema loader, with earlier loaders taking priority nested_schema_loaders = [ self._create_component_from_model(model=schema_loader, config=config) for schema_loader in model.schema_loader ]unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
4567-4633: Good test case for multiple schema loaders functionality.This test verifies that the factory correctly handles declarative streams defined with multiple schema loaders specified as a list. The test properly:
- Defines a stream with two different inline schemas
- Processes the manifest through parsing and transformation
- Verifies the resulting schema loader is a
CompositeSchemaLoadercontaining twoInlineSchemaLoaderinstancesHave you considered adding a test case that mixes different schema loader types (like
JsonFileSchemaLoaderwithInlineSchemaLoader) to verify that works too? Also, would it be worth adding an assertion to verify the actual schema content after merging? Just thinking about edge cases, wdyt?airbyte_cdk/sources/declarative/schema/composite_schema_loader.py (2)
11-32: Clean implementation of the CompositeSchemaLoader!The implementation is straightforward and the docstring clearly explains the precedence behavior. The merging of schemas is handled efficiently.
One thing to consider: the current implementation only merges top-level properties, but doesn't perform deep merging of nested properties. In most cases this is probably fine, but would a deep merge be beneficial for some use cases? For example:
def get_json_schema(self) -> Mapping[str, Any]: combined_schema: Dict[str, Any] = { "$schema": "http://json-schema.org/draft-07/schema#", "type": ["null", "object"], "properties": {}, } for schema_loader in self.schema_loaders: schema_properties = schema_loader.get_json_schema()["properties"] - combined_schema["properties"] = {**schema_properties, **combined_schema["properties"]} + # Deep merge properties + self._deep_merge_properties(schema_properties, combined_schema["properties"]) return combined_schema + def _deep_merge_properties(self, source: Dict[str, Any], target: Dict[str, Any]) -> None: + """Deep merge source properties into target, with target taking precedence.""" + for key, value in source.items(): + if key not in target: + target[key] = value + elif isinstance(value, dict) and isinstance(target[key], dict): + if "properties" in value and "properties" in target[key]: + if "properties" not in target[key]: + target[key]["properties"] = {} + self._deep_merge_properties(value["properties"], target[key]["properties"])Would that be useful? Or perhaps overkill for the current use cases?
29-30: Schema merging logic works but could be clearer!The current implementation has properties from earlier loaders taking precedence, which is mentioned in the docstring. However, the dictionary unpacking order might be confusing at first glance.
Consider making the precedence more explicit with a comment or by rearranging the expression:
- combined_schema["properties"] = {**schema_properties, **combined_schema["properties"]} + # Ensure properties from earlier loaders take precedence + combined_schema["properties"] = {**schema_properties, **combined_schema["properties"]}Or alternatively:
- combined_schema["properties"] = {**schema_properties, **combined_schema["properties"]} + # First apply new properties, then keep existing ones (which take precedence) + new_properties = dict(schema_properties) # Make a copy + new_properties.update(combined_schema["properties"]) # Existing properties override new ones + combined_schema["properties"] = new_propertiesThis makes the intention clearer even to readers who aren't familiar with dictionary unpacking behavior. What do you think?
unit_tests/sources/declarative/schema/test_composite_schema_loader.py (3)
55-151: Great test for the dynamic/static schema combination!This test thoroughly verifies that the CompositeSchemaLoader correctly combines properties from both static and dynamic schema loaders.
One suggestion - it might be helpful to add assertions that specifically verify the schema structure beyond just comparing the entire expected schema. For example:
# Verify specific aspects of the schema assert "dynamic_property_one" in schema["properties"] assert "id" in schema["properties"] assert schema["properties"]["nestedDisplayOptions"]["properties"]["theme"]["type"] == ["null", "string"]This would make it clearer what aspects of the schema are most important to validate and provide more specific failure messages if something goes wrong. What do you think?
153-211: Good test for property precedence behavior!This test properly verifies that properties from earlier schema loaders take precedence over later ones.
Consider adding a comment explaining why certain properties appear the way they do in the expected schema. For example, a comment noting that "updatedAt" from the first loader takes precedence over the version in the second loader, and "id" from the first loader takes precedence over the version in the third loader. This would make the test's intent clearer to future readers.
# Test specifically verifies: # - "updatedAt" from first loader (format: date-time) overrides second loader (format: date) # - "id" from first loader (type: string) overrides third loader (type: integer) # - "organization" from second loader is included expected_schema = { ... }Would that help clarify the test's purpose?
1-211: Consider adding tests for edge cases!The tests cover the main functionality well, but there are a few edge cases that might be worth testing.
Consider adding tests for:
- Empty schema_loaders list
- Schema loader that returns a schema without properties
- Schema loader that returns None or raises an exception
These would help ensure the CompositeSchemaLoader is robust against unexpected inputs. Would these additional tests be valuable?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(2 hunks)airbyte_cdk/sources/declarative/schema/composite_schema_loader.py(1 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py(2 hunks)unit_tests/sources/declarative/schema/test_composite_schema_loader.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (1)
DynamicSchemaLoader(119-285)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 1928-1928: mypy error: Argument "model" to "_create_component_from_model" of "ModelToComponentFactory" has incompatible type "InlineSchemaLoader | DynamicSchemaLoader | JsonFileSchemaLoader | CustomSchemaLoader | list[InlineSchemaLoader | DynamicSchemaLoader | JsonFileSchemaLoader | CustomSchemaLoader]"; expected "BaseModel" [arg-type]
[error] 1934-1934: mypy error: Incompatible types in assignment (expression has type "DefaultSchemaLoader", variable has type "CompositeSchemaLoader") [assignment]
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
507-507: Clean import addition for CompositeSchemaLoaderThe import is correctly placed in the schema section, maintaining the organizational structure of the imports.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
155-156: Well-structured import changes.Adding the necessary imports for
InlineSchemaLoaderandCompositeSchemaLoaderto support testing the new functionality.airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
2150-2164: Nice extension of the schema_loader field!This change allows the flexibility to use multiple schema loaders while maintaining backward compatibility. The Union type correctly handles both single schema loaders and lists of loaders.
Just a thought - does the implementation handle any possible conflicts or overlaps between schemas from different loaders? Looking at the CompositeSchemaLoader implementation, it seems earlier loaders take precedence, which is likely the expected behavior, but might be worth documenting this behavior in the field description, wdyt?
What
While migrating airbytehq/airbyte#58105, @tolik0 identified that it was very tedious and limiting to introduce static schema fields into a dynamic schema loader because it required everything to be defined in a transformation.
This PR allows for the definition of multiple schema_loaders on a single stream that are then merged together so that we can use the more intuitive and readable InlineSchemaLoader in combination with DynamicSchemaLoader
How
Modifies the low code manifest interface to accept either a list or a single loader to retain backwards compatibility
Under the hood we have a
CompositeSchemaLoaderwhich combines the properties of each schema loader together with earlier defined schema loaders taking priority. This is more intuitive so users don't need to learn or understand another component compared to just a regular list of schema loaders.Summary by CodeRabbit