feat: Removes stream_state interpolation from CDK#320
Conversation
…stream_state` to jinja interpolation
This comment was marked as off-topic.
This comment was marked as off-topic.
stream_state interpolation)stream_state interpolation from CDK
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/interpolation/jinja.py (2)
41-43: Great error message! Consider adding docstring?The mapping and error message are clear and helpful. What do you think about adding a brief docstring to explain the purpose of this mapping and how to extend it for future unsupported variables? wdyt?
_UNSUPPORTED_INTERPOLATION_VARIABLES: Mapping[str, str] = { + # Maps variable names that are no longer supported for interpolation to their corresponding error messages. + # Add new entries here when deprecating interpolation variables. "stream_state": "`stream_state` is no longer supported for interpolation. We recommend using `stream_interval` instead. Please reference the CDK Migration Guide for more information.", }
104-110: Consider using AST for more accurate variable detection?The current implementation checks for variable names using string containment, which might have false positives (e.g., if "stream_state" appears in a string literal). Since we already have
_find_undeclared_variables, what do you think about using it for a more robust check? wdyt?- for variable_name in _UNSUPPORTED_INTERPOLATION_VARIABLES: - if variable_name in input_str: + if isinstance(input_str, str): + undeclared = self._find_undeclared_variables(input_str) + for variable_name in _UNSUPPORTED_INTERPOLATION_VARIABLES: + if variable_name in undeclared: raise AirbyteTracedException( message=_UNSUPPORTED_INTERPOLATION_VARIABLES[variable_name], internal_message=_UNSUPPORTED_INTERPOLATION_VARIABLES[variable_name], failure_type=FailureType.config_error, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py(0 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml(0 hunks)airbyte_cdk/sources/declarative/interpolation/jinja.py(3 hunks)airbyte_cdk/sources/declarative/requesters/http_requester.py(0 hunks)unit_tests/sources/declarative/test_concurrent_declarative_source.py(0 hunks)
💤 Files with no reviewable changes (4)
- unit_tests/sources/declarative/test_concurrent_declarative_source.py
- airbyte_cdk/sources/declarative/concurrent_declarative_source.py
- airbyte_cdk/sources/declarative/requesters/http_requester.py
- airbyte_cdk/sources/declarative/declarative_component_schema.yaml
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/interpolation/jinja.py (1)
14-14: LGTM! Good error handling setup.The new imports for
FailureTypeandAirbyteTracedExceptionare well-organized and align with the PR's objective to handle unsupported interpolation variables gracefully.Also applies to: 19-19
natikgadzhi
left a comment
There was a problem hiding this comment.
The code change looks good, and I for once understood it. Didn't run tests locally, and see they're failing — good to go on green CI?
..._cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.py
Show resolved
Hide resolved
|
@pnilan lol you got got: |
|
@natikgadzhi Just went through and updated all the tests to either convert stream_state interpolation to stream_interval equivalent or remove outright if no longer relevant. CI should be green now. @brianjlai Can you review? |
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py (1)
64-66: 💡 Verification agent🧩 Analysis chain
Consider updating method signature to reflect new interpolation context.
The test is using
stream_intervalin templates but still passesstream_stateto the method. Should we update the method signature to better reflect the new design, wdyt?
🏁 Script executed:
#!/bin/bash # Search for other method signatures that might need similar updates rg -A 2 "def get_request_params"Length of output: 5962
Could you please provide the output from this script?
🏁 Script executed:
#!/bin/bash # Let's extract the method signature from interpolated_request_options_provider.py to check if it uses stream_state or stream_interval. rg -n "def get_request_params" airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.pyLength of output: 163
Consistency Check: Update Parameter Naming in get_request_params?
There appears to be an inconsistency between the new interpolation context and the method signature. In your test (lines 64–66 of
unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py), you callget_request_paramswithstream_state, yet elsewhere in the updated interpolation logic the termstream_intervalis being introduced.
- The method signature in
airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.py(line 72) has not been updated to reflect this naming change.- Would you consider updating the parameter name in the method signature (or alternatively, aligning the test call) so that it consistently reflects the new context using
stream_interval? wdyt?
🧹 Nitpick comments (5)
unit_tests/sources/declarative/extractors/test_record_selector.py (1)
26-26: LGTM! Consider adding a comment about the migration.The change from
stream_statetostream_interval.extra_fieldsaligns with the PR objectives. Would it be helpful to add a comment explaining this migration for future reference, wdyt? 🤔+ # Note: Using stream_interval.extra_fields instead of stream_state as part of the stream_state interpolation removal "{{ record['created_at'] > stream_interval.extra_fields['created_at'] }}",unit_tests/sources/declarative/datetime/test_min_max_datetime.py (1)
25-25: Consider standardizing the stream_slice key naming?I notice we're using both
start_dateandneweras keys instream_sliceacross different test cases. For consistency and to avoid confusion, should we standardize on one key name throughout the test file? wdyt?Also applies to: 31-31, 45-45, 54-54, 59-60, 68-68, 80-80
unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py (1)
11-12: Consider updating test data setup.The test uses
statevariable but templates usestream_interval. Should we rename the variable to match the new context, wdyt?-state = {"date": "2021-01-01"} +stream_interval = {"start_date": "2021-01-01"}unit_tests/sources/declarative/extractors/test_record_filter.py (2)
59-60: Verify the impact of changing comparison operator.The comparison operator changed from
>to>=. This means records with exact matching dates will now be included. Is this intentional? If so, should we add test cases for the edge case wherecreated_atexactly equalsstream_interval.extra_fields['created_at'], wdyt?Would you like me to generate additional test cases to cover this edge case?
64-65: Consider updating test data to include edge cases.The test data doesn't include a case where
created_atexactly equals the comparison value. Should we add such a case to verify the new>=behavior, wdyt?[ {"id": 1, "created_at": "06-06-21"}, {"id": 2, "created_at": "06-07-21"}, + {"id": 4, "created_at": "06-07-21"}, # Edge case: exact match {"id": 3, "created_at": "06-08-21"}, ], - [{"id": 2, "created_at": "06-07-21"}, {"id": 3, "created_at": "06-08-21"}], + [{"id": 2, "created_at": "06-07-21"}, {"id": 4, "created_at": "06-07-21"}, {"id": 3, "created_at": "06-08-21"}],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
unit_tests/sources/declarative/datetime/test_min_max_datetime.py(6 hunks)unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py(1 hunks)unit_tests/sources/declarative/extractors/test_record_filter.py(1 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py(2 hunks)unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py(2 hunks)unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py(2 hunks)unit_tests/sources/declarative/requesters/request_options/test_interpolated_request_options_provider.py(5 hunks)unit_tests/sources/declarative/transformations/test_add_fields.py(0 hunks)
💤 Files with no reviewable changes (1)
- unit_tests/sources/declarative/transformations/test_add_fields.py
✅ Files skipped from review due to trivial changes (1)
- unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (7)
unit_tests/sources/declarative/extractors/test_record_selector.py (1)
83-87: LGTM! Consider adding more test cases.The addition of
extra_fieldstoStreamSlicelooks good. Would you like to add more test cases to verify edge cases, wdyt? For example:
- Missing
extra_fields- Empty
extra_fields- Different field names in
extra_fieldsunit_tests/sources/declarative/partition_routers/test_parent_state_stream.py (2)
148-148: LGTM! The change fromstream_statetostream_intervalaligns with the CDK migration.The update correctly replaces the deprecated
stream_stateinterpolation withstream_intervalin the record filter condition. The test cases thoroughly validate this change across various scenarios.
261-1074: Comprehensive test coverage for the migration, nice work!The test cases thoroughly validate the migration from
stream_statetostream_intervalby:
- Testing basic functionality with different state configurations
- Verifying state persistence and updates
- Handling edge cases with empty responses
- Testing state migration scenarios
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (2)
146-146: The change fromstream_statetostream_interval.extra_fieldsaligns with the PR objectives.The condition has been updated to use
stream_interval.extra_fieldsinstead ofstream_state, which is consistent with the PR's goal of removingstream_stateinterpolation. The test case validates that the new approach works correctly.
2546-2546: Similar change in the request options manifest looks good.The condition in
SUBSTREAM_REQUEST_OPTIONS_MANIFESThas been updated to usestream_interval['extra_fields']instead ofstream_state, maintaining consistency with the earlier change.unit_tests/sources/declarative/datetime/test_min_max_datetime.py (2)
89-89: LGTM! Clean transition from stream_state to stream_sliceThe test setup and get_datetime call have been updated correctly to use stream_slice instead of stream_state.
Also applies to: 95-95
102-102: Test coverage looks good! One small suggestion...The custom format tests have been updated correctly to use stream_slice. However, should we add a test case to verify behavior when stream_slice is None or missing the expected key? This would help ensure robust error handling, wdyt?
Also applies to: 108-108, 111-111, 120-120, 126-126, 129-129
brianjlai
left a comment
There was a problem hiding this comment.
A few suggestions here and there, but overall this is looking good.
One other area we might be able to cleanup, is simple_retriever.py._get_request_options(). Although its probably too annoying to remove stream_state from the method signature, we can stop passing it into the subsequent paginator_method and stream_slicer methods since at that point stream_state defaults to None. And from there it all becomes the kwargs you already removed. Might be worth trying to remove and see if the tests still pass and then we know it wasn't needed
unit_tests/sources/declarative/datetime/test_min_max_datetime.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py(0 hunks)airbyte_cdk/sources/declarative/transformations/add_fields.py(2 hunks)unit_tests/sources/declarative/datetime/test_min_max_datetime.py(6 hunks)unit_tests/sources/declarative/interpolation/test_jinja.py(2 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py(1 hunks)
💤 Files with no reviewable changes (1)
- airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
🚧 Files skipped from review as they are similar to previous changes (2)
- unit_tests/sources/declarative/datetime/test_min_max_datetime.py
- airbyte_cdk/sources/declarative/transformations/add_fields.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
132-132:Details
❓ Verification inconclusive
LGTM! The test has been updated to match the new method signature.
The change aligns with the PR objectives of removing
stream_stateinterpolation from the codebase. The test now correctly verifies that_request_paramsaccepts two parameters instead of three while maintaining the same assertion logic.Let's verify that this change is consistent across the codebase:
🏁 Script executed:
#!/bin/bash # Description: Check for any remaining instances of _request_params with three parameters # This helps ensure the change has been applied consistently ast-grep --pattern $'_request_params\(\$_, \$_, \$_\)'Length of output: 55
LGTM! The test for
_request_paramsnow properly reflects the updated signature.
The assertion on line 132 correctly verifies that calling_request_paramswith two parameters returns an empty dictionary, which aligns with the PR’s objective of removingstream_stateinterpolation. I didn't find any evidence of a three-parameter call using our initial verification command, although the lack of output suggests that you might want to manually confirm or run additional checks to be 100% sure. wdyt?
brianjlai
left a comment
There was a problem hiding this comment.
looks good to me! coderabbit had a thing and one more adjustment to the schema file, but no need to block pending the fix.
Have you functionally tested it against one of our existing connectors that uses stream_state?
Confirmed w/ |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cdk-migrations.md (2)
3-18: Great Introduction to the Breaking Change Section!
You've done a clear job outlining the removal ofstream_stateand recommending the update tostream_intervalin the "Upgrading to 6.X.X" section. Would you consider adding a brief note (for example, after line 17) to remind developers to re-test their connectors after these changes or to reference specific sections of the migration guide for further details? wdyt?
21-31: Clear YAML Example Demonstrating the Migration!
The before-and-after YAML snippet provides an excellent illustration of how to update the RecordFilter condition. Would you consider adding a short comment within the code block (or immediately after) to alert users about potential pitfalls when migrating fromstream_statetostream_interval? This might help reinforce best practices as they update their configuration. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cdk-migrations.md(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
What
stream_stateinterpolation frominterpolation_contextindeclarative_component_schemastream_stateto theHttpRequester,AddFields, andRequestOptionsProvider, andRecordFiltercomponents.stream_stateinaccessible as a interpolated variable.JinjaInterpolationclass to raiseAirbyteTracedExceptionwhen connector attempts to evaluate a jinja expression that containsstream_state.DefaultStreams(concurrent) ifstream_stateinterpolation was used.CDK Migration GuideRecommended Reading Order
declarative_component_schema.yamlhttp_requester.pyinterpolated_requests_input_provider.pyinterpolated_nested_request_input_provider.pyadd_fields.pyinterpolated_requests_options_provider.pyjinja.pycdk_migrations.mdSummary by CodeRabbit
stream_stateparameter with a new interval-based approach across request handling and transformations.