fix: zero out declarative cursor on concurrent streams#723
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix-cursor-datetime-format-issue#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix-cursor-datetime-format-issueHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughImplements a temporary workaround in create_declarative_stream: if the retriever is a SimpleRetriever, its cursor is cleared (set to None) before building the partition generator. Other flow and public interfaces remain unchanged. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as create_declarative_stream
participant R as SimpleRetriever
participant P as PartitionGenerator
C->>R: Instantiate retriever
alt retriever is SimpleRetriever
Note over C,R: TEMP workaround (FIXME)
C->>R: Set cursor = None
end
C->>P: Construct partition generator (uses retriever)
Note over C,P: Remaining flow unchanged
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
2093-2098: Scope the zero-out to concurrent mode and add a debug trace for observability, wdyt?The intent is clear and aligns with restoring the Concurrent CDK behavior. To make this safer and self-documenting, could we:
- Only zero out when a concurrent cursor is actually being used; and
- Emit a debug log so operators understand why SimpleRetriever.cursor is being nulled?
- # FIXME to be removed once we migrate everything to DefaultStream - if isinstance(retriever, SimpleRetriever): - # We zero it out here, but since this is a cursor reference, the state is still properly - # instantiated for the other components that reference it - retriever.cursor = None + # FIXME to be removed once we migrate everything to DefaultStream + # When running in concurrent mode, the concurrent cursor owns state progression. + # Ensure the SimpleRetriever does not attempt to process a declarative cursor. + if isinstance(retriever, SimpleRetriever) and isinstance(concurrent_cursor, ConcurrentCursor): + logging.getLogger(f"airbyte.{stream_name}").debug( + "Zeroing SimpleRetriever.cursor; ConcurrentCursor manages state for stream '%s'.", + stream_name, + ) + # We zero it out here, but since this is a cursor reference, the state instance remains + # available to other components that have their own references. + retriever.cursor = None
2093-2098: Add a regression test to lock in the behavior, wdyt?A small test would catch future regressions where SimpleRetriever starts observing/closing slices in concurrent flows. For example:
- Build a declarative stream with a DatetimeBasedCursor that resolves to a ConcurrentCursor.
- Assert that in the produced DefaultStream, the underlying SimpleRetriever has cursor is None while DefaultStream.cursor is a ConcurrentCursor.
If helpful, I can draft a test skeleton targeting create_declarative_stream and asserting the invariants above. Want me to push that?
1956-1961: Double-check: partition_field_start set from partition_field_end looks unintended, wdyt?This block configures DatetimeBasedRequestOptionsProvider for DatetimeBasedCursorModel, but both partition fields are sourced from
cursor_model.partition_field_end. Shouldpartition_field_startcome fromcursor_model.partition_field_startinstead?request_options_provider = DatetimeBasedRequestOptionsProvider( start_time_option=start_time_option, end_time_option=end_time_option, - partition_field_start=cursor_model.partition_field_end, - partition_field_end=cursor_model.partition_field_end, + partition_field_start=cursor_model.partition_field_start, + partition_field_end=cursor_model.partition_field_end, config=config, parameters=model.parameters or {}, )If this is intentional (e.g., a specific API quirk), could we add a short comment to prevent future “fixes”?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
SimpleRetriever(53-602)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
What
The v6.60.16 of the CDK brought back by accident some of the processing of the declarative cursor which is less resilient than the Concurrent CDK one. In order to fix this, we need to add this logic as part of the ModelToComponentFactory.
For more information, see https://airbytehq-team.slack.com/archives/C09BWEEGWTE
Summary by CodeRabbit