Skip to content

feat: add ClientSideIncrementalRetrieverDecorator for custom retrievers#893

Draft
darynaishchenko wants to merge 2 commits intomainfrom
devin/1770312809-client-side-incremental-custom-retriever
Draft

feat: add ClientSideIncrementalRetrieverDecorator for custom retrievers#893
darynaishchenko wants to merge 2 commits intomainfrom
devin/1770312809-client-side-incremental-custom-retriever

Conversation

@darynaishchenko
Copy link
Contributor

Summary

Fixes a gap where custom retrievers bypass client-side incremental filtering even when is_client_side_incremental: true is configured in the manifest.

Problem: When a stream uses a custom retriever (not SimpleRetriever), the ClientSideIncrementalRecordFilterDecorator that normally filters records based on cursor state is never applied. This causes all records to be emitted on every sync, defeating the purpose of incremental syncing.

Solution: Add a new ClientSideIncrementalRetrieverDecorator that wraps custom retrievers at the retriever level (rather than record filter level) and applies the same filtering logic using cursor.should_be_synced().

Changes:

  • New ClientSideIncrementalRetrieverDecorator class in retrievers/
  • Updated create_default_stream() in model_to_component_factory.py to wrap custom retrievers when:
    • is_client_side_incremental is enabled
    • Retriever is NOT a SimpleRetriever (which already handles this via record filter)
    • Cursor is NOT a FinalStateCursor (i.e., there's actual incremental state)

Related to: airbytehq/oncall#11113

Review & Testing Checklist for Human

  • Verify wrapping condition logic: Confirm that the elif branch in create_default_stream() correctly excludes AsyncRetriever (handled in the if branch above) and SimpleRetriever (already has filtering)
  • Test with a real custom retriever connector: The unit tests only test the decorator in isolation. Recommend testing with a connector like Notion that uses a custom retriever with is_client_side_incremental: true to verify end-to-end behavior
  • Review Record creation for Mapping records: In the decorator, when a Mapping is received, a Record is created with stream_name="" - verify this doesn't cause issues with cursor implementations

Notes

This change automatically applies client-side incremental filtering to
custom retrievers when is_client_side_incremental is enabled in the
manifest.

Previously, custom retrievers bypassed the ClientSideIncrementalRecordFilterDecorator
that SimpleRetriever uses, causing all records to be emitted on every sync
even when client_side_incremental was configured.

Changes:
- Add ClientSideIncrementalRetrieverDecorator class that wraps any Retriever
  and filters records using cursor.should_be_synced()
- Update model_to_component_factory.create_default_stream() to wrap custom
  retrievers with the decorator when is_client_side_incremental is enabled
- Add unit tests for the new decorator

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1770312809-client-side-incremental-custom-retriever#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1770312809-client-side-incremental-custom-retriever

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

PyTest Results (Fast)

3 860 tests  +5   3 848 ✅ +5   6m 29s ⏱️ +2s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 3a00ee5. ± Comparison against base commit 15542de.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

PyTest Results (Full)

3 863 tests  +5   3 851 ✅ +5   10m 53s ⏱️ -8s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 3a00ee5. ± Comparison against base commit 15542de.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant