-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Summary
S2 is serverless storage for streams, designed to make streaming data simple and reliable.
Add native s2_input connector support in Feldera for direct ingestion from S2 streams.
Motivation
Feldera currently has no first-class S2 input connector.
This feature enables users to configure S2 as a native input transport in SQL pipeline connectors, with resume/replay semantics aligned with Feldera’s fault-tolerance model.
Expected outcomes:
s2_inputis accepted and usable in SQL connector configuration.- Pipelines can resume from checkpoints using S2 sequence positions.
- Behavior is consistent with existing sequence-based input connectors.
Reference-level explanation
Proposed scope:
- Add
S2InputConfiginfeldera-types(e.g.basin,stream,auth_token, optionalendpoint,start_from). - Add
TransportConfig::S2Input(...). - Implement S2 adapter (
S2InputEndpoint/S2Reader) incrates/adapters/src/transport/s2. - Use S2
read_sessionfor streaming reads. - Track checkpoint metadata as
seq_num_range: Range<u64>. - Replay by reading from checkpoint start and consuming to checkpoint end.
- Register
S2Inputin adapter factory behindwith-s2. - Update pipeline-manager connector classification so
S2Inputis treated as an input variant during SQL connector validation.
Example connector payload:
CREATE TABLE events (
id BIGINT,
data STRING
) WITH (
'append_only' = 'true',
'connectors' = '[{
"transport": {
"name": "s2_input",
"config": {
"basin": "test-buket",
"stream": "feldera",
"auth_token": "your-s2-token",
"endpoint": "http://localhost:7070/",
"start_from": "Beginning"
}
},
"format": {
"name": "json",
"config": {
"update_format": "raw"
}
}
}]'
);
CREATE materialized VIEW res AS select * from events;
Rationale and alternatives
Why this design:
S2 sequence numbers map naturally to Feldera checkpoint/replay metadata.
Reusing patterns from existing FT input connectors reduces integration risk.
Alternatives considered:
No native S2 connector: keeps status quo but leaves a key input source unsupported.
Generic HTTP-only ingestion path: does not provide native sequence-based replay semantics.
Impact of not doing this:
Users cannot configure S2 as a first-class Feldera input transport.
Fault-tolerant resume/replay for S2 ingestion remains unavailable in native connector flow.