Skip to content

[RFC][Connectors][Draft]: Add native s2 input connector support in Feldera #5726

@Mrhs121

Description

@Mrhs121

Summary

S2 is serverless storage for streams, designed to make streaming data simple and reliable.
Add native s2_input connector support in Feldera for direct ingestion from S2 streams.

Motivation

Feldera currently has no first-class S2 input connector.
This feature enables users to configure S2 as a native input transport in SQL pipeline connectors, with resume/replay semantics aligned with Feldera’s fault-tolerance model.

Expected outcomes:

  • s2_input is accepted and usable in SQL connector configuration.
  • Pipelines can resume from checkpoints using S2 sequence positions.
  • Behavior is consistent with existing sequence-based input connectors.

Reference-level explanation

Proposed scope:

  • Add S2InputConfig in feldera-types (e.g. basin, stream, auth_token, optional endpoint, start_from).
  • Add TransportConfig::S2Input(...).
  • Implement S2 adapter (S2InputEndpoint / S2Reader) in crates/adapters/src/transport/s2.
  • Use S2 read_session for streaming reads.
  • Track checkpoint metadata as seq_num_range: Range<u64>.
  • Replay by reading from checkpoint start and consuming to checkpoint end.
  • Register S2Input in adapter factory behind with-s2.
  • Update pipeline-manager connector classification so S2Input is treated as an input variant during SQL connector validation.

Example connector payload:

CREATE TABLE events (
      id BIGINT,
      data STRING
  ) WITH (
      'append_only' = 'true',
      'connectors' = '[{
          "transport": {
              "name": "s2_input",
              "config": {
                  "basin": "test-buket",
                  "stream": "feldera",
                  "auth_token": "your-s2-token",
                  "endpoint": "http://localhost:7070/",
                  "start_from": "Beginning"
              }
          },
          "format": {
              "name": "json",
              "config": {
                  "update_format": "raw"
              }
          }
      }]'
  );
CREATE  materialized VIEW res AS select * from events;

Rationale and alternatives

Why this design:

S2 sequence numbers map naturally to Feldera checkpoint/replay metadata.
Reusing patterns from existing FT input connectors reduces integration risk.
Alternatives considered:

No native S2 connector: keeps status quo but leaves a key input source unsupported.
Generic HTTP-only ingestion path: does not provide native sequence-based replay semantics.
Impact of not doing this:

Users cannot configure S2 as a first-class Feldera input transport.
Fault-tolerant resume/replay for S2 ingestion remains unavailable in native connector flow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions