K2I is a standalone Rust service for Kafka-to-Apache-Iceberg ingestion. It consumes a configured Kafka topic, decodes raw or Confluent-framed Protobuf messages, keeps recent rows visible through an Arrow-backed local read path, and writes Parquet data files through Iceberg catalog commits.
The current release docs are organized around the working implementation and local verification flows. Historical PRDs, research notes, and older website drafts live under archive.
| Guide | Use It For |
|---|---|
| Kafka to Iceberg | Main explanation of the K2I data path |
| Quickstart | Local Docker proof and first manual run |
| Configuration | Complete TOML reference |
| Architecture | System design, ordering, and hot/cold visibility |
| Comparisons | K2I vs Kafka Connect, Flink, Spark, TableFlow, and Moonlink |
| FAQ | Short answers for common user questions |
| Guide | Use It For |
|---|---|
| DuckDB Iceberg Validation | Docker E2E, direct Parquet reads, and DuckDB iceberg_scan |
| Schema Registry Protobuf | Confluent Protobuf decoding and schema evolution behavior |
| Iceberg REST Catalog | REST catalog commits and catalog backend caveats |
| Commands | CLI command reference and E2E scripts |
| Man Pages | Generated man pages for every CLI command and subcommand |
| Deployment | Deployment patterns and operational notes |
| Troubleshooting | Common issues and recovery guidance |
| Production Readiness | Verification status, caveats, and follow-up issues |
# Correctness flow: Protobuf evolution, read-state RPC, DuckDB Parquet checks
scripts/e2e-docker.sh
# Real Iceberg REST metadata and DuckDB iceberg_scan
scripts/e2e-docker-iceberg.sh
# 100,000-row Iceberg load profile
K2I_E2E_LOAD_MESSAGES=100000 scripts/e2e-docker-iceberg-load.shThe Iceberg E2E success line is:
ok: DuckDB iceberg_scan validated real Iceberg metadata
K2I is production-oriented, but the docs intentionally keep caveats visible:
- one configured Kafka topic and one configured Iceberg table per process today;
- REST catalog real-metadata path validated locally; Glue, Hive, and Nessie abstractions require backend-specific validation;
- hot reads are local read-state RPC, while query engines see data after an Iceberg commit;
- exactly-once-style durability is designed around manual Kafka offsets, transaction-log records, idempotency records, immutable Parquet writes, and atomic Iceberg commits;
- multi-partition hardening, startup recovery application, async Kafka commit acknowledgement, per-entry fsync behavior, GCS/Azure writer wiring, and maintenance scheduler wiring remain production follow-ups.
See Production Readiness before broad rollout.