docs

K2I Documentation

K2I is a standalone Rust service for Kafka-to-Apache-Iceberg ingestion. It consumes a configured Kafka topic, decodes raw or Confluent-framed Protobuf messages, keeps recent rows visible through an Arrow-backed local read path, and writes Parquet data files through Iceberg catalog commits.

The current release docs are organized around the working implementation and local verification flows. Historical PRDs, research notes, and older website drafts live under archive.

Start Here

Guide	Use It For
Kafka to Iceberg	Main explanation of the K2I data path
Quickstart	Local Docker proof and first manual run
Configuration	Complete TOML reference
Architecture	System design, ordering, and hot/cold visibility
Comparisons	K2I vs Kafka Connect, Flink, Spark, TableFlow, and Moonlink
FAQ	Short answers for common user questions

Implementation Deep Dives

Guide	Use It For
DuckDB Iceberg Validation	Docker E2E, direct Parquet reads, and DuckDB `iceberg_scan`
Schema Registry Protobuf	Confluent Protobuf decoding and schema evolution behavior
Iceberg REST Catalog	REST catalog commits and catalog backend caveats
Commands	CLI command reference and E2E scripts
Man Pages	Generated man pages for every CLI command and subcommand
Deployment	Deployment patterns and operational notes
Troubleshooting	Common issues and recovery guidance
Production Readiness	Verification status, caveats, and follow-up issues

Quick Local Proof

# Correctness flow: Protobuf evolution, read-state RPC, DuckDB Parquet checks
scripts/e2e-docker.sh

# Real Iceberg REST metadata and DuckDB iceberg_scan
scripts/e2e-docker-iceberg.sh

# 100,000-row Iceberg load profile
K2I_E2E_LOAD_MESSAGES=100000 scripts/e2e-docker-iceberg-load.sh

The Iceberg E2E success line is:

ok: DuckDB iceberg_scan validated real Iceberg metadata

Current Release Scope

K2I is production-oriented, but the docs intentionally keep caveats visible:

one configured Kafka topic and one configured Iceberg table per process today;
REST catalog real-metadata path validated locally; Glue, Hive, and Nessie abstractions require backend-specific validation;
hot reads are local read-state RPC, while query engines see data after an Iceberg commit;
exactly-once-style durability is designed around manual Kafka offsets, transaction-log records, idempotency records, immutable Parquet writes, and atomic Iceberg commits;
multi-partition hardening, startup recovery application, async Kafka commit acknowledgement, per-entry fsync behavior, GCS/Azure writer wiring, and maintenance scheduler wiring remain production follow-ups.

See Production Readiness before broad rollout.

Name		Name	Last commit message	Last commit date
parent directory ..
archive		archive
man		man
README.md		README.md
architecture.md		architecture.md
commands.md		commands.md
comparisons.md		comparisons.md
configuration.md		configuration.md
deployment.md		deployment.md
duckdb-iceberg-validation.md		duckdb-iceberg-validation.md
faq.md		faq.md
iceberg-rest-catalog.md		iceberg-rest-catalog.md
kafka-to-iceberg.md		kafka-to-iceberg.md
production-readiness.md		production-readiness.md
quickstart.md		quickstart.md
schema-registry-protobuf.md		schema-registry-protobuf.md
troubleshooting.md		troubleshooting.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

K2I Documentation

Start Here

Implementation Deep Dives

Quick Local Proof

Current Release Scope

FilesExpand file tree

docs

Directory actions

More options

Directory actions

More options

Latest commit

History

docs

Folders and files

parent directory

README.md

K2I Documentation

Start Here

Implementation Deep Dives

Quick Local Proof

Current Release Scope