Steps to reproduce the behavior (Required)
Against any external Iceberg catalog with a partitioned table that has undergone rewrite_manifests / compactManifests (any normal housekeeping flow):
-- Iceberg table with partitions p_a, p_b. Files of p_a were added in snapshot S1,
-- files of p_b in S2. Later, a maintenance snapshot S_M rewrites both manifests
-- (no data change).
SELECT partition_value, last_updated_at, last_updated_snapshot_id
FROM iceberg_cat.db.tbl$iceberg_partitions_table;
Expected behavior (Required)
Per Iceberg's PARTITIONS metadata schema, these columns should be "commit time / id of snapshot that last updated this partition" — i.e. the snapshot in which a data or delete file was added/removed for that partition. The same query against Spark/Flink returns:
p_a | <ts of S1> | <S1>
p_b | <ts of S2> | <S2>
last_updated_snapshot_id has existed in Apache Iceberg's PARTITIONS metadata since iceberg 1.4.0 (apache/iceberg#7581) — StarRocks runs iceberg 1.10.0, so the column should be exposed in iceberg_partitions_table.
Real behavior (Required)
Two bugs:
-
last_updated_at granularity is wrong. IcebergPartitionsTableScanner resolves last_updated_at from ManifestFile.snapshotId() (the snapshot that wrote the manifest), not from ManifestEntry.snapshotId() (the snapshot that added the file). After manifest rewrite, every partition that lived in a rewritten manifest reports the maintenance snapshot's timestamp:
p_a | <ts of S_M> | (missing column)
p_b | <ts of S_M> | (missing column)
Information about when each partition's data actually last changed is lost.
-
last_updated_snapshot_id is missing entirely from iceberg_partitions_table schema, even though:
- the iceberg
PARTITIONS metadata table exposes it since iceberg 1.4.0;
IcebergPartitionsTableScanner already reads via ManifestReader, so the per-entry snapshot id is available.
A user writing portable queries against $iceberg_partitions_table cannot get the same answer they would get from Spark/Flink.
The root cause for both is that IcebergPartitionsTableScanner iterates ContentFile rather than ManifestEntry. Iceberg's own org.apache.iceberg.PartitionsTable.update() iterates entries and resolves the snapshot per-entry, so manifest rewrites do not lose history.
StarRocks version (Required)
Reproducible on current main (verified against b91aafbcc5f). Iceberg dependency: 1.10.0 (fe/pom.xml:82, java-extensions/pom.xml:38).
Fix proposed in #73307.
Steps to reproduce the behavior (Required)
Against any external Iceberg catalog with a partitioned table that has undergone
rewrite_manifests/compactManifests(any normal housekeeping flow):Expected behavior (Required)
Per Iceberg's
PARTITIONSmetadata schema, these columns should be "commit time / id of snapshot that last updated this partition" — i.e. the snapshot in which a data or delete file was added/removed for that partition. The same query against Spark/Flink returns:last_updated_snapshot_idhas existed in Apache Iceberg'sPARTITIONSmetadata since iceberg 1.4.0 (apache/iceberg#7581) — StarRocks runs iceberg 1.10.0, so the column should be exposed iniceberg_partitions_table.Real behavior (Required)
Two bugs:
last_updated_atgranularity is wrong.IcebergPartitionsTableScannerresolveslast_updated_atfromManifestFile.snapshotId()(the snapshot that wrote the manifest), not fromManifestEntry.snapshotId()(the snapshot that added the file). After manifest rewrite, every partition that lived in a rewritten manifest reports the maintenance snapshot's timestamp:Information about when each partition's data actually last changed is lost.
last_updated_snapshot_idis missing entirely fromiceberg_partitions_tableschema, even though:PARTITIONSmetadata table exposes it since iceberg 1.4.0;IcebergPartitionsTableScanneralready reads viaManifestReader, so the per-entry snapshot id is available.A user writing portable queries against
$iceberg_partitions_tablecannot get the same answer they would get from Spark/Flink.The root cause for both is that
IcebergPartitionsTableScanneriteratesContentFilerather thanManifestEntry. Iceberg's ownorg.apache.iceberg.PartitionsTable.update()iterates entries and resolves the snapshot per-entry, so manifest rewrites do not lose history.StarRocks version (Required)
Reproducible on current
main(verified againstb91aafbcc5f). Iceberg dependency: 1.10.0 (fe/pom.xml:82,java-extensions/pom.xml:38).Fix proposed in #73307.