Commit ebce99f
feat(merge): legacy-prefix promotion path + schema-evolution body cols
Two adversarial-review follow-ups grouped because they share the
streaming engine's input-routing and union-schema seams.
## (b) Legacy-prefix promotion
A new operation type pairs a prefix_len=0 split with prefix_len>0
peers in one merge, so legacy splits can be folded into prefix-
aligned buckets instead of aging out via retention. Adds:
- `ParquetMergeOperation::promote_legacy(splits, target_prefix_len)`: relaxes MP-3 to allow mixed
`rg_partition_prefix_len` as long as every input is `<= target`. Sort_fields + window equality
unchanged.
- `ParquetMergeOperation::target_prefix_len_override: Option<u32>` field records the promotion
target; `None` is the default regular-merge form.
- `merge_parquet_split_metadata(..., mixed_prefix_ok)`: skips the input-side prefix-len equality
check in promotion mode. The output prefix_len still comes from the writer's KV stamp via
`MergeOutputFile.output_rg_partition_prefix_len` (CS-1 holds by construction post-F1).
- `merge::execute_merge_operation(op, sources, ...)`: new thin executor that opens each input as
either `LegacyInputAdapter` (when `split.rg_partition_prefix_len < target`) or
`StreamingParquetReader` (otherwise), then feeds them to the streaming engine. Becomes the seam
PR-7 will wire from above.
Tests:
- `test_promote_legacy_pairs_legacy_with_aligned_peer`, `test_promote_legacy_rejects_higher_prefix_input`,
`test_promote_legacy_still_enforces_sort_fields`, `test_promote_legacy_all_at_target_is_valid`.
- `test_mixed_prefix_ok_skips_input_equality_check`.
- `test_promote_legacy_executor_end_to_end`: legacy single-RG + aligned multi-RG → 3-RG output
passing `assert_unique_rg_prefix_keys` with `prefix_len = 1`, plus metastore CS-1.
- `test_executor_mismatched_sources_count_bails`.
## F6 + F13: Schema evolution for body columns
The merger now supports MC-4 across heterogeneous body-col schemas:
- F6: `normalize_type` collapses `Binary`/`LargeBinary` (and dict variants) to `Binary`, analogous
to the existing string-flavour collapse. Two inputs whose body col differs only by byte-array
flavour merge cleanly; before this they hit a "type conflict" at alignment time.
- F13: `streaming_writer.rs::write_list_via_serialized_column_writer` (renamed from
`..._non_nullable_...`) now handles nullable outer `List<T>` / `LargeList<T>`. MC-4 forces the
union to be nullable when a List col is present in only some inputs; before this the writer
rejected the merged output. Uses Dremel max_def_level = 2 (0 = outer null, 1 = empty list, 2 =
element present) for nullable outer; non-nullable path unchanged.
Test: `test_mc2_mixed_schemas_round_trip` builds two inputs A and B
with the same sort schema but different body cols (Utf8 vs
Dict<Utf8>, LargeBinary vs Binary, List<Float64> in A only, Int32
A-only, Int64 B-only, common Float64). The merge produces the
union schema; per-row rendering via `render_cell` matches across
flavour boundaries; List cells from B render as nulls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 8882c0f commit ebce99f
7 files changed
Lines changed: 938 additions & 116 deletions
File tree
- quickwit
- quickwit-indexing/src/actors/parquet_pipeline
- quickwit-parquet-engine/src
- merge
- policy
- storage
Lines changed: 15 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
234 | 245 | | |
235 | 246 | | |
236 | | - | |
237 | | - | |
238 | | - | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
239 | 251 | | |
240 | 252 | | |
241 | 253 | | |
| |||
Lines changed: 60 additions & 20 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
44 | 50 | | |
45 | 51 | | |
46 | 52 | | |
| 53 | + | |
47 | 54 | | |
48 | 55 | | |
49 | 56 | | |
| |||
93 | 100 | | |
94 | 101 | | |
95 | 102 | | |
96 | | - | |
| 103 | + | |
97 | 104 | | |
98 | 105 | | |
99 | | - | |
| 106 | + | |
| 107 | + | |
100 | 108 | | |
101 | 109 | | |
102 | 110 | | |
| |||
248 | 256 | | |
249 | 257 | | |
250 | 258 | | |
251 | | - | |
| 259 | + | |
252 | 260 | | |
253 | 261 | | |
254 | 262 | | |
| |||
267 | 275 | | |
268 | 276 | | |
269 | 277 | | |
270 | | - | |
| 278 | + | |
271 | 279 | | |
272 | 280 | | |
273 | 281 | | |
| |||
302 | 310 | | |
303 | 311 | | |
304 | 312 | | |
305 | | - | |
| 313 | + | |
306 | 314 | | |
307 | 315 | | |
308 | 316 | | |
| |||
323 | 331 | | |
324 | 332 | | |
325 | 333 | | |
326 | | - | |
| 334 | + | |
327 | 335 | | |
328 | 336 | | |
329 | 337 | | |
| |||
337 | 345 | | |
338 | 346 | | |
339 | 347 | | |
340 | | - | |
| 348 | + | |
341 | 349 | | |
342 | 350 | | |
343 | 351 | | |
344 | 352 | | |
345 | 353 | | |
346 | 354 | | |
347 | 355 | | |
348 | | - | |
| 356 | + | |
349 | 357 | | |
350 | 358 | | |
351 | 359 | | |
| |||
362 | 370 | | |
363 | 371 | | |
364 | 372 | | |
365 | | - | |
| 373 | + | |
366 | 374 | | |
367 | 375 | | |
368 | 376 | | |
| |||
373 | 381 | | |
374 | 382 | | |
375 | 383 | | |
376 | | - | |
| 384 | + | |
377 | 385 | | |
378 | 386 | | |
379 | 387 | | |
| |||
384 | 392 | | |
385 | 393 | | |
386 | 394 | | |
387 | | - | |
| 395 | + | |
388 | 396 | | |
389 | 397 | | |
390 | 398 | | |
| |||
395 | 403 | | |
396 | 404 | | |
397 | 405 | | |
398 | | - | |
| 406 | + | |
399 | 407 | | |
400 | 408 | | |
401 | 409 | | |
| |||
406 | 414 | | |
407 | 415 | | |
408 | 416 | | |
409 | | - | |
| 417 | + | |
410 | 418 | | |
411 | 419 | | |
412 | 420 | | |
| |||
417 | 425 | | |
418 | 426 | | |
419 | 427 | | |
420 | | - | |
| 428 | + | |
421 | 429 | | |
422 | 430 | | |
423 | 431 | | |
| |||
442 | 450 | | |
443 | 451 | | |
444 | 452 | | |
445 | | - | |
| 453 | + | |
446 | 454 | | |
447 | 455 | | |
448 | 456 | | |
| |||
457 | 465 | | |
458 | 466 | | |
459 | 467 | | |
460 | | - | |
| 468 | + | |
461 | 469 | | |
462 | 470 | | |
463 | 471 | | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
464 | 504 | | |
465 | 505 | | |
466 | 506 | | |
| |||
479 | 519 | | |
480 | 520 | | |
481 | 521 | | |
482 | | - | |
| 522 | + | |
483 | 523 | | |
484 | 524 | | |
485 | 525 | | |
| |||
494 | 534 | | |
495 | 535 | | |
496 | 536 | | |
497 | | - | |
| 537 | + | |
498 | 538 | | |
499 | 539 | | |
500 | 540 | | |
| |||
510 | 550 | | |
511 | 551 | | |
512 | 552 | | |
513 | | - | |
| 553 | + | |
514 | 554 | | |
515 | 555 | | |
516 | 556 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
44 | | - | |
45 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
| |||
483 | 485 | | |
484 | 486 | | |
485 | 487 | | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
0 commit comments