Refactor: Name the implicit pipeline stages — detect, parse, accumulate, finalize, render

## Summary

Today's main flow in `ltl` already forms an implicit 5-stage pipeline, but the stages are unnamed and the boundaries between them are not enforced. Name them — **detect, parse, accumulate, finalize, render** — with explicit entry-point subroutines and clear inter-stage data contracts. The light-touch refactor adds structure without changing behavior.

This is a prerequisite for Issue #23. With named stages in place, Phase 1 of #23 inserts the format registry into the `detect` stage, and Phase 2 adds per-bucket lifecycle hooks inside `finalize` — both become refactors between named stages rather than green-field redesigns.

## Motivation

The current implicit pipeline is in `## MAIN ##` (`ltl:7677`):

| Order | Sub | Lines | Implicit stage |
|---|---|---|---|
| 1 | `read_and_process_logs()` | `ltl:3590-4407` | detect + parse + accumulate (interleaved) |
| 2 | `initialize_empty_time_windows()` | `ltl:7692` | accumulate |
| 3 | `group_similar_messages()` | `ltl:7702` | finalize (consolidation) |
| 4 | `calculate_all_statistics()` | `ltl:4822-5070`, called `ltl:7868` | finalize |
| 5 | `calculate_heatmap_buckets()` | `ltl:4435`, called `ltl:7876` | finalize |
| 6 | `calculate_histogram_buckets()` | `ltl:4552`, called `ltl:7885` | finalize |
| 7 | `normalize_data_for_output()` | `ltl:5590`, called `ltl:7893` | render |
| 8 | `print_bar_graph()` | `ltl:6251`, called `ltl:7913` | render |
| 9 | `print_histograms()` | called `ltl:7914` | render |
| 10 | `write_index_file()` | `ltl:524`, called `ltl:7942` | render (post-output) |

The interleaving in #1 (`read_and_process_logs`) is the part that hides the boundaries — detection, parsing, and accumulation all happen per line in a single tight loop.

## Stages

### detect
Format detection. Today: inlined per-line in `read_and_process_logs()` as 13 cascading regex tests (`ltl:3689-3840`). After this refactor: a named subroutine that runs at the start of each file (or first-N-lines), determines the format, and caches the choice. Falls through to per-line detection in low-confidence cases (this fallback requires the buffered-read architecture from the companion issue).

### parse
Regex match + field extraction for a single line. Takes a line + cached format; emits a structured record (timestamp, message, duration, bytes, count, fields). Today: also inlined in `read_and_process_logs()`.

### accumulate
Push parsed records into bucket structures: `%log_analysis`, `%log_messages`, `%heatmap_raw`, `%histogram_values`, `%log_threadpools`, `%log_sessions`, `%udm_values`. Today: also inlined.

### finalize
Close-of-pipeline computations: `calculate_all_statistics()`, `calculate_heatmap_buckets()`, `calculate_histogram_buckets()`, `group_similar_messages()`. Today: separate subs called sequentially after `read_and_process_logs()` returns.

### render
`normalize_data_for_output()`, `print_bar_graph()`, `print_histograms()`, then `write_index_file()`.

## Scope

- Add named entry-point subroutines: `pipeline_detect()`, `pipeline_parse()`, `pipeline_accumulate()`, `pipeline_finalize()`, `pipeline_render()`.
- Add a top-level dispatcher in `## MAIN ##` that calls them in order.
- Define and document inter-stage data contracts: what each stage receives and emits.
- Move existing logic into the named stages with **zero behavioral change**. Do not restructure intra-stage logic.

## Out of scope

- Per-bucket open/close hooks for sliding window (belongs in #23 Phase 2).
- Replacing the chained-regex detection (belongs in #23 Phase 1).
- Reorganizing intra-stage logic (e.g., splitting `calculate_all_statistics` into smaller pieces).
- Performance optimization beyond what falls out naturally from the refactor.

## Acceptance criteria

- [ ] Five named entry-point subroutines exist with clear contracts.
- [ ] `## MAIN ##` is a thin dispatcher that calls the stages in order.
- [ ] All golden-file tests pass byte-identically (regression suite from #56).
- [ ] Benchmark suite shows no regression beyond noise (compare against current baseline).
- [ ] Stage contracts are documented (inline comments or a short addendum to `docs/staged-processing-pipeline.md`).
- [ ] Pre-existing functions (`calculate_all_statistics`, `calculate_heatmap_buckets`, etc.) become callees of the named stages.

## Why coarse not fine-grained

Finer stage boundaries (e.g., per-bucket open/close hooks for sliding window) belong inside #23 Phase 2. Landing them now would expand scope and risk regression. The 5-stage shape is enough to make Phase 1 (#58) and Phase 2 (#59) into refactors-between-named-stages.

## Dependency of

- #23 — Phase 1 #58 inserts the format registry into `detect`; Phase 2 #59 adds per-bucket lifecycle inside `finalize`. Both become refactors-between-named-stages once this lands.
- #181 — The buffered read pipeline needs a named insertion point between file I/O and `parse` (or between `parse` and `accumulate`); named stages provide that.
- #34 — Two-pass streaming for histogram/heatmap inserts a bound-discovery sub-stage before `accumulate`; named stages give it a clean home.

## Depends on

- None.

## Related

- features/log-format-registry.md (planning session 2026-05-09)
- docs/staged-processing-pipeline.md (architectural template from #96)


Order	Sub	Lines	Implicit stage
1	`read_and_process_logs()`	`ltl:3590-4407`	detect + parse + accumulate (interleaved)
2	`initialize_empty_time_windows()`	`ltl:7692`	accumulate
3	`group_similar_messages()`	`ltl:7702`	finalize (consolidation)
4	`calculate_all_statistics()`	`ltl:4822-5070`, called `ltl:7868`	finalize
5	`calculate_heatmap_buckets()`	`ltl:4435`, called `ltl:7876`	finalize
6	`calculate_histogram_buckets()`	`ltl:4552`, called `ltl:7885`	finalize
7	`normalize_data_for_output()`	`ltl:5590`, called `ltl:7893`	render
8	`print_bar_graph()`	`ltl:6251`, called `ltl:7913`	render
9	`print_histograms()`	called `ltl:7914`	render
10	`write_index_file()`	`ltl:524`, called `ltl:7942`	render (post-output)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Name the implicit pipeline stages — detect, parse, accumulate, finalize, render #180

Summary

Motivation

Stages

detect

parse

accumulate

finalize

render

Scope

Out of scope

Acceptance criteria

Why coarse not fine-grained

Dependency of

Depends on

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Refactor: Name the implicit pipeline stages — detect, parse, accumulate, finalize, render #180

Description

Summary

Motivation

Stages

detect

parse

accumulate

finalize

render

Scope

Out of scope

Acceptance criteria

Why coarse not fine-grained

Dependency of

Depends on

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions