Summary
Today's main flow in ltl already forms an implicit 5-stage pipeline, but the stages are unnamed and the boundaries between them are not enforced. Name them — detect, parse, accumulate, finalize, render — with explicit entry-point subroutines and clear inter-stage data contracts. The light-touch refactor adds structure without changing behavior.
This is a prerequisite for Issue #23. With named stages in place, Phase 1 of #23 inserts the format registry into the detect stage, and Phase 2 adds per-bucket lifecycle hooks inside finalize — both become refactors between named stages rather than green-field redesigns.
Motivation
The current implicit pipeline is in ## MAIN ## (ltl:7677):
| Order |
Sub |
Lines |
Implicit stage |
| 1 |
read_and_process_logs() |
ltl:3590-4407 |
detect + parse + accumulate (interleaved) |
| 2 |
initialize_empty_time_windows() |
ltl:7692 |
accumulate |
| 3 |
group_similar_messages() |
ltl:7702 |
finalize (consolidation) |
| 4 |
calculate_all_statistics() |
ltl:4822-5070, called ltl:7868 |
finalize |
| 5 |
calculate_heatmap_buckets() |
ltl:4435, called ltl:7876 |
finalize |
| 6 |
calculate_histogram_buckets() |
ltl:4552, called ltl:7885 |
finalize |
| 7 |
normalize_data_for_output() |
ltl:5590, called ltl:7893 |
render |
| 8 |
print_bar_graph() |
ltl:6251, called ltl:7913 |
render |
| 9 |
print_histograms() |
called ltl:7914 |
render |
| 10 |
write_index_file() |
ltl:524, called ltl:7942 |
render (post-output) |
The interleaving in #1 (read_and_process_logs) is the part that hides the boundaries — detection, parsing, and accumulation all happen per line in a single tight loop.
Stages
detect
Format detection. Today: inlined per-line in read_and_process_logs() as 13 cascading regex tests (ltl:3689-3840). After this refactor: a named subroutine that runs at the start of each file (or first-N-lines), determines the format, and caches the choice. Falls through to per-line detection in low-confidence cases (this fallback requires the buffered-read architecture from the companion issue).
parse
Regex match + field extraction for a single line. Takes a line + cached format; emits a structured record (timestamp, message, duration, bytes, count, fields). Today: also inlined in read_and_process_logs().
accumulate
Push parsed records into bucket structures: %log_analysis, %log_messages, %heatmap_raw, %histogram_values, %log_threadpools, %log_sessions, %udm_values. Today: also inlined.
finalize
Close-of-pipeline computations: calculate_all_statistics(), calculate_heatmap_buckets(), calculate_histogram_buckets(), group_similar_messages(). Today: separate subs called sequentially after read_and_process_logs() returns.
render
normalize_data_for_output(), print_bar_graph(), print_histograms(), then write_index_file().
Scope
- Add named entry-point subroutines:
pipeline_detect(), pipeline_parse(), pipeline_accumulate(), pipeline_finalize(), pipeline_render().
- Add a top-level dispatcher in
## MAIN ## that calls them in order.
- Define and document inter-stage data contracts: what each stage receives and emits.
- Move existing logic into the named stages with zero behavioral change. Do not restructure intra-stage logic.
Out of scope
Acceptance criteria
Why coarse not fine-grained
Finer stage boundaries (e.g., per-bucket open/close hooks for sliding window) belong inside #23 Phase 2. Landing them now would expand scope and risk regression. The 5-stage shape is enough to make Phase 1 (#58) and Phase 2 (#59) into refactors-between-named-stages.
Dependency of
Depends on
Related
Summary
Today's main flow in
ltlalready forms an implicit 5-stage pipeline, but the stages are unnamed and the boundaries between them are not enforced. Name them — detect, parse, accumulate, finalize, render — with explicit entry-point subroutines and clear inter-stage data contracts. The light-touch refactor adds structure without changing behavior.This is a prerequisite for Issue #23. With named stages in place, Phase 1 of #23 inserts the format registry into the
detectstage, and Phase 2 adds per-bucket lifecycle hooks insidefinalize— both become refactors between named stages rather than green-field redesigns.Motivation
The current implicit pipeline is in
## MAIN ##(ltl:7677):read_and_process_logs()ltl:3590-4407initialize_empty_time_windows()ltl:7692group_similar_messages()ltl:7702calculate_all_statistics()ltl:4822-5070, calledltl:7868calculate_heatmap_buckets()ltl:4435, calledltl:7876calculate_histogram_buckets()ltl:4552, calledltl:7885normalize_data_for_output()ltl:5590, calledltl:7893print_bar_graph()ltl:6251, calledltl:7913print_histograms()ltl:7914write_index_file()ltl:524, calledltl:7942The interleaving in #1 (
read_and_process_logs) is the part that hides the boundaries — detection, parsing, and accumulation all happen per line in a single tight loop.Stages
detect
Format detection. Today: inlined per-line in
read_and_process_logs()as 13 cascading regex tests (ltl:3689-3840). After this refactor: a named subroutine that runs at the start of each file (or first-N-lines), determines the format, and caches the choice. Falls through to per-line detection in low-confidence cases (this fallback requires the buffered-read architecture from the companion issue).parse
Regex match + field extraction for a single line. Takes a line + cached format; emits a structured record (timestamp, message, duration, bytes, count, fields). Today: also inlined in
read_and_process_logs().accumulate
Push parsed records into bucket structures:
%log_analysis,%log_messages,%heatmap_raw,%histogram_values,%log_threadpools,%log_sessions,%udm_values. Today: also inlined.finalize
Close-of-pipeline computations:
calculate_all_statistics(),calculate_heatmap_buckets(),calculate_histogram_buckets(),group_similar_messages(). Today: separate subs called sequentially afterread_and_process_logs()returns.render
normalize_data_for_output(),print_bar_graph(),print_histograms(), thenwrite_index_file().Scope
pipeline_detect(),pipeline_parse(),pipeline_accumulate(),pipeline_finalize(),pipeline_render().## MAIN ##that calls them in order.Out of scope
calculate_all_statisticsinto smaller pieces).Acceptance criteria
## MAIN ##is a thin dispatcher that calls the stages in order.docs/staged-processing-pipeline.md).calculate_all_statistics,calculate_heatmap_buckets, etc.) become callees of the named stages.Why coarse not fine-grained
Finer stage boundaries (e.g., per-bucket open/close hooks for sliding window) belong inside #23 Phase 2. Landing them now would expand scope and risk regression. The 5-stage shape is enough to make Phase 1 (#58) and Phase 2 (#59) into refactors-between-named-stages.
Dependency of
detect; Phase 2 Issue #23 Phase 2: Sliding window deferred-per-bucket processing #59 adds per-bucket lifecycle insidefinalize. Both become refactors-between-named-stages once this lands.parse(or betweenparseandaccumulate); named stages provide that.accumulate; named stages give it a clean home.Depends on
Related