Skip to content

decision tree codegen and projection abstraction#1174

Merged
sbillig merged 27 commits intoargotorg:masterfrom
micahscopes:decision-tree-codegen
Dec 18, 2025
Merged

decision tree codegen and projection abstraction#1174
sbillig merged 27 commits intoargotorg:masterfrom
micahscopes:decision-tree-codegen

Conversation

@micahscopes
Copy link
Collaborator

@micahscopes micahscopes commented Dec 5, 2025

This PR introduces a projection abstraction for navigating into data structures. Field access and pattern matching now work via semantic paths rather than scattered offset calculations.

What changed

The projection system flows through the whole pipeline:

  • HIR has a generic Projection<Ty, Var, Idx> vocabulary used by the decision tree
  • MIR uses Place { base, projection } for both field access and match lowering
  • Codegen walks projections through one centralized pathway (lower_place_address) to emit pointer arithmetic

This replaces the old approach of calling GetField/GetVariantField helpers everywhere, which made it hard to reason about layout correctness.

Changes / To-do

  • Generic projection module (hir/src/projection.rs) with Projection<Ty, Var, Idx>
  • HIR uses Idx = Infallible so pattern matching can't construct dynamic indexing
  • Decision tree --> nested MIR switches
  • MIR Place { base, projection, address_space } with PlaceLoad/PlaceRef
  • Field access uses Place abstraction
  • Single codegen choke point (lower_place_address)
  • Layout centralized in HIR with snapshot tests
  • Shared match temp (one let tmp := 0 for nested switches)
  • Direct memory loads with masking/signextend
  • Nested enum bindings work end-to-end (match_nested_enum.fe)
  • Fix off-by-one in field_offset_bytes / variant_field_offset_bytes
  • Make "unknown size --> 32 bytes" explicit in layout API instead of unwrap_or(32) in codegen
  • MIR should route default --> wildcard leaf explicitly (codegen shouldn't rediscover it)
  • Add fixtures for wildcard + nested defaults

Future directions

Near-term

  • storage_size_bytes / field_storage_size_bytes in layout API
    • make the "unknown types are 32-byte words" rule explicit rather than scattered unwrap_or(32)

Medium-term

  • Index/Deref projection variants (for arrays, references)
  • may_alias() on projection paths, needed for SROA, escape analysis
  • Span tracking for editability (i.e. IDE refactoring support)
  • Editable/EditablePath traits (stubbed out, not yet implemented)
  • HIR semantic API: FieldAccessView::as_projection()
    • unify field access tracking with projection vocabulary
    • investigate consolidating with broader HIR semantic traversal API
    • projection abstraction could be super useful for e.g. the doc engine / language server

Long-term

  • On-demand graph view of MIR for optimization passes (e.g. SROA, escape analysis)
  • Target-agnostic MIR
    • remove AddressSpaceKind from Place, add explicit Alloca
    • let codegen decide memory vs storage based on target
  • Producing sea of nodes and other similar graph representations i.e. projections as graph edges rather than address-computation instructions

@micahscopes micahscopes force-pushed the decision-tree-codegen branch 3 times, most recently from cf916d4 to 6a4d0b6 Compare December 6, 2025 21:30
- Apply LoadableScalar conversions (masking, sign-ext) for PlaceLoad
- Add defensive fallbacks for empty/missing field type info
- Fix merge_block return consistency for terminating arms
- Remove debug prints, add proper error handling for missing body
- Include variant index in hash_place for dedup safety
- Convert defensive fallbacks in lower_place_address to hard YulError
  failures for any field_types query returning empty or out-of-bounds
- Error on target field access when field doesn't exist in type info
- Include enum_ty in hash_place via pretty_print to preserve type
  system semantics during MIR function deduplication
- Improve comment for Absent patterns explaining the design decision
@micahscopes micahscopes force-pushed the decision-tree-codegen branch from 6a4d0b6 to 0b712b4 Compare December 15, 2025 18:54
- Add generic Projection/ProjectionPath types with Infallible for HIR index slots
- Refactor match lowering to use Place with projections (defer offsets to codegen)
- Extract shared layout module eliminating duplicate ty_size_bytes implementations
- Inline field loads in codegen, removing helper function indirection
@micahscopes micahscopes changed the title decision tree codegen decision tree codegen and projection abstraction Dec 15, 2025
@micahscopes
Copy link
Collaborator Author

@sbillig I just realized you also have a projection abstraction in #1183 😅

I'm working to make this one dovetail with yours

@micahscopes micahscopes force-pushed the decision-tree-codegen branch from c2fe225 to b6f6763 Compare December 16, 2025 05:38
@sbillig
Copy link
Collaborator

sbillig commented Dec 16, 2025

@micahscopes What's the status on this? Reviewable? I'm wrapping up the contract lowering, and would like to rebase on this if you don't anticipate big changes.

@micahscopes micahscopes marked this pull request as ready for review December 17, 2025 00:09
@micahscopes
Copy link
Collaborator Author

@sbillig no big changes planned on my end

@sbillig
Copy link
Collaborator

sbillig commented Dec 17, 2025

Code review

Found 1 issue:

  1. Missing insert_place_expr call in default arm binding emission. The regular switch cases (lines 345-350) cache the place expression with insert_place_expr, but the default arm (lines 402-406) does not include this caching. This inconsistency means bindings in default arms won't have their place expressions cached, which could break future reference-semantics codegen for default arm bindings.

// Emit decision tree bindings (handles tuple/struct/enum patterns uniformly).
for binding in &arm.decision_tree_bindings {
let load_expr = self.lower_value(binding.value, &default_state)?;
let temp_name = default_state.alloc_local();
arm_docs.push(YulDoc::line(format!("let {temp_name} := {load_expr}")));
default_state.insert_binding(binding.name.clone(), temp_name);
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@sbillig
Copy link
Collaborator

sbillig commented Dec 17, 2025

More input from Claude:

When computing the byte offset of a field in a struct/tuple, there are two approaches in the codebase:

  1. Centralized API (field_offset_bytes_or_word_aligned): When ANY preceding field has an unknown size (like an enum), it falls back to word-aligned layout: field_idx * 32 bytes. This ensures all fields start at word boundaries.
  2. Manual calculation in lower_place_address(): Sums individual field sizes using ty_size_bytes_or_word(), which returns 32 bytes per unknown field.

Why this matters:

Consider a struct { a: MyEnum, b: u8 } where MyEnum has unknown size:

Approach Field a offset Field b offset
Centralized API 0 32 (1 * 32)
Manual calculation 0 32 (sum of a's 32)

These happen to match here. But for { a: u8, b: MyEnum, c: u16 }:

Approach Field c offset
Centralized API 64 (2 * 32, because field b is unknown)
Manual calculation 1 + 32 = 33 (sum of actual sizes with 32-byte fallback per unknown)

@micahscopes
Copy link
Collaborator Author

@sbillig those are good catches, thank you

Both should be fixed now. I also found one more area of duplicated logic in this API stub I'd added and tightened it up. I also added comments about the stubs, feel free to keep / delete them in your followup stuff.

@micahscopes micahscopes mentioned this pull request Dec 17, 2025
Copy link
Collaborator

@sbillig sbillig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sbillig sbillig merged commit a23b484 into argotorg:master Dec 18, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants