|
| 1 | +# GitHub Copilot Instructions for LODA Python Project |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +This is a Python implementation of LODA - an assembly language designed for integer sequences. The project enables reading, writing, evaluating, and generating LODA programs using machine learning techniques to discover new integer sequence programs. |
| 6 | + |
| 7 | +## Core Concepts |
| 8 | + |
| 9 | +### LODA Assembly Language |
| 10 | +- **Memory Model**: Integer memory cells accessed by index, cell 0 contains input/output |
| 11 | +- **Operand Types**: |
| 12 | + - Constants: `5`, `-3` |
| 13 | + - Direct memory: `$1`, `$2` (value at memory location) |
| 14 | + - Indirect memory: `$$1` (value at location pointed to by $1) |
| 15 | +- **Operations**: `mov`, `add`, `sub`, `mul`, `div`, `dif`, `mod`, `pow`, `gcd`, `bin`, `cmp`, `min`, `max`, `lpb`, `lpe` |
| 16 | +- **Loops**: `lpb $n` starts loop, `lpe` ends loop (counter-based termination) |
| 17 | + |
| 18 | +### Token Encoding for ML |
| 19 | +Each operation becomes 3 tokens: `[operation_type, target_operand, source_operand]` |
| 20 | +Example: `mov $1,5` → `["mov", "$1", "5"]` |
| 21 | + |
| 22 | +## Source Code Structure |
| 23 | + |
| 24 | +### Core Language (`loda/lang/`) |
| 25 | +- **`operand.py`**: `Operand` class with types CONSTANT, DIRECT, INDIRECT |
| 26 | +- **`operation.py`**: `Operation` class representing single assembly instructions |
| 27 | +- **`program.py`**: `Program` class containing list of operations, handles parsing |
| 28 | + |
| 29 | +### Runtime System (`loda/runtime/`) |
| 30 | +- **`interpreter.py`**: `Interpreter` executes programs with memory management and resource limits |
| 31 | +- **`evaluator.py`**: `Evaluator` high-level interface for generating integer sequences |
| 32 | +- **`operations.py`**: Implementation of all arithmetic operations |
| 33 | + |
| 34 | +### OEIS Integration (`loda/oeis/`) |
| 35 | +- **`sequence.py`**: `Sequence` class with OEIS metadata and b-file loading |
| 36 | +- **`program_cache.py`**: `ProgramCache` manages filesystem loading/caching |
| 37 | +- **`prefix_index.py`**: `PrefixIndex` enables sequence matching by prefix patterns |
| 38 | + |
| 39 | +### Machine Learning (`loda/ml/`) |
| 40 | +- **`util.py`**: Token conversion utilities (program ↔ tokens, merging) |
| 41 | +- **`keras/program_generation_rnn.py`**: RNN model for program generation using TensorFlow |
| 42 | + |
| 43 | +### Mining (`loda/mine/`) |
| 44 | +- **`miner.py`**: `Miner` searches for programs matching OEIS sequences |
| 45 | + |
| 46 | +## Coding Guidelines |
| 47 | + |
| 48 | +### When working with Programs: |
| 49 | +```python |
| 50 | +# Always handle parsing errors |
| 51 | +try: |
| 52 | + program = Program.parse(program_text) |
| 53 | +except Exception as e: |
| 54 | + # Handle parse error |
| 55 | + |
| 56 | +# Use resource limits for evaluation |
| 57 | +interpreter = Interpreter(max_memory=1000, max_stack=10, max_steps=100000) |
| 58 | +``` |
| 59 | + |
| 60 | +### When working with Operands: |
| 61 | +```python |
| 62 | +# Check operand types before operations |
| 63 | +if operand.type == OperandType.CONSTANT: |
| 64 | + value = operand.value |
| 65 | +elif operand.type == OperandType.DIRECT: |
| 66 | + value = memory[operand.value] |
| 67 | +elif operand.type == OperandType.INDIRECT: |
| 68 | + value = memory[memory[operand.value]] |
| 69 | +``` |
| 70 | + |
| 71 | +### When working with ML tokens: |
| 72 | +```python |
| 73 | +# Convert programs to tokens for ML |
| 74 | +from loda.ml.util import program_to_tokens, tokens_to_program |
| 75 | + |
| 76 | +tokens = program_to_tokens(program) |
| 77 | +reconstructed = tokens_to_program(tokens) |
| 78 | +``` |
| 79 | + |
| 80 | +### When working with sequences: |
| 81 | +```python |
| 82 | +# Always specify term count and handle evaluation errors |
| 83 | +evaluator = Evaluator(program, interpreter) |
| 84 | +try: |
| 85 | + terms = [evaluator(i) for i in range(num_terms)] |
| 86 | +except Exception: |
| 87 | + # Handle evaluation error (infinite loop, overflow, etc.) |
| 88 | +``` |
| 89 | + |
| 90 | +## Common Patterns |
| 91 | + |
| 92 | +### Program Evaluation Pattern: |
| 93 | +```python |
| 94 | +program = Program.parse(program_text) |
| 95 | +interpreter = Interpreter() |
| 96 | +evaluator = Evaluator(program, interpreter) |
| 97 | +sequence_terms = [] |
| 98 | +for i in range(10): |
| 99 | + try: |
| 100 | + term = evaluator(i) |
| 101 | + sequence_terms.append(term) |
| 102 | + except Exception: |
| 103 | + break |
| 104 | +``` |
| 105 | + |
| 106 | +### Caching Pattern: |
| 107 | +```python |
| 108 | +# Use caches for performance |
| 109 | +program_cache = ProgramCache("path/to/programs") |
| 110 | +program = program_cache.get_program(sequence_id) |
| 111 | +``` |
| 112 | + |
| 113 | +### Token Conversion Pattern: |
| 114 | +```python |
| 115 | +# ML workflow |
| 116 | +tokens = program_to_tokens(program) |
| 117 | +# Process with ML model |
| 118 | +new_tokens = model.generate(tokens) |
| 119 | +new_program = tokens_to_program(new_tokens) |
| 120 | +``` |
| 121 | + |
| 122 | +## Testing Conventions |
| 123 | + |
| 124 | +- Use CSV files in `tests/operations/` for operation test cases |
| 125 | +- Sample programs go in `tests/programs/` |
| 126 | +- Unit tests follow `test_*.py` naming convention |
| 127 | +- Test both valid and invalid inputs for robustness |
| 128 | + |
| 129 | +## Resource Management |
| 130 | + |
| 131 | +Always set appropriate limits: |
| 132 | +- `max_memory`: Prevent excessive memory usage |
| 133 | +- `max_steps`: Prevent infinite loops |
| 134 | +- `max_stack`: Prevent stack overflow in nested loops |
| 135 | +- Handle `MemoryError`, `RuntimeError`, and `TimeoutError` |
| 136 | + |
| 137 | +## File Naming and Organization |
| 138 | + |
| 139 | +- Programs: `A######.asm` format (OEIS sequence numbers) |
| 140 | +- B-files: `b######.txt` format for sequence terms |
| 141 | +- Models: Use descriptive names with hyperparameters |
| 142 | +- Use relative paths from project root |
| 143 | + |
| 144 | +## Integration Points |
| 145 | + |
| 146 | +- OEIS database integration via sequence IDs |
| 147 | +- TensorFlow/Keras for neural networks |
| 148 | +- File system caching for performance |
| 149 | +- CSV parsing for test data |
| 150 | + |
| 151 | +Remember: LODA programs are deterministic and should produce consistent integer sequences. Always validate generated programs before use. |
0 commit comments