Skip to content

Commit b8e25c0

Browse files
committed
add copilot instructions
1 parent 59f8153 commit b8e25c0

File tree

1 file changed

+151
-0
lines changed

1 file changed

+151
-0
lines changed

.github/copilot-instructions.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# GitHub Copilot Instructions for LODA Python Project
2+
3+
## Project Overview
4+
5+
This is a Python implementation of LODA - an assembly language designed for integer sequences. The project enables reading, writing, evaluating, and generating LODA programs using machine learning techniques to discover new integer sequence programs.
6+
7+
## Core Concepts
8+
9+
### LODA Assembly Language
10+
- **Memory Model**: Integer memory cells accessed by index, cell 0 contains input/output
11+
- **Operand Types**:
12+
- Constants: `5`, `-3`
13+
- Direct memory: `$1`, `$2` (value at memory location)
14+
- Indirect memory: `$$1` (value at location pointed to by $1)
15+
- **Operations**: `mov`, `add`, `sub`, `mul`, `div`, `dif`, `mod`, `pow`, `gcd`, `bin`, `cmp`, `min`, `max`, `lpb`, `lpe`
16+
- **Loops**: `lpb $n` starts loop, `lpe` ends loop (counter-based termination)
17+
18+
### Token Encoding for ML
19+
Each operation becomes 3 tokens: `[operation_type, target_operand, source_operand]`
20+
Example: `mov $1,5``["mov", "$1", "5"]`
21+
22+
## Source Code Structure
23+
24+
### Core Language (`loda/lang/`)
25+
- **`operand.py`**: `Operand` class with types CONSTANT, DIRECT, INDIRECT
26+
- **`operation.py`**: `Operation` class representing single assembly instructions
27+
- **`program.py`**: `Program` class containing list of operations, handles parsing
28+
29+
### Runtime System (`loda/runtime/`)
30+
- **`interpreter.py`**: `Interpreter` executes programs with memory management and resource limits
31+
- **`evaluator.py`**: `Evaluator` high-level interface for generating integer sequences
32+
- **`operations.py`**: Implementation of all arithmetic operations
33+
34+
### OEIS Integration (`loda/oeis/`)
35+
- **`sequence.py`**: `Sequence` class with OEIS metadata and b-file loading
36+
- **`program_cache.py`**: `ProgramCache` manages filesystem loading/caching
37+
- **`prefix_index.py`**: `PrefixIndex` enables sequence matching by prefix patterns
38+
39+
### Machine Learning (`loda/ml/`)
40+
- **`util.py`**: Token conversion utilities (program ↔ tokens, merging)
41+
- **`keras/program_generation_rnn.py`**: RNN model for program generation using TensorFlow
42+
43+
### Mining (`loda/mine/`)
44+
- **`miner.py`**: `Miner` searches for programs matching OEIS sequences
45+
46+
## Coding Guidelines
47+
48+
### When working with Programs:
49+
```python
50+
# Always handle parsing errors
51+
try:
52+
program = Program.parse(program_text)
53+
except Exception as e:
54+
# Handle parse error
55+
56+
# Use resource limits for evaluation
57+
interpreter = Interpreter(max_memory=1000, max_stack=10, max_steps=100000)
58+
```
59+
60+
### When working with Operands:
61+
```python
62+
# Check operand types before operations
63+
if operand.type == OperandType.CONSTANT:
64+
value = operand.value
65+
elif operand.type == OperandType.DIRECT:
66+
value = memory[operand.value]
67+
elif operand.type == OperandType.INDIRECT:
68+
value = memory[memory[operand.value]]
69+
```
70+
71+
### When working with ML tokens:
72+
```python
73+
# Convert programs to tokens for ML
74+
from loda.ml.util import program_to_tokens, tokens_to_program
75+
76+
tokens = program_to_tokens(program)
77+
reconstructed = tokens_to_program(tokens)
78+
```
79+
80+
### When working with sequences:
81+
```python
82+
# Always specify term count and handle evaluation errors
83+
evaluator = Evaluator(program, interpreter)
84+
try:
85+
terms = [evaluator(i) for i in range(num_terms)]
86+
except Exception:
87+
# Handle evaluation error (infinite loop, overflow, etc.)
88+
```
89+
90+
## Common Patterns
91+
92+
### Program Evaluation Pattern:
93+
```python
94+
program = Program.parse(program_text)
95+
interpreter = Interpreter()
96+
evaluator = Evaluator(program, interpreter)
97+
sequence_terms = []
98+
for i in range(10):
99+
try:
100+
term = evaluator(i)
101+
sequence_terms.append(term)
102+
except Exception:
103+
break
104+
```
105+
106+
### Caching Pattern:
107+
```python
108+
# Use caches for performance
109+
program_cache = ProgramCache("path/to/programs")
110+
program = program_cache.get_program(sequence_id)
111+
```
112+
113+
### Token Conversion Pattern:
114+
```python
115+
# ML workflow
116+
tokens = program_to_tokens(program)
117+
# Process with ML model
118+
new_tokens = model.generate(tokens)
119+
new_program = tokens_to_program(new_tokens)
120+
```
121+
122+
## Testing Conventions
123+
124+
- Use CSV files in `tests/operations/` for operation test cases
125+
- Sample programs go in `tests/programs/`
126+
- Unit tests follow `test_*.py` naming convention
127+
- Test both valid and invalid inputs for robustness
128+
129+
## Resource Management
130+
131+
Always set appropriate limits:
132+
- `max_memory`: Prevent excessive memory usage
133+
- `max_steps`: Prevent infinite loops
134+
- `max_stack`: Prevent stack overflow in nested loops
135+
- Handle `MemoryError`, `RuntimeError`, and `TimeoutError`
136+
137+
## File Naming and Organization
138+
139+
- Programs: `A######.asm` format (OEIS sequence numbers)
140+
- B-files: `b######.txt` format for sequence terms
141+
- Models: Use descriptive names with hyperparameters
142+
- Use relative paths from project root
143+
144+
## Integration Points
145+
146+
- OEIS database integration via sequence IDs
147+
- TensorFlow/Keras for neural networks
148+
- File system caching for performance
149+
- CSV parsing for test data
150+
151+
Remember: LODA programs are deterministic and should produce consistent integer sequences. Always validate generated programs before use.

0 commit comments

Comments
 (0)