A compact, educational codebase for building and training a Transformer language model end‑to‑end:
from byte-pair encoding (BPE) tokenization, through a minimal Transformer implementation
with RoPE and RMSNorm, to training, evaluation, and text generation. The repo is organized
for readability and testability, with lightweight dependencies managed via uv.
- Tokenizer (BPE): Train a vocabulary and merges, tokenize raw text, and compute bytes-per-token.
- Model: Attention, RoPE, RMSNorm, FFN, embeddings, and a Transformer LM composed from simple modules.
- Training: Scripts for training, checkpointing, logging, and generation.
- Optimizers: SGD, AdamW, schedulers, and gradient clipping.
- Utilities: Reproducible env via
uv, logging config, and checkpoint serialization. - Tests: Unit tests and reference snapshots to validate each component.
# 1) Ensure uv is installed (https://github.com/astral-sh/uv)
uv --version
# 2) Run inside the managed environment
uv run python -V
# 3) Run tests
uv run pytestEnvironment is fully managed by uv, so dependencies are solved and cached automatically.
You can run any entry point with:
uv run <python_or_module_path>For end‑to‑end commands (data download, BPE training, dataset tokenization, model training, and generation), see the detailed instructions in USAGE.md. That document contains copy‑pasteable commands and options.
LM_training/
data_load/ # dataset loading
nn/ # core neural modules and optimizers
functional.py
modules/
attention.py, rope.py, rmsnorm.py, ffn.py, transformer.py, embedding.py, linear.py
optim/
adamw.py, sgd.py, scheduler.py, clipping.py
tokenizer/ # BPE training and CLI tools
bpe/
cli/
scripts/
train.py # training entry point
utils/
checkpointing.py, logging_config.py
tests/ # unit tests and snapshots
uv run pytestIf you’re implementing components from scratch, some tests may initially fail with NotImplementedError.
Hook your implementation via the adapter functions in tests/adapters.py.
TinyStories and a small OpenWebText sample are used in examples. For curated, step‑by‑step data download and preprocessing instructions, see USAGE.md.
- Designed for clarity over raw performance; suitable for learning, prototyping, and experimentation.
- Reproducible environments with
uv(seeuv.lock).
For full workflows, tunables, and CLI examples, head to USAGE.md.