A PyTorch implementation of a GPT-style language model, built from scratch for educational purposes and scalable usage.
This project demonstrates how transformer-based language models can be trained, evaluated, and deployed.
- Minimal, modular PyTorch implementation of GPT
- Configurable hyperparameters (
n_embd,n_layer,n_head, etc.) - Training loop with evaluation and checkpoint saving
- Hugging Face integration for easy upload and inference
- Colab notebook for quick experimentation
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer from Hugging Face Hub
model = AutoModelForCausalLM.from_pretrained('your_repo_id', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('your_repo_id')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
print("Generating text...")
# Start with BOS token
context = torch.tensor([[tokenizer.bos_token_id]], dtype=torch.long, device=device)
# Generate sequence
generated_ids = model.generate(context, max_new_tokens=256)[0].tolist()
print(tokenizer.decode(generated_ids))GPT/
├── configs/ # Configuration files (config.py)
├── data/ # Data loading and preprocessing (dataset.py)
├── model/ # Model definition (model.py)
├── trainer/ # Training loop and saving logic (trainer.py)
└── main.py # Entry point for training and generation
-
Clone the repository:
git clone https://github.com/OE-Void/GPT.git cd GPT -
Install dependencies:
pip install -r requirements.txt
To train the model:
python -m GPT.mainThe trained model will be saved in the my_model directory.
Edit GPT/configs/config.py to adjust hyperparameters such as:
n_embd→ embedding dimension sizen_layer→ number of transformer layersn_head→ number of attention headsblock_size→ maximum sequence lengthbatch_size→ training batch size
Pull requests are welcome! For major changes, please open an issue first to discuss what you’d like to change.
This project is licensed under the MIT License — see the LISENCE file for details.