A high-performance translation API supporting 200+ languages using Meta's NLLB-200 model. Features optimized INT8 inference (600MB), event-driven architecture with Celery/Redis, and production-ready Docker deployment.
- ⚡ Fast & Lightweight: CTranslate2 INT8 quantization (4x faster, 2.5GB → 600MB)
- 🔄 Asynchronous Processing: Celery + Redis for non-blocking operations
- 🚀 Production Ready: Dockerized, Swagger docs, zero cold-start with singleton pattern
- 🌐 200+ Languages: English, Arabic, French, Chinese, Spanish, and more
Run the pre-built containers directly from Docker Hub:
# Clone the repo
git clone https://github.com/Mohammed2372/Translation-API.git
cd Translation-API
# Pull and run the complete stack
docker-compose -f docker-compose.prod.yml upThis will automatically pull:
mohammed237/translation-api:webv1(Django API)mohammed237/translationapi:workerv1(Celery worker with model)redis:7-alpine(official Redis image)
Access API: http://localhost:8000/api/docs/
- Docker & Docker Compose OR Python 3.10+ & Redis
Option A - Use Kaggle Notebook (5 min):
- Open: Model Preparation Notebook
- Click "Copy and Edit" → Run all cells
- Download
translator_model.zip
Option B - Run Locally:
jupyter notebook notebooks/quantize_translator_model.ipynb
# test model
python test_translator_model.py/translationapi
├── model/
│ └── translator_model/ <-- Extract here
│ ├── model.bin
│ ├── config.json
│ └── shared_vocabulary.json
# With Docker
docker-compose up --build
# OR Manually (2 terminals)
# Terminal 1
python manage.py runserver
# Terminal 2
celery -A core worker -l info # Add -P solo on WindowsEndpoint: POST /api/translate/
Request:
{
"text": "Hello, how are you?",
"source": "eng_Latn",
"target": "arb_Arab"
}Response:
{
"status": "completed",
"result": {
"translated": "مرحبا، كيف حالك؟",
"original": "Hello, how are you?"
}
}Interactive Docs: http://localhost:8000/api/docs/
graph LR
A[Client] -->|HTTP| B[Django API]
B -->|Task| C[Redis Queue]
C --> D[Celery Worker]
D -->|INT8 Model| E[CTranslate2]
Components:
- Django API: Request validation, task dispatch
- Redis: Message queue, traffic buffering
- Celery Worker: Background AI processing (model loaded once)
- CTranslate2: Optimized INT8 inference engine
Standard PyTorch is slow and heavy. CTranslate2 provides:
- 4x faster inference via C++ optimization
- INT8 quantization: 2.5GB → 600MB
- CPU-friendly (no GPU needed)
Model loads once at worker startup (translationapi/ai_loader.py), eliminating 5+ second cold starts per request.
API waits up to 10 seconds for results (feels real-time), then gracefully returns task ID for async checks under heavy load.
translationapi/
├── translationapi/
│ ├── views.py # API endpoints
│ ├── tasks.py # Celery tasks
│ └── ai_loader.py # Model singleton
├── model/
│ └── translator_model/ # INT8 model (600MB)
├── notebooks/
│ ├── quantize_translator_model.ipynb
│ └── test_translator_model.py # test model if working
├── docker-compose.yml # Build images
├── docker-compose.prod.yml # Production with Docker Hub images
└── requirements.txt
| Metric | Value |
|---|---|
| Model Size | 600MB (INT8) |
| Inference | ~200ms/sentence (CPU) |
| Memory | ~800MB total |
| Throughput | 100+ req/min |
- Kaggle Notebook: quantize-translator-model
- Docker Hub:
- Issues: Report Bug