MLE-Leetcode is a comprehensive collection of 103 production-grade coding questions designed to prepare you for top-tier Machine Learning Engineer (MLE) interviews.
Unlike traditional LeetCode, this repository focuses on Large Language Models (LLMs), Transformers, and Multimodal AI systems. These are not toy problems—they are simplified versions of real-world engineering challenges faced at companies like OpenAI, Google, and Meta.
graph LR
A[Start] --> B{Choose Module};
B --> C[Read question.md];
C --> D[Implement solution.py];
D --> E[Run Public Tests];
E -- Pass --> F[Run Private Evals];
E -- Fail --> D;
F -- Pass --> G[Mastered!];
F -- Fail --> D;
style A fill:#4CAF50,stroke:#333,stroke-width:2px;
style G fill:#4CAF50,stroke:#333,stroke-width:2px;
style E fill:#2196F3,stroke:#333,stroke-width:2px;
style F fill:#9C27B0,stroke:#333,stroke-width:2px;
The curriculum is organized into 9 Modules covering the full spectrum of modern AI engineering.
Module 1: HuggingFace Transformers Engineering (Q02-Q14)
Focus: Practical engineering with the HuggingFace ecosystem. Model loading, tokenizer customization, training optimization.
Module 2: Attention & Transformer Core (Q15-Q26)
Focus: Hand-written implementations of core components from scratch to understand the mathematics and logic.
Module 3: RoPE & Positional Encoding (Q27-Q31, Q75-Q83)
Focus: Mastering Rotary Positional Embeddings (RoPE), ALiBi, and long-context strategies.
| ID | Topic |
|---|---|
| Q27 | Advanced Multi Head Attention |
| Q28 | Transformer Normalization Strategies |
| Q29 | RoPE Rotary Position Embedding |
| Q30 | RoPE With Position Offset |
| Q31 | RoPE Context Extension Scaling |
| Q75-Q83 | Advanced RoPE Variants & Numerical Stability |
Module 4: KV Cache & Inference Optimization (Q32-Q37, Q84-Q89)
Focus: Efficient inference, memory management, PagedAttention, and KV cache optimizations.
| ID | Topic |
|---|---|
| Q32 | ALiBi Attention Bias |
| Q33 | RoPE Shape Bug Debugging |
| Q34 | Advanced RoPE Implementation |
| Q35 | Positional Encoding Comparison |
| Q36 | Paged Attention Simplified |
| Q37 | KV Cache Memory Estimator |
| Q84-Q89 | KV Cache Quantization, Streaming, Compression |
Module 5: Sampling, Decoding & Evaluation (Q38-Q42, Q90-Q94)
Focus: Decoding strategies like Beam Search, Top-k/Top-p, Speculative Sampling.
| ID | Topic |
|---|---|
| Q38 | Top K Top P Sampling |
| Q39 | Repetition Penalty |
| Q40 | Beam Search Length Penalty |
| Q41 | Constrained Decoding |
| Q42 | Perplexity Packed Sequences |
| Q90-Q94 | Diverse Beam Search, Speculative Sampling, Token Healing |
Module 6: Training Engineering (Q43-Q50)
Focus: Distributed training, Mixed Precision (AMP), Checkpointing, FSDP/ZeRO concepts.
Module 7: Multimodal Modeling (Q51-Q64, Q95-Q104)
Focus: Vision-Language Models (VLM), Audio, Video, CLIP, LLaVA, Adapters.
| ID | Topic |
|---|---|
| Q51 | ViT Patch Embedding |
| Q52 | CLIP Contrastive Loss |
| Q53 | Vision Projector |
| Q54 | Image Token Inserter |
| Q55 | SigLIP Similarity Loss |
| Q56 | LLaVA Fusion |
| Q57-Q64 | Cross Attention Adapters, Q-Former, Video/Audio Processing |
| Q95-Q104 | Advanced Multimodal: 3D, Video Temporal Modeling, Cross-Modal Retrieval |
Module 8: Data Engineering (Q65-Q69)
Focus: Data mixing, deduplication, quality filtering, and packing strategies.
| ID | Topic |
|---|---|
| Q65 | Multi Source Data Mixer |
| Q66 | Online Deduplication |
| Q67 | Quality Filtering Scoring |
| Q68 | Sample Packing Collator |
| Q69 | Safety Compliance Filtering |
Module 9: Debugging & Consistency (Q70-Q74)
Focus: Real-world bug hunting—fixing deadlocks, silent failures, and numerical instability.
| ID | Topic |
|---|---|
| Q70 | Attention Mask Bug |
| Q71 | RoPE Position Offset Bug |
| Q72 | Dataloader Resume Bug |
| Q73 | Multi GPU Deadlock |
| Q74 | Mixed Precision NaN |
pip install torch transformers accelerate pytest datasets- Select a Module: Pick a topic you want to master.
- Read: Go to the folder (e.g.,
Q15_Scaled_Dot_Product_Attention) and readquestion.md. - Code: Write your implementation in a new file or modify
solution.py(if practicing blind). - Test: Run the public test suite.
# Example: Testing your attention implementation
cd questions/Module_2_Attention_Transformer_Core/Q15_Scaled_Dot_Product_Attention
pytest test_case_public.py -v- Evaluate: For a deeper check, run the private evaluation script.
python eval_script_private.pyContributions are welcome! If you have a new interview question idea or want to improve an existing solution:
- Fork the repo.
- Create a branch for your feature.
- Submit a Pull Request.
This project is licensed under the CC BY-SA 4.0 license.
Designed to bridge the gap between theory and production engineering. Special thanks to the open-source community for the simplified model components used as references.
