Skip to content

SergiuDeveloper/SergiuDeveloper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Sergiu Nistor

AI Research engineer focused on deep learning and scalable ML infrastructure. I'm drawn to the unsolved parts of the field, where the science is still being written and the engineering hasn't caught up yet, and building the systems that make those ideas real.

Website

LinkedIn

Calendly

Substack


Projects

Project Description
yoro-full-pretraining Novel LLM architecture (YORO) where the reasoning block runs once at prefill and is reused across all token generation steps - O(1) reasoning cost vs O(T). Pretrained from scratch on 10B tokens with 8×H100s and DeepSpeed ZeRO
yoro-finetuning Fine-tuning stage for YORO: freeze the reasoning block, train lightweight adaptation/compensation/concatenation subnets via knowledge distillation with temperature-scaled soft labels
llm-layer-prefetch Pipelined layer-streaming system enabling full LLM inference at a fraction of the model's VRAM footprint - disk, CPU, and GPU transfers overlapped in parallel
cuda-kernel-verifier Runtime correctness checker for custom CUDA/Triton kernels - decorator-based, outlier-biased sampling, zero training graph impact
self-attention-cuda-kernel-comparison Benchmarks of hand-written CUDA C, Numba, and Triton self-attention kernels vs PyTorch SDPA across sequence lengths, batch sizes, and head dims
mojo-tensor GPU-accelerated deep learning framework in Mojo - tensors, autograd, and neural network layers with custom GPU kernel implementations
distributed-llama.cpp Distributed LLM inference across machines: routes OpenAI-compatible requests to llama.cpp nodes with automatic model distribution, load balancing, and mutual TLS

Skills

Languages · Python · C · C++ · Go · Mojo · Java · JavaScript

DL / ML · DeepSpeed · PyTorch · TensorFlow · Keras · CUDA · scikit-learn

Training · Pretraining · Distributed Training · Fine-Tuning · PEFT · SFT · RLHF · RLAIF · DPO · GRPO · LoRA · QLoRA · Unsloth

GenAI · Transformers · Diffusers · vLLM · LangChain · LangGraph · LlamaIndex · llama.cpp

Infrastructure · Docker · Kubernetes · AWS · GCP · Azure · Terraform

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors