GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 27 days ago • 218
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 6 days ago • 139
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 1 day ago • 1
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 1 day ago • 1
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 1 day ago • 1
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning Paper • 2601.09088 • Published 22 days ago • 62