huggingface / trl Public

Notifications You must be signed in to change notification settings
Fork 2.7k
Star 18.4k

Code
Issues 555
Pull requests 135
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: huggingface/trl

Labels 37 Milestones 0

New pull request New

135 Open 3,164 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Continuous Batching support for AsyncGRPO

#5781 opened May 16, 2026 by qgallouedec Member • Draft

8 tasks

Drop unjustified model.visual. skip in GRPO / RLOO Qwen2.5-VL tests

#5780 opened May 15, 2026 by qgallouedec Member

Loading…

Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip

#5779 opened May 15, 2026 by qgallouedec Member

Loading…

Make the LLaVA / LLaVA-Next test guard explicit

#5778 opened May 15, 2026 by qgallouedec Member

Loading…

Fix spurious KL gradients for zero-std reward groups when beta > 0

#5777 opened May 15, 2026 by xodn348 Contributor

Loading…

5 of 7 tasks

skip vision parts of the model for test_train_vlm_multi_image as well

#5774 opened May 15, 2026 by kaixuanliu Contributor

Loading…

cleanup xpu cahce memory after each test

#5771 opened May 15, 2026 by kaixuanliu Contributor

Loading…

Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests

#5767 opened May 13, 2026 by albertvillanova Member

Loading…

Memory-efficient PEFT/LoRA vLLM weight sync under DeepSpeed ZeRO-3

#5766 opened May 13, 2026 by rak96

Loading…

7 of 14 tasks

feat(grpo): replace deprecated use_transformers_paged with transformers continuous batching

#5765 opened May 13, 2026 by sergiopaniego Member

Loading…

4 of 8 tasks

Add Qwen3-VL training chat template with generation markers

#5764 opened May 13, 2026 by aazizyan Contributor

Loading…

5 of 8 tasks

docs: set max_completion_length=1024 in GRPO quickstart examples

#5759 opened May 13, 2026 by dhruvnigam93

Loading…

5 of 8 tasks

Add telemetry to trainers

#5758 opened May 12, 2026 by qgallouedec Member

Loading…

Tighten old_per_token_logps recomputation check in GRPO

#5757 opened May 12, 2026 by wengeezhang

Loading…

5 of 8 tasks

async_grpo don't return on queue.Empty

#5751 opened May 12, 2026 by AmineDiro Member

Loading…

feat: move async rollout worker to separate process

#5749 opened May 11, 2026 by AmineDiro Member

Loading…

3 tasks done

[AsyncGRPO] Fix missing tool gates in worker init (fixes #5742)

#5748 opened May 11, 2026 by aazizyan Contributor

Loading…

5 of 8 tasks

Add end-to-end GRPO + OpenReward notebook (Local ORS / Toolathlon Gym / Qwen3.5-4B)

#5747 opened May 11, 2026 by rycerzes Contributor

Loading…

3 of 8 tasks

Align tiny Qwen2.5-VL with Qwen/Qwen2.5-VL-3B-Instruct

#5739 opened May 9, 2026 by qgallouedec Member

Loading…

cleanup vram

#5738 opened May 9, 2026 by ved1beta

Loading…

3 of 8 tasks

Fix OpenRewardSpec omitting task‑scoped tools during rollout binding (fixes #5727)

#5729 opened May 7, 2026 by rycerzes Contributor

Loading…

[gold] Implement seq_kd in GOLDTrainer

#5725 opened May 7, 2026 by roycho96 Contributor

Loading…

3 of 8 tasks

feat: add Falcon Mamba training chat templates with generation markers

#5723 opened May 7, 2026 by DagaBhai Contributor

Loading…

4 of 8 tasks

[feat, algo] Support RandOpt algo

#5719 opened May 7, 2026 by sunrainyg

Loading…

4 of 8 tasks

Align tiny DeepSeekV3 config with deepseek-ai/DeepSeek-R1-0528

#5715 opened May 6, 2026 by qgallouedec Member • Draft

8 tasks

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2026-04-17.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!