generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Pull requests: huggingface/trl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Remove max_prompt_length from experimental ORPO
#4966
opened Feb 4, 2026 by
albertvillanova
Loading…
Remove padding_value from CPO and use pad_token_id
#4962
opened Feb 4, 2026 by
albertvillanova
Loading…
Use local variable instead of attribute in collator tests
#4957
opened Feb 3, 2026 by
qgallouedec
Loading…
fix: add gradient checkpointing to PolicyAndValueWrapper
#4955
opened Feb 3, 2026 by
lvhungdev
Loading…
3 of 5 tasks
[Experimental] Add SDFT trainer, config, docs, and tests
#4941
opened Jan 31, 2026 by
Shekswess
Loading…
4 of 5 tasks
Update RewardFunc type to use RewardCallable protocol
#4938
opened Jan 31, 2026 by
amit9oct
Loading…
2 of 5 tasks
documentation for modifying chat templates for assistant-only loss
#4937
opened Jan 30, 2026 by
jiosephlee
Loading…
Add Wordle example with Qwen3 thinking activated
#4936
opened Jan 30, 2026 by
sergiopaniego
•
Draft
5 tasks
Add SDPO (Self-Distillation Policy Optimization) trainer
#4935
opened Jan 30, 2026 by
MengAiDev
Loading…
fix: prevent O(2^n) regex backtracking in qwen3_schema
#4934
opened Jan 29, 2026 by
wingding12
•
Draft
1 task done
fix: sanitize malformed tool calls to prevent TypeError in chat templates
#4933
opened Jan 29, 2026 by
wingding12
•
Draft
3 tasks done
Automatically add generation tags to chat template for assistant_only_loss=True training (TRL Issue #4879)
#4900
opened Jan 26, 2026 by
Neelectric
•
Draft
3 of 5 tasks
Expose generation index to tool callables in GRPOTrainer
#4894
opened Jan 25, 2026 by
lukehinds
Loading…
4 tasks done
Previous Next
ProTip!
Updated in the last three days: updated:>2026-02-01.