-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Pull requests: huggingface/trl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat(dpo): implement Adaptive Beta-DPO (arXiv:2407.08639)
#6123
opened Jun 19, 2026 by
mukund1985
Loading…
3 of 8 tasks
fix(online_dpo): add prediction_step to fix eval_strategy=steps crash
#6122
opened Jun 19, 2026 by
mukund1985
Loading…
3 of 8 tasks
fix(ppo): exclude padding tokens from entropy calculation
#6121
opened Jun 19, 2026 by
mukund1985
Loading…
3 of 8 tasks
feat(ppo): add save_value_model flag to PPOConfig
#6120
opened Jun 19, 2026 by
mukund1985
Loading…
3 of 8 tasks
Handle missing vllm_ascend imports gracefully
🐛 bug
Something isn't working
#6118
opened Jun 19, 2026 by
burtenshaw
Collaborator
Loading…
Make
evaluate() accept the same dataset types as the trainer
#6116
opened Jun 19, 2026 by
qgallouedec
Member
Loading…
Support LFM2-VL multimodal inputs in GRPO and RLOO
#6114
opened Jun 19, 2026 by
zwischenraum
Loading…
4 of 8 tasks
fix: address 4 remaining GMPO bugs from code review
#6095
opened Jun 18, 2026 by
abderahmane-ai
Contributor
Loading…
4 of 6 tasks
Add packing-aware dynamic batching to AsyncGRPO
#6092
opened Jun 17, 2026 by
AmineDiro
Member
Loading…
Auto-load
processing_class in CPO/ORPO trainers when omitted
#6087
opened Jun 17, 2026 by
DaoyuanLi2816
Contributor
Loading…
2 of 4 tasks
fix(data): correct extract_prompt index for divergence at start and prefix completions
#6079
opened Jun 16, 2026 by
he-yufeng
Loading…
[async GRPO] Per-generation reset() observation
#6072
opened Jun 16, 2026 by
qgallouedec
Member
Loading…
async_grpo: inject the environment_factory reset() observation into the prompt
#6068
opened Jun 15, 2026 by
adithya-s-k
Collaborator
Loading…
Warn when sequence-level importance sampling is combined with a token-summed loss type
#6042
opened Jun 13, 2026 by
discobot
Contributor
Loading…
4 of 8 tasks
fix: preserve OnlineDPO vLLM completion ids
#6038
opened Jun 13, 2026 by
he-yufeng
Loading…
4 of 8 tasks
refactor(sft): build labels during dataset preparation instead of collation
#6037
opened Jun 12, 2026 by
0xadvait
Loading…
5 of 8 tasks
test: don't hard-code bf16=True on devices that lack bf16 support
#6036
opened Jun 12, 2026 by
behroozazarkhalili
Collaborator
Loading…
fix: load image-text policy for async grpo
#6032
opened Jun 12, 2026 by
he-yufeng
Loading…
5 of 8 tasks
fix: pass AsyncGRPO environment rewards
#6031
opened Jun 12, 2026 by
he-yufeng
Loading…
5 of 8 tasks
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.