GRPO Trainer by michaelbenayoun · Pull Request #1020 · huggingface/optimum-neuron

michaelbenayoun · 2025-11-04T16:32:24Z

What does this PR do?

This PR adds partial support for GRPO.

It was broken down into smaller PRs:

It adds the NeuronGRPOTrainer with a set of optimizations and modifications for the Torch XLA backend used to run things on Trainium instances. There are still core missing features:

Integration with vLLM: we use a custom CPU vLLM hack for now. The plan is to work on the vLLM part on another PR.
Weight Synchronization NeuronGRPOTrainer <-> vLLM
No tensor parallelism

HuggingFaceDocBuilderDev · 2025-11-04T16:38:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-01-21T08:06:58Z

This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2026-01-26T08:07:16Z

This PR was closed because it has been stalled for 5 days with no activity.

michaelbenayoun · 2026-02-04T15:04:32Z

optimum/neuron/trainers/grpo_config.py

+        if not self.experimental:
+            raise ValueError(
+                "NeuronGRPOTrainer is experimental and not production-ready. To proceed, set `experimental=True` in "
+                "your NeuronGRPOConfig. This flag exists to ensure users are aware of the current state of the implementation."
+            )


For now we disable the access to the NeuronGRPOTrainer

michaelbenayoun added 23 commits October 15, 2025 15:46

fix: remove wrong trl imports

19ad728

feat: align to latest trl release

34f4698

chore: update pyproject.toml

07437bc

style

e5256bf

feat: sync with SFTTrainer

954cfdf

Merge branch 'main' into sync_trl

3f1f700

fix: minor issues

3c72216

chore: sync with trl==0.24.0

d286f50

chore: sync sft_trainer

0992738

chore: sync sft_trainer

5a847ec

chore: sync sft_trainer

cddbf5f

fix: sft trainer

0200820

Merge branch 'main' into sync_trl

5bd79e6

chore: update dependency version for trl

2c8c1d1

chore: cleanup and fix no-packing test

7eda163

chore: restore finetune_qwen3.sh

b6ee2a3

feat: add model card creation when saving a checkpoint

72b338a

chore: remove model card support

98a6210

doc: align with trl==0.24.0

ee6caeb

test: fix broken sft + peft test

ac0c9f2

chore: add GRPO imports in optimum.neuron

8892d51

chore: add GRPO imports in optimum.neuron.trainers

f26497d

chore: add skeleton for GRPO trainer

f574d3e

michaelbenayoun added 6 commits November 4, 2025 18:07

feat: add mock class for vLLM

b105f91

Merge branch 'main' into grpo

b2d45f0

fix: add is_vllm_available imports

781b27f

chore: add data loading

e932e28

chore: add _prepare_inputs

cef6d30

chore: keep replacing stub methods

a567289

michaelbenayoun added 3 commits December 19, 2025 15:01

feat: optimization for XLA

998fd64

Merge branch 'main' into grpo

07cd31b

Merge branch 'main' into grpo

bf000df

github-actions bot added the Stale label Jan 21, 2026

github-actions bot closed this Jan 26, 2026

michaelbenayoun added 5 commits January 29, 2026 14:36

debug: training produces NaNs

4407687

fix: no NaNs anymore

131df14

rewrite _get_per_token_logps_and_entropies for better breaks

7a0167e

optimize _compute_loss

ac36687

optimize _generate_and_score_completions

8c816f6

michaelbenayoun reopened this Jan 30, 2026

github-actions bot removed the Stale label Jan 31, 2026

michaelbenayoun added 10 commits February 4, 2026 11:16

fix: use separate model for ref model to avoid XLA NaN issues

4fa42ba

fix: use separate model for ref model to avoid XLA NaN issues

8fc448a

chore: vllm_client.py remove unused functions

39660dc

chore: remove useless docstrings in vllm_client.py

6828a9b

chore: add safeguard for the GRPO feature

69252e8

chore: grpo_trainer.py cleanup

c2e6582

chore: untrack example

22e1c22

chore: clean trl_utils.py

ba1ac45

chore: clean trl_utils.py

5a19bd6

chore: clean trl_utils.py

51b68be

michaelbenayoun commented Feb 4, 2026

View reviewed changes

fix: add training extra for doc building

bce138b

michaelbenayoun marked this pull request as ready for review February 4, 2026 17:16

michaelbenayoun requested review from JingyaHuang, dacorvo and tengomucho February 4, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO Trainer#1020

GRPO Trainer#1020
michaelbenayoun wants to merge 84 commits intomainfrom
grpo

michaelbenayoun commented Nov 4, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2025

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

michaelbenayoun Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michaelbenayoun commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2025

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

michaelbenayoun Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michaelbenayoun commented Nov 4, 2025 •

edited

Loading