[Bugfix] Handle the edge case in detokenizer where processed tokens contain both `stop` str and `eos` token #23938

dtransposed · 2025-08-29T13:14:27Z

Bug description 🐛

Right now, BaseIncrementalDetokenizer can stop generation in two separate cases:

EOS token found
If <eos> appears in new_token_ids and stop_terminated=True, we hit the check at line 124 → and eventually return None.
Stop string found
Otherwise, we use StopChecker to look for stop strings inside the decoded text from new_token_ids → and eventually return stop_string.

The missing case

We don’t handle the situation where both conditions happen at once:

new_token_ids contain an <eos> token and
the decoded text also matches a stop string.

In this case, the current code only respects the EOS path and skips the stop string check.

The fix

We should:

Always run StopChecker first (to handle stop strings properly).
Then apply EOS termination (stop_terminated).

By swapping the order, stop strings take priority over EOS, which produces the expected behavior.

… test Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

…22704

gemini-code-assist

Code Review

This pull request addresses a bug where stop strings were ignored if an EOS token was present in the same batch of tokens. The proposed fix correctly reorders the logic to prioritize stop string evaluation before handling EOS token termination. The change is simple, effective, and accompanied by a new, thorough unit test that validates the corrected behavior for both including and excluding the stop string in the output. The fix appears correct and complete.

…22704

njhill · 2025-08-29T17:00:54Z

@dtransposed presumably you're talking about the spec decoding case?

What about when the eos token occurs in new_token_ids prior to the stop string?

dtransposed · 2025-08-29T18:25:08Z

@dtransposed presumably you're talking about the spec decoding case?

What about when the eos token occurs in new_token_ids prior to the stop string?

Yes, this is precisely how i stumbled upon this edge case - when I tested a very efficient speculator.

Is it possible though for detokenizer to see new_token_ids such that there are any tokens past <eos> token?
I thought that EngineCoreOutput.new_token_ids will never go past <eos> token here: https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/output_processor.py#L352 (this is the method where the detokenization is being used).

…22704

dtransposed · 2025-09-05T13:06:14Z

@njhill Could I ask you for further review please?

…22704

njhill

Thanks @dtransposed yes I see what you mean.

Please see inline comment and then we can merge this.

vllm/v1/engine/detokenizer.py

…22704

Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

…22704

njhill

Thanks @dtransposed!

…22704

dtransposed · 2025-09-09T10:26:00Z

@njhill. Could you take a look at the failing tests? They seems orthogonal to the PR logic. If it looks good to you can I ask you to update the branch and force merge?

…ontain both `stop` str and `eos` token (vllm-project#23938) Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

…ontain both `stop` str and `eos` token (vllm-project#23938) Signed-off-by: dtransposed <damian.bogunowicz@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…ontain both `stop` str and `eos` token (vllm-project#23938) Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

detokenizer: handle stop_string when model terminates; add regression…

5c9081c

… test Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

dtransposed requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 29, 2025 13:14

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

493d9db

…22704

mergify bot added the v1 label Aug 29, 2025

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

dtransposed mentioned this pull request Aug 29, 2025

[BUG] Suffix decoding ignores stop param in vllm.SamplingParams snowflakedb/ArcticInference#161

Closed

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

29b5449

…22704

dtransposed added 2 commits September 3, 2025 09:08

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

08de50d

…22704

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

cb6fdb4

…22704

dtransposed added 2 commits September 5, 2025 15:06

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

66323bb

…22704

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

ef5092f

…22704

njhill reviewed Sep 5, 2025

View reviewed changes

vllm/v1/engine/detokenizer.py Outdated Show resolved Hide resolved

dtransposed added 2 commits September 8, 2025 09:23

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

ef28fde

…22704

Simplification

cf28176

Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

dtransposed requested a review from njhill September 8, 2025 08:07

dtransposed added 3 commits September 8, 2025 11:45

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

73b41cf

…22704

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

23066fc

…22704

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

ff2c2ca

…22704

njhill approved these changes Sep 8, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 8, 2025

njhill enabled auto-merge (squash) September 8, 2025 17:02

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

e7e9239

…22704

Merge branch 'main' into feat/detokenizer-stop-termination-20250829-1…

100fabf

…22704

njhill added the force-merge label Sep 9, 2025

vllm-bot merged commit 922d3b4 into vllm-project:main Sep 9, 2025
36 of 38 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Bugfix] Handle the edge case in detokenizer where processed tokens c…

2d2c71e

…ontain both `stop` str and `eos` token (vllm-project#23938) Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Bugfix] Handle the edge case in detokenizer where processed tokens c…

d0f9a92

…ontain both `stop` str and `eos` token (vllm-project#23938) Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Handle the edge case in detokenizer where processed tokens contain both `stop` str and `eos` token #23938

[Bugfix] Handle the edge case in detokenizer where processed tokens contain both `stop` str and `eos` token #23938

Uh oh!

dtransposed commented Aug 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

njhill commented Aug 29, 2025

Uh oh!

dtransposed commented Aug 29, 2025 •

edited

Loading

Uh oh!

dtransposed commented Sep 5, 2025

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

dtransposed commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Handle the edge case in detokenizer where processed tokens contain both stop str and eos token #23938

[Bugfix] Handle the edge case in detokenizer where processed tokens contain both stop str and eos token #23938

Uh oh!

Conversation

dtransposed commented Aug 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug description 🐛

The missing case

The fix

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill commented Aug 29, 2025

Uh oh!

dtransposed commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtransposed commented Sep 5, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

dtransposed commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Bugfix] Handle the edge case in detokenizer where processed tokens contain both `stop` str and `eos` token #23938

[Bugfix] Handle the edge case in detokenizer where processed tokens contain both `stop` str and `eos` token #23938

dtransposed commented Aug 29, 2025 •

edited by github-actions bot

Loading

dtransposed commented Aug 29, 2025 •

edited

Loading