Patch Mistral Common Tokenizer 2: fix add_special_tokens #41962

juliendenize · 2025-10-31T11:07:04Z

What does this PR do?

Fixes the behavior of add_special_tokens to match PretrainedTokenizer goals:

if mode is finetuning: add both bos and eos tokens
if mode is test: add only bos tokens so that the model can generate freely.

Also adds the possibility to use ValidationMode as a string to avoid import from mistral-common for the user.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

cc @patrickvonplaten for viz

@ArthurZucker and @itazap

juliendenize added 5 commits October 31, 2025 12:03

Patch Mistral Common Tokenizer 2: fix add_special_tokens

186821a

Fix typos

a276753

Fix typing issue

856d441

Make _get_validation_mode static

2ad74a9

wip

bafcd62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Patch Mistral Common Tokenizer 2: fix add_special_tokens #41962

Patch Mistral Common Tokenizer 2: fix add_special_tokens #41962

Uh oh!

juliendenize commented Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Patch Mistral Common Tokenizer 2: fix add_special_tokens #41962

Are you sure you want to change the base?

Patch Mistral Common Tokenizer 2: fix add_special_tokens #41962

Uh oh!

Conversation

juliendenize commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

juliendenize commented Oct 31, 2025 •

edited

Loading