Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Nov 25, 2025

As per title, now checks for the model type properly and adds sanity check for v5+

Fixes #42374
Fixes #42378
Fixes #42369
Closes #42379
Closes #42388

@vasqu vasqu requested a review from ArthurZucker November 25, 2025 11:41
"mistral",
"mistral3",
"voxstral",
"voxtral",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed a typo here

Comment on lines 2483 to 2484
elif version.parse(transformers_version) > version.parse("4.57.2"):
return tokenizer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we won't need this fix for the newest versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it explicit with 5.0.0<=

Comment on lines 2486 to 2488
# Expose the `fix_mistral_regex` flag on the tokenizer when provided, even if no correction is applied.
if "fix_mistral_regex" in init_kwargs:
setattr(tokenizer, "fix_mistral_regex", init_kwargs["fix_mistral_regex"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before but now it needs to bypass our two safety checks that indicate that we do not need to fix mistral

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

return False

if _is_local or is_base_mistral(pretrained_model_name_or_path):
is_official_mistral_tokenizer = is_base_mistral(pretrained_model_name_or_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot call the hub when is_local

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, fixed it explicitly checking with _is_local now, good point

@vasqu vasqu added the for patch Tag issues / labels that should be included in the next patch label Nov 25, 2025
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks the test was much needed 😢

Can you just isolate this as a patch_mixtral_regex func

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, llama

@jrp2014
Copy link

jrp2014 commented Nov 25, 2025

Any chance of issuing a new release, please. Breaking the usage of local caches is quite a big regression, at least for me.

@ArthurZucker ArthurZucker merged commit b605555 into huggingface:main Nov 25, 2025
23 checks passed
@ArthurZucker
Copy link
Collaborator

Yes we were just waiting on the CI sir

ArthurZucker added a commit that referenced this pull request Nov 25, 2025
* fix

* sanity check

* style

* comments

* make it v5 explicit

* make explicit fixes possible in local tokenizers

* remove hub usage on local

* fix

* extend test for no config case

* move mistral patch outside to separate fn

* fix local path only

* add a tes

* make sure test does not pass before this PR

* styling

* make sure it exists

* fix

* fix

* rename

* up

* last nit i hope lord

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
@vasqu vasqu deleted the fix-mistral-detection branch December 18, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

4 participants