Skip to content

Conversation

@pcuenca
Copy link
Member

@pcuenca pcuenca commented Sep 24, 2025

This can do for now, but in fact, we should not even attempt to download config.json when loading a tokenizer.

@pcuenca
Copy link
Member Author

pcuenca commented Sep 24, 2025

I'll add a test later.

@pcuenca pcuenca force-pushed the tokenizers-dont-need-model-config branch from 3d78220 to 46cd5ab Compare September 25, 2025 07:41
@pcuenca
Copy link
Member Author

pcuenca commented Sep 25, 2025

I tried to skip downloads of model configurations, but I realized there are still edge cases for old tokenizers where the tokenizer class is retrieved from the model config. We need to look into this better. This is another fallback mechanism in addition to #251.

@pcuenca pcuenca mentioned this pull request Sep 25, 2025
@pcuenca
Copy link
Member Author

pcuenca commented Sep 25, 2025

Merging this for now, will improve the fallback cases later.

@pcuenca pcuenca merged commit b2bc56e into main Sep 25, 2025
2 checks passed
@pcuenca pcuenca deleted the tokenizers-dont-need-model-config branch September 25, 2025 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants