chore(deps): update transformers dep #627

renovate · 2025-09-05T18:03:21Z

Coming soon: The Renovate bot (GitHub App) will be renamed to Mend. PRs from Renovate will soon appear from 'Mend'. Learn more here.

This PR contains the following updates:

Package	Change	Age	Confidence
accelerate	`==1.10.0` -> `==1.10.1`
sentence-transformers	`==5.1.0` -> `==5.1.1`
transformers	`==4.55.4` -> `==4.56.2`

Release Notes

huggingface/accelerate (accelerate)

`v1.10.1`: : Patchfix

Compare Source

Feat: add to_json by @S1ro1 in #3743
Protect import for device_mesh by @SunMarc in #3742.

Full Changelog: huggingface/accelerate@v1.10.0...v1.10.1

UKPLab/sentence-transformers (sentence-transformers)

`v5.1.1`: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

Compare Source

This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==5.1.1

### Inference only, use one of:
pip install sentence-transformers==5.1.1
pip install sentence-transformers[onnx-gpu]==5.1.1
pip install sentence-transformers[onnx]==5.1.1
pip install sentence-transformers[openvino]==5.1.1

Error if unused kwargs is passed & `get_model_kwargs` (#3500)

Some SentenceTransformer or SparseEncoder models support custom model-specific keyword arguments, such as jinaai/jina-embeddings-v4. As of this release, calling model.encode with keyword arguments that aren't used by the model will result in an error.

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("all-MiniLM-L6-v2")
>>> model.encode("Who is Amelia Earhart?", normalize=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[sic]/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "[sic]/SentenceTransformer.py", line 983, in encode
    raise ValueError(
ValueError: SentenceTransformer.encode() has been called with additional keyword arguments that this model does not use: ['normalize']. As per SentenceTransformer.get_model_kwargs(), this model does not accept any additional keyword arguments.

Quite useful when you, for example, accidentally forget that the parameter to get normalized embeddings is normalize_embeddings. Prior to this version, this parameter would simply quietly be ignored.

To check which custom extra keyword arguments may be used for your model, you can call the new get_model_kwargs method:

>>> from sentence_transformers import SentenceTransformer, SparseEncoder
>>> SentenceTransformer("all-MiniLM-L6-v2").get_model_kwargs()
[]
>>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs()
['task', 'truncate_dim']
>>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs()
['task']

Note: You can always pass the task parameter, it's the only model-specific parameter that will be quietly ignored. This means that you can always use model.encode(..., task="query") and model.encode(..., task="document").

Minor Features

Add FLOPS calculations to SparseEncoder evaluators (#3456)
Add Support for Knowledgeable Passage Retriever (KPR) models (#3495)

Minor Fixes

Fix batch_size being ignored in CrossEncoderRerankingEvaluator (#3497)
Fix multi-GPU processing with encode, embeddings are now moved from the various devices to the CPU before being stacked into one tensor (#3488)
Use encode_query and encode_document in mine_hard_negatives, automatically using defined "query" and "document" prompts (#3502)
Fix "Path does not exist" errors when calling an Evaluator with a output_path that doesn't exist yet (#3516)
Fix the number of reported number of missing negatives in mine_hard_negatives (#3504)

All Changes

Docs Patch for AnglE and CoSENT Losses by @johneckberg in #3496
[fix] add batch size parameter to model prediction in CrossEncoderRerankingEvaluator by @emapco in #3497
Add FLOPS calculation and update metrics in SparseEvaluators by @arthurbr11 in #3456
[fix] Ensure multi-process embeddings are moved to CPU for concatenation by @tomaarsen in #3488
[model_card] Don't override manually provided languages in model card by @tomaarsen in #3501
[tests] Add hard negatives test showing multiple positives are correctly handled by @tomaarsen in #3503
[feat] Use encode_document and encode_query in mine_hard_negatives by @tomaarsen in #3502
Add Support for Knowledgeable Passage Retriever (KPR) by @ikuyamada in #3495
Update rasyosef/splade-mini MSMARCO and BEIR-13 benchmark scores in pretrained_models.md by @rasyosef in #3508
always pass input_ids, attention_mask, token_type_ids, inputs_embeds to forward by @Samoed in #3509
[feat] add get_model_kwargs method; throw error if unused kwarg is passed by @tomaarsen in #3500
Fix:Import SentenceTransformer class explicitly in losses module by @altescy in #3521
fix: add makedirs to informationretrievalevaluator by @stephantul in #3516
[fix] Fix the number of missing negatives in mine_hard_negatives by @tomaarsen in #3504

New Contributors

@ikuyamada made their first contribution in #3495
@rasyosef made their first contribution in #3508

Full Changelog: UKPLab/sentence-transformers@v5.1.0...v5.1.1

huggingface/transformers (transformers)

`v4.56.2`: Patch release v4.56.2

Compare Source

Processor load with multi-processing (#40786)
[Jetmoe] Fix RoPE (#40819)
Fix getter regression (#40824)
Fix config dtype parsing for Emu3 edge case (#40766)

`v4.56.1`: Patch release v4.56.1

Compare Source

Patch release v4.56.1

This patch most notably fixes an issue with the new dtype argument (replacing torch_dtype) in pipelines!

Bug Fixes & Improvements

Fix broken Llama4 accuracy in MoE part (#40609)
fix pipeline dtype (#40638)
Fix self.dropout_p is not defined for SamAttention/Sam2Attention (#40667)
Fix backward compatibility with accelerate in Trainer (#40668)
fix broken offline mode when loading tokenizer from hub (#40669)
[Glm4.5V] fix vLLM support (#40696)

`v4.56.0`: v4.56: Dino v3, X-Codec, Ovis 2, MetaCLIP 2, Florence 2, SAM 2, Kosmos 2.5, HunYuan, GLMV-4.5

Compare Source

New model additions

Dino v3

DINOv3 is a family of versatile vision foundation models that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models.

You can find all the original DINOv3 checkpoints under the DINOv3 collection.

Add Dino v3 by @qubvel in #40167

X-Codec

he X-Codec model was proposed in Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model by Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

The X-Codec model is a neural audio codec that integrates semantic information from self-supervised models (e.g., HuBERT) alongside traditional acoustic information. This enables :

Music continuation : Better modeling of musical semantics yields more coherent continuations.
Text-to-Sound Synthesis : X-Codec captures semantic alignment between text prompts and generated audio.
Semantic aware audio tokenization: X-Codec is used as an audio tokenizer in the YuE lyrics to song generation model.

Add X-Codec model by @Manalelaidouni in #38248

Ovis 2

The Ovis2 is an updated version of the Ovis model developed by the AIDC-AI team at Alibaba International Digital Commerce Group.

Ovis2 is the latest advancement in multi-modal large language models (MLLMs), succeeding Ovis1.6. It retains the architectural design of the Ovis series, which focuses on aligning visual and textual embeddings, and introduces major improvements in data curation and training methods.

Add Ovis2 model and processor implementation by @thisisiron in #37088

MetaCLIP 2

MetaCLIP 2 is a replication of the original CLIP model trained on 300+ languages. It achieves state-of-the-art (SOTA) results on multilingual benchmarks (e.g., XM3600, CVQA, Babel‑ImageNet), surpassing previous SOTA such as mSigLIP and SigLIP‑2. The authors show that English and non-English worlds can mutually benefit and elevate each other.

Add MetaCLIP 2 by @NielsRogge in #39826

Florence 2

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages the FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.

Add support for Florence-2 by @ducviet00 in #38188

SAM 2

SAM2 (Segment Anything Model 2) was proposed in Segment Anything in Images and Videos by Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer.

The model can be used to predict segmentation masks of any object of interest given an input image or video, and input points or bounding boxes.

Add Segment Anything 2 (SAM2) by @SangbumChoi in #32317

Kosmos 2.5

The Kosmos-2.5 model was proposed in KOSMOS-2.5: A Multimodal Literate Model by Microsoft.

The abstract from the paper is the following:

We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models.

Add Kosmos-2.5 by @tic-top in #31711

HunYuan

More information at release 🤗

HunYuan opensource by @yjc9696 in #39606

Seed OSS

More information at release 🤗

Adding ByteDance Seed Seed-OSS by @Fazziekey in #40272

GLM-4.5V

More information at release 🤗

GLM-4.5V Model Support by @zRzRzRzRzRzRzR in #39805

Cache

Beyond a large refactor of the caching system in Transformers, making it much more practical and general, models using sliding window attention/chunk attention do not waste memory anymore when caching past states. It was allowed most notable by:

New DynamicSlidingWindowLayer & associated Cache by @Cyrilvallez in #40039

See the following improvements on memory usage for Mistral (using only sliding layers) and GPT-OSS (1 out of 2 layers is sliding) respectively:

Beyond memory usage, it will also improve generation/forward speed by a large margin for large contexts, as only necessary states are passed to the attention computation, which is very sensitive to the sequence length.

Quantization

MXFP4

Since the GPT-OSS release which introduced the MXPF4 quantization type, several improvements have been made to the support, which should now stabilize.

Fix MXFP4 quantizer validation to allow CPU inference with dequantize option by @returnL in #39953
Enable gpt-oss mxfp4 on older hardware (sm75+) by @matthewdouglas in #39940
Fix typo and improve GPU kernel check error message in MXFP4 quantization by @akintunero in #40349)
Default to dequantize if cpu in device_map for mxfp4 by @MekkCyber in #39993
Fix GPT-OSS swiglu_limit not passed in for MXFP4 by @danielhanchen in #40197
[Mxfp4] Add a way to save with a quantization method by @ArthurZucker in #40176

New standard

Now that we deprecated tensorflow and jax, we felt that torch_dtype was not only misaligned with torch, but was redundant and hard to remember. For this reason, we switched to a much more standard dtype argument!

⚠️⚠️ Use dtype instead of torch_dtype everywhere! by @Cyrilvallez in #39782

torch_dtype will still be a valid usage for as long as needed to ensure a smooth transition, but new code should use dtype, and we encourage you to update older code as well!

Breaking changes

The following commits are breaking changes in workflows that were either buggy or not working as expected.

Saner hub-defaults for hybrid cache implementation

On models where the hub checkpoint specifies cache_implementation="hybrid" (static sliding window hybrid cache), UNSETS this value. This will make the model use the dynamic sliding window layers by default.

This default meant that there were widespread super slow 1st generate calls on models with hybrid caches, which should nol onger be the case.

🚨🚨 [generate] ignore cache_implementation="hybrid" hub defaults by @gante in #40135

Sine positional embeddings for MaskFormer & LRU cache

Cache the computation of sine positional embeddings for MaskFormer; results in a 6% performance improvement.

🚨 Use lru_cache for sine pos embeddings MaskFormer by @yonigozlan in #40007

Explicit cache initialization

Adds explicit cache initialization to prepare for the deprecation of the from_legacy_cache utility.

🚨 Always return Cache objects in modelings (to align with generate) by @manueldeprada in #39765

Default compilation with `fullgraph=False`

Having fullgraph set to True during compilation ended up being very restrictive, especially with the arrival of widely-used MoEs.

🚨🚨 Switch default compilation to fullgraph=False by @Cyrilvallez in #40137

Remove decoding strategies

The DoLa decoding strategy has been moved to the following remote-code repository a few versions ago: https://huggingface.co/transformers-community/dola

The Contrastive Search decoding strategy has been moved to the following remote-code repository a few versions ago: https://huggingface.co/transformers-community/contrastive-search

Both have now been removed from the library as a result.

🚨 Remove DoLa decoding strategy by @manueldeprada in #40082
🚨 Remove Contrastive Search decoding strategy by @manueldeprada in #40428

Fix sliding window in flash attention

Flash attention has used sliding window sizes which were off by one. This affected generations that had initially bigger contexts than the sliding window size.

🚨 [Flash Attention] Fix sliding window size by @vasqu in #40163

Minimum Torch version is now 2.2

Torch 2.1 support has been unreliable for some time, so we've now made it official and bumped our minimum version to 2.2.

byebye torch 2.1 by @Rocketknight1 in #40317

Bugfixes and improvements

[CI] post-GptOss fixes for green CI by @gante in #39929
Avoid utils/check_bad_commit.py failing due to rate limit (requesting api.github.com) by @ydshieh in #39918
Fix CI: Tests failing on CPU due to torch.device('cpu').index being None by @manueldeprada in #39933
circleci: pin torch 2.7.1 until torchcodec is updated by @ydshieh in #39951
[docs] ko toc fix by @gante in #39927
docs: fix typo in 'quantization-aware training' by @luckyvickyricky in #39904
Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts by @Mihonarium in #39959
fix typo by @Tialo in #39936
[image processor] fix glm4v by @KeyKy in #39964
remove triton_kernels dep with kernels instead by @SunMarc in #39926
Fix fix_and_overwrite mode of utils/check_docstring.py by @manueldeprada in #39369
[bugfix] fix flash_attention_2 unavailable error on Ascend NPU by @FightingZhen in #39844
chore: update Deformable_Detr model card by @arpon-kapuria in #39902
Modular fix: remove the model name in find_file_type by @yonigozlan in #39897
Gemma3 fixes by @remi-or in #39960
[superglue] Fixed the way batch mask was applied to the scores before match assignment computation by @sbucaille in #39968
Support input_embeds in torch exportable decoders by @jackzhxng in #39836
Various test fixes for AMD by @remi-or in #39978
[Idefics] fix device mismatch by @zucchini-nlp in #39981
Fix gemma3n feature extractor's incorrect squeeze by @Isotr0py in #39919
[typing] Fix return typehint for decoder and inv_freq annotation by @qubvel in #39610
Fix consistency by @Cyrilvallez in #39995
Update expected output values after #39885 (part 1) by @ydshieh in #39990
Fix int4 quantized model cannot work with cpu by @yuanwu2017 in #39724
Fix missing video inputs for PerceptionLM. by @shuminghu in #39971
fix: remove CHAT_TEMPLATE import in tests for deepseek-vl by @geetu040 in #40003
Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips by @ducviet00 in #39965
Fix default values of getenv by @cyyever in #39867
FA2 can continue generation from cache by @zucchini-nlp in #39843
unpin torch<2.8 on circleci by @ydshieh in #40012
docs: fix duplication in 'en/optimizers.md' by @luckyvickyricky in #40014
Raising error when quantizing a quantized model by @MekkCyber in #39998
Update expected output values after #39885 (part 2) by @ydshieh in #40015
pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI by @ydshieh in #40013
Fix broken image inference for Fuyu model by @Isotr0py in #39915
Higgs modules_to_not_convert standardization by @MekkCyber in #39989
Fix an annoying flaky test by @zucchini-nlp in #40000
Harmonize past_key_value to past_key_valueS everywhere by @Cyrilvallez in #39956
Fix missing None default values for Gemma3n model in get_placeholder_mask by @Znerual in #39991)
[core] Refactor the Cache logic to make it simpler and more general by @Cyrilvallez in #39797
Tie weights recursively on all submodels by @Cyrilvallez in #39996
Bnb failling tests by @MekkCyber in #40026
fix notification_service.py about time_spent by @ydshieh in #40037
Revert "fix notification_service.py about time_spent" by @ydshieh in #40044
Update HuBERT model card according to template by @reedrya in #39742
unpin torchcodec==0.5.0 and use torch 2.8 on daily CI by @ydshieh in #40072
fix: resolve triton version check compatibility on windows by @Tsumugii24 in #39986
[qwen-vl] fix beam search with videos by @zucchini-nlp in #39726
[gemma3] update conversion key mapping by @zucchini-nlp in #39778
fix: move super().init after vision_config init in Mistral3Config by @starcatmeow in #40063
Remove deprecated cache-related objects by @Cyrilvallez in #40035
guard on model.eval when using torch.compile + FSDP2 by @winglian in #37413
Fix repo consistency by @zucchini-nlp in #40077
added Textnet fast image processor by @rahzaazhar in #39884
Fix time_spent in notification_service.py. by @ydshieh in #40081
chore: standardize DeBERTa model card by @Shoumik-Gandre in #37409
[GPT Big Code] Fix attention scaling by @vasqu in #40041
feat: extract rev in attn_implementation kernels via @ by @drbh in #40009
Update notification service MI325 by @ivarflakstad in #40078
Fix PerceptionLM image preprocessing for non-tiled image input. by @shuminghu in #40006
Revert FA2 kwargs construction by @zucchini-nlp in #40029
[fix] batch inference for llava_onevision by @cyr0930 in #40021
[docs] Zero Shot Object Detection Task by @ariG23498 in #40096
Update Glm4V processor and add tests by @zucchini-nlp in #39988
Add glm4.5&&glm4.5V doc by @lambertwjh in #40095
Causal loss for ForConditionalGeneration by @qgallouedec in #39973
Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock by @Malav-P in #39743
New DynamicSlidingWindowLayer & associated Cache by @Cyrilvallez in #40039
Enable SIM rules by @cyyever in #39806
feat: add is_fast to ImageProcessor by @MilkClouds in #39603
Re-apply make style by @Cyrilvallez in #40106
Replace logger.warning with logger.warning_once in GradientCheckpointingLayer by @qgallouedec in #40091
Fix regression in mllama vision encoder by @Isotr0py in #40083
Switch the order of args in StaticCache (for BC and future logic) by @Cyrilvallez in #40100
Fix Qwen3 MoE GGUF architecture mismatch by @ctcanbol in #39976
Fix error on importing unavailable torch.distributed by @m-gallus in #40038
[Flash Attention] Fix flash attention integration by @vasqu in #40002
[trainer] ensure special tokens in model configs are aligned with tokenizer at train time by @gante in #38441
Fix Causality Handling in Flash Attention to Support Bidirectional Attention by @lucaswychan in #39707
[docs] Add reference to HF-maintained custom_generate collections by @gante in #39894
Add model card for MobileViT by @Shivamjan in #40033
remove sequence parallel in llama4 by @3outeille in #40084
🌐 [i18n-KO] Translated tiny_agents.md to Korean by @AhnJoonSung in #39913
[bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM by @qgallouedec in #39975
changed xLSTMRMSNorm to RMSNorm by @nikitazuevblago in #40113
Fix QuantoQuantizedCache import issues by @manueldeprada in #40109
[serve] allow array content inputs for LLMs by @gante in #39829
decoding_method argument in generate by @manueldeprada in #40085
Collated reports by @ivarflakstad in #40080
DOCS: Add missing space in SECURITY.md by @shivaheidari in #40087
[trainer] handle case where EOS token is None in generation_config by @gante in #40127
Fix hidden torchvision>=0.15 dependency issue by @yonigozlan in #39928
🌐 [i18n-KO] Translated main_classes/processors.md to Korean by @TaskerJang in #39519
🌐 [i18n-KO] Translated jamba.md to Korean by @skwh54 in #39890
🌐 [i18n-KO] Translated main_classes/optimizer_schedules.md to Korean by @luckyvickyricky in #39713
🌐 [i18n-KO] Translated gpt2.md to Korean by @taemincode in #39808
🌐 [i18n-KO] Translated optimizers.md to Korean by @chelsseeey in #40011
🌐 [i18n-KO] Translated grounding-dino.md to Korean by @TaskerJang in #39861
🌐 [i18n-KO] Translated pipelines.md to Korean by @xhaktm00 in #39577
gpt oss is important by @ArthurZucker in #40139
Fix Janus by @Cyrilvallez in #40140
[docs] Fix ko toctree by @stevhliu in #40138
Remove an old badly designed test by @Cyrilvallez in #40142
updated visualBERT modelcard by @Anil-Red in #40057
🌐 [i18n-KO] Translated gemma3.md to Korean by @seopp in #39865
Fix quantized cache with only cache_implementation in generate by @Cyrilvallez in #40144
Add pytest marker: torch_compile_test and torch_export_test by @ydshieh in #39950
Update Dockerfiles to install packages inside a virtual environment by @Sai-Suraj-27 in #39098
Create self-scheduled-amd-mi355-caller.yml by @glegendre01 in #40134
[Cohere2Vision] remove unused arg by @zucchini-nlp in #40103
[efficientloftr] fix bugs and follow original cross attn implementation strictly by @sbucaille in #40141
Fix CI: Use correct import in SAM for torchvision InterpolationMode by @manueldeprada in #40160
[Continous Batching] set head_dim when config.head_dim is None by @kashif in #40159
Replace self.tokenizer by self.processing_class by @qgallouedec in #40119
[FA2] Fix it finally - revert fa kwargs preparation by @Cyrilvallez in #40161
[bugfix] fix flash-attention2 unavailable error for Ascend NPU by @FightingZhen in #40151
build: Add fast image processor tvp by @adutchengineer in #39529
Add GptOssForSequenceClassification for GPT-OSS models by @zyfedward in #40043
Standardize BARTpho model card: badges, new examples, fixed broken im… by @eshwanthkartitr in #40051
Add dates to the model docs by @MHRDYN7 in #39320
Pin torch to 2.7.1 on CircleCI for now by @ydshieh in #40174
Update dynamic attnt setter for multimodals by @zucchini-nlp in #39908
[MINOR:TYPO] Update base.py by @cakiki in #40169
make model doc device agnostic by @yao-matrix in #40143
fix to avoid modifying a view in place by @3outeille in #40162
Fix fsdp for generic-task models by @Cyrilvallez in #40191
Add repr to EncoderDecoderCache by @Cyrilvallez in #40195
Fix typos by @cyyever in #40175
Remove _prepare_flash_attention_from_position_ids by @cyyever in #40069
Avoid CUDA stream sync by @cyyever in #40060
Fix various Pylint warnings by @cyyever in #40107
Update: add type hints to check_tokenizers.py by @ajeet214 in #40094
Benchmarking improvements by @ahadnagy in #39768
docs: Update LayoutLM model card according to new standardized format by @Jin-HoMLee in #40129
Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for too long with no output by @ydshieh in #40201
Use correct model_input_names for PixtralImageProcessor by @rohitrango in #40226
fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function by @killight98 in #40130
[SAM 2] Change checkpoints in docs and tests by @yonigozlan in #40213
Fix more typos by @cyyever in #40212
Fix ESM token_dropout crash when using inputs_embeds instead of input_ids by @notkisk in #40181
AMD scheduled CI ref env file by @ivarflakstad in #40243
Fix more pylint warnings by @cyyever in #40204
remove transpose_for_scores call in ESM-2 by @pstjohn in #40210
Add chat_template (jinja2) as an extra dependency by @tboerstad in #40128
[typing] fix type annotation error in DepthPro model image processor by @MengAiDev in #40238
[serve] guard imports by @gante in #39825
[CI] Fix repo consistency by @vasqu in #40249
Fixes for EncoderDecoderCache by @remi-or in #40008
fix: Catch correct ConnectionError for additional_chat_templates by @akug in #39874
Model card for NLLB by @sahil-kabir in #40074
Correct typo and update notes in docs Readme by @PavloFesenko in #40234
Fix benchmark workflow by [@ahadnagy](http

Configuration

📅 Schedule: Branch creation - "on friday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate bot added the dependencies Pull requests that update a dependency file label Sep 5, 2025

renovate bot force-pushed the renovate/transformers-dep branch from cb89fe6 to 52bf0c8 Compare September 19, 2025 17:50

chore(deps): update transformers dep

14b67bb

renovate bot force-pushed the renovate/transformers-dep branch from 52bf0c8 to 14b67bb Compare September 22, 2025 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(deps): update transformers dep #627

chore(deps): update transformers dep #627

renovate bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

chore(deps): update transformers dep #627

Are you sure you want to change the base?

chore(deps): update transformers dep #627

Conversation

renovate bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v1.10.1: : Patchfix

v5.1.1: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

Error if unused kwargs is passed & get_model_kwargs (#​3500)

Minor Features

Minor Fixes

All Changes

New Contributors

v4.56.2: Patch release v4.56.2

v4.56.1: Patch release v4.56.1

Patch release v4.56.1

Bug Fixes & Improvements

v4.56.0: v4.56: Dino v3, X-Codec, Ovis 2, MetaCLIP 2, Florence 2, SAM 2, Kosmos 2.5, HunYuan, GLMV-4.5

New model additions

Dino v3

X-Codec

Ovis 2

MetaCLIP 2

Florence 2

SAM 2

Kosmos 2.5

HunYuan

Seed OSS

GLM-4.5V

Cache

Quantization

MXFP4

New standard

Breaking changes

Saner hub-defaults for hybrid cache implementation

Sine positional embeddings for MaskFormer & LRU cache

Explicit cache initialization

Default compilation with fullgraph=False

Remove decoding strategies

Fix sliding window in flash attention

Minimum Torch version is now 2.2

Bugfixes and improvements

Configuration

Uh oh!

Uh oh!

renovate bot commented Sep 5, 2025 •

edited

Loading

`v1.10.1`: : Patchfix

`v5.1.1`: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

Error if unused kwargs is passed & `get_model_kwargs` (#3500)

`v4.56.2`: Patch release v4.56.2

`v4.56.1`: Patch release v4.56.1

`v4.56.0`: v4.56: Dino v3, X-Codec, Ovis 2, MetaCLIP 2, Florence 2, SAM 2, Kosmos 2.5, HunYuan, GLMV-4.5

Default compilation with `fullgraph=False`