Skip to content

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Sep 5, 2025

Coming soon: The Renovate bot (GitHub App) will be renamed to Mend. PRs from Renovate will soon appear from 'Mend'. Learn more here.

This PR contains the following updates:

Package Change Age Confidence
accelerate ==1.10.0 -> ==1.10.1 age confidence
sentence-transformers ==5.1.0 -> ==5.1.1 age confidence
transformers ==4.55.4 -> ==4.56.2 age confidence

Release Notes

huggingface/accelerate (accelerate)

v1.10.1: : Patchfix

Compare Source

Full Changelog: huggingface/accelerate@v1.10.0...v1.10.1

UKPLab/sentence-transformers (sentence-transformers)

v5.1.1: - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative

Compare Source

This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==5.1.1

### Inference only, use one of:
pip install sentence-transformers==5.1.1
pip install sentence-transformers[onnx-gpu]==5.1.1
pip install sentence-transformers[onnx]==5.1.1
pip install sentence-transformers[openvino]==5.1.1

Error if unused kwargs is passed & get_model_kwargs (#​3500)

Some SentenceTransformer or SparseEncoder models support custom model-specific keyword arguments, such as jinaai/jina-embeddings-v4. As of this release, calling model.encode with keyword arguments that aren't used by the model will result in an error.

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("all-MiniLM-L6-v2")
>>> model.encode("Who is Amelia Earhart?", normalize=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[sic]/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "[sic]/SentenceTransformer.py", line 983, in encode
    raise ValueError(
ValueError: SentenceTransformer.encode() has been called with additional keyword arguments that this model does not use: ['normalize']. As per SentenceTransformer.get_model_kwargs(), this model does not accept any additional keyword arguments.

Quite useful when you, for example, accidentally forget that the parameter to get normalized embeddings is normalize_embeddings. Prior to this version, this parameter would simply quietly be ignored.

To check which custom extra keyword arguments may be used for your model, you can call the new get_model_kwargs method:

>>> from sentence_transformers import SentenceTransformer, SparseEncoder
>>> SentenceTransformer("all-MiniLM-L6-v2").get_model_kwargs()
[]
>>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs()
['task', 'truncate_dim']
>>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs()
['task']

Note: You can always pass the task parameter, it's the only model-specific parameter that will be quietly ignored. This means that you can always use model.encode(..., task="query") and model.encode(..., task="document").

Minor Features

Minor Fixes

  • Fix batch_size being ignored in CrossEncoderRerankingEvaluator (#​3497)
  • Fix multi-GPU processing with encode, embeddings are now moved from the various devices to the CPU before being stacked into one tensor (#​3488)
  • Use encode_query and encode_document in mine_hard_negatives, automatically using defined "query" and "document" prompts (#​3502)
  • Fix "Path does not exist" errors when calling an Evaluator with a output_path that doesn't exist yet (#​3516)
  • Fix the number of reported number of missing negatives in mine_hard_negatives (#​3504)

All Changes

New Contributors

Full Changelog: UKPLab/sentence-transformers@v5.1.0...v5.1.1

huggingface/transformers (transformers)

v4.56.2: Patch release v4.56.2

Compare Source

v4.56.1: Patch release v4.56.1

Compare Source

Patch release v4.56.1

This patch most notably fixes an issue with the new dtype argument (replacing torch_dtype) in pipelines!

Bug Fixes & Improvements
  • Fix broken Llama4 accuracy in MoE part (#​40609)
  • fix pipeline dtype (#​40638)
  • Fix self.dropout_p is not defined for SamAttention/Sam2Attention (#​40667)
  • Fix backward compatibility with accelerate in Trainer (#​40668)
  • fix broken offline mode when loading tokenizer from hub (#​40669)
  • [Glm4.5V] fix vLLM support (#​40696)

v4.56.0: v4.56: Dino v3, X-Codec, Ovis 2, MetaCLIP 2, Florence 2, SAM 2, Kosmos 2.5, HunYuan, GLMV-4.5

Compare Source

New model additions
Dino v3

DINOv3 is a family of versatile vision foundation models that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models.

You can find all the original DINOv3 checkpoints under the DINOv3 collection.

image
X-Codec

he X-Codec model was proposed in Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model by Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

The X-Codec model is a neural audio codec that integrates semantic information from self-supervised models (e.g., HuBERT) alongside traditional acoustic information. This enables :

  • Music continuation : Better modeling of musical semantics yields more coherent continuations.
  • Text-to-Sound Synthesis : X-Codec captures semantic alignment between text prompts and generated audio.
  • Semantic aware audio tokenization: X-Codec is used as an audio tokenizer in the YuE lyrics to song generation model.
image
Ovis 2

The Ovis2 is an updated version of the Ovis model developed by the AIDC-AI team at Alibaba International Digital Commerce Group.

Ovis2 is the latest advancement in multi-modal large language models (MLLMs), succeeding Ovis1.6. It retains the architectural design of the Ovis series, which focuses on aligning visual and textual embeddings, and introduces major improvements in data curation and training methods.

MetaCLIP 2

MetaCLIP 2 is a replication of the original CLIP model trained on 300+ languages. It achieves state-of-the-art (SOTA) results on multilingual benchmarks (e.g., XM3600, CVQA, Babel‑ImageNet), surpassing previous SOTA such as mSigLIP and SigLIP‑2. The authors show that English and non-English worlds can mutually benefit and elevate each other.

image
Florence 2

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages the FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.

image
SAM 2

SAM2 (Segment Anything Model 2) was proposed in Segment Anything in Images and Videos by Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer.

The model can be used to predict segmentation masks of any object of interest given an input image or video, and input points or bounding boxes.

image
Kosmos 2.5

The Kosmos-2.5 model was proposed in KOSMOS-2.5: A Multimodal Literate Model by Microsoft.

The abstract from the paper is the following:

We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models.

drawing

HunYuan
image

More information at release 🤗

Seed OSS
image

More information at release 🤗

GLM-4.5V

More information at release 🤗

Cache

Beyond a large refactor of the caching system in Transformers, making it much more practical and general, models using sliding window attention/chunk attention do not waste memory anymore when caching past states. It was allowed most notable by:

See the following improvements on memory usage for Mistral (using only sliding layers) and GPT-OSS (1 out of 2 layers is sliding) respectively: image image

Beyond memory usage, it will also improve generation/forward speed by a large margin for large contexts, as only necessary states are passed to the attention computation, which is very sensitive to the sequence length.

Quantization
MXFP4

Since the GPT-OSS release which introduced the MXPF4 quantization type, several improvements have been made to the support, which should now stabilize.

New standard

Now that we deprecated tensorflow and jax, we felt that torch_dtype was not only misaligned with torch, but was redundant and hard to remember. For this reason, we switched to a much more standard dtype argument!

torch_dtype will still be a valid usage for as long as needed to ensure a smooth transition, but new code should use dtype, and we encourage you to update older code as well!

Breaking changes

The following commits are breaking changes in workflows that were either buggy or not working as expected.

Saner hub-defaults for hybrid cache implementation

On models where the hub checkpoint specifies cache_implementation="hybrid" (static sliding window hybrid cache), UNSETS this value. This will make the model use the dynamic sliding window layers by default.

This default meant that there were widespread super slow 1st generate calls on models with hybrid caches, which should nol onger be the case.

  • 🚨🚨 [generate] ignore cache_implementation="hybrid" hub defaults by @​gante in #​40135
Sine positional embeddings for MaskFormer & LRU cache

Cache the computation of sine positional embeddings for MaskFormer; results in a 6% performance improvement.

Explicit cache initialization

Adds explicit cache initialization to prepare for the deprecation of the from_legacy_cache utility.

Default compilation with fullgraph=False

Having fullgraph set to True during compilation ended up being very restrictive, especially with the arrival of widely-used MoEs.

Remove decoding strategies

The DoLa decoding strategy has been moved to the following remote-code repository a few versions ago: https://huggingface.co/transformers-community/dola

The Contrastive Search decoding strategy has been moved to the following remote-code repository a few versions ago: https://huggingface.co/transformers-community/contrastive-search

Both have now been removed from the library as a result.

Fix sliding window in flash attention

Flash attention has used sliding window sizes which were off by one. This affected generations that had initially bigger contexts than the sliding window size.

Minimum Torch version is now 2.2

Torch 2.1 support has been unreliable for some time, so we've now made it official and bumped our minimum version to 2.2.

Bugfixes and improvements

Configuration

📅 Schedule: Branch creation - "on friday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot added the dependencies Pull requests that update a dependency file label Sep 5, 2025
@renovate renovate bot force-pushed the renovate/transformers-dep branch from cb89fe6 to 52bf0c8 Compare September 19, 2025 17:50
@renovate renovate bot force-pushed the renovate/transformers-dep branch from 52bf0c8 to 14b67bb Compare September 22, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants