Pulse · huggingface/transformers · GitHub

July 11, 2025 – August 11, 2025

Overview

554 Active pull requests

303 Active issues

7 Releases published by 2 people

v4.53.2-modernbert-decoder-preview ModernBERT Decoder (based on v4.53.2)
published Jul 16, 2025
v4.53.3 Patch release v4.53.3
published Jul 22, 2025
v4.53.2-Ernie-4.5-preview Ernie-4.5 and Ernie-4.5 MoE (based on v4.53.2)
published Jul 23, 2025
v4.54.0 v4.54.0: Kernels, Transformers Serve, Ernie, Voxtral, LFM2, DeepSeek v2, ModernBERT Decoder...
published Jul 25, 2025
4.54.1 Patch release 4.54.1
published Jul 29, 2025
v4.55.0 v4.55.0: New openai GPT OSS model!
published Aug 5, 2025
4.55.0-GLM-4.5V-preview GLM-4.5V preview based on 4.55.0
published Aug 11, 2025

364 Pull requests merged by 145 people

feat: extract rev in attn_implementation kernels via @
#40009 merged Aug 11, 2025
[GPT Big Code] Fix attention scaling
#40041 merged Aug 11, 2025
chore: standardize DeBERTa model card
#37409 merged Aug 11, 2025
Fix time_spent in notification_service.py.
#40081 merged Aug 11, 2025
added Textnet fast image processor
#39884 merged Aug 11, 2025
Fix repo consistency
#40077 merged Aug 11, 2025
guard on model.eval when using torch.compile + FSDP2
#37413 merged Aug 11, 2025
Remove deprecated cache-related objects
#40035 merged Aug 11, 2025
fix: move super().__init__ after vision_config init in Mistral3Config
#40063 merged Aug 11, 2025
[gemma3] update conversion key mapping
#39778 merged Aug 11, 2025
[qwen-vl] fix beam search with videos
#39726 merged Aug 11, 2025
fix: resolve triton version check compatibility on windows
#39986 merged Aug 11, 2025
unpin torchcodec==0.5.0 and use torch 2.8 on daily CI
#40072 merged Aug 10, 2025
Update HuBERT model card according to template
#39742 merged Aug 10, 2025
Revert "fix notification_service.py about time_spent"
#40044 merged Aug 8, 2025
GLM-4.5V Model Support
#39805 merged Aug 8, 2025
fix notification_service.py about time_spent
#40037 merged Aug 8, 2025
Bnb failling tests
#40026 merged Aug 8, 2025
Tie weights recursively on all submodels
#39996 merged Aug 8, 2025
[core] Refactor the Cache logic to make it simpler and more general
#39797 merged Aug 8, 2025
Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991)
#40024 merged Aug 8, 2025
Harmonize past_key_value to past_key_valueS everywhere
#39956 merged Aug 8, 2025
Fix an annoying flaky test
#40000 merged Aug 8, 2025
Higgs modules_to_not_convert standardization
#39989 merged Aug 8, 2025
Fix broken image inference for Fuyu model
#39915 merged Aug 8, 2025
pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI
#40013 merged Aug 7, 2025
Update expected output values after #39885 (part 2)
#40015 merged Aug 7, 2025
Raising error when quantizing a quantized model
#39998 merged Aug 7, 2025
docs: fix duplication in 'en/optimizers.md'
#40014 merged Aug 7, 2025
unpin torch<2.8 on circleci
#40012 merged Aug 7, 2025
FA2 can continue generation from cache
#39843 merged Aug 7, 2025
Fix default values of getenv
#39867 merged Aug 7, 2025
Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips
#39965 merged Aug 7, 2025
fix: remove CHAT_TEMPLATE import in tests for deepseek-vl
#40003 merged Aug 7, 2025
Fix missing video inputs for PerceptionLM.
#39971 merged Aug 7, 2025
Fix int4 quantized model cannot work with cpu
#39724 merged Aug 7, 2025
Update expected output values after #39885 (part 1)
#39990 merged Aug 7, 2025
Fix consistency
#39995 merged Aug 7, 2025
Fix return typehint for decoder and annotate inv_freq
#39610 merged Aug 7, 2025
Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu
#39967 merged Aug 7, 2025
Fix gemma3n feature extractor's incorrect squeeze
#39919 merged Aug 7, 2025
[Idefics] fix device mismatch
#39981 merged Aug 7, 2025
Various test fixes for AMD
#39978 merged Aug 7, 2025
Support input_embeds in torch exportable decoders
#39836 merged Aug 7, 2025
[superglue] Fixed the way batch mask was applied to the scores before match assignment computation
#39968 merged Aug 7, 2025
Gemma3 fixes
#39960 merged Aug 7, 2025
Modular fix: remove the model name in find_file_type
#39897 merged Aug 6, 2025
chore: update Deformable_Detr model card
#39902 merged Aug 6, 2025
[bugfix] fix flash_attention_2 unavailable error on Ascend NPU
#39844 merged Aug 6, 2025
Fix fix_and_overwrite mode of utils/check_docstring.py
#39369 merged Aug 6, 2025
remove triton_kernels dep with kernels instead
#39926 merged Aug 6, 2025
fix glm4v image process
#39964 merged Aug 6, 2025
fix typo
#39936 merged Aug 6, 2025
Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts
#39959 merged Aug 6, 2025
docs: fix typo in 'quantization-aware training'
#39904 merged Aug 6, 2025
Enable gpt-oss mxfp4 on older hardware (sm75+)
#39940 merged Aug 6, 2025
Fix MXFP4 quantizer validation to allow CPU inference with dequantize option
#39953 merged Aug 6, 2025
[docs] ko toc fix
#39927 merged Aug 6, 2025
circleci: pin torch 2.7.1 until torchcodec is updated
#39951 merged Aug 6, 2025
Fix CI: Tests failing on CPU due to torch.device('cpu').index being None
#39933 merged Aug 6, 2025
Avoid utils/check_bad_commit.py failing due to rate limit (requesting api.github.com)
#39918 merged Aug 5, 2025
[CI] post-GptOss fixes for green CI
#39929 merged Aug 5, 2025
gpt_oss last chat template changes
#39925 merged Aug 5, 2025
Add GPT OSS model from OpenAI
#39923 merged Aug 5, 2025
🌐 [i18n-KO] Translated cache_explanation.md to Korean
#39535 merged Aug 5, 2025
Export SmolvLM
#39614 merged Aug 5, 2025
Update object_detection.md
#39909 merged Aug 5, 2025
run model debugging with forward arg
#39905 merged Aug 5, 2025
Revert "remove dtensors, not explicit (#39840)"
#39912 merged Aug 5, 2025
Fix aria tests
#39879 merged Aug 5, 2025
Fix eval thread fork bomb
#39717 merged Aug 5, 2025
Replace video_fps with fps in tests
#39898 merged Aug 5, 2025
Fix misleading WandB error when WANDB_DISABLED is set
#39891 merged Aug 5, 2025
Avoid aliasing in cond's branches for torch 2.8
#39488 merged Aug 5, 2025
Remove unnecessary CUDA sync in qwen2_5_vl
#39870 merged Aug 5, 2025
fix test_working_of_tp failure of accelerate ut
#39828 merged Aug 5, 2025
[Exaone4] Fixes the attn implementation!
#39906 merged Aug 5, 2025
Reorder serving docs
#39634 merged Aug 5, 2025
chore: update DETR model card
#39822 merged Aug 4, 2025
Add support for ModernBertForMultipleChoice
#39232 merged Aug 4, 2025
send some feedback when manually building doc via comment
#39889 merged Aug 4, 2025
Update cohere2 vision test
#39888 merged Aug 4, 2025
[DOCS] : Improved mimi model card
#39824 merged Aug 4, 2025
Fix link to models in README
#39880 merged Aug 4, 2025
Better return type hint for AutoModelForCausalLM and AutoModelForImageTextToText
#39881 merged Aug 4, 2025
Set torch.backends.cudnn.allow_tf32 = False for CI
#39885 merged Aug 4, 2025
Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor
#39858 merged Aug 4, 2025
Rework add-new-model-like with modular and make test filenames coherent
#39612 merged Aug 4, 2025
Fix quant docker for fp-quant
#39641 merged Aug 4, 2025
Fix attn_implementation setter for models with backbone_config
#39855 merged Aug 4, 2025
Add support for including in-memory videos (not just files/urls) in apply_chat_template
#39494 merged Aug 4, 2025
Use comment to build doc on PRs
#39846 merged Aug 4, 2025
Refactor label name handling for PEFT models in Trainer class
#39265 merged Aug 4, 2025
Improve is_wandb_available function to verify WandB installation
#39875 merged Aug 4, 2025
remove dtensors, not explicit
#39840 merged Aug 1, 2025
Allow TrackioCallback to work when pynvml is not installed
#39851 merged Aug 1, 2025
fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models
#39830 merged Aug 1, 2025
Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid
#39739 merged Aug 1, 2025
Fix responses add tests
#39848 merged Aug 1, 2025
Update ux cb
#39845 merged Aug 1, 2025
[WIP] Add MM Grounding DINO
#37925 merged Aug 1, 2025
Export private symbols
#39729 merged Aug 1, 2025
[attn_implementation] remove recursive, allows custom kernels with wrappers
#39823 merged Aug 1, 2025
[VLMs] split out "get placeholder mask" to helper
#39777 merged Aug 1, 2025
Fix tp cb
#39838 merged Aug 1, 2025
Fix bad markdown links
#39819 merged Jul 31, 2025
Fix broken links
#39809 merged Jul 31, 2025
[cohere2 vision] move doc to multimodal section
#39820 merged Jul 31, 2025
Update documentation for Cohere2Vision models
#39817 merged Jul 31, 2025
[Model] Cohere2 Vision
#39810 merged Jul 31, 2025
[docs] fix korean docs yet again
#39813 merged Jul 31, 2025
feat(tokenization): add encode_message to tokenize messages one by one
#39507 merged Jul 31, 2025
fix: providing a tensor to cache_position in model.generate kwargs always crashes because of boolean test
#39300 merged Jul 30, 2025
Add callback to monitor progress in whisper transcription
#37483 merged Jul 30, 2025
Update mT5 model card
#39702 merged Jul 30, 2025
chore: update cohere2 (Command R7B) model card
#39604 merged Jul 30, 2025
standardized BARThez model card
#39701 merged Jul 30, 2025
Fix re-compilations for cross attention cache
#39788 merged Jul 30, 2025
Simplify conditional code
#39781 merged Jul 30, 2025
Fix an invalid condition
#39762 merged Jul 30, 2025
fix chameleonvision UT failure
#39646 merged Jul 30, 2025
Super tiny update
#39727 merged Jul 30, 2025
more info in model_results.json
#39783 merged Jul 30, 2025
[ASR pipline] fix with datasets 4.0
#39504 merged Jul 30, 2025
enable static cache on vision encoder decoder
#39773 merged Jul 30, 2025
Fix Evolla and xLSTM tests
#39769 merged Jul 30, 2025
Don't set run_name when none
#39695 merged Jul 30, 2025
Standardize CLAP model card format
#39738 merged Jul 29, 2025
docs: Update EfficientLoFTR documentation
#39620 merged Jul 29, 2025
Fix OmDet test after arg deprecation
#39766 merged Jul 29, 2025
Remove python3.7 reference from doc link
#39706 merged Jul 29, 2025
[docs] Ko doc fixes after toc update
#39660 merged Jul 29, 2025
Fix Cache.max_cache_len max value for Hybrid models
#39737 merged Jul 29, 2025
fix(trainer): Correct loss scaling for incomplete gradient accumulation steps
#39659 merged Jul 29, 2025
🌐 [i18n-KO] Translated how_to_hack_models.md to Korean
#39536 merged Jul 29, 2025
🌐 [i18n-KO] Translated perf_train_gpu_one.md to Korean
#39552 merged Jul 29, 2025
🌐 [i18n-KO] Translated pipeline_gradio.md to Korean
#39520 merged Jul 29, 2025
🌐 [i18n-KO] Translated tokenizer.md to Korean
#39532 merged Jul 29, 2025
🌐 [i18n-KO] Translated tvp.md to Korean
#39578 merged Jul 29, 2025
🌐 [i18n-KO] Translated albert.md to Korean
#39524 merged Jul 29, 2025
🌐 [i18n-KO] Translated main_classes/peft.md
#39515 merged Jul 29, 2025
[modenbert] fix regression
#39750 merged Jul 29, 2025
add libcst to extras["testing"] in setup.py
#39761 merged Jul 29, 2025
Fix version issue in modeling_utils.py
#39759 merged Jul 29, 2025
Enable xpu allocator on caching_allocator_warmup
#39654 merged Jul 29, 2025
Support loading Qwen3 MoE GGUF
#39638 merged Jul 29, 2025
Fix GPT2 with cross attention
#39754 merged Jul 29, 2025
Avoid OOM when other tests are failing
#39758 merged Jul 29, 2025
AMD disable torchcodec
#39757 merged Jul 29, 2025
Use --gpus all in workflow files
#39752 merged Jul 29, 2025
Apply several ruff SIM rules
#37283 merged Jul 29, 2025
Fix mamba regression
#39728 merged Jul 29, 2025
Update IMPORTANT_MODELS list
#39734 merged Jul 29, 2025
update GemmaIntegrationTest::test_model_2b_bf16_dola again
#39731 merged Jul 29, 2025
Fix: add back base model plan
#39733 merged Jul 29, 2025
[Fix] import two missing typos in models/__init__.py for typo checking
#39745 merged Jul 29, 2025
fix cache inheritance
#39748 merged Jul 29, 2025
extend more trainer test cases to XPU, all pass
#39652 merged Jul 29, 2025
BLIPs clean-up
#35560 merged Jul 29, 2025
Add Fast Segformer Processor
#37024 merged Jul 28, 2025
Superpoint fast image processor
#37804 merged Jul 28, 2025
Fix AMD dockerfile for audio models
#39669 merged Jul 28, 2025
Fix cache-related tests
#39676 merged Jul 28, 2025
Fix Layer device placement in Caches
#39732 merged Jul 28, 2025
Fix Qwen2AudioForConditionalGeneration.forward() and test_flash_attn_kernels_inference_equivalence
#39503 merged Jul 28, 2025
skip Glm4MoeModelTest::test_torch_compile_for_training
#39670 merged Jul 28, 2025
Update QAPipelineTests::test_large_model_course after #39193
#39666 merged Jul 28, 2025
mllama outputs refactor
#39643 merged Jul 28, 2025
Remove all expired deprecation cycles
#39725 merged Jul 28, 2025
[CI] Add Eric to comment slow ci
#39601 merged Jul 28, 2025
PATCH: add back n-dim device-mesh + fix tp trainer saving
#39693 merged Jul 28, 2025
Add self-hosted runner scale set workflow for mi325 CI
#39651 merged Jul 28, 2025
[configuration] remove redundant classmethod
#38812 merged Jul 28, 2025
update ernie model card
#39657 merged Jul 28, 2025
[processors] add tests for helper fn
#39629 merged Jul 28, 2025
xpu optimization for generation case
#39573 merged Jul 28, 2025
fix(tokenization): check token.content for trie
#39587 merged Jul 28, 2025
Fix missing initialization of FastSpeech2Conformer
#39689 merged Jul 28, 2025
fix missing model._tp_size from ep refactor
#39688 merged Jul 26, 2025
More robust tied weight test
#39681 merged Jul 25, 2025
Add padding-free to Granite hybrid moe models
#39677 merged Jul 25, 2025
Fix tied weight test
#39680 merged Jul 25, 2025
fix break for ckpt without _tp_plan
#39658 merged Jul 25, 2025
Add EXAONE 4.0 model
#39129 merged Jul 25, 2025
Support typing.Literal as type of tool parameters or return value
#39633 merged Jul 25, 2025
Add ep
#39501 merged Jul 25, 2025
bad_words_ids no longer slow on mps
#39556 merged Jul 25, 2025
Add xlstm model
#39665 merged Jul 25, 2025
Use auto_docstring for perception_lm fast image processor
#39679 merged Jul 25, 2025
fix: HWIO to OIHW
#39200 merged Jul 25, 2025
Fix auto_docstring crashing when dependencies are missing
#39564 merged Jul 25, 2025
Add support for DeepseekAI's DeepseekVL
#36248 merged Jul 25, 2025
Add missing flag for CacheLayer
#39678 merged Jul 25, 2025
Add evolla rebase main
#36232 merged Jul 25, 2025
update expected outputs for whisper after #38778
#39304 merged Jul 25, 2025
fix kyutai tests
#39416 merged Jul 25, 2025
Fixes the BC
#39636 merged Jul 25, 2025
Delete bad rebasing functions
#39672 merged Jul 25, 2025
[Ernie 4.5] Post merge adaptations
#39664 merged Jul 25, 2025
[CI] revert device in test_export_static_cache
#39662 merged Jul 25, 2025
Fix ModernBERT Decoder model
#39671 merged Jul 25, 2025
🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor
#39591 merged Jul 25, 2025
Rename huggingface_cli to hf
#39630 merged Jul 25, 2025
fix(voxtral): correct typo in apply_transcription_request
#39572 merged Jul 25, 2025
make fixup
#39661 merged Jul 25, 2025
[docs] fix ko cache docs
#39644 merged Jul 25, 2025
Make pytorch examples UV-compatible
#39635 merged Jul 25, 2025
revert change to cu_seqlen_k and max_k when preparing from position_ids
#39653 merged Jul 25, 2025
Fix: explicit not none check for tensors in flash attention
#39639 merged Jul 25, 2025
[attention] fix test for packed padfree masking
#39582 merged Jul 25, 2025
Add owlv2 fast processor
#39041 merged Jul 25, 2025
revert behavior of _prepare_from_posids
#39622 merged Jul 24, 2025
[Voxtral] values for A10 runners
#39605 merged Jul 24, 2025
[timm] new timm pin
#39640 merged Jul 24, 2025
Fix EfficientLoFTR model id in tests
#39621 merged Jul 24, 2025
Update recent processors for vLLM backend
#39583 merged Jul 24, 2025
[Docs] Translate audio_classification.md from English to Spanish
#39513 merged Jul 23, 2025
standardized YOLOS model card according to template in #36979
#39528 merged Jul 23, 2025
Feature/standardize opt model card
#39568 merged Jul 23, 2025
🔴 Fix EnCodec internals and integration tests
#39431 merged Jul 23, 2025
Fix DAC integration tests and checkpoint conversion.
#39313 merged Jul 23, 2025
Move openai import
#39613 merged Jul 23, 2025
Transformers serve VLM
#39454 merged Jul 23, 2025
Fix important models CI
#39576 merged Jul 23, 2025
Fix typos and grammar issues in documentation and code
#39598 merged Jul 23, 2025
Allow device_mesh have multiple dim
#38949 merged Jul 23, 2025
enable triton backend on awq xpu
#39443 merged Jul 23, 2025
[idefics3] fix for vLLM
#39470 merged Jul 23, 2025
fix moe routing_weights
#39581 merged Jul 23, 2025
FP-Quant support
#38696 merged Jul 23, 2025
Rename supports_static_cache to can_compile_fullgraph
#39505 merged Jul 23, 2025
[Trackio] Allow single-gpu training and monitor power
#39595 merged Jul 23, 2025
Generic task-specific base classes
#39584 merged Jul 23, 2025
Fix DynamicCache and simplify Cache classes a bit
#39590 merged Jul 23, 2025
Mask2former & Maskformer Fast Image Processor
#35685 merged Jul 23, 2025
🎯 Trackio integration
#38814 merged Jul 22, 2025
[WIP] Add OneformerFastImageProcessor
#38343 merged Jul 22, 2025
Fix link in "Inference server backends" doc
#39589 merged Jul 22, 2025
Torchdec RuntimeError catch
#39580 merged Jul 22, 2025
[Paged-Attention] Handle continuous batching for repetition penalty
#39457 merged Jul 22, 2025
updated mistral3 model card
#39531 merged Jul 22, 2025
Update docs/source/ko/_toctree.yml
#39516 merged Jul 22, 2025
[cache refactor] Move all the caching logic to a per-layer approach
#39106 merged Jul 22, 2025
General weight initialization scheme
#39579 merged Jul 22, 2025
Add AMD GPU expectations for LLaVA tests
#39486 merged Jul 22, 2025
Kernels flash attn
#39474 merged Jul 22, 2025
Add AMD expectations to Mistral3 tests
#39481 merged Jul 22, 2025
[docs] Create page on inference servers with transformers backend
#39550 merged Jul 22, 2025
[docs] update attention implementation and cache docs
#39547 merged Jul 22, 2025
Add AMD test expectations to DETR model
#39539 merged Jul 22, 2025
feat: add support for gradient checkpointing for TimmWrapperModel and TimmWrapperForImageClassification
#39287 merged Jul 22, 2025
Fixes needed for n-d parallelism and TP
#39562 merged Jul 22, 2025
Bump AMD container for 2.7.1 PyTorch
#39458 merged Jul 22, 2025
Add EfficientLoFTR model
#36355 merged Jul 22, 2025
[gemma3] fix bidirectional image mask
#39396 merged Jul 22, 2025
Update OLMoE model card
#39344 merged Jul 21, 2025
Update modernbertdecoder docs
#39453 merged Jul 21, 2025
[CI] Fix post merge ernie 4.5
#39561 merged Jul 21, 2025
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps)
#39489 merged Jul 21, 2025
[Ernie 4.5] Add ernie text models
#39228 merged Jul 21, 2025
Refactor embedding input/output getter/setter
#39339 merged Jul 21, 2025
🌐 [i18n-KO] Translated perf_infer_gpu_multi.md to Korean
#39441 merged Jul 21, 2025
[Fast image processor] refactor fast image processor glm4v
#39490 merged Jul 21, 2025
fix ndim check of device_mesh for TP
#39538 merged Jul 21, 2025
Refactor MambaCache to modeling_mamba.py
#38086 merged Jul 21, 2025
Fix Docstring of BarkProcessor
#39546 merged Jul 21, 2025
use the enable_gqa param in torch.nn.functional.scaled_dot_product_at…
#39412 merged Jul 21, 2025
Fix missing initializations for models created in 2023
#39239 merged Jul 21, 2025
Raise TypeError instead of ValueError for invalid types
#38660 merged Jul 21, 2025
Fix pylint warnings
#39477 merged Jul 21, 2025
Fix Qwen Omni integration test
#39553 merged Jul 21, 2025
🚨🚨🚨 [Trainer] Enable average_tokens_across_devices by default in TrainingArguments
#39395 merged Jul 21, 2025
Rename _supports_flash_attn_2 in examples and tests
#39471 merged Jul 21, 2025
Fix the check in flex test
#39548 merged Jul 21, 2025
Fix bad tensor shape in failing Hubert test.
#39502 merged Jul 21, 2025
GLM-4 Update
#39393 merged Jul 21, 2025
[qwen2 vl] fix packing with all attentions
#39447 merged Jul 21, 2025
[gemma3] support sequence classification task
#39465 merged Jul 21, 2025
Fix placeholders replacement logic in auto_docstring
#39433 merged Jul 18, 2025
Update SAM/SAM HQ attention implementation + fix Cuda sync issues
#39386 merged Jul 18, 2025
Improve @auto_docstring doc and rename args_doc.py to auto_docstring.py
#39439 merged Jul 18, 2025
Add fast image processor SAM
#39385 merged Jul 18, 2025
Fix BatchEncoding.to() for nested elements
#38985 merged Jul 18, 2025
[gemma3] Fix do_convert_rgb in image processors.
#39438 merged Jul 18, 2025
[chat template] return assistant mask in processors
#38545 merged Jul 18, 2025
[dependencies] Update datasets pin
#39500 merged Jul 18, 2025
Slack CI bot: set default result for non-existing artifacts
#39499 merged Jul 18, 2025
🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling
#39423 merged Jul 18, 2025
[doc builder job] temporary pyarrow pin
#39496 merged Jul 18, 2025
Add voxtral
#39429 merged Jul 18, 2025
Fix typing order
#39467 merged Jul 17, 2025
Add unified logits_to_keep support to LLMClass
#39472 merged Jul 17, 2025
[serve] Add speech to text (/v1/audio/transcriptions)
#39434 merged Jul 17, 2025
Update integration_utils.py
#39469 merged Jul 17, 2025
fix: ImageTextToTextPipeline handles user-defined generation_config
#39374 merged Jul 17, 2025
Enable some ruff checks for performance and readability
#39383 merged Jul 17, 2025
Fix convert_and_export_with_cache failures for GPU models
#38976 merged Jul 17, 2025
Update GemmaIntegrationTest::test_model_2b_bf16_dola
#39362 merged Jul 17, 2025
fix a comment typo in utils.py
#39459 merged Jul 17, 2025
Use newer typing notation
#38934 merged Jul 17, 2025
Fix tests due to breaking change in accelerate
#39451 merged Jul 17, 2025
fix max_length calculating using cu_seq_lens
#39341 merged Jul 17, 2025
fix(pipelines): QA pipeline returns fewer than top_k results in batch mode
#39193 merged Jul 17, 2025
Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings
#38822 merged Jul 16, 2025
create ijepa modelcard (ref : PR #36979 ).
#39354 merged Jul 16, 2025
Improve grammar and clarity in perf_hardware.md
#39428 merged Jul 16, 2025
fix cached file error when repo type is dataset
#36909 merged Jul 16, 2025
Fix indentation bug in SmolVLM image processor causing KeyError
#39452 merged Jul 16, 2025
Updated Megatron conversion script for gpt2 checkpoints
#38969 merged Jul 16, 2025
[CI] Fix partially red CI
#39448 merged Jul 16, 2025
Fixes #39204: add fallback if get_base_model missing
#39226 merged Jul 16, 2025
make the loss context manager easier to extend
#39321 merged Jul 16, 2025
Remove something that should have never been there
#38254 merged Jul 16, 2025
Fix processor tests
#39450 merged Jul 16, 2025
[Bugfix] [Quantization] Remove unused init arg
#39324 merged Jul 16, 2025
Better typing for model.config
#39132 merged Jul 16, 2025
Fix typo in generation configuration for Janus model weight conversion
#39432 merged Jul 16, 2025
Responses API in transformers serve
#39155 merged Jul 16, 2025
[cache] make all classes cache compatible finally
#38635 merged Jul 16, 2025
docs: add missing numpy import to minimal example
#39444 merged Jul 16, 2025
Remove runtime conditions for type checking
#37340 merged Jul 16, 2025
Add StableAdamW Optimizer
#39446 merged Jul 16, 2025
add test scanner
#39419 merged Jul 16, 2025
Fix missing definition of diff_file_url in notification service
#39445 merged Jul 16, 2025
Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer
#31870 merged Jul 16, 2025
Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor
#39372 merged Jul 16, 2025
Defaults to adamw_torch_fused for Pytorch>=2.8
#37358 merged Jul 16, 2025
Fix L270 - hasattr("moe_args") returning False error
#38715 merged Jul 16, 2025
[chat template] add a testcase for kwargs
#39415 merged Jul 16, 2025
Fixed a bug calculating cross entropy loss in JetMoeForCausalLM
#37830 merged Jul 16, 2025
Remove double soft-max in load-balancing loss. Fixes #39055 .
#39056 merged Jul 16, 2025
[Core] [Offloading] Fix saving offloaded submodules
#39280 merged Jul 16, 2025
[autodocstring] add video and audio inputs
#39420 merged Jul 16, 2025
Responses API (to be merged into #39155)
#39338 merged Jul 16, 2025
CI workflow for performed test regressions
#39198 merged Jul 16, 2025
docs: update LightGlue docs
#39407 merged Jul 15, 2025
docs: update SuperGlue docs
#39406 merged Jul 15, 2025
[vlm] fix loading of retrieval VLMs
#39242 merged Jul 15, 2025
handle training summary when creating modelcard but offline mode is set
#37095 merged Jul 15, 2025
Remove residual quantization attribute from dequantized models
#39373 merged Jul 15, 2025
Remove deprecated audio utils functions
#39330 merged Jul 15, 2025
Fix bugs in pytorch example run_clm when streaming is enabled
#39286 merged Jul 15, 2025
Fix bugs from pipeline preprocessor overhaul
#39425 merged Jul 15, 2025
refactor: remove set_tracer_provider and set_meter_provider calls
#39422 merged Jul 15, 2025
Fix invalid property
#39384 merged Jul 15, 2025
set document_question_answering pipeline _load_tokenizer to True
#39411 merged Jul 15, 2025
Ignore extra position embeddings weights for ESM
#39063 merged Jul 15, 2025
support loading qwen3 gguf
#38645 merged Jul 15, 2025
Add ModernBERT Decoder Models - ModernBERT, but trained with CLM!
#38967 merged Jul 15, 2025
Fix typo in /v1/models output payload
#39414 merged Jul 15, 2025
[refactor] set attention implementation
#38974 merged Jul 15, 2025
Fix/siglip2 pooling comment
#39378 merged Jul 14, 2025
Update phi4_multimodal.md
#38830 merged Jul 14, 2025
[Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
#39391 merged Jul 14, 2025
Use np.pad instead of np.lib.pad.
#39346 merged Jul 14, 2025
🚨 Totally rewrite how pipelines load preprocessors
#38947 merged Jul 14, 2025
Remove do_reduce_labels Argument from model initialization in run_semantic_segmentation_no_trainer
#39322 merged Jul 14, 2025
Fix Lfm2 and common tests
#39398 merged Jul 14, 2025
Deprecate AutoModelForVision2Seq
#38900 merged Jul 14, 2025
[Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor
#39333 merged Jul 14, 2025
[BLIP] remove cache from Qformer
#39335 merged Jul 14, 2025
[shieldgemma] fix checkpoint loading
#39348 merged Jul 14, 2025
Fix overriding Fast Image/Video Processors instance attributes affect other instances
#39363 merged Jul 12, 2025
update docker file to use latest timm (for perception_lm)
#39380 merged Jul 12, 2025

190 Pull requests opened by 128 people

Add Apertus
#39381 opened Jul 12, 2025
Fix: Docker Build Vulnerable to Malicious Package Installation Attack in docker/custom-tokenizers.dockerfile
#39394 opened Jul 14, 2025
No repeat kv
#39402 opened Jul 14, 2025
Add Vocos model
#39403 opened Jul 14, 2025
Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs
#39435 opened Jul 15, 2025
Fix logger warnings in Gemma model test files
#39449 opened Jul 16, 2025
Add eurobert
#39455 opened Jul 16, 2025
Fix quantized model initialization for int8 dtypes
#39456 opened Jul 16, 2025
Skipping `initialize_weights` when model is quantized
#39464 opened Jul 17, 2025
README: Update Bert Japanese model card
#39466 opened Jul 17, 2025
Fix quantized model dispatch with device_map='auto'
#39468 opened Jul 17, 2025
Fix Bark failing tests
#39478 opened Jul 17, 2025
Add model arcinstitute state
#39480 opened Jul 17, 2025
Bye bye env vars, keep everything as configs
#39483 opened Jul 17, 2025
Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling
#39485 opened Jul 17, 2025
Update CTRL model card with improved usage examples and documentation notes
#39487 opened Jul 17, 2025
Fix: Skip weight initialization for quantized int8 models
#39491 opened Jul 17, 2025
[Voxtral] nit + pin correct mistral common version
#39493 opened Jul 18, 2025
Make sure Moshi is exportable with static cache
#39506 opened Jul 18, 2025
[WIP] :broom: :broom: :broom: Get set decoder cleanup
#39509 opened Jul 18, 2025
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 opened Jul 19, 2025
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 opened Jul 19, 2025
🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean
#39519 opened Jul 19, 2025
build: Add fast image processor tvp
#39529 opened Jul 20, 2025
Add Beit3 model
#39534 opened Jul 20, 2025
Add Muon optimizer implementation and integration
#39541 opened Jul 20, 2025
🌐 [i18n-KO] Translated feature_extractors.md to Korea
#39544 opened Jul 21, 2025
[WIP] try to relax the tie_weights method
#39555 opened Jul 21, 2025
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 opened Jul 21, 2025
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 opened Jul 21, 2025
fix load_model_end = true work when save_steps < eval_steps
#39560 opened Jul 21, 2025
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 opened Jul 21, 2025
🌐 [i18n-KO] Translated `auto_docstring.md` to Korean
#39571 opened Jul 22, 2025
feat(autoformer): Improve ValueError for insufficient sequence length
#39574 opened Jul 22, 2025
🌐 [i18n-KO] Translated `vitpose.md` to Korean
#39575 opened Jul 22, 2025
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 opened Jul 22, 2025
[`Ernie 4.5`] Ernie VL models
#39585 opened Jul 22, 2025
WIP, reference modeling
#39588 opened Jul 22, 2025
Add Fast Image Processor for ImageGPT
#39592 opened Jul 22, 2025
🌐 [i18n-KO] Translated 'xclip.md' to Korean
#39594 opened Jul 22, 2025
Fix: check TrainerState file exists before loading during resume
#39599 opened Jul 23, 2025
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 opened Jul 23, 2025
feat: add `is_fast` to ImageProcessor
#39603 opened Jul 23, 2025
HunYuan opensource
#39606 opened Jul 23, 2025
Chat schemas
#39609 opened Jul 23, 2025
Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39617 opened Jul 23, 2025
fix tensor device when loading state dict
#39623 opened Jul 24, 2025
Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI
#39625 opened Jul 24, 2025
[serve] Add speech-to-text
#39631 opened Jul 24, 2025
fix dead NVIDIA link
#39632 opened Jul 24, 2025
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 opened Jul 24, 2025
Fix loss scaling and token aggregation to use only data parallel group
#39674 opened Jul 25, 2025
[BugFix]: Support dict and config file path for deepspeed
#39675 opened Jul 25, 2025
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 opened Jul 25, 2025
Allow custom hf_quantizer in from_pretrained
#39690 opened Jul 26, 2025
fix misspelled issues
#39691 opened Jul 26, 2025
use untyped storage for dtensors due to deprecation
#39697 opened Jul 26, 2025
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"
#39698 opened Jul 26, 2025
Fix Causality Handling in Flash Attention to Support Bidirectional Attention
#39707 opened Jul 27, 2025
🌐[i18n-bn] Introduce Bengali version of Transformers documentation
#39708 opened Jul 27, 2025
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39712 opened Jul 27, 2025
🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean
#39713 opened Jul 27, 2025
🌐 [i18n-KO] Translated `main_classes/backbones.md` to Korean
#39714 opened Jul 27, 2025
Fix SigLIP2 documentation model/processor mismatch
#39718 opened Jul 28, 2025
[Feat] Adding Intern-S1
#39722 opened Jul 28, 2025
handle multimodal models with tp_plan on the text_config
#39735 opened Jul 28, 2025
[Tests] [Bugfix] Make weights tied for `dynamic_tied_weights` test
#39740 opened Jul 28, 2025
Fix HfArgumentParser to filter out dict types from Union
#39741 opened Jul 28, 2025
Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock
#39743 opened Jul 29, 2025
🌐 [i18n-KO] Translated `text-to-speech.md` to Korean
#39751 opened Jul 29, 2025
Fix rope_deltas corruption in Qwen2.5VL during CFG generation
#39756 opened Jul 29, 2025
[Draft] Add Llasa TTS family of models
#39760 opened Jul 29, 2025
Improve Gemma3n model and tests
#39764 opened Jul 29, 2025
Stop using `from_legacy_cache` as Cache initialization
#39765 opened Jul 29, 2025
Benchmarking improvements
#39768 opened Jul 29, 2025
[Bugfix] Fix `AutoModel.from_pretrained(..., quantization_config=None)` regression
#39770 opened Jul 29, 2025
Fix missing initializations for models created in 2022
#39772 opened Jul 30, 2025
Use `dtype` instead of `torch_dtype` everywhere!
#39782 opened Jul 30, 2025
fix mllama integration tests
#39785 opened Jul 30, 2025
Fix pil dependency torch extra
#39790 opened Jul 30, 2025
Served models handle with nested content
#39792 opened Jul 30, 2025
Fix DAC conversion script
#39793 opened Jul 30, 2025
Fix ProphetNet forward to handle tuple encoder_outputs
#39794 opened Jul 30, 2025
[pipelines] text-to-audio pipeline standardization
#39796 opened Jul 30, 2025
Mistral: Add support for interleaved attention
#39799 opened Jul 30, 2025
[WIP] Add EdgeTAM
#39800 opened Jul 30, 2025
fix: qwen 25vl rope if item is masked
#39802 opened Jul 30, 2025
Enable SIM rules
#39806 opened Jul 31, 2025
🌐 [i18n-KO] Translated `bamba.md` to Korean
#39807 opened Jul 31, 2025
🌐 [i18n-KO] Translated `gpt2.md` to Korean
#39808 opened Jul 31, 2025
[chat template] update when "push_to_hub"
#39815 opened Jul 31, 2025
Refactor vit-like models
#39816 opened Jul 31, 2025
Support MetaCLIP 2
#39821 opened Jul 31, 2025
[serve] guard imports
#39825 opened Jul 31, 2025
Add MetaCLIP 2
#39826 opened Jul 31, 2025
[serve] allow array `content` inputs for LLMs
#39829 opened Jul 31, 2025
refactor(modeling_llama): make RotaryEmbedding default path explicit
#39831 opened Jul 31, 2025
add step3v in VLMS
#39837 opened Aug 1, 2025
[WIP] RoPE refactor
#39847 opened Aug 1, 2025
Fix DeepSpeed mixed precision precedence over Accelerate defaults
#39856 opened Aug 1, 2025
WIP: Initial support for bnb 4bit on any nn.Parameter
#39859 opened Aug 1, 2025
🌐 [i18n-KO] Translated grounding-dino.md to Korean
#39861 opened Aug 2, 2025
Update model card for gpt neox japanese
#39862 opened Aug 2, 2025
🌐 [i18n-KO] Translated `chat_extras.md` to Korean
#39863 opened Aug 2, 2025
🌐 [i18n-KO] Translated `gemma3.md` to Korean
#39865 opened Aug 2, 2025
make sure model.save_pretrained has the correct is_main_process
#39866 opened Aug 2, 2025
Update README.md
#39869 opened Aug 3, 2025
fix: Catch correct ConnectionError for additional_chat_templates
#39874 opened Aug 3, 2025
FP-Quant NVFP4 and Python 3.9 support
#39876 opened Aug 3, 2025
Remove deprecated max_size parameter from ConditionalDetrImageProcessor
#39883 opened Aug 4, 2025
🌐 [i18n-KO] Translated `perf_train_gaudi.md` to Korean
#39886 opened Aug 4, 2025
🌐 [i18n-KO] Translated `jamba.md` to Korean
#39890 opened Aug 4, 2025
[docs] Add reference to HF-maintained `custom_generate` collections
#39894 opened Aug 4, 2025
Add Videoprism
#39895 opened Aug 4, 2025
[model] Support MiniCPM-V 4.0
#39899 opened Aug 5, 2025
🌐 [i18n-KO] Translated `fp_quant` to Korean
#39901 opened Aug 5, 2025
🌐 [i18n-KO] Translated clipseg.md to Korean
#39903 opened Aug 5, 2025
Update dynamic attnt setter for multimodals
#39908 opened Aug 5, 2025
🌐 [i18n-KO] Translated `tiny_agents.md` to Korean
#39913 opened Aug 5, 2025
🌐 [i18n-KO] Updated ko/perf_train_cpu.md
#39917 opened Aug 5, 2025
🌐 [i18n-KO] Updated ko/perf_train_special.md
#39920 opened Aug 5, 2025
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39922 opened Aug 5, 2025
Add chat template tests
#39924 opened Aug 5, 2025
Fix hidden torchvision>=0.15 dependency issue
#39928 opened Aug 5, 2025
Add missing special token properties to MistralCommonTokenizer
#39930 opened Aug 5, 2025
Registers StaticCache serialization functions for torch.export.export
#39931 opened Aug 5, 2025
Fix whisper `return_language` with `return_timestamp=word`
#39938 opened Aug 5, 2025
fixing image_utils.py todo
#39941 opened Aug 6, 2025
fix llama issue
#39942 opened Aug 6, 2025
Add back `_tp_plan` attribute
#39944 opened Aug 6, 2025
Add pytest marker: `torch_compile_test` and `torch_export_test`
#39950 opened Aug 6, 2025
Use torch._check instead of a test to make the model Gemma3 exportable
#39962 opened Aug 6, 2025
Add Keypoint Matcher pipeline
#39970 opened Aug 6, 2025
Causal loss for `ForConditionalGeneration`
#39973 opened Aug 7, 2025
[bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM
#39975 opened Aug 7, 2025
Fix Qwen3 MoE GGUF architecture mismatch
#39976 opened Aug 7, 2025
Fix cross-attention masking before residual connection
#39979 opened Aug 7, 2025
Fix setting attention for multimodal models
#39984 opened Aug 7, 2025
Add a VGGT(Visual Geometry Grounded Transformer) model compatible with huggingface transfromers
#39987 opened Aug 7, 2025
Update Glm4V processor and add tests
#39988 opened Aug 7, 2025
Default to dequantize if cpu in device_map for mxfp4
#39993 opened Aug 7, 2025
chore: Add type hints to import_utils.py module
#39994 opened Aug 7, 2025
make sure position_ids are passed in for causal mask creation for gpt-oss
#39997 opened Aug 7, 2025
allow TP to work in ND-parallel with fsdp cpu ram efficient loading
#39999 opened Aug 7, 2025
[`Flash Attention`] Fix flash attention integration
#40002 opened Aug 7, 2025
Fix PerceptionLM image preprocessing for non-tiled image input.
#40006 opened Aug 7, 2025
🚨 Use lru_cache for sine pos embeddings MaskFormer
#40007 opened Aug 7, 2025
Fixes for EncoderDecoderCache
#40008 opened Aug 7, 2025
🌐 [i18n-KO] Translated `optimizers.md` to Korean
#40011 opened Aug 7, 2025
[WIP] Fix naive for loops for MoE models resulting in sub 20% downstream MFU for training with trl, e.t.c (Qwen3, Deepseek V3, Ernie 4.5, GLM 4.5, Dots1)
#40016 opened Aug 7, 2025
Feat/add gpt oss sequence classification
#40019 opened Aug 8, 2025
[fix] batch inference for llava_onevision
#40021 opened Aug 8, 2025
fix: resolve dropout type error in DogeDecoder
#40022 opened Aug 8, 2025
Add support for SDPA for OWLViT and OWLv2
#40023 opened Aug 8, 2025
Add amd runners to run-slow command
#40027 opened Aug 8, 2025
Revert FA2 kwargs construction
#40029 opened Aug 8, 2025
Update boxes expectations for OWLViT test
#40030 opened Aug 8, 2025
Add model card for MobileViT
#40033 opened Aug 8, 2025
Fix error on importing unavailable torch.distributed
#40038 opened Aug 8, 2025
New DynamicSlidingWindowLayer & associated Cache
#40039 opened Aug 8, 2025
Add GptOssForSequenceClassification for GPT-OSS models
#40043 opened Aug 8, 2025
(small) fix conditional for input_ids and input_embeds in marian
#40045 opened Aug 8, 2025
Update wavlm.md to match new model card template
#40047 opened Aug 8, 2025
Standardize BARTpho model card: badges, new examples, fixed broken im…
#40051 opened Aug 9, 2025
Auto-log parallelism info to wandb.config using HF Accelerate
#40055 opened Aug 9, 2025
updated visualBERT modelcard
#40057 opened Aug 9, 2025
GGUF Qwen2VL
#40058 opened Aug 9, 2025
Fix Inefficient GELU implementation in GPT2
#40059 opened Aug 9, 2025
Avoid CUDA stream sync
#40060 opened Aug 10, 2025
🌐 [i18n-KO] Translated `vitdet.md` to Korean
#40061 opened Aug 10, 2025
🌐 [i18n-KO] Translated `videomae.md` to Korean
#40064 opened Aug 10, 2025
Delay float32 upcast in ForCausalLMLoss after filtering ignore_index
#40065 opened Aug 10, 2025
Change Qwen2RMSNorm to RMSNorm from PyTorch
#40066 opened Aug 10, 2025
Add missing arguments to class constructors
#40068 opened Aug 10, 2025
Remove _prepare_flash_attention_from_position_ids
#40069 opened Aug 10, 2025
initializing branch and draft PR
#40074 opened Aug 11, 2025
Skipping pytree registration in case fsdp is enabled
#40075 opened Aug 11, 2025
rm pytorch-triton dependency
#40076 opened Aug 11, 2025
Update notification service MI325
#40078 opened Aug 11, 2025
[WIP] Collated reports
#40080 opened Aug 11, 2025
Removes DoLa decoding strategy
#40082 opened Aug 11, 2025
Fix regression in mllama vision encoder
#40083 opened Aug 11, 2025
remove sequence parallel in llama4
#40084 opened Aug 11, 2025
`decoding_method` argument in generate
#40085 opened Aug 11, 2025
build unittest for `ViTImageProcessorFast`
#40086 opened Aug 11, 2025
DOCS: Add missing space in SECURITY.md
#40087 opened Aug 11, 2025
Fix RuntimeError when loading quantized models with int8 weights (#39366)
#40090 opened Aug 11, 2025
Replace `logger.warning` with `logger.warning_once` in `GradientCheckpointingLayer`
#40091 opened Aug 12, 2025
Optimize LlamaAttention by fusing QKV projections
#40092 opened Aug 12, 2025
fix(modeling_utils): correct initialization of missing and mismatched…
#40093 opened Aug 12, 2025

180 Issues closed by 57 people

Whisper v-3 pipeline requiring a lot of memory when setting return_timestamps="word"
#27834 closed Aug 11, 2025
Incorrect scaling of Gemma embeddings in float32 regime
#38702 closed Aug 11, 2025
🐛 Bug Report: Accelerate config to disable torch dynamo is ignored by transformers automatic compilation
#39191 closed Aug 11, 2025
Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39214 closed Aug 11, 2025
[Mistral3] attn_implementation not applied to vision_tower.config in Mistral3Config due to init order
#40062 closed Aug 11, 2025
Instantiating `google/gemma-3-4b-pt` with AutoModelForSequenceClassification Reports Unitialized Model
#39763 closed Aug 11, 2025
`num_beams` > 1 leads to exception for Qwen2.5VL (Qwen family or all VLM models?)
#39723 closed Aug 11, 2025
Triton version check compatibility on windows
#39985 closed Aug 11, 2025
apply_rotary_pos_emb_flashatt failed during triton jit compilation 'constexpr' object has no attribute 'bit_length'
#39167 closed Aug 11, 2025
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 closed Aug 10, 2025
Gemma2 fall back to cpu execusion when attn_implementation='flash_attention_2'
#39188 closed Aug 10, 2025
Previous PRs introduced a bug on Accumulated Gradients Losses
#40052 closed Aug 9, 2025
Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model
#37248 closed Aug 9, 2025
Pretrainedtokenizerfast Segmentation fault
#39099 closed Aug 9, 2025
New release 4.53.0 breaks HF trainer/model
#39111 closed Aug 9, 2025
Gradient accumulation steps for Vision Languge model
#39123 closed Aug 9, 2025
Not capable of exporting Mistral to ONNX format with the use of caching
#39162 closed Aug 9, 2025
Error when loading gguf file
#40040 closed Aug 9, 2025
Weights not tied when loading `from_pretrained` with a wrapped model
#39900 closed Aug 8, 2025
`TypeError: 'builtins.safe_open' object is not iterable` in `load_pytorch_state_dict_in_tf2_model `
#40028 closed Aug 8, 2025
Major issues with transformers version causing rubbish generations with Gemma3 family using vllm
#40017 closed Aug 8, 2025
AttributeError: 'BitsAndBytesConfig' object has no attribute 'get_loading_attributes' with transformers 4.55.0
#39939 closed Aug 8, 2025
Gemma3n get_placeholder_mask issue
#39991 closed Aug 8, 2025
flash-attn cannot perform deterministic computation
#39982 closed Aug 8, 2025
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 closed Aug 8, 2025
ModernBertUnpaddedRotaryEmbedding __init__ error
#39934 closed Aug 7, 2025
video_inputs are not passed to perception_lm
#40004 closed Aug 7, 2025
Flash Attention fails with non aligned position_ids
#39814 closed Aug 7, 2025
`convert_deepseek_vl_weights_to_hf.py` not included in v4.55.0 release.
#39966 closed Aug 7, 2025
[Gemma3N] Audio processing issue
#39911 closed Aug 7, 2025
v4.55.0 Idefics3 RuntimeError Tensors on different devices
#39947 closed Aug 7, 2025
[gpt‑oss] eager_attention_forward not using sliding-window attention for GPT‑OSS models
#39954 closed Aug 7, 2025
Finetune `gpt-oss-20b` with `mxfp4` quantization
#39969 closed Aug 6, 2025
Fix grammatically incorrect variable name "expert_hitted" → "expert_hit" in MoE implementation
#39955 closed Aug 6, 2025
transformers serve doesn't handle OPTIONS http method
#39932 closed Aug 6, 2025
454545
#39864 closed Aug 6, 2025
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#38442 closed Aug 6, 2025
Streaming mode support on HF vs kyutai-labs for the mimi model
#38535 closed Aug 6, 2025
enable GraniteMoeHybridIntegrationTest in UT
#38542 closed Aug 6, 2025
Llama4 inference encounter unsupported op in dynamo ?
#38118 closed Aug 6, 2025
Misleading WandB error when WANDB_DISABLED=True and report_to="wandb" are both set
#39878 closed Aug 5, 2025
pytorch_utils.py > isin_mps_friendly > RuntimeError: Expected elements.dtype() == test_elements.dtype() to be true, but got false.
#37423 closed Aug 5, 2025
Inefficient memory resharding in attention layer
#39072 closed Aug 5, 2025
Inefficient default GELU implementation in GPT2
#39073 closed Aug 5, 2025
facebook/dinov2-with-registers-giant does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
#39075 closed Aug 5, 2025
AttributeError: 'HfTrainerDeepSpeedConfig' object has no attribute 'is_zero3'
#39081 closed Aug 5, 2025
Why `lm-head` weight still exists with `"tie_word_embeddings": true`
#39812 closed Aug 4, 2025
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
#39704 closed Aug 4, 2025
ValueError: Max cache length is not consistent across layers
#39877 closed Aug 4, 2025
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 closed Aug 4, 2025
Exception while inference Qwen2VL and Qwen2VL, assert module.weight.shape[1] == 1
#38665 closed Aug 4, 2025
model.generate custom encoder and decoder outputs/inputs
#39871 closed Aug 3, 2025
Vision Encoder-Decoder fails with LLaMA decoder due to missing cross-attention implementation
#34674 closed Aug 3, 2025
Only with newest version (4.52.4): from_pretrained() esm.embeddings.position_embeddings.weight missing
#39038 closed Aug 3, 2025
pytorch version 1.8.1 compatibility
#39049 closed Aug 3, 2025
TypeError: couldn't find storage object Float8_e4m3fnStorage - which version is needed for this?
#39409 closed Aug 2, 2025
'Mistral3Model' object has no attribute 'prepare_inputs_for_generation'
#39007 closed Aug 2, 2025
Not able to use flash attention with torch.compile with model like BERT
#39017 closed Aug 2, 2025
Tool-Calling Model (ToolACE-2-Llama-3.1-8B) Responds with Irrelevant Tool message on General Question
#39833 closed Aug 1, 2025
Add MM Grounding DINO
#37744 closed Aug 1, 2025
Using Gemma3n with text-only generation requires image dependencies
#39169 closed Aug 1, 2025
Qwen2-VL err
#39818 closed Jul 31, 2025
Option to tokenize messages one after the other
#39417 closed Jul 31, 2025
[rank0]: ValueError: Your setup doesn't support bf16/gpu.
#39716 closed Jul 31, 2025
Blip model got performance regression on compile mode after refactor cache.
#39774 closed Jul 30, 2025
BioGPT Implementation Bug Report
#39776 closed Jul 30, 2025
tokenizer decode decode with timestamp fails for extended vocabulary
#35330 closed Jul 30, 2025
How to streaming output audio of Qwen2.5-omni-7b
#37570 closed Jul 30, 2025
Significant WER Increase with Whisper Chunking Compared to Long-Form Transcription
#38347 closed Jul 30, 2025
Transformers version causing my finetuned model to hallucinate
#38378 closed Jul 30, 2025
`load_balancing_loss_func` doesn't support 4D attention mask
#38910 closed Jul 30, 2025
Max cache length issue with Gemma 3
#39711 closed Jul 29, 2025
Loss is incorrectly scaled in Trainer during the last step with gradient accumulation when the final batch is smaller than accumulation steps.
#38837 closed Jul 29, 2025
ModernBERT has been totally destroyed by PR #38974 and #38838
#39747 closed Jul 29, 2025
Support loading Qwen3 MoE GGUF
#39721 closed Jul 29, 2025
[XPU] Model get OOM when loading models
#39627 closed Jul 29, 2025
encoder decoder model compile failed after refactor cache
#39746 closed Jul 29, 2025
_supports_static_cache disappear
#39744 closed Jul 29, 2025
[exaone4] ZeroDivisionError/TypeError when sliding_window_pattern is None/"LLLG" and _attn_implementation stays None (4.54.0 & main)
#39696 closed Jul 29, 2025
device mismatch error when using `SlidingWindowLayer`.
#39730 closed Jul 28, 2025
AddedToken should check content on `_update`
#39586 closed Jul 28, 2025
Checkpointing broken for classifier training multi-gpu
#38925 closed Jul 28, 2025
vlmm 0.10.0 load baidu/ERNIE-4.5-300B-A47B-Base-PT error
#39719 closed Jul 28, 2025
[i18n-<languageCode>] Translating docs to <عربي>
#38381 closed Jul 27, 2025
Not installable on arm64 due to jaxlib upper bound
#36611 closed Jul 27, 2025
KeyError in Llama-4-Maverick-17B-128E-Instruct-FP8 Inference with Offloading
#38281 closed Jul 27, 2025
ImportError: DLL load failed while importing _safetensors_rust: The specified module could not be found
#38479 closed Jul 27, 2025
Contribute to Transformers on windows natively without WSL
#38601 closed Jul 27, 2025
Error when create ModernBert model with flash attention TypeError: RotaryEmbedding.__init__() got an unexpected keyword argument 'pos_idx_in_fp32'
#38843 closed Jul 27, 2025
Reproducibility Issue of Siglip2 with Blackwell Architecture GPUs (RTX 5090)
#38874 closed Jul 27, 2025
The wrong config parameter found in src/transformers/models/qwen2_5_vl/configuration_qwen2_5_vl.py.
#38889 closed Jul 27, 2025
CRITICAL ISSUE REPORT! GEMMA 3 1B CANNOT RUN!
#39686 closed Jul 26, 2025
text-generation extremely slow with large `bad_words_ids` list
#39512 closed Jul 25, 2025
Does Gemma 3 need positions ids to be 1-indexed explicitly?
#39023 closed Jul 25, 2025
Add Deepseek-VL
#36110 closed Jul 25, 2025
Grammatical error in the "Loading model's" page
#39018 closed Jul 25, 2025
Inference API Returning 404
#39650 closed Jul 25, 2025
Backwards incompatible change in returned hidden states
#39558 closed Jul 25, 2025
Typo in `apply_transcrition_request` method name
#39530 closed Jul 25, 2025
video_auto_processing.py breaks everything
#38846 closed Jul 25, 2025
Should `compute_metrics` only run on the main process when doing DDP?
#38851 closed Jul 25, 2025
VoxtralForConditionalGeneration import error
#39611 closed Jul 24, 2025
`Trainer._save()` May Incorrectly Save Empty Model State (safetensors)
#38686 closed Jul 24, 2025
`gemma-3-1b-it` with `use_cache=True` and `past_key_values` throws `RuntimeError: CUDA error: device-side assert` error
#39593 closed Jul 24, 2025
Wandb isn't logging config in offline mode
#38968 closed Jul 23, 2025
The similarity between image and text in siglip2 is very low
#39597 closed Jul 23, 2025
Does Qwen_2_5_VL support variable length attention computation?
#38007 closed Jul 23, 2025
Have to import cv2 and pop up window frist, or else it stuck forever
#38139 closed Jul 23, 2025
CI skipped failures tracking issue
#38820 closed Jul 23, 2025
"ValueError: Predictions and/or references don't match the expected format." error
#39510 closed Jul 22, 2025
Clarification on Recent Changes to Loss and Gradient Accumulation
#39567 closed Jul 22, 2025
Add EfficientLoFTR model
#36354 closed Jul 22, 2025
Gemma3 bidirectional mask for image tokens isn't reaching attention forward
#39389 closed Jul 22, 2025
Is the new Intel–Weizmann speculative decoding algorithm integrated into Transformers?
#39545 closed Jul 21, 2025
Enabling `average_tokens_across_devices` by default in Trainer
#39392 closed Jul 21, 2025
T5Gemma problem with tokenizer(?)
#39521 closed Jul 21, 2025
Causal mask is not compatible with Qwen2-VL when using padding-free training
#39400 closed Jul 21, 2025
KeyError: 'llava_qwen2'
#39533 closed Jul 21, 2025
Add Gemma 3 For Sequence Classification
#36755 closed Jul 21, 2025
Expected all tensors to be on the same device, but found at least two devices
#37545 closed Jul 21, 2025
DynamicCache results in too many torch recompiles after 4.51
#37908 closed Jul 21, 2025
Confusion about num_labels and problem_type in classification logic 🐛
#38219 closed Jul 21, 2025
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer
#38753 closed Jul 21, 2025
DTensor issues when running Llama4ForConditionalGeneration with tensor parallel.
#38803 closed Jul 21, 2025
Version 4.52.3 leads to error after bundling with pyinstaller
#38402 closed Jul 20, 2025
Issue importing models in jupyter notebooks 'No module named transformers.models.ipynb_checkpoints'
#38726 closed Jul 19, 2025
T5Gemma returning 0 loss for s2s training
#39514 closed Jul 19, 2025
Whisper models appear to be broken with Flash Attention 2
#38662 closed Jul 18, 2025
Speculative Decoding(do_sample=False) get different outputs
#39421 closed Jul 18, 2025
BarkProcessor voice_preset doesn't work
#34634 closed Jul 18, 2025
dataset 4.0.0 , issue with load_dataset loading audio dataset
#39497 closed Jul 18, 2025
Gemma3n don't support chat with history
#39498 closed Jul 18, 2025
modeling_flax_gemma.FlaxGemmaModule failed with incompatible shapes when running with GemmaConfig
#39492 closed Jul 18, 2025
Error for `return_assistant_tokens_mask` in MLLM processor
#38521 closed Jul 18, 2025
`get_video_features` in XCLIPModel always returns `pooled_output`
#38709 closed Jul 18, 2025
`.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
#38717 closed Jul 18, 2025
I can't make sense of this works on Windows but not on Linux AutoModelForCausalLM.from_pretrained
#39461 closed Jul 17, 2025
HfArgumentParser cannot parse `str` for local path
#39462 closed Jul 17, 2025
breaking changes in ESM model classes
#39405 closed Jul 17, 2025
[torch.export] Unhandled FakeTensor Device Propagation for two different devices
#38975 closed Jul 17, 2025
QA pipeline prediction generates wrong response when `top_k` param > 1
#38984 closed Jul 17, 2025
When will transformers 4.51.4 be released?
#37812 closed Jul 17, 2025
[RuntimeError: Expected all tensors to be on the same device, but found at least two devices] when fine-tuning with peft and device_map=auto
#38687 closed Jul 17, 2025
CheckpointLoaderSimple ..... Error while deserializing header: InvalidHeaderDeserialization
#38692 closed Jul 17, 2025
can't torch.export.export tinyllama model
#39463 closed Jul 17, 2025
Missing 4 spaces in SmolVLMImageProcessorFast
#39442 closed Jul 16, 2025
ModernBERT for Sequence Classification - issues with finetuning
#38720 closed Jul 16, 2025
When creating a Trainer object for a MixedModel, the initialization tries to access attribute get_base_model (which does not exist) rather than model
#39204 closed Jul 16, 2025
SigLip2 text pooler output selection
#39269 closed Jul 16, 2025
[YosoConfig] Missing `architectures` field
#39424 closed Jul 16, 2025
Qwen3 tokenizer wrong offset_mapping
#39401 closed Jul 16, 2025
OpenTelemetry Collector Connection error when installing the latest release 4.53.0 during `docker build`
#39143 closed Jul 16, 2025
DBRX model passes probabilities and not logits to the load balancer
#39055 closed Jul 16, 2025
`verify_tp_plan` function raises an error if a key without '.' is given
#38419 closed Jul 16, 2025
Whisper chunking algorithm increases WER
#37789 closed Jul 16, 2025
model_type = self._reverse_config_mapping[key.__name__] KeyError: 'Qwen2RMConfig'
#38517 closed Jul 16, 2025
TypeError: 'NoneType' object is not iterable in ESM when using DDP training
#38667 closed Jul 16, 2025
LlamaAttention forward function type hint is incorrect
#38739 closed Jul 15, 2025
`quantization_method` is not cleared after calling `.dequantize()`
#39295 closed Jul 15, 2025
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 closed Jul 15, 2025
Mypy errors since v4.51.0
#37339 closed Jul 15, 2025
Errors using TinyLlama-1.1B-Chat-v1.0 and DirectML
#38340 closed Jul 15, 2025
Pytorch language_modelling example run_clm fails when streaming is enabled
#39285 closed Jul 15, 2025
`transformers.utils.metrics` sets global `TracerProvider`
#39115 closed Jul 15, 2025
There is no transformers version that can run DeepSeek V3 generate
#38710 closed Jul 15, 2025
Support of Qwen3 GGUF model
#38650 closed Jul 15, 2025
Latest Transformers release causes CUDA out-of-memory errors during VisionLLM fine-tuning
#39337 closed Jul 14, 2025
Paligemma model card needs update
#38544 closed Jul 14, 2025
Using resnet-18 in flax
#39388 closed Jul 14, 2025
Getting Warnings When Instantiating Object Detection Models Due to Meta Tensor Initialization
#37615 closed Jul 14, 2025
4.52.2 报错Could not import module 'Qwen3ForCausalLM'
#38291 closed Jul 14, 2025
Transformers fail to load deepseek-ai/DeepSeek-V3 with vllm
#38588 closed Jul 13, 2025
MambaInnerFnBackward
#38600 closed Jul 13, 2025
Failed to full fine tuning code5p 2B
#38602 closed Jul 13, 2025
Exporting google/gemma-3n-e4b-it language_model (decoder) into ONNX format
#39328 closed Jul 12, 2025
Removing the modification of loss value due to rounding off to 4 digits
#38032 closed Jul 12, 2025
`AutoModel.from_pretrained(...)` (with explicit `device_map` unset) fails under `with torch.device("meta")` with PyTorch 2.6.0 and 2.7.0
#38066 closed Jul 12, 2025
eval_loss not found when training a peft model using trainer.py / losses not retrieved from base model where appropriate
#38130 closed Jul 12, 2025
Clarification on default top_k sampling parameter
#38549 closed Jul 12, 2025
hidden_states, self_attn_weights = self.self_attn( ValueError: too many values to unpack (expected 2)
#38554 closed Jul 12, 2025

123 Issues opened by 118 people

Could not import module 'AutoTokenizer'. Are this object's requirements defined correctly?
#40089 opened Aug 11, 2025
Default behavior of llama tokenizers breaks text by removing spaces (round trip is not identity function)
#40088 opened Aug 11, 2025
TypeError in DogeDecoderLayer with MoE Configuration when using dropout()
#40079 opened Aug 11, 2025
gpt_oss inference activates *all* experts for every token
#40073 opened Aug 11, 2025
Issue running model from ImageSegmentationPipeline
#40071 opened Aug 10, 2025
Transformer GGUF support philosophy / naive question
#40070 opened Aug 10, 2025
[BUG] No umt5 config for GGUF. This is not supported configuration.
#40067 opened Aug 10, 2025
Question: How to write a custome tokenizer form scratch
#40056 opened Aug 9, 2025
Whisper transcription accuracy improves when last 1600 samples of input audio are muted
#40054 opened Aug 9, 2025
Support text classification with GPT-OSS models
#40050 opened Aug 9, 2025
Please support loading Qwen 2.5 VL from GGUF
#40049 opened Aug 9, 2025
Recent releases break backwards-compatibility with key_cache
#40046 opened Aug 8, 2025
Support loading glm4moe GGUF
#40042 opened Aug 8, 2025
`plamo-2-1b` broken on latest main
#40034 opened Aug 8, 2025
Add Padding Strategy to DataCollatorForLanguageModeling
#40032 opened Aug 8, 2025
[gpt-oss] MoE routing bug in the mxfp4 implementation (in distributed setting)
#40031 opened Aug 8, 2025
accelerate==1.10.0 and safetensors==0.6.1 are incompatible with transformers==4.53.1
#40020 opened Aug 8, 2025
need GptOssForSequenceClassification
#40018 opened Aug 8, 2025
Customizable Logit Warping Strategies for Generation
#40010 opened Aug 7, 2025
Possible wrong init call
#40001 opened Aug 7, 2025
[gpt-oss] Transform checkpoint from safetensors to state dict
#39992 opened Aug 7, 2025
CVE fix for v4.37.2 and v4.38.0
#39983 opened Aug 7, 2025
FSDP2 not compatible with transformers >= 4.54.0 GenericForTokenClassification
#39977 opened Aug 7, 2025
bug in new transformers: 'Florence2ForConditionalGeneration' object has no attribute '_supports_sdpa'
#39974 opened Aug 7, 2025
Gemma3 with fp16 in inference (I don't know if this change is working in fine-tune) #BUG FIX
#39972 opened Aug 6, 2025
change `dataloader_persistent_workers` default value to `True`
#39963 opened Aug 6, 2025
Calling `trainer.evaluate()` before `trainer.train()` with FSDP 2 raises `ValueError: When using FSDP2, a model and optimizer must be passed together to `Accelerator.prepare()...`
#39961 opened Aug 6, 2025
TypeError: Received a NoneType for argument video_processor, but a BaseVideoProcessor was expected.(this issue im getting when using doc-ocr)
#39958 opened Aug 6, 2025
Retaining computational graph after using AutoImageProcessor
#39946 opened Aug 6, 2025
GPT-OSS mxfp4 with triton_kernel: make_default_matmul_mxfp4_w_layout not found
#39945 opened Aug 6, 2025
Breaking change in unset `_tp_plan` attribute
#39943 opened Aug 6, 2025
Still getting "fp16 mixed precision requires a GPU (not 'mps')." error
#39935 opened Aug 5, 2025
[Gemma3N] Not able to add new special tokens to model/tokenizer due to projection error
#39921 opened Aug 5, 2025
When using batch_eval_metrics, inputs are not gathered from different device, which is wrong behavior
#39916 opened Aug 5, 2025
Idefics 3: shape mismatch: value tensor of shape [256, 576] cannot be broadcast to indexing result of shape [192, 576]
#39914 opened Aug 5, 2025
Question: Llama4 weight reshaping
#39910 opened Aug 5, 2025
Hidden torchvision>=0.19.0 dependency results in quiet import failures of e.g. PreTrainedModel
#39907 opened Aug 5, 2025
v4.54.1 average_tokens_across_devices=True would cause "ValueError: Tensors must be CUDA and dense" when gathering num_items_in_batch
#39896 opened Aug 4, 2025
Add VideoPrism
#39893 opened Aug 4, 2025
[Feature Request] Automatically log parallelism configuration from Accelerate to W&B
#39882 opened Aug 4, 2025
Checking for additional_chat_templates doesn't work without internet (ConnectionError)
#39873 opened Aug 3, 2025
InternVL, PerceptionLM inference freeze in 4.54.1
#39872 opened Aug 3, 2025
Tensor parallelism for GLM-4.5
#39868 opened Aug 2, 2025
Florence2ForConditionalGeneration does not support Flash Attention 2.0 yet ?...
#39860 opened Aug 2, 2025
`make fixup` can't find PLC1802
#39853 opened Aug 1, 2025
Inconsistent Function calling behaviour by Mistral-7B-Instruct-v0.3
#39852 opened Aug 1, 2025
Support topNSigma sampling in `generate`
#39850 opened Aug 1, 2025
Accelerate seems to default mixed precision to bf16 when passing a DeepSpeed config.
#39849 opened Aug 1, 2025
Expected behavior of `compute_result` is hard to expect and inconsistent
#39842 opened Aug 1, 2025
MistralCommonTokenizer does not match PreTrainedTokenizer
#39841 opened Aug 1, 2025
pack_image_features RuntimeError when vision_feature_select_strategy="full"
#39839 opened Aug 1, 2025
Crash when running Llama4 on transformers-4.54.1
#39835 opened Aug 1, 2025
Allow extra outputs from `GenerationMixin.generate`
#39834 opened Aug 1, 2025
Missing einops dependency causing ModuleNotFoundError
#39811 opened Jul 31, 2025
Fine tuning qwen2.5 error
#39804 opened Jul 31, 2025
Memory leak occurred during training qwen-2.5-vl
#39803 opened Jul 31, 2025
ValueError: This model does not support cache_implementation='static'. Please check the following issue: https://github.com/huggingface/transformers/issues/28981
#39801 opened Jul 30, 2025
You current version of `autoawq` does not support module quantization skipping, please upgrade `autoawq` package to at least 0.1.8.
#39798 opened Jul 30, 2025
Regression - High memory usage when using transformers model with FSDP + LoRA
#39795 opened Jul 30, 2025
`transformers serve` Fails to Handle Messages with Nested Content
#39791 opened Jul 30, 2025
ViTPose+ models post processing doest not work for `dataset_index : 5`
#39789 opened Jul 30, 2025
"CSM audio generation lacks reliable EOS: does not generate all-zero frames → never stops early"
#39787 opened Jul 30, 2025
pip install 'transformers[torch]' pulls nvidia dependencies
#39780 opened Jul 30, 2025
transformers env fails with: ModuleNotFoundError: No module named 'PIL'
#39779 opened Jul 30, 2025
Granite 4.0 Tiny Preview inference broken in
#39775 opened Jul 30, 2025
would it be possible to standardize on the vx.y.z format for all tags
#39771 opened Jul 30, 2025
Model with non-string type property tool giving incomplete response using VLLM
#39767 opened Jul 29, 2025
Follow-up on Issues Regarding Training State Restoration from Interruptions
#39755 opened Jul 29, 2025
Inv frequency has not default, going against our philosophy
#39753 opened Jul 29, 2025
Qwen2_5_VLForConditionalGeneration cfg forward twice error
#39749 opened Jul 29, 2025
losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys) output logits and labels are not the batch same size
#39736 opened Jul 28, 2025
[transformers==4.54.0] FSDP1 forward misalignment after loading state dict
#39720 opened Jul 28, 2025
OWLv2 with visual prompt - alternative query embedding selection method
#39710 opened Jul 27, 2025
[i18n-<bn>] Translating docs to <Bengali>
#39705 opened Jul 27, 2025
ValueError: Number of image placeholders in the prompt does not match the number of images. internVL3
#39703 opened Jul 26, 2025
No flag to support Conditional Parameter Loading for gemma-3n-E2B models in transformer
#39699 opened Jul 26, 2025
4.54.0 bug: ImportError: cannot import name 'deterministic_g' from 'transformers.modeling_flash_attention_utils'
#39694 opened Jul 26, 2025
SigLIP2 documentation example has multiple errors (model/processor mismatch + quantization failure)
#39692 opened Jul 26, 2025
Qwen 2.5 VL - error without attention_mask
#39685 opened Jul 26, 2025
Add multi-candidate & tree search for assisted decoding (speculative decoding)
#39684 opened Jul 25, 2025
Accelerate beam search decoding via tree attention
#39682 opened Jul 25, 2025
error: argument --deepspeed: invalid dict value: '<path>'
#39673 opened Jul 25, 2025
Issue when initializing a DynamicCache
#39668 opened Jul 25, 2025
T5Gemma training not working
#39656 opened Jul 25, 2025
Use DP+FSDP device mesh dimensions for scaling loss with default value of average_tokens_across_devices: True
#39648 opened Jul 24, 2025
Please develop DataCollatorForVisionLanguageModeling to support visual model training !!!
#39647 opened Jul 24, 2025
[BUG] Run 111B+ Teacher distributed inference and 8B Student distributed training on multi-node H200 GPUs using the Transformers Trainer without encountering OOM errors?
#39637 opened Jul 24, 2025
FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39619 opened Jul 23, 2025
SageAttention for attention implementation?
#39618 opened Jul 23, 2025
Trainer: Error when folded metrics are saved
#39616 opened Jul 23, 2025
Qwen3 Fails w/4D Attn Mask when using FA2
#39608 opened Jul 23, 2025
ImageClassificationPipeline preprocess should accept numpy/tensor arrays
#39607 opened Jul 23, 2025
Does transformers support python3.13 -- disable-gil or python3.14 free threading?
#39596 opened Jul 23, 2025
Model forward execution in full eager mode?
#39565 opened Jul 21, 2025
Why `is_causal` is not used in `flash_attention_forward` ?
#39554 opened Jul 21, 2025
Is there plan to integrate ColQwen2.5 into Transformers?
#39549 opened Jul 21, 2025
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 opened Jul 21, 2025
Add muon and flash-muon optimizer
#39537 opened Jul 20, 2025
InformerForPrediction [I would like to seek your opinions, everyone, How can I set the dynamic real features for prediction]
#39551 opened Jul 20, 2025
training google colab error
#39527 opened Jul 19, 2025
paged attention NOT working with Qwen Models
#39525 opened Jul 19, 2025
T5Gemma failing on provided example
#39522 opened Jul 19, 2025
Export voxtral to ExecuTorch
#39511 opened Jul 18, 2025
Whisper transcription is 2x slower between 4.51.3 -> 4.52.1
#39508 opened Jul 18, 2025
Add Muon Optimiser for 2x faster convergence
#39495 opened Jul 18, 2025
Transformers still tries to use apex.amp which is no longer a thing in apex.
#39484 opened Jul 17, 2025
Adding Space-Time-MiniLM-v0
#39479 opened Jul 17, 2025
Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved
#39476 opened Jul 17, 2025
Unexpected behaviour with transformers versions above 4.28 for Donut
#39473 opened Jul 17, 2025
Autoformer get_lagged_subsequences always true if condition
#39460 opened Jul 16, 2025
Add Interactive Multi-Modal Attention Visualization for Vision-Language Models
#39440 opened Jul 15, 2025
Support for per-token latency tracking in `generate()` (suggested options: using callback, profiler class, or using a config flag)
#39437 opened Jul 15, 2025
Export LFM2 to ExecuTorch
#39436 opened Jul 15, 2025
Add DiCoW: Diarization-Conditioned Whisper
#39430 opened Jul 15, 2025
Gemma 3 Compilation Issues During Generation
#39427 opened Jul 15, 2025
object detection : matchin outputs.last_hidden_state with results
#39426 opened Jul 15, 2025
Exeception 3 type mismatch
#39413 opened Jul 15, 2025
FP8 training support for Model Parallel / Tensor Parallel (MP/TP)
#39410 opened Jul 15, 2025
Off-by-one error when using flash_attention with a sliding window
#39408 opened Jul 15, 2025
Whisper `return_language` with pipeline no longer working
#39404 opened Jul 14, 2025
Qwen2.5-VL Sharding error when using Tensor Parallelism
#39399 opened Jul 14, 2025
Mask2FormerImageProcessor yields inconsistent results between single and batch inference
#39382 opened Jul 12, 2025
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 opened Jul 12, 2025

138 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Segment Anything 2 (SAM2)
#32317 commented on Aug 12, 2025 • 166 new comments
[WiP] Add xcodec2 model
#37868 commented on Jul 30, 2025 • 76 new comments
Add support for Florence-2
#38188 commented on Aug 7, 2025 • 61 new comments
blt wip
#38579 commented on Aug 11, 2025 • 47 new comments
Feat: add Kwai-Keye transformers
#39292 commented on Aug 11, 2025 • 42 new comments
Add fastconformer encoder support for nvidia/parakeet and nvidia/canary models
#39062 commented on Aug 3, 2025 • 21 new comments
[WIP] Computer vision util: vision visualizer
#36892 commented on Aug 11, 2025 • 16 new comments
support MiniCPM-o2.6
#37917 commented on Aug 12, 2025 • 16 new comments
Add NVIDIA Cosmos
#36476 commented on Jul 16, 2025 • 14 new comments
Add T5LA models
#39293 commented on Jul 25, 2025 • 14 new comments
feat: Add ConvaiCausalLM model for Hindi Causal Language Modeling
#37837 commented on Jul 16, 2025 • 10 new comments
Add standardized model card for facebook/data2vec-audio-base-960h
#39368 commented on Jul 24, 2025 • 10 new comments
Fix the issue that csm model cannot work with pipeline mode.
#39349 commented on Aug 11, 2025 • 7 new comments
[omni modality] support composite processor config
#38142 commented on Aug 5, 2025 • 7 new comments
Add Ovis2 model and processor implementation
#37088 commented on Aug 11, 2025 • 5 new comments
fix: filter None router logits in Qwen3 MoE and handle empty router logits (#39203)
#39206 commented on Jul 21, 2025 • 5 new comments
Force real tensors and clone state_dict in src/transformers/modeling_utils.py
#38114 commented on Jul 15, 2025 • 3 new comments
Add X-Codec model
#38248 commented on Jul 23, 2025 • 3 new comments
Fix: Add version check for timm to support mobilenetv5 models (fixes #39208)
#39264 commented on Jul 14, 2025 • 3 new comments
feat: add sliding window attention to Continuous Batching
#39225 commented on Aug 1, 2025 • 3 new comments
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on Aug 11, 2025 • 2 new comments
Fix audio pipeline with torchcodec input
#39309 commented on Aug 1, 2025 • 2 new comments
Make executorch integration more seamless by analyzing model signature
#36969 commented on Jul 16, 2025 • 1 new comment
Provide clearer instructions on how to specify target language.
#38786 commented on Jul 21, 2025 • 1 new comment
add pin memory and block table
#39130 commented on Aug 11, 2025 • 1 new comment
deci gguf support
#38669 commented on Jul 29, 2025 • 1 new comment
Fix ModernBERT tokenizer issue with is_split_into_words flag
#38564 commented on Jul 16, 2025 • 1 new comment
another way to use shift_labels
#38533 commented on Jul 16, 2025 • 1 new comment
Bug in modeling_bart.eager_attention_forward
#39365 commented on Aug 11, 2025 • 0 new comments
Add StyleTTS 2
#35790 commented on Jul 28, 2025 • 0 new comments
Fix ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36011 commented on Jul 21, 2025 • 0 new comments
env.useBrowserCache = true causes JSON parsing error, forced to disable cache making app slower.
#39352 commented on Aug 11, 2025 • 0 new comments
Add Phi-3.5-vision
#36036 commented on Aug 1, 2025 • 0 new comments
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 commented on Aug 7, 2025 • 0 new comments
[Validation] First implementation of `@strict` from `huggingface_hub`
#36534 commented on Jul 29, 2025 • 0 new comments
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 commented on Jul 23, 2025 • 0 new comments
Update docstring for glm4v
#39357 commented on Jul 14, 2025 • 0 new comments
RoBERTa is not well implemented for tokenizers with pad_token_id != 1
#34528 commented on Aug 11, 2025 • 0 new comments
Add FAST
#35476 commented on Jul 30, 2025 • 0 new comments
Add JinaBERT model
#35320 commented on Jul 15, 2025 • 0 new comments
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on Jul 21, 2025 • 0 new comments
[Community contributions] Model cards
#36979 commented on Aug 11, 2025 • 0 new comments
Load a pretrainedfast tokenizer if fast=true and tokenizer.json exists
#33751 commented on Jul 15, 2025 • 0 new comments
[Contributions Welcome] Add Fast Image Processors
#36978 commented on Aug 11, 2025 • 0 new comments
DeepSpeed sequence parallelism (aka Ulysses) integration with HF transformer
#32305 commented on Jul 15, 2025 • 0 new comments
Implement MambaForSequenceClassification
#31155 commented on Jul 15, 2025 • 0 new comments
AutoConfig has potential issue with composite config.
#38258 commented on Aug 12, 2025 • 0 new comments
Fix Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39221 commented on Jul 21, 2025 • 0 new comments
feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT
#39140 commented on Aug 5, 2025 • 0 new comments
Disable static cache on certain MoE models
#39108 commented on Jul 28, 2025 • 0 new comments
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on Aug 11, 2025 • 0 new comments
Bug/38843 fix pos idx in fp32 parameter error
#39064 commented on Jul 22, 2025 • 0 new comments
Fix slow test_moshika_greedy_unconditional_fp16
#39251 commented on Jul 15, 2025 • 0 new comments
Allow compression on meta device
#39039 commented on Aug 8, 2025 • 0 new comments
Add Dust3R
#38805 commented on Jul 22, 2025 • 0 new comments
Adding custom 3d mask into ModernBert
#38671 commented on Jul 29, 2025 • 0 new comments
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on Aug 7, 2025 • 0 new comments
Add Bagel
#38569 commented on Aug 10, 2025 • 0 new comments
🔴[`Attention`] Bert-based Models Attention Refactor
#38301 commented on Jul 16, 2025 • 0 new comments
Fix the shape of ModernBertForMaskedLM's output hidden_states
#38272 commented on Jul 16, 2025 • 0 new comments
Add dates to the model docs
#39320 commented on Aug 11, 2025 • 0 new comments
add profiler to trainer
#37889 commented on Jul 29, 2025 • 0 new comments
fix colpali mapping
#39353 commented on Jul 14, 2025 • 0 new comments
Update ruff to 0.12.3 and apply its fixes
#37809 commented on Jul 21, 2025 • 0 new comments
Vectorize deepseek moe
#37769 commented on Jul 16, 2025 • 0 new comments
fix: qwen2.5 omni apply_chat_template system content check
#37511 commented on Aug 8, 2025 • 0 new comments
[RFC] Fix Gemma 3 FP16 with activation scaling
#37226 commented on Jul 16, 2025 • 0 new comments
trying custom tokenizer fix
#37177 commented on Jul 16, 2025 • 0 new comments
Add Plain-DETR
#37096 commented on Aug 11, 2025 • 0 new comments
Add Matching Anything by Segmenting Anything (MASA) MOT tracking model
#32164 commented on Jul 14, 2025 • 0 new comments
_load_rng_state after get_batch_samples may break training reproducibility when dataloader has random operations
#39215 commented on Jul 23, 2025 • 0 new comments
🌐 [i18n-KO] Translating docs to Korean
#20179 commented on Jul 24, 2025 • 0 new comments
Adding support for Gemma 3n GGUFs
#39329 commented on Jul 24, 2025 • 0 new comments
Model implmenetation using Liger Kernel layers
#38416 commented on Jul 24, 2025 • 0 new comments
Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer
#30725 commented on Jul 24, 2025 • 0 new comments
Trainer/accelerate doesn't save model when using FSDP with SHARDED_STATE_DICT
#30491 commented on Jul 24, 2025 • 0 new comments
Segfault on Apple M4 using AutoModelForSequenceClassification with BETO model on CPU
#39020 commented on Jul 25, 2025 • 0 new comments
Whisper word-level timestamp extraction fails with beam search
#36093 commented on Jul 28, 2025 • 0 new comments
Issue with module.smart_apply(module._initialize_weights) in the initialize_weights Function of modeling_utils.py
#39027 commented on Jul 28, 2025 • 0 new comments
Output logits differ significantly for different attn_implementations on image inputs
#39067 commented on Jul 28, 2025 • 0 new comments
ValueError: GGUF model with architecture deci is not supported yet.
#37736 commented on Jul 28, 2025 • 0 new comments
Resuming training from an interrupted checkpoint fails to save the final checkpoint.
#38939 commented on Jul 30, 2025 • 0 new comments
How to use other acceleration apis of npu?
#39105 commented on Jul 31, 2025 • 0 new comments
Adding native support to load GGUF models using transformers
#38063 commented on Jul 31, 2025 • 0 new comments
Add support for BAGEL from ByteDance
#38267 commented on Jul 31, 2025 • 0 new comments
Object detection training/fine-tuning for Owl-vit/Owlv2
#33664 commented on Aug 1, 2025 • 0 new comments
Community contribution: Adding GGUF support for more architectures
#33260 commented on Aug 2, 2025 • 0 new comments
If a training job job failed MLFlow will not be reported and MLFlow shows job still running
#30333 commented on Jul 15, 2025 • 0 new comments
[DOCS] Add `pruna` as optimization framework
#38740 commented on Jul 16, 2025 • 0 new comments
Modernbert 3D attention mask
#38040 commented on Jul 16, 2025 • 0 new comments
Automatic dynamic batch size selection for DataCollatorWithFlattening
#33945 commented on Jul 16, 2025 • 0 new comments
Flex attention support with arbitrary 4d mask for LlamaModel
#33898 commented on Jul 17, 2025 • 0 new comments
Add `pruna` integration for loading model through `transformers.from_pretrained` / `pipeline`.
#37971 commented on Jul 17, 2025 • 0 new comments
Add HF integration dates + paper release dates to the model docs
#39319 commented on Jul 18, 2025 • 0 new comments
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on Jul 18, 2025 • 0 new comments
Safetensors deserializing silently mishandles tied parameters
#38870 commented on Jul 18, 2025 • 0 new comments
Support 2D Array Inputs in Wav2Vec2FeatureExtractor for Non-Waveform Modalities
#39291 commented on Jul 18, 2025 • 0 new comments
RuntimeError when loading llmcompressor W8A8 quantized model: int8 dtype in weight initialization
#39366 commented on Jul 19, 2025 • 0 new comments
Error: StaticCache.__init__() got an unexpected keyword argument 'batch_size'
#38914 commented on Jul 20, 2025 • 0 new comments
Implement Titans Architecture with GRPO Fine-Tuning
#36352 commented on Jul 21, 2025 • 0 new comments
`AutoTokenizer.from_pretrained` does not propagate `token`
#39030 commented on Jul 21, 2025 • 0 new comments
Caching of model code in ~/.cache/huggingface/modules/transformers_modules
#39107 commented on Jul 22, 2025 • 0 new comments
add MiniCPM-o
#37029 commented on Jul 22, 2025 • 0 new comments
Unknown Model (mobilenetv5_300m_enc) when loading Gemma 3n
#39208 commented on Jul 22, 2025 • 0 new comments
Inference with model.generate( ) using a quantized model leads to assertion error
#39311 commented on Aug 9, 2025 • 0 new comments
Qwen3 MOE models w/non-empty `mlp_only_layers` fail when `output_router_logits=True`
#39203 commented on Aug 9, 2025 • 0 new comments
Improve CI/CD by completing migration from setup.py to pyproject.toml
#38928 commented on Aug 9, 2025 • 0 new comments
CUDA OOM when running meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
#37532 commented on Aug 9, 2025 • 0 new comments
`MoshiIntegrationTests` started to fail after #34464
#38725 commented on Aug 9, 2025 • 0 new comments
TypeError: GenerationMixin._extract_past_from_model_output() got an unexpected keyword argument 'standardize_cache_format'
#39336 commented on Aug 10, 2025 • 0 new comments
[Trainer] Eval loss depends on batch size (with solution)
#39241 commented on Aug 10, 2025 • 0 new comments
bf16_full_eval=True moves model to device before FSDP application and causes cuda OOM
#39136 commented on Aug 10, 2025 • 0 new comments
QWEN2VLProcessor missing video_token_id in mm_token_type_ids
#39112 commented on Aug 10, 2025 • 0 new comments
AutoModelForCausalLM.from_pretrained(..., device_map=...) ignore `Tensor.retain_grad()` in Multi-GPUs setting
#39036 commented on Aug 10, 2025 • 0 new comments
CPMANT Model Fails to Run Following Official Tutorial
#39026 commented on Aug 10, 2025 • 0 new comments
Potential Memory Leak or Caching in Fast Image Processor
#38656 commented on Aug 10, 2025 • 0 new comments
Attention refactor in #35235 adds a `__getitem__` into the forward pass, which causes errors with torch dynamo.
#38271 commented on Aug 10, 2025 • 0 new comments
YaRN: factor is not effective with original_max_position_embeddings
#38224 commented on Aug 10, 2025 • 0 new comments
"pipeline" is not exported from module "transformers"
#37646 commented on Aug 10, 2025 • 0 new comments
Please support GGUF format for UMT5EncoderModel
#36774 commented on Aug 10, 2025 • 0 new comments
FlashAttention2 support for GSAI-ML / LLaDA-8B-Instruct?
#39377 commented on Aug 11, 2025 • 0 new comments
v4.53.0 - Qwen 2.5 VL Flash Attention error - object has no attribute is_causal
#39231 commented on Aug 4, 2025 • 0 new comments
torch fake_tensor load hf model failed
#39217 commented on Aug 4, 2025 • 0 new comments
Exporting Llava decoder into ONNX format
#38924 commented on Aug 4, 2025 • 0 new comments
transformers: FlaubertTokenizer: do_lowercase_and_remove_accent: make the logger warning actionable (don't only tell what's wrong, rather suggest what could be done about that)
#39224 commented on Aug 4, 2025 • 0 new comments
Failed to export PyTorch traced graph of Mixtral-8x7B-Instruct-v0.1 due to the PR #32429
#38518 commented on Aug 4, 2025 • 0 new comments
Torch patches tracker for HPU/Gaudi
#39175 commented on Aug 5, 2025 • 0 new comments
[FEAT] [non-CUDA]: Support alternative implementation for `constraints.positive_definite.check`
#36660 commented on Aug 5, 2025 • 0 new comments
We now require users to upgrade torch to at least v2.6 in order to use the function.
#38464 commented on Aug 5, 2025 • 0 new comments
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on Aug 6, 2025 • 0 new comments
ImportError: cannot import name 'pipeline' from 'transformers'
#39137 commented on Aug 6, 2025 • 0 new comments
Loading audio in video from video URLs fail with chat template
#39076 commented on Aug 7, 2025 • 0 new comments
[RFC] Updating pipeline models
#26690 commented on Aug 7, 2025 • 0 new comments
Support for context-free-grammars (CFG) to constrain model output
#25778 commented on Aug 7, 2025 • 0 new comments
hangs during training using deepspeed
#39275 commented on Aug 8, 2025 • 0 new comments
Please help i am trying to run model but issue
#39260 commented on Aug 8, 2025 • 0 new comments
Support `StaticCache` in assisted generation
#32946 commented on Aug 8, 2025 • 0 new comments
Whisper demo code for model + processor API is broken
#39318 commented on Aug 9, 2025 • 0 new comments