-
Notifications
You must be signed in to change notification settings - Fork 30k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
7 Releases published by 2 people
-
v4.53.2-modernbert-decoder-preview ModernBERT Decoder (based on v4.53.2)
published
Jul 16, 2025 -
v4.53.3 Patch release v4.53.3
published
Jul 22, 2025 -
v4.53.2-Ernie-4.5-preview Ernie-4.5 and Ernie-4.5 MoE (based on v4.53.2)
published
Jul 23, 2025 -
4.54.1 Patch release 4.54.1
published
Jul 29, 2025 -
v4.55.0 v4.55.0: New openai GPT OSS model!
published
Aug 5, 2025 -
4.55.0-GLM-4.5V-preview GLM-4.5V preview based on 4.55.0
published
Aug 11, 2025
364 Pull requests merged by 145 people
-
feat: extract rev in attn_implementation kernels via @
#40009 merged
Aug 11, 2025 -
[
GPT Big Code
] Fix attention scaling#40041 merged
Aug 11, 2025 -
chore: standardize DeBERTa model card
#37409 merged
Aug 11, 2025 -
Fix
time_spent
innotification_service.py
.#40081 merged
Aug 11, 2025 -
added Textnet fast image processor
#39884 merged
Aug 11, 2025 -
Fix repo consistency
#40077 merged
Aug 11, 2025 -
guard on model.eval when using torch.compile + FSDP2
#37413 merged
Aug 11, 2025 -
Remove deprecated cache-related objects
#40035 merged
Aug 11, 2025 -
fix: move super().__init__ after vision_config init in Mistral3Config
#40063 merged
Aug 11, 2025 -
[gemma3] update conversion key mapping
#39778 merged
Aug 11, 2025 -
[qwen-vl] fix beam search with videos
#39726 merged
Aug 11, 2025 -
fix: resolve triton version check compatibility on windows
#39986 merged
Aug 11, 2025 -
unpin
torchcodec==0.5.0
and usetorch 2.8
on daily CI#40072 merged
Aug 10, 2025 -
Update HuBERT model card according to template
#39742 merged
Aug 10, 2025 -
Revert "fix
notification_service.py
abouttime_spent
"#40044 merged
Aug 8, 2025 -
GLM-4.5V Model Support
#39805 merged
Aug 8, 2025 -
fix
notification_service.py
abouttime_spent
#40037 merged
Aug 8, 2025 -
Bnb failling tests
#40026 merged
Aug 8, 2025 -
Tie weights recursively on all submodels
#39996 merged
Aug 8, 2025 -
[core] Refactor the Cache logic to make it simpler and more general
#39797 merged
Aug 8, 2025 -
Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991)
#40024 merged
Aug 8, 2025 -
Harmonize
past_key_value
topast_key_valueS
everywhere#39956 merged
Aug 8, 2025 -
Fix an annoying flaky test
#40000 merged
Aug 8, 2025 -
Higgs modules_to_not_convert standardization
#39989 merged
Aug 8, 2025 -
Fix broken image inference for Fuyu model
#39915 merged
Aug 8, 2025 -
pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI
#40013 merged
Aug 7, 2025 -
Update expected output values after #39885 (part 2)
#40015 merged
Aug 7, 2025 -
Raising error when quantizing a quantized model
#39998 merged
Aug 7, 2025 -
docs: fix duplication in 'en/optimizers.md'
#40014 merged
Aug 7, 2025 -
unpin torch<2.8 on circleci
#40012 merged
Aug 7, 2025 -
FA2 can continue generation from cache
#39843 merged
Aug 7, 2025 -
Fix default values of getenv
#39867 merged
Aug 7, 2025 -
Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips
#39965 merged
Aug 7, 2025 -
fix: remove CHAT_TEMPLATE import in tests for deepseek-vl
#40003 merged
Aug 7, 2025 -
Fix missing video inputs for PerceptionLM.
#39971 merged
Aug 7, 2025 -
Fix int4 quantized model cannot work with cpu
#39724 merged
Aug 7, 2025 -
Update expected output values after #39885 (part 1)
#39990 merged
Aug 7, 2025 -
Fix consistency
#39995 merged
Aug 7, 2025 -
Fix return typehint for decoder and annotate inv_freq
#39610 merged
Aug 7, 2025 -
Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu
#39967 merged
Aug 7, 2025 -
Fix gemma3n feature extractor's incorrect squeeze
#39919 merged
Aug 7, 2025 -
[Idefics] fix device mismatch
#39981 merged
Aug 7, 2025 -
Various test fixes for AMD
#39978 merged
Aug 7, 2025 -
Support input_embeds in torch exportable decoders
#39836 merged
Aug 7, 2025 -
[superglue] Fixed the way batch mask was applied to the scores before match assignment computation
#39968 merged
Aug 7, 2025 -
Gemma3 fixes
#39960 merged
Aug 7, 2025 -
Modular fix: remove the model name in
find_file_type
#39897 merged
Aug 6, 2025 -
chore: update Deformable_Detr model card
#39902 merged
Aug 6, 2025 -
[bugfix] fix flash_attention_2 unavailable error on Ascend NPU
#39844 merged
Aug 6, 2025 -
Fix
fix_and_overwrite
mode ofutils/check_docstring.py
#39369 merged
Aug 6, 2025 -
remove
triton_kernels
dep withkernels
instead#39926 merged
Aug 6, 2025 -
fix glm4v image process
#39964 merged
Aug 6, 2025 -
fix typo
#39936 merged
Aug 6, 2025 -
Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts
#39959 merged
Aug 6, 2025 -
docs: fix typo in 'quantization-aware training'
#39904 merged
Aug 6, 2025 -
Enable gpt-oss mxfp4 on older hardware (sm75+)
#39940 merged
Aug 6, 2025 -
Fix MXFP4 quantizer validation to allow CPU inference with dequantize option
#39953 merged
Aug 6, 2025 -
[docs] ko toc fix
#39927 merged
Aug 6, 2025 -
circleci: pin torch 2.7.1 until
torchcodec
is updated#39951 merged
Aug 6, 2025 -
Fix CI: Tests failing on CPU due to
torch.device('cpu').index
being None#39933 merged
Aug 6, 2025 -
Avoid
utils/check_bad_commit.py
failing due to rate limit (requestingapi.github.com
)#39918 merged
Aug 5, 2025 -
[CI] post-
GptOss
fixes for green CI#39929 merged
Aug 5, 2025 -
gpt_oss last chat template changes
#39925 merged
Aug 5, 2025 -
Add GPT OSS model from OpenAI
#39923 merged
Aug 5, 2025 -
🌐 [i18n-KO] Translated
cache_explanation.md
to Korean#39535 merged
Aug 5, 2025 -
Export SmolvLM
#39614 merged
Aug 5, 2025 -
Update object_detection.md
#39909 merged
Aug 5, 2025 -
run model debugging with forward arg
#39905 merged
Aug 5, 2025 -
Revert "remove dtensors, not explicit (#39840)"
#39912 merged
Aug 5, 2025 -
Fix aria tests
#39879 merged
Aug 5, 2025 -
Fix eval thread fork bomb
#39717 merged
Aug 5, 2025 -
Replace video_fps with fps in tests
#39898 merged
Aug 5, 2025 -
Fix misleading WandB error when WANDB_DISABLED is set
#39891 merged
Aug 5, 2025 -
Avoid aliasing in cond's branches for torch 2.8
#39488 merged
Aug 5, 2025 -
Remove unnecessary CUDA sync in qwen2_5_vl
#39870 merged
Aug 5, 2025 -
fix test_working_of_tp failure of accelerate ut
#39828 merged
Aug 5, 2025 -
[
Exaone4
] Fixes the attn implementation!#39906 merged
Aug 5, 2025 -
Reorder serving docs
#39634 merged
Aug 5, 2025 -
chore: update DETR model card
#39822 merged
Aug 4, 2025 -
Add support for
ModernBertForMultipleChoice
#39232 merged
Aug 4, 2025 -
send some feedback when manually building doc via comment
#39889 merged
Aug 4, 2025 -
Update cohere2 vision test
#39888 merged
Aug 4, 2025 -
[DOCS] : Improved mimi model card
#39824 merged
Aug 4, 2025 -
Fix link to models in README
#39880 merged
Aug 4, 2025 -
Better return type hint for
AutoModelForCausalLM
andAutoModelForImageTextToText
#39881 merged
Aug 4, 2025 -
Set
torch.backends.cudnn.allow_tf32 = False
for CI#39885 merged
Aug 4, 2025 -
Replace
Tokenizer
withPreTrainedTokenizerFast
inContinuousBatchProcessor
#39858 merged
Aug 4, 2025 -
Rework add-new-model-like with modular and make test filenames coherent
#39612 merged
Aug 4, 2025 -
Fix quant docker for fp-quant
#39641 merged
Aug 4, 2025 -
Fix attn_implementation setter for models with
backbone_config
#39855 merged
Aug 4, 2025 -
Add support for including in-memory videos (not just files/urls) in apply_chat_template
#39494 merged
Aug 4, 2025 -
Use comment to build doc on PRs
#39846 merged
Aug 4, 2025 -
Refactor label name handling for PEFT models in Trainer class
#39265 merged
Aug 4, 2025 -
Improve
is_wandb_available
function to verify WandB installation#39875 merged
Aug 4, 2025 -
remove dtensors, not explicit
#39840 merged
Aug 1, 2025 -
Allow
TrackioCallback
to work when pynvml is not installed#39851 merged
Aug 1, 2025 -
Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid
#39739 merged
Aug 1, 2025 -
Fix responses add tests
#39848 merged
Aug 1, 2025 -
Update ux cb
#39845 merged
Aug 1, 2025 -
[WIP] Add MM Grounding DINO
#37925 merged
Aug 1, 2025 -
Export private symbols
#39729 merged
Aug 1, 2025 -
[
attn_implementation
] remove recursive, allows custom kernels with wrappers#39823 merged
Aug 1, 2025 -
[VLMs] split out "get placeholder mask" to helper
#39777 merged
Aug 1, 2025 -
Fix tp cb
#39838 merged
Aug 1, 2025 -
Fix bad markdown links
#39819 merged
Jul 31, 2025 -
Fix broken links
#39809 merged
Jul 31, 2025 -
[cohere2 vision] move doc to multimodal section
#39820 merged
Jul 31, 2025 -
Update documentation for Cohere2Vision models
#39817 merged
Jul 31, 2025 -
[Model] Cohere2 Vision
#39810 merged
Jul 31, 2025 -
[docs] fix korean docs yet again
#39813 merged
Jul 31, 2025 -
feat(tokenization): add encode_message to tokenize messages one by one
#39507 merged
Jul 31, 2025 -
fix: providing a tensor to cache_position in model.generate kwargs always crashes because of boolean test
#39300 merged
Jul 30, 2025 -
Add callback to monitor progress in whisper transcription
#37483 merged
Jul 30, 2025 -
Update mT5 model card
#39702 merged
Jul 30, 2025 -
chore: update cohere2 (Command R7B) model card
#39604 merged
Jul 30, 2025 -
standardized BARThez model card
#39701 merged
Jul 30, 2025 -
Fix re-compilations for cross attention cache
#39788 merged
Jul 30, 2025 -
Simplify conditional code
#39781 merged
Jul 30, 2025 -
Fix an invalid condition
#39762 merged
Jul 30, 2025 -
fix chameleonvision UT failure
#39646 merged
Jul 30, 2025 -
Super tiny update
#39727 merged
Jul 30, 2025 -
more info in
model_results.json
#39783 merged
Jul 30, 2025 -
[ASR pipline] fix with datasets 4.0
#39504 merged
Jul 30, 2025 -
enable static cache on vision encoder decoder
#39773 merged
Jul 30, 2025 -
Fix Evolla and xLSTM tests
#39769 merged
Jul 30, 2025 -
Don't set
run_name
when none#39695 merged
Jul 30, 2025 -
Standardize CLAP model card format
#39738 merged
Jul 29, 2025 -
docs: Update EfficientLoFTR documentation
#39620 merged
Jul 29, 2025 -
Fix OmDet test after arg deprecation
#39766 merged
Jul 29, 2025 -
Remove python3.7 reference from doc link
#39706 merged
Jul 29, 2025 -
[docs] Ko doc fixes after toc update
#39660 merged
Jul 29, 2025 -
Fix Cache.max_cache_len max value for Hybrid models
#39737 merged
Jul 29, 2025 -
fix(trainer): Correct loss scaling for incomplete gradient accumulation steps
#39659 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
how_to_hack_models.md
to Korean#39536 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
perf_train_gpu_one.md
to Korean#39552 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
pipeline_gradio.md
to Korean#39520 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
tokenizer.md
to Korean#39532 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
tvp.md
to Korean#39578 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated albert.md to Korean
#39524 merged
Jul 29, 2025 -
🌐 [i18n-KO] Translated
main_classes/peft.md
#39515 merged
Jul 29, 2025 -
[modenbert] fix regression
#39750 merged
Jul 29, 2025 -
add
libcst
toextras["testing"]
insetup.py
#39761 merged
Jul 29, 2025 -
Fix version issue in modeling_utils.py
#39759 merged
Jul 29, 2025 -
Enable xpu allocator on caching_allocator_warmup
#39654 merged
Jul 29, 2025 -
Support loading Qwen3 MoE GGUF
#39638 merged
Jul 29, 2025 -
Fix GPT2 with cross attention
#39754 merged
Jul 29, 2025 -
Avoid OOM when other tests are failing
#39758 merged
Jul 29, 2025 -
AMD disable torchcodec
#39757 merged
Jul 29, 2025 -
Use
--gpus all
in workflow files#39752 merged
Jul 29, 2025 -
Apply several ruff SIM rules
#37283 merged
Jul 29, 2025 -
Fix mamba regression
#39728 merged
Jul 29, 2025 -
Update IMPORTANT_MODELS list
#39734 merged
Jul 29, 2025 -
update
GemmaIntegrationTest::test_model_2b_bf16_dola
again#39731 merged
Jul 29, 2025 -
Fix: add back base model plan
#39733 merged
Jul 29, 2025 -
[Fix] import two missing typos in
models/__init__.py
for typo checking#39745 merged
Jul 29, 2025 -
fix cache inheritance
#39748 merged
Jul 29, 2025 -
extend more trainer test cases to XPU, all pass
#39652 merged
Jul 29, 2025 -
BLIPs clean-up
#35560 merged
Jul 29, 2025 -
Add Fast Segformer Processor
#37024 merged
Jul 28, 2025 -
Superpoint fast image processor
#37804 merged
Jul 28, 2025 -
Fix AMD dockerfile for audio models
#39669 merged
Jul 28, 2025 -
Fix cache-related tests
#39676 merged
Jul 28, 2025 -
Fix Layer device placement in Caches
#39732 merged
Jul 28, 2025 -
Fix
Qwen2AudioForConditionalGeneration.forward()
andtest_flash_attn_kernels_inference_equivalence
#39503 merged
Jul 28, 2025 -
skip
Glm4MoeModelTest::test_torch_compile_for_training
#39670 merged
Jul 28, 2025 -
Update
QAPipelineTests::test_large_model_course
after #39193#39666 merged
Jul 28, 2025 -
mllama outputs refactor
#39643 merged
Jul 28, 2025 -
Remove all expired deprecation cycles
#39725 merged
Jul 28, 2025 -
[
CI
] Add Eric to comment slow ci#39601 merged
Jul 28, 2025 -
PATCH: add back n-dim device-mesh + fix tp trainer saving
#39693 merged
Jul 28, 2025 -
Add self-hosted runner scale set workflow for mi325 CI
#39651 merged
Jul 28, 2025 -
[configuration] remove redundant
classmethod
#38812 merged
Jul 28, 2025 -
update ernie model card
#39657 merged
Jul 28, 2025 -
[processors] add tests for helper fn
#39629 merged
Jul 28, 2025 -
xpu optimization for generation case
#39573 merged
Jul 28, 2025 -
fix(tokenization): check token.content for trie
#39587 merged
Jul 28, 2025 -
Fix missing initialization of
FastSpeech2Conformer
#39689 merged
Jul 28, 2025 -
fix missing model._tp_size from ep refactor
#39688 merged
Jul 26, 2025 -
More robust tied weight test
#39681 merged
Jul 25, 2025 -
Add padding-free to Granite hybrid moe models
#39677 merged
Jul 25, 2025 -
Fix tied weight test
#39680 merged
Jul 25, 2025 -
fix break for ckpt without _tp_plan
#39658 merged
Jul 25, 2025 -
Add EXAONE 4.0 model
#39129 merged
Jul 25, 2025 -
Support
typing.Literal
as type of tool parameters or return value#39633 merged
Jul 25, 2025 -
Add ep
#39501 merged
Jul 25, 2025 -
bad_words_ids no longer slow on mps
#39556 merged
Jul 25, 2025 -
Add xlstm model
#39665 merged
Jul 25, 2025 -
Use auto_docstring for perception_lm fast image processor
#39679 merged
Jul 25, 2025 -
fix: HWIO to OIHW
#39200 merged
Jul 25, 2025 -
Fix auto_docstring crashing when dependencies are missing
#39564 merged
Jul 25, 2025 -
Add support for DeepseekAI's DeepseekVL
#36248 merged
Jul 25, 2025 -
Add missing flag for CacheLayer
#39678 merged
Jul 25, 2025 -
Add evolla rebase main
#36232 merged
Jul 25, 2025 -
update expected outputs for whisper after #38778
#39304 merged
Jul 25, 2025 -
fix
kyutai
tests#39416 merged
Jul 25, 2025 -
Fixes the BC
#39636 merged
Jul 25, 2025 -
Delete bad rebasing functions
#39672 merged
Jul 25, 2025 -
[
Ernie 4.5
] Post merge adaptations#39664 merged
Jul 25, 2025 -
[CI] revert device in
test_export_static_cache
#39662 merged
Jul 25, 2025 -
Fix ModernBERT Decoder model
#39671 merged
Jul 25, 2025 -
🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor
#39591 merged
Jul 25, 2025 -
Rename huggingface_cli to hf
#39630 merged
Jul 25, 2025 -
fix(voxtral): correct typo in apply_transcription_request
#39572 merged
Jul 25, 2025 -
make fixup
#39661 merged
Jul 25, 2025 -
[docs] fix ko cache docs
#39644 merged
Jul 25, 2025 -
Make pytorch examples UV-compatible
#39635 merged
Jul 25, 2025 -
revert change to cu_seqlen_k and max_k when preparing from position_ids
#39653 merged
Jul 25, 2025 -
Fix: explicit not none check for tensors in flash attention
#39639 merged
Jul 25, 2025 -
[attention] fix test for packed padfree masking
#39582 merged
Jul 25, 2025 -
Add owlv2 fast processor
#39041 merged
Jul 25, 2025 -
revert behavior of _prepare_from_posids
#39622 merged
Jul 24, 2025 -
[Voxtral] values for A10 runners
#39605 merged
Jul 24, 2025 -
[timm] new timm pin
#39640 merged
Jul 24, 2025 -
Fix EfficientLoFTR model id in tests
#39621 merged
Jul 24, 2025 -
Update recent processors for vLLM backend
#39583 merged
Jul 24, 2025 -
[Docs] Translate audio_classification.md from English to Spanish
#39513 merged
Jul 23, 2025 -
standardized YOLOS model card according to template in #36979
#39528 merged
Jul 23, 2025 -
Feature/standardize opt model card
#39568 merged
Jul 23, 2025 -
🔴 Fix EnCodec internals and integration tests
#39431 merged
Jul 23, 2025 -
Fix DAC integration tests and checkpoint conversion.
#39313 merged
Jul 23, 2025 -
Move openai import
#39613 merged
Jul 23, 2025 -
Transformers serve VLM
#39454 merged
Jul 23, 2025 -
Fix important models CI
#39576 merged
Jul 23, 2025 -
Fix typos and grammar issues in documentation and code
#39598 merged
Jul 23, 2025 -
Allow
device_mesh
have multiple dim#38949 merged
Jul 23, 2025 -
enable triton backend on awq xpu
#39443 merged
Jul 23, 2025 -
[idefics3] fix for vLLM
#39470 merged
Jul 23, 2025 -
fix moe routing_weights
#39581 merged
Jul 23, 2025 -
FP-Quant support
#38696 merged
Jul 23, 2025 -
Rename
supports_static_cache
tocan_compile_fullgraph
#39505 merged
Jul 23, 2025 -
[Trackio] Allow single-gpu training and monitor power
#39595 merged
Jul 23, 2025 -
Generic task-specific base classes
#39584 merged
Jul 23, 2025 -
Fix DynamicCache and simplify Cache classes a bit
#39590 merged
Jul 23, 2025 -
Mask2former & Maskformer Fast Image Processor
#35685 merged
Jul 23, 2025 -
🎯 Trackio integration
#38814 merged
Jul 22, 2025 -
[WIP] Add OneformerFastImageProcessor
#38343 merged
Jul 22, 2025 -
Fix link in "Inference server backends" doc
#39589 merged
Jul 22, 2025 -
Torchdec RuntimeError catch
#39580 merged
Jul 22, 2025 -
[Paged-Attention] Handle continuous batching for repetition penalty
#39457 merged
Jul 22, 2025 -
updated mistral3 model card
#39531 merged
Jul 22, 2025 -
Update
docs/source/ko/_toctree.yml
#39516 merged
Jul 22, 2025 -
[cache refactor] Move all the caching logic to a per-layer approach
#39106 merged
Jul 22, 2025 -
General weight initialization scheme
#39579 merged
Jul 22, 2025 -
Add AMD GPU expectations for LLaVA tests
#39486 merged
Jul 22, 2025 -
Kernels flash attn
#39474 merged
Jul 22, 2025 -
Add AMD expectations to Mistral3 tests
#39481 merged
Jul 22, 2025 -
[docs] Create page on inference servers with transformers backend
#39550 merged
Jul 22, 2025 -
[docs] update attention implementation and cache docs
#39547 merged
Jul 22, 2025 -
Add AMD test expectations to DETR model
#39539 merged
Jul 22, 2025 -
feat: add support for gradient checkpointing for TimmWrapperModel and TimmWrapperForImageClassification
#39287 merged
Jul 22, 2025 -
Fixes needed for n-d parallelism and TP
#39562 merged
Jul 22, 2025 -
Bump AMD container for 2.7.1 PyTorch
#39458 merged
Jul 22, 2025 -
Add EfficientLoFTR model
#36355 merged
Jul 22, 2025 -
[gemma3] fix bidirectional image mask
#39396 merged
Jul 22, 2025 -
Update OLMoE model card
#39344 merged
Jul 21, 2025 -
Update modernbertdecoder docs
#39453 merged
Jul 21, 2025 -
[
CI
] Fix post merge ernie 4.5#39561 merged
Jul 21, 2025 -
[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps)
#39489 merged
Jul 21, 2025 -
[
Ernie 4.5
] Add ernie text models#39228 merged
Jul 21, 2025 -
Refactor embedding input/output getter/setter
#39339 merged
Jul 21, 2025 -
🌐 [i18n-KO] Translated
perf_infer_gpu_multi.md
to Korean#39441 merged
Jul 21, 2025 -
[Fast image processor] refactor fast image processor glm4v
#39490 merged
Jul 21, 2025 -
fix ndim check of device_mesh for TP
#39538 merged
Jul 21, 2025 -
Refactor
MambaCache
tomodeling_mamba.py
#38086 merged
Jul 21, 2025 -
Fix Docstring of BarkProcessor
#39546 merged
Jul 21, 2025 -
use the enable_gqa param in torch.nn.functional.scaled_dot_product_at…
#39412 merged
Jul 21, 2025 -
Fix missing initializations for models created in 2023
#39239 merged
Jul 21, 2025 -
Raise
TypeError
instead of ValueError for invalid types#38660 merged
Jul 21, 2025 -
Fix pylint warnings
#39477 merged
Jul 21, 2025 -
Fix Qwen Omni integration test
#39553 merged
Jul 21, 2025 -
🚨🚨🚨 [Trainer] Enable
average_tokens_across_devices
by default inTrainingArguments
#39395 merged
Jul 21, 2025 -
Rename
_supports_flash_attn_2
in examples and tests#39471 merged
Jul 21, 2025 -
Fix the check in flex test
#39548 merged
Jul 21, 2025 -
Fix bad tensor shape in failing Hubert test.
#39502 merged
Jul 21, 2025 -
GLM-4 Update
#39393 merged
Jul 21, 2025 -
[qwen2 vl] fix packing with all attentions
#39447 merged
Jul 21, 2025 -
[gemma3] support sequence classification task
#39465 merged
Jul 21, 2025 -
Fix placeholders replacement logic in auto_docstring
#39433 merged
Jul 18, 2025 -
Update SAM/SAM HQ attention implementation + fix Cuda sync issues
#39386 merged
Jul 18, 2025 -
Improve @auto_docstring doc and rename
args_doc.py
toauto_docstring.py
#39439 merged
Jul 18, 2025 -
Add fast image processor SAM
#39385 merged
Jul 18, 2025 -
Fix BatchEncoding.to() for nested elements
#38985 merged
Jul 18, 2025 -
[gemma3] Fix do_convert_rgb in image processors.
#39438 merged
Jul 18, 2025 -
[chat template] return assistant mask in processors
#38545 merged
Jul 18, 2025 -
[dependencies] Update
datasets
pin#39500 merged
Jul 18, 2025 -
Slack CI bot: set default result for non-existing artifacts
#39499 merged
Jul 18, 2025 -
🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling
#39423 merged
Jul 18, 2025 -
[doc builder job] temporary pyarrow pin
#39496 merged
Jul 18, 2025 -
Add voxtral
#39429 merged
Jul 18, 2025 -
Fix typing order
#39467 merged
Jul 17, 2025 -
Add unified logits_to_keep support to LLMClass
#39472 merged
Jul 17, 2025 -
[serve] Add speech to text (
/v1/audio/transcriptions
)#39434 merged
Jul 17, 2025 -
Update integration_utils.py
#39469 merged
Jul 17, 2025 -
fix: ImageTextToTextPipeline handles user-defined generation_config
#39374 merged
Jul 17, 2025 -
Enable some ruff checks for performance and readability
#39383 merged
Jul 17, 2025 -
Fix convert_and_export_with_cache failures for GPU models
#38976 merged
Jul 17, 2025 -
Update
GemmaIntegrationTest::test_model_2b_bf16_dola
#39362 merged
Jul 17, 2025 -
fix a comment typo in utils.py
#39459 merged
Jul 17, 2025 -
Use newer typing notation
#38934 merged
Jul 17, 2025 -
Fix tests due to breaking change in accelerate
#39451 merged
Jul 17, 2025 -
fix max_length calculating using cu_seq_lens
#39341 merged
Jul 17, 2025 -
fix(pipelines): QA pipeline returns fewer than top_k results in batch mode
#39193 merged
Jul 17, 2025 -
Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings
#38822 merged
Jul 16, 2025 -
create ijepa modelcard (ref : PR #36979 ).
#39354 merged
Jul 16, 2025 -
Improve grammar and clarity in perf_hardware.md
#39428 merged
Jul 16, 2025 -
fix cached file error when repo type is dataset
#36909 merged
Jul 16, 2025 -
Fix indentation bug in SmolVLM image processor causing KeyError
#39452 merged
Jul 16, 2025 -
Updated Megatron conversion script for gpt2 checkpoints
#38969 merged
Jul 16, 2025 -
[
CI
] Fix partially red CI#39448 merged
Jul 16, 2025 -
Fixes #39204: add fallback if get_base_model missing
#39226 merged
Jul 16, 2025 -
make the loss context manager easier to extend
#39321 merged
Jul 16, 2025 -
Remove something that should have never been there
#38254 merged
Jul 16, 2025 -
Fix processor tests
#39450 merged
Jul 16, 2025 -
[Bugfix] [Quantization] Remove unused init arg
#39324 merged
Jul 16, 2025 -
Better typing for model.config
#39132 merged
Jul 16, 2025 -
Fix typo in generation configuration for Janus model weight conversion
#39432 merged
Jul 16, 2025 -
Responses API in
transformers serve
#39155 merged
Jul 16, 2025 -
[cache] make all classes cache compatible finally
#38635 merged
Jul 16, 2025 -
docs: add missing numpy import to minimal example
#39444 merged
Jul 16, 2025 -
Remove runtime conditions for type checking
#37340 merged
Jul 16, 2025 -
Add StableAdamW Optimizer
#39446 merged
Jul 16, 2025 -
add test scanner
#39419 merged
Jul 16, 2025 -
Fix missing definition of diff_file_url in notification service
#39445 merged
Jul 16, 2025 -
Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer
#31870 merged
Jul 16, 2025 -
Change log level from warning to info for scheduled request logging in
ContinuousBatchProcessor
#39372 merged
Jul 16, 2025 -
Defaults to adamw_torch_fused for Pytorch>=2.8
#37358 merged
Jul 16, 2025 -
Fix L270 - hasattr("moe_args") returning False error
#38715 merged
Jul 16, 2025 -
[chat template] add a testcase for kwargs
#39415 merged
Jul 16, 2025 -
Fixed a bug calculating cross entropy loss in
JetMoeForCausalLM
#37830 merged
Jul 16, 2025 -
Remove double soft-max in load-balancing loss. Fixes #39055 .
#39056 merged
Jul 16, 2025 -
[Core] [Offloading] Fix saving offloaded submodules
#39280 merged
Jul 16, 2025 -
[autodocstring] add video and audio inputs
#39420 merged
Jul 16, 2025 -
Responses API (to be merged into #39155)
#39338 merged
Jul 16, 2025 -
CI workflow for performed test regressions
#39198 merged
Jul 16, 2025 -
docs: update LightGlue docs
#39407 merged
Jul 15, 2025 -
docs: update SuperGlue docs
#39406 merged
Jul 15, 2025 -
[vlm] fix loading of retrieval VLMs
#39242 merged
Jul 15, 2025 -
handle training summary when creating modelcard but offline mode is set
#37095 merged
Jul 15, 2025 -
Remove residual quantization attribute from dequantized models
#39373 merged
Jul 15, 2025 -
Remove deprecated audio utils functions
#39330 merged
Jul 15, 2025 -
Fix bugs in pytorch example run_clm when streaming is enabled
#39286 merged
Jul 15, 2025 -
Fix bugs from pipeline preprocessor overhaul
#39425 merged
Jul 15, 2025 -
refactor: remove
set_tracer_provider
andset_meter_provider
calls#39422 merged
Jul 15, 2025 -
Fix invalid property
#39384 merged
Jul 15, 2025 -
set document_question_answering pipeline _load_tokenizer to True
#39411 merged
Jul 15, 2025 -
Ignore extra position embeddings weights for ESM
#39063 merged
Jul 15, 2025 -
support loading qwen3 gguf
#38645 merged
Jul 15, 2025 -
Add ModernBERT Decoder Models - ModernBERT, but trained with CLM!
#38967 merged
Jul 15, 2025 -
Fix typo in
/v1/models
output payload#39414 merged
Jul 15, 2025 -
[refactor] set attention implementation
#38974 merged
Jul 15, 2025 -
Fix/siglip2 pooling comment
#39378 merged
Jul 14, 2025 -
Update phi4_multimodal.md
#38830 merged
Jul 14, 2025 -
[Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
#39391 merged
Jul 14, 2025 -
Use np.pad instead of np.lib.pad.
#39346 merged
Jul 14, 2025 -
🚨 Totally rewrite how pipelines load preprocessors
#38947 merged
Jul 14, 2025 -
Remove do_reduce_labels Argument from model initialization in run_semantic_segmentation_no_trainer
#39322 merged
Jul 14, 2025 -
Fix Lfm2 and common tests
#39398 merged
Jul 14, 2025 -
Deprecate AutoModelForVision2Seq
#38900 merged
Jul 14, 2025 -
[Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor
#39333 merged
Jul 14, 2025 -
[BLIP] remove cache from Qformer
#39335 merged
Jul 14, 2025 -
[shieldgemma] fix checkpoint loading
#39348 merged
Jul 14, 2025 -
Fix overriding Fast Image/Video Processors instance attributes affect other instances
#39363 merged
Jul 12, 2025 -
update docker file to use latest
timm
(forperception_lm
)#39380 merged
Jul 12, 2025
190 Pull requests opened by 128 people
-
Add Apertus
#39381 opened
Jul 12, 2025 -
Fix: Docker Build Vulnerable to Malicious Package Installation Attack in docker/custom-tokenizers.dockerfile
#39394 opened
Jul 14, 2025 -
No repeat kv
#39402 opened
Jul 14, 2025 -
Add Vocos model
#39403 opened
Jul 14, 2025 -
Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs
#39435 opened
Jul 15, 2025 -
Fix logger warnings in Gemma model test files
#39449 opened
Jul 16, 2025 -
Add eurobert
#39455 opened
Jul 16, 2025 -
Fix quantized model initialization for int8 dtypes
#39456 opened
Jul 16, 2025 -
Skipping `initialize_weights` when model is quantized
#39464 opened
Jul 17, 2025 -
README: Update Bert Japanese model card
#39466 opened
Jul 17, 2025 -
Fix quantized model dispatch with device_map='auto'
#39468 opened
Jul 17, 2025 -
Fix Bark failing tests
#39478 opened
Jul 17, 2025 -
Add model arcinstitute state
#39480 opened
Jul 17, 2025 -
Bye bye env vars, keep everything as configs
#39483 opened
Jul 17, 2025 -
Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling
#39485 opened
Jul 17, 2025 -
Update CTRL model card with improved usage examples and documentation notes
#39487 opened
Jul 17, 2025 -
Fix: Skip weight initialization for quantized int8 models
#39491 opened
Jul 17, 2025 -
[Voxtral] nit + pin correct mistral common version
#39493 opened
Jul 18, 2025 -
Make sure Moshi is exportable with static cache
#39506 opened
Jul 18, 2025 -
[WIP] :broom: :broom: :broom: Get set decoder cleanup
#39509 opened
Jul 18, 2025 -
🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean
#39517 opened
Jul 19, 2025 -
🌐 [i18n-KO] Translated `models.md` to Korean
#39518 opened
Jul 19, 2025 -
🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean
#39519 opened
Jul 19, 2025 -
build: Add fast image processor tvp
#39529 opened
Jul 20, 2025 -
Add Beit3 model
#39534 opened
Jul 20, 2025 -
Add Muon optimizer implementation and integration
#39541 opened
Jul 20, 2025 -
🌐 [i18n-KO] Translated feature_extractors.md to Korea
#39544 opened
Jul 21, 2025 -
[WIP] try to relax the tie_weights method
#39555 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean
#39557 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean
#39559 opened
Jul 21, 2025 -
fix load_model_end = true work when save_steps < eval_steps
#39560 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean
#39563 opened
Jul 21, 2025 -
🌐 [i18n-KO] Translated `auto_docstring.md` to Korean
#39571 opened
Jul 22, 2025 -
feat(autoformer): Improve ValueError for insufficient sequence length
#39574 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated `vitpose.md` to Korean
#39575 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated `pipelines.md` to Korean
#39577 opened
Jul 22, 2025 -
[`Ernie 4.5`] Ernie VL models
#39585 opened
Jul 22, 2025 -
WIP, reference modeling
#39588 opened
Jul 22, 2025 -
Add Fast Image Processor for ImageGPT
#39592 opened
Jul 22, 2025 -
🌐 [i18n-KO] Translated 'xclip.md' to Korean
#39594 opened
Jul 22, 2025 -
Fix: check TrainerState file exists before loading during resume
#39599 opened
Jul 23, 2025 -
[video processors] decode only sampled videos -> less RAM and faster processing
#39600 opened
Jul 23, 2025 -
feat: add `is_fast` to ImageProcessor
#39603 opened
Jul 23, 2025 -
HunYuan opensource
#39606 opened
Jul 23, 2025 -
Chat schemas
#39609 opened
Jul 23, 2025 -
Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39617 opened
Jul 23, 2025 -
fix tensor device when loading state dict
#39623 opened
Jul 24, 2025 -
Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI
#39625 opened
Jul 24, 2025 -
[serve] Add speech-to-text
#39631 opened
Jul 24, 2025 -
fix dead NVIDIA link
#39632 opened
Jul 24, 2025 -
🌐 [i18n-KO] Translated `deepseek_v3.md` to Korean
#39649 opened
Jul 24, 2025 -
Fix loss scaling and token aggregation to use only data parallel group
#39674 opened
Jul 25, 2025 -
[BugFix]: Support dict and config file path for deepspeed
#39675 opened
Jul 25, 2025 -
Fix issue #39191 respect accelerate config to disable torch.dynamo compilation
#39683 opened
Jul 25, 2025 -
Allow custom hf_quantizer in from_pretrained
#39690 opened
Jul 26, 2025 -
fix misspelled issues
#39691 opened
Jul 26, 2025 -
use untyped storage for dtensors due to deprecation
#39697 opened
Jul 26, 2025 -
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"
#39698 opened
Jul 26, 2025 -
Fix Causality Handling in Flash Attention to Support Bidirectional Attention
#39707 opened
Jul 27, 2025 -
🌐[i18n-bn] Introduce Bengali version of Transformers documentation
#39708 opened
Jul 27, 2025 -
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39712 opened
Jul 27, 2025 -
🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean
#39713 opened
Jul 27, 2025 -
🌐 [i18n-KO] Translated `main_classes/backbones.md` to Korean
#39714 opened
Jul 27, 2025 -
Fix SigLIP2 documentation model/processor mismatch
#39718 opened
Jul 28, 2025 -
[Feat] Adding Intern-S1
#39722 opened
Jul 28, 2025 -
handle multimodal models with tp_plan on the text_config
#39735 opened
Jul 28, 2025 -
[Tests] [Bugfix] Make weights tied for `dynamic_tied_weights` test
#39740 opened
Jul 28, 2025 -
Fix HfArgumentParser to filter out dict types from Union
#39741 opened
Jul 28, 2025 -
Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock
#39743 opened
Jul 29, 2025 -
🌐 [i18n-KO] Translated `text-to-speech.md` to Korean
#39751 opened
Jul 29, 2025 -
Fix rope_deltas corruption in Qwen2.5VL during CFG generation
#39756 opened
Jul 29, 2025 -
[Draft] Add Llasa TTS family of models
#39760 opened
Jul 29, 2025 -
Improve Gemma3n model and tests
#39764 opened
Jul 29, 2025 -
Stop using `from_legacy_cache` as Cache initialization
#39765 opened
Jul 29, 2025 -
Benchmarking improvements
#39768 opened
Jul 29, 2025 -
[Bugfix] Fix `AutoModel.from_pretrained(..., quantization_config=None)` regression
#39770 opened
Jul 29, 2025 -
Fix missing initializations for models created in 2022
#39772 opened
Jul 30, 2025 -
Use `dtype` instead of `torch_dtype` everywhere!
#39782 opened
Jul 30, 2025 -
fix mllama integration tests
#39785 opened
Jul 30, 2025 -
Fix pil dependency torch extra
#39790 opened
Jul 30, 2025 -
Served models handle with nested content
#39792 opened
Jul 30, 2025 -
Fix DAC conversion script
#39793 opened
Jul 30, 2025 -
Fix ProphetNet forward to handle tuple encoder_outputs
#39794 opened
Jul 30, 2025 -
[pipelines] text-to-audio pipeline standardization
#39796 opened
Jul 30, 2025 -
Mistral: Add support for interleaved attention
#39799 opened
Jul 30, 2025 -
[WIP] Add EdgeTAM
#39800 opened
Jul 30, 2025 -
fix: qwen 25vl rope if item is masked
#39802 opened
Jul 30, 2025 -
Enable SIM rules
#39806 opened
Jul 31, 2025 -
🌐 [i18n-KO] Translated `bamba.md` to Korean
#39807 opened
Jul 31, 2025 -
🌐 [i18n-KO] Translated `gpt2.md` to Korean
#39808 opened
Jul 31, 2025 -
[chat template] update when "push_to_hub"
#39815 opened
Jul 31, 2025 -
Refactor vit-like models
#39816 opened
Jul 31, 2025 -
Support MetaCLIP 2
#39821 opened
Jul 31, 2025 -
[serve] guard imports
#39825 opened
Jul 31, 2025 -
Add MetaCLIP 2
#39826 opened
Jul 31, 2025 -
[serve] allow array `content` inputs for LLMs
#39829 opened
Jul 31, 2025 -
refactor(modeling_llama): make RotaryEmbedding default path explicit
#39831 opened
Jul 31, 2025 -
add step3v in VLMS
#39837 opened
Aug 1, 2025 -
[WIP] RoPE refactor
#39847 opened
Aug 1, 2025 -
Fix DeepSpeed mixed precision precedence over Accelerate defaults
#39856 opened
Aug 1, 2025 -
WIP: Initial support for bnb 4bit on any nn.Parameter
#39859 opened
Aug 1, 2025 -
🌐 [i18n-KO] Translated grounding-dino.md to Korean
#39861 opened
Aug 2, 2025 -
Update model card for gpt neox japanese
#39862 opened
Aug 2, 2025 -
🌐 [i18n-KO] Translated `chat_extras.md` to Korean
#39863 opened
Aug 2, 2025 -
🌐 [i18n-KO] Translated `gemma3.md` to Korean
#39865 opened
Aug 2, 2025 -
make sure model.save_pretrained has the correct is_main_process
#39866 opened
Aug 2, 2025 -
Update README.md
#39869 opened
Aug 3, 2025 -
fix: Catch correct ConnectionError for additional_chat_templates
#39874 opened
Aug 3, 2025 -
FP-Quant NVFP4 and Python 3.9 support
#39876 opened
Aug 3, 2025 -
Remove deprecated max_size parameter from ConditionalDetrImageProcessor
#39883 opened
Aug 4, 2025 -
🌐 [i18n-KO] Translated `perf_train_gaudi.md` to Korean
#39886 opened
Aug 4, 2025 -
🌐 [i18n-KO] Translated `jamba.md` to Korean
#39890 opened
Aug 4, 2025 -
[docs] Add reference to HF-maintained `custom_generate` collections
#39894 opened
Aug 4, 2025 -
Add Videoprism
#39895 opened
Aug 4, 2025 -
[model] Support MiniCPM-V 4.0
#39899 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated `fp_quant` to Korean
#39901 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated clipseg.md to Korean
#39903 opened
Aug 5, 2025 -
Update dynamic attnt setter for multimodals
#39908 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated `tiny_agents.md` to Korean
#39913 opened
Aug 5, 2025 -
🌐 [i18n-KO] Updated ko/perf_train_cpu.md
#39917 opened
Aug 5, 2025 -
🌐 [i18n-KO] Updated ko/perf_train_special.md
#39920 opened
Aug 5, 2025 -
🌐 [i18n-KO] Translated `attention_interface.md` to Korean
#39922 opened
Aug 5, 2025 -
Add chat template tests
#39924 opened
Aug 5, 2025 -
Fix hidden torchvision>=0.15 dependency issue
#39928 opened
Aug 5, 2025 -
Add missing special token properties to MistralCommonTokenizer
#39930 opened
Aug 5, 2025 -
Registers StaticCache serialization functions for torch.export.export
#39931 opened
Aug 5, 2025 -
Fix whisper `return_language` with `return_timestamp=word`
#39938 opened
Aug 5, 2025 -
fixing image_utils.py todo
#39941 opened
Aug 6, 2025 -
fix llama issue
#39942 opened
Aug 6, 2025 -
Add back `_tp_plan` attribute
#39944 opened
Aug 6, 2025 -
Add pytest marker: `torch_compile_test` and `torch_export_test`
#39950 opened
Aug 6, 2025 -
Use torch._check instead of a test to make the model Gemma3 exportable
#39962 opened
Aug 6, 2025 -
Add Keypoint Matcher pipeline
#39970 opened
Aug 6, 2025 -
Causal loss for `ForConditionalGeneration`
#39973 opened
Aug 7, 2025 -
[bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM
#39975 opened
Aug 7, 2025 -
Fix Qwen3 MoE GGUF architecture mismatch
#39976 opened
Aug 7, 2025 -
Fix cross-attention masking before residual connection
#39979 opened
Aug 7, 2025 -
Fix setting attention for multimodal models
#39984 opened
Aug 7, 2025 -
Add a VGGT(Visual Geometry Grounded Transformer) model compatible with huggingface transfromers
#39987 opened
Aug 7, 2025 -
Update Glm4V processor and add tests
#39988 opened
Aug 7, 2025 -
Default to dequantize if cpu in device_map for mxfp4
#39993 opened
Aug 7, 2025 -
chore: Add type hints to import_utils.py module
#39994 opened
Aug 7, 2025 -
make sure position_ids are passed in for causal mask creation for gpt-oss
#39997 opened
Aug 7, 2025 -
allow TP to work in ND-parallel with fsdp cpu ram efficient loading
#39999 opened
Aug 7, 2025 -
[`Flash Attention`] Fix flash attention integration
#40002 opened
Aug 7, 2025 -
Fix PerceptionLM image preprocessing for non-tiled image input.
#40006 opened
Aug 7, 2025 -
🚨 Use lru_cache for sine pos embeddings MaskFormer
#40007 opened
Aug 7, 2025 -
Fixes for EncoderDecoderCache
#40008 opened
Aug 7, 2025 -
🌐 [i18n-KO] Translated `optimizers.md` to Korean
#40011 opened
Aug 7, 2025 -
Feat/add gpt oss sequence classification
#40019 opened
Aug 8, 2025 -
[fix] batch inference for llava_onevision
#40021 opened
Aug 8, 2025 -
fix: resolve dropout type error in DogeDecoder
#40022 opened
Aug 8, 2025 -
Add support for SDPA for OWLViT and OWLv2
#40023 opened
Aug 8, 2025 -
Add amd runners to run-slow command
#40027 opened
Aug 8, 2025 -
Revert FA2 kwargs construction
#40029 opened
Aug 8, 2025 -
Update boxes expectations for OWLViT test
#40030 opened
Aug 8, 2025 -
Add model card for MobileViT
#40033 opened
Aug 8, 2025 -
Fix error on importing unavailable torch.distributed
#40038 opened
Aug 8, 2025 -
New DynamicSlidingWindowLayer & associated Cache
#40039 opened
Aug 8, 2025 -
Add GptOssForSequenceClassification for GPT-OSS models
#40043 opened
Aug 8, 2025 -
(small) fix conditional for input_ids and input_embeds in marian
#40045 opened
Aug 8, 2025 -
Update wavlm.md to match new model card template
#40047 opened
Aug 8, 2025 -
Standardize BARTpho model card: badges, new examples, fixed broken im…
#40051 opened
Aug 9, 2025 -
Auto-log parallelism info to wandb.config using HF Accelerate
#40055 opened
Aug 9, 2025 -
updated visualBERT modelcard
#40057 opened
Aug 9, 2025 -
GGUF Qwen2VL
#40058 opened
Aug 9, 2025 -
Fix Inefficient GELU implementation in GPT2
#40059 opened
Aug 9, 2025 -
Avoid CUDA stream sync
#40060 opened
Aug 10, 2025 -
🌐 [i18n-KO] Translated `vitdet.md` to Korean
#40061 opened
Aug 10, 2025 -
🌐 [i18n-KO] Translated `videomae.md` to Korean
#40064 opened
Aug 10, 2025 -
Delay float32 upcast in ForCausalLMLoss after filtering ignore_index
#40065 opened
Aug 10, 2025 -
Change Qwen2RMSNorm to RMSNorm from PyTorch
#40066 opened
Aug 10, 2025 -
Add missing arguments to class constructors
#40068 opened
Aug 10, 2025 -
Remove _prepare_flash_attention_from_position_ids
#40069 opened
Aug 10, 2025 -
initializing branch and draft PR
#40074 opened
Aug 11, 2025 -
Skipping pytree registration in case fsdp is enabled
#40075 opened
Aug 11, 2025 -
rm pytorch-triton dependency
#40076 opened
Aug 11, 2025 -
Update notification service MI325
#40078 opened
Aug 11, 2025 -
[WIP] Collated reports
#40080 opened
Aug 11, 2025 -
Removes DoLa decoding strategy
#40082 opened
Aug 11, 2025 -
Fix regression in mllama vision encoder
#40083 opened
Aug 11, 2025 -
remove sequence parallel in llama4
#40084 opened
Aug 11, 2025 -
`decoding_method` argument in generate
#40085 opened
Aug 11, 2025 -
build unittest for `ViTImageProcessorFast`
#40086 opened
Aug 11, 2025 -
DOCS: Add missing space in SECURITY.md
#40087 opened
Aug 11, 2025 -
Fix RuntimeError when loading quantized models with int8 weights (#39366)
#40090 opened
Aug 11, 2025 -
Replace `logger.warning` with `logger.warning_once` in `GradientCheckpointingLayer`
#40091 opened
Aug 12, 2025 -
Optimize LlamaAttention by fusing QKV projections
#40092 opened
Aug 12, 2025 -
fix(modeling_utils): correct initialization of missing and mismatched…
#40093 opened
Aug 12, 2025
180 Issues closed by 57 people
-
Whisper v-3 pipeline requiring a lot of memory when setting return_timestamps="word"
#27834 closed
Aug 11, 2025 -
Incorrect scaling of Gemma embeddings in float32 regime
#38702 closed
Aug 11, 2025 -
🐛 Bug Report: Accelerate config to disable torch dynamo is ignored by transformers automatic compilation
#39191 closed
Aug 11, 2025 -
Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39214 closed
Aug 11, 2025 -
[Mistral3] attn_implementation not applied to vision_tower.config in Mistral3Config due to init order
#40062 closed
Aug 11, 2025 -
Instantiating `google/gemma-3-4b-pt` with AutoModelForSequenceClassification Reports Unitialized Model
#39763 closed
Aug 11, 2025 -
`num_beams` > 1 leads to exception for Qwen2.5VL (Qwen family or all VLM models?)
#39723 closed
Aug 11, 2025 -
Triton version check compatibility on windows
#39985 closed
Aug 11, 2025 -
Whisper `.generate()` function not respecting `max_new_tokens` or `max_length`
#36183 closed
Aug 10, 2025 -
Gemma2 fall back to cpu execusion when attn_implementation='flash_attention_2'
#39188 closed
Aug 10, 2025 -
Previous PRs introduced a bug on Accumulated Gradients Losses
#40052 closed
Aug 9, 2025 -
Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model
#37248 closed
Aug 9, 2025 -
Pretrainedtokenizerfast Segmentation fault
#39099 closed
Aug 9, 2025 -
New release 4.53.0 breaks HF trainer/model
#39111 closed
Aug 9, 2025 -
Gradient accumulation steps for Vision Languge model
#39123 closed
Aug 9, 2025 -
Not capable of exporting Mistral to ONNX format with the use of caching
#39162 closed
Aug 9, 2025 -
Error when loading gguf file
#40040 closed
Aug 9, 2025 -
Weights not tied when loading `from_pretrained` with a wrapped model
#39900 closed
Aug 8, 2025 -
`TypeError: 'builtins.safe_open' object is not iterable` in `load_pytorch_state_dict_in_tf2_model `
#40028 closed
Aug 8, 2025 -
Major issues with transformers version causing rubbish generations with Gemma3 family using vllm
#40017 closed
Aug 8, 2025 -
Gemma3n get_placeholder_mask issue
#39991 closed
Aug 8, 2025 -
flash-attn cannot perform deterministic computation
#39982 closed
Aug 8, 2025 -
[DeepSeek-V3] Different rotary embedding implementation between DeepSeek-AI and Transformers
#39687 closed
Aug 8, 2025 -
ModernBertUnpaddedRotaryEmbedding __init__ error
#39934 closed
Aug 7, 2025 -
video_inputs are not passed to perception_lm
#40004 closed
Aug 7, 2025 -
Flash Attention fails with non aligned position_ids
#39814 closed
Aug 7, 2025 -
`convert_deepseek_vl_weights_to_hf.py` not included in v4.55.0 release.
#39966 closed
Aug 7, 2025 -
[Gemma3N] Audio processing issue
#39911 closed
Aug 7, 2025 -
v4.55.0 Idefics3 RuntimeError Tensors on different devices
#39947 closed
Aug 7, 2025 -
[gpt‑oss] eager_attention_forward not using sliding-window attention for GPT‑OSS models
#39954 closed
Aug 7, 2025 -
Finetune `gpt-oss-20b` with `mxfp4` quantization
#39969 closed
Aug 6, 2025 -
Fix grammatically incorrect variable name "expert_hitted" → "expert_hit" in MoE implementation
#39955 closed
Aug 6, 2025 -
transformers serve doesn't handle OPTIONS http method
#39932 closed
Aug 6, 2025 -
454545
#39864 closed
Aug 6, 2025 -
ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#38442 closed
Aug 6, 2025 -
Streaming mode support on HF vs kyutai-labs for the mimi model
#38535 closed
Aug 6, 2025 -
enable GraniteMoeHybridIntegrationTest in UT
#38542 closed
Aug 6, 2025 -
Llama4 inference encounter unsupported op in dynamo ?
#38118 closed
Aug 6, 2025 -
Misleading WandB error when WANDB_DISABLED=True and report_to="wandb" are both set
#39878 closed
Aug 5, 2025 -
Inefficient memory resharding in attention layer
#39072 closed
Aug 5, 2025 -
Inefficient default GELU implementation in GPT2
#39073 closed
Aug 5, 2025 -
AttributeError: 'HfTrainerDeepSpeedConfig' object has no attribute 'is_zero3'
#39081 closed
Aug 5, 2025 -
Why `lm-head` weight still exists with `"tie_word_embeddings": true`
#39812 closed
Aug 4, 2025 -
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
#39704 closed
Aug 4, 2025 -
ValueError: Max cache length is not consistent across layers
#39877 closed
Aug 4, 2025 -
Allow video objects (np array etc.) in apply_chat_template (not just paths or urls)
#36560 closed
Aug 4, 2025 -
Exception while inference Qwen2VL and Qwen2VL, assert module.weight.shape[1] == 1
#38665 closed
Aug 4, 2025 -
model.generate custom encoder and decoder outputs/inputs
#39871 closed
Aug 3, 2025 -
Vision Encoder-Decoder fails with LLaMA decoder due to missing cross-attention implementation
#34674 closed
Aug 3, 2025 -
Only with newest version (4.52.4): from_pretrained() esm.embeddings.position_embeddings.weight missing
#39038 closed
Aug 3, 2025 -
pytorch version 1.8.1 compatibility
#39049 closed
Aug 3, 2025 -
TypeError: couldn't find storage object Float8_e4m3fnStorage - which version is needed for this?
#39409 closed
Aug 2, 2025 -
'Mistral3Model' object has no attribute 'prepare_inputs_for_generation'
#39007 closed
Aug 2, 2025 -
Not able to use flash attention with torch.compile with model like BERT
#39017 closed
Aug 2, 2025 -
Tool-Calling Model (ToolACE-2-Llama-3.1-8B) Responds with Irrelevant Tool message on General Question
#39833 closed
Aug 1, 2025 -
Add MM Grounding DINO
#37744 closed
Aug 1, 2025 -
Using Gemma3n with text-only generation requires image dependencies
#39169 closed
Aug 1, 2025 -
Qwen2-VL err
#39818 closed
Jul 31, 2025 -
Option to tokenize messages one after the other
#39417 closed
Jul 31, 2025 -
[rank0]: ValueError: Your setup doesn't support bf16/gpu.
#39716 closed
Jul 31, 2025 -
Blip model got performance regression on compile mode after refactor cache.
#39774 closed
Jul 30, 2025 -
BioGPT Implementation Bug Report
#39776 closed
Jul 30, 2025 -
tokenizer decode decode with timestamp fails for extended vocabulary
#35330 closed
Jul 30, 2025 -
How to streaming output audio of Qwen2.5-omni-7b
#37570 closed
Jul 30, 2025 -
Significant WER Increase with Whisper Chunking Compared to Long-Form Transcription
#38347 closed
Jul 30, 2025 -
Transformers version causing my finetuned model to hallucinate
#38378 closed
Jul 30, 2025 -
`load_balancing_loss_func` doesn't support 4D attention mask
#38910 closed
Jul 30, 2025 -
Max cache length issue with Gemma 3
#39711 closed
Jul 29, 2025 -
ModernBERT has been totally destroyed by PR #38974 and #38838
#39747 closed
Jul 29, 2025 -
Support loading Qwen3 MoE GGUF
#39721 closed
Jul 29, 2025 -
[XPU] Model get OOM when loading models
#39627 closed
Jul 29, 2025 -
encoder decoder model compile failed after refactor cache
#39746 closed
Jul 29, 2025 -
_supports_static_cache disappear
#39744 closed
Jul 29, 2025 -
device mismatch error when using `SlidingWindowLayer`.
#39730 closed
Jul 28, 2025 -
AddedToken should check content on `_update`
#39586 closed
Jul 28, 2025 -
Checkpointing broken for classifier training multi-gpu
#38925 closed
Jul 28, 2025 -
vlmm 0.10.0 load baidu/ERNIE-4.5-300B-A47B-Base-PT error
#39719 closed
Jul 28, 2025 -
[i18n-<languageCode>] Translating docs to <عربي>
#38381 closed
Jul 27, 2025 -
Not installable on arm64 due to jaxlib upper bound
#36611 closed
Jul 27, 2025 -
KeyError in Llama-4-Maverick-17B-128E-Instruct-FP8 Inference with Offloading
#38281 closed
Jul 27, 2025 -
ImportError: DLL load failed while importing _safetensors_rust: The specified module could not be found
#38479 closed
Jul 27, 2025 -
Contribute to Transformers on windows natively without WSL
#38601 closed
Jul 27, 2025 -
Reproducibility Issue of Siglip2 with Blackwell Architecture GPUs (RTX 5090)
#38874 closed
Jul 27, 2025 -
The wrong config parameter found in src/transformers/models/qwen2_5_vl/configuration_qwen2_5_vl.py.
#38889 closed
Jul 27, 2025 -
CRITICAL ISSUE REPORT! GEMMA 3 1B CANNOT RUN!
#39686 closed
Jul 26, 2025 -
text-generation extremely slow with large `bad_words_ids` list
#39512 closed
Jul 25, 2025 -
Does Gemma 3 need positions ids to be 1-indexed explicitly?
#39023 closed
Jul 25, 2025 -
Add Deepseek-VL
#36110 closed
Jul 25, 2025 -
Grammatical error in the "Loading model's" page
#39018 closed
Jul 25, 2025 -
Inference API Returning 404
#39650 closed
Jul 25, 2025 -
Backwards incompatible change in returned hidden states
#39558 closed
Jul 25, 2025 -
Typo in `apply_transcrition_request` method name
#39530 closed
Jul 25, 2025 -
video_auto_processing.py breaks everything
#38846 closed
Jul 25, 2025 -
Should `compute_metrics` only run on the main process when doing DDP?
#38851 closed
Jul 25, 2025 -
VoxtralForConditionalGeneration import error
#39611 closed
Jul 24, 2025 -
`Trainer._save()` May Incorrectly Save Empty Model State (safetensors)
#38686 closed
Jul 24, 2025 -
Wandb isn't logging config in offline mode
#38968 closed
Jul 23, 2025 -
The similarity between image and text in siglip2 is very low
#39597 closed
Jul 23, 2025 -
Does Qwen_2_5_VL support variable length attention computation?
#38007 closed
Jul 23, 2025 -
Have to import cv2 and pop up window frist, or else it stuck forever
#38139 closed
Jul 23, 2025 -
CI skipped failures tracking issue
#38820 closed
Jul 23, 2025 -
"ValueError: Predictions and/or references don't match the expected format." error
#39510 closed
Jul 22, 2025 -
Clarification on Recent Changes to Loss and Gradient Accumulation
#39567 closed
Jul 22, 2025 -
Add EfficientLoFTR model
#36354 closed
Jul 22, 2025 -
Gemma3 bidirectional mask for image tokens isn't reaching attention forward
#39389 closed
Jul 22, 2025 -
Is the new Intel–Weizmann speculative decoding algorithm integrated into Transformers?
#39545 closed
Jul 21, 2025 -
Enabling `average_tokens_across_devices` by default in Trainer
#39392 closed
Jul 21, 2025 -
T5Gemma problem with tokenizer(?)
#39521 closed
Jul 21, 2025 -
Causal mask is not compatible with Qwen2-VL when using padding-free training
#39400 closed
Jul 21, 2025 -
KeyError: 'llava_qwen2'
#39533 closed
Jul 21, 2025 -
Add Gemma 3 For Sequence Classification
#36755 closed
Jul 21, 2025 -
Expected all tensors to be on the same device, but found at least two devices
#37545 closed
Jul 21, 2025 -
DynamicCache results in too many torch recompiles after 4.51
#37908 closed
Jul 21, 2025 -
Confusion about num_labels and problem_type in classification logic 🐛
#38219 closed
Jul 21, 2025 -
Silent Overwrite of Custom Optimizer When Using DeepSpeed with Transformers Trainer
#38753 closed
Jul 21, 2025 -
DTensor issues when running Llama4ForConditionalGeneration with tensor parallel.
#38803 closed
Jul 21, 2025 -
Version 4.52.3 leads to error after bundling with pyinstaller
#38402 closed
Jul 20, 2025 -
Issue importing models in jupyter notebooks 'No module named transformers.models.ipynb_checkpoints'
#38726 closed
Jul 19, 2025 -
T5Gemma returning 0 loss for s2s training
#39514 closed
Jul 19, 2025 -
Whisper models appear to be broken with Flash Attention 2
#38662 closed
Jul 18, 2025 -
Speculative Decoding(do_sample=False) get different outputs
#39421 closed
Jul 18, 2025 -
BarkProcessor voice_preset doesn't work
#34634 closed
Jul 18, 2025 -
dataset 4.0.0 , issue with load_dataset loading audio dataset
#39497 closed
Jul 18, 2025 -
Gemma3n don't support chat with history
#39498 closed
Jul 18, 2025 -
modeling_flax_gemma.FlaxGemmaModule failed with incompatible shapes when running with GemmaConfig
#39492 closed
Jul 18, 2025 -
Error for `return_assistant_tokens_mask` in MLLM processor
#38521 closed
Jul 18, 2025 -
`get_video_features` in XCLIPModel always returns `pooled_output`
#38709 closed
Jul 18, 2025 -
I can't make sense of this works on Windows but not on Linux AutoModelForCausalLM.from_pretrained
#39461 closed
Jul 17, 2025 -
HfArgumentParser cannot parse `str` for local path
#39462 closed
Jul 17, 2025 -
breaking changes in ESM model classes
#39405 closed
Jul 17, 2025 -
[torch.export] Unhandled FakeTensor Device Propagation for two different devices
#38975 closed
Jul 17, 2025 -
QA pipeline prediction generates wrong response when `top_k` param > 1
#38984 closed
Jul 17, 2025 -
When will transformers 4.51.4 be released?
#37812 closed
Jul 17, 2025 -
CheckpointLoaderSimple ..... Error while deserializing header: InvalidHeaderDeserialization
#38692 closed
Jul 17, 2025 -
can't torch.export.export tinyllama model
#39463 closed
Jul 17, 2025 -
Missing 4 spaces in SmolVLMImageProcessorFast
#39442 closed
Jul 16, 2025 -
ModernBERT for Sequence Classification - issues with finetuning
#38720 closed
Jul 16, 2025 -
SigLip2 text pooler output selection
#39269 closed
Jul 16, 2025 -
[YosoConfig] Missing `architectures` field
#39424 closed
Jul 16, 2025 -
Qwen3 tokenizer wrong offset_mapping
#39401 closed
Jul 16, 2025 -
OpenTelemetry Collector Connection error when installing the latest release 4.53.0 during `docker build`
#39143 closed
Jul 16, 2025 -
DBRX model passes probabilities and not logits to the load balancer
#39055 closed
Jul 16, 2025 -
`verify_tp_plan` function raises an error if a key without '.' is given
#38419 closed
Jul 16, 2025 -
Whisper chunking algorithm increases WER
#37789 closed
Jul 16, 2025 -
model_type = self._reverse_config_mapping[key.__name__] KeyError: 'Qwen2RMConfig'
#38517 closed
Jul 16, 2025 -
TypeError: 'NoneType' object is not iterable in ESM when using DDP training
#38667 closed
Jul 16, 2025 -
LlamaAttention forward function type hint is incorrect
#38739 closed
Jul 15, 2025 -
`quantization_method` is not cleared after calling `.dequantize()`
#39295 closed
Jul 15, 2025 -
Saving model with shared tensors fails on cpu but succeeds on gpu
#33688 closed
Jul 15, 2025 -
Mypy errors since v4.51.0
#37339 closed
Jul 15, 2025 -
Errors using TinyLlama-1.1B-Chat-v1.0 and DirectML
#38340 closed
Jul 15, 2025 -
Pytorch language_modelling example run_clm fails when streaming is enabled
#39285 closed
Jul 15, 2025 -
`transformers.utils.metrics` sets global `TracerProvider`
#39115 closed
Jul 15, 2025 -
There is no transformers version that can run DeepSeek V3 generate
#38710 closed
Jul 15, 2025 -
Support of Qwen3 GGUF model
#38650 closed
Jul 15, 2025 -
Latest Transformers release causes CUDA out-of-memory errors during VisionLLM fine-tuning
#39337 closed
Jul 14, 2025 -
Paligemma model card needs update
#38544 closed
Jul 14, 2025 -
Using resnet-18 in flax
#39388 closed
Jul 14, 2025 -
Getting Warnings When Instantiating Object Detection Models Due to Meta Tensor Initialization
#37615 closed
Jul 14, 2025 -
4.52.2 报错Could not import module 'Qwen3ForCausalLM'
#38291 closed
Jul 14, 2025 -
Transformers fail to load deepseek-ai/DeepSeek-V3 with vllm
#38588 closed
Jul 13, 2025 -
MambaInnerFnBackward
#38600 closed
Jul 13, 2025 -
Failed to full fine tuning code5p 2B
#38602 closed
Jul 13, 2025 -
Exporting google/gemma-3n-e4b-it language_model (decoder) into ONNX format
#39328 closed
Jul 12, 2025 -
Removing the modification of loss value due to rounding off to 4 digits
#38032 closed
Jul 12, 2025 -
Clarification on default top_k sampling parameter
#38549 closed
Jul 12, 2025 -
hidden_states, self_attn_weights = self.self_attn( ValueError: too many values to unpack (expected 2)
#38554 closed
Jul 12, 2025
123 Issues opened by 118 people
-
Could not import module 'AutoTokenizer'. Are this object's requirements defined correctly?
#40089 opened
Aug 11, 2025 -
Default behavior of llama tokenizers breaks text by removing spaces (round trip is not identity function)
#40088 opened
Aug 11, 2025 -
TypeError in DogeDecoderLayer with MoE Configuration when using dropout()
#40079 opened
Aug 11, 2025 -
gpt_oss inference activates *all* experts for every token
#40073 opened
Aug 11, 2025 -
Issue running model from ImageSegmentationPipeline
#40071 opened
Aug 10, 2025 -
Transformer GGUF support philosophy / naive question
#40070 opened
Aug 10, 2025 -
[BUG] No umt5 config for GGUF. This is not supported configuration.
#40067 opened
Aug 10, 2025 -
Question: How to write a custome tokenizer form scratch
#40056 opened
Aug 9, 2025 -
Whisper transcription accuracy improves when last 1600 samples of input audio are muted
#40054 opened
Aug 9, 2025 -
Support text classification with GPT-OSS models
#40050 opened
Aug 9, 2025 -
Please support loading Qwen 2.5 VL from GGUF
#40049 opened
Aug 9, 2025 -
Recent releases break backwards-compatibility with key_cache
#40046 opened
Aug 8, 2025 -
Support loading glm4moe GGUF
#40042 opened
Aug 8, 2025 -
`plamo-2-1b` broken on latest main
#40034 opened
Aug 8, 2025 -
Add Padding Strategy to DataCollatorForLanguageModeling
#40032 opened
Aug 8, 2025 -
[gpt-oss] MoE routing bug in the mxfp4 implementation (in distributed setting)
#40031 opened
Aug 8, 2025 -
accelerate==1.10.0 and safetensors==0.6.1 are incompatible with transformers==4.53.1
#40020 opened
Aug 8, 2025 -
need GptOssForSequenceClassification
#40018 opened
Aug 8, 2025 -
Customizable Logit Warping Strategies for Generation
#40010 opened
Aug 7, 2025 -
Possible wrong init call
#40001 opened
Aug 7, 2025 -
[gpt-oss] Transform checkpoint from safetensors to state dict
#39992 opened
Aug 7, 2025 -
CVE fix for v4.37.2 and v4.38.0
#39983 opened
Aug 7, 2025 -
FSDP2 not compatible with transformers >= 4.54.0 GenericForTokenClassification
#39977 opened
Aug 7, 2025 -
bug in new transformers: 'Florence2ForConditionalGeneration' object has no attribute '_supports_sdpa'
#39974 opened
Aug 7, 2025 -
Gemma3 with fp16 in inference (I don't know if this change is working in fine-tune) #BUG FIX
#39972 opened
Aug 6, 2025 -
change `dataloader_persistent_workers` default value to `True`
#39963 opened
Aug 6, 2025 -
Retaining computational graph after using AutoImageProcessor
#39946 opened
Aug 6, 2025 -
GPT-OSS mxfp4 with triton_kernel: make_default_matmul_mxfp4_w_layout not found
#39945 opened
Aug 6, 2025 -
Breaking change in unset `_tp_plan` attribute
#39943 opened
Aug 6, 2025 -
Still getting "fp16 mixed precision requires a GPU (not 'mps')." error
#39935 opened
Aug 5, 2025 -
[Gemma3N] Not able to add new special tokens to model/tokenizer due to projection error
#39921 opened
Aug 5, 2025 -
When using batch_eval_metrics, inputs are not gathered from different device, which is wrong behavior
#39916 opened
Aug 5, 2025 -
Question: Llama4 weight reshaping
#39910 opened
Aug 5, 2025 -
Hidden torchvision>=0.19.0 dependency results in quiet import failures of e.g. PreTrainedModel
#39907 opened
Aug 5, 2025 -
Add VideoPrism
#39893 opened
Aug 4, 2025 -
[Feature Request] Automatically log parallelism configuration from Accelerate to W&B
#39882 opened
Aug 4, 2025 -
Checking for additional_chat_templates doesn't work without internet (ConnectionError)
#39873 opened
Aug 3, 2025 -
InternVL, PerceptionLM inference freeze in 4.54.1
#39872 opened
Aug 3, 2025 -
Tensor parallelism for GLM-4.5
#39868 opened
Aug 2, 2025 -
Florence2ForConditionalGeneration does not support Flash Attention 2.0 yet ?...
#39860 opened
Aug 2, 2025 -
`make fixup` can't find PLC1802
#39853 opened
Aug 1, 2025 -
Inconsistent Function calling behaviour by Mistral-7B-Instruct-v0.3
#39852 opened
Aug 1, 2025 -
Support topNSigma sampling in `generate`
#39850 opened
Aug 1, 2025 -
Accelerate seems to default mixed precision to bf16 when passing a DeepSpeed config.
#39849 opened
Aug 1, 2025 -
Expected behavior of `compute_result` is hard to expect and inconsistent
#39842 opened
Aug 1, 2025 -
MistralCommonTokenizer does not match PreTrainedTokenizer
#39841 opened
Aug 1, 2025 -
pack_image_features RuntimeError when vision_feature_select_strategy="full"
#39839 opened
Aug 1, 2025 -
Crash when running Llama4 on transformers-4.54.1
#39835 opened
Aug 1, 2025 -
Allow extra outputs from `GenerationMixin.generate`
#39834 opened
Aug 1, 2025 -
Missing einops dependency causing ModuleNotFoundError
#39811 opened
Jul 31, 2025 -
Fine tuning qwen2.5 error
#39804 opened
Jul 31, 2025 -
Memory leak occurred during training qwen-2.5-vl
#39803 opened
Jul 31, 2025 -
Regression - High memory usage when using transformers model with FSDP + LoRA
#39795 opened
Jul 30, 2025 -
`transformers serve` Fails to Handle Messages with Nested Content
#39791 opened
Jul 30, 2025 -
ViTPose+ models post processing doest not work for `dataset_index : 5`
#39789 opened
Jul 30, 2025 -
"CSM audio generation lacks reliable EOS: does not generate all-zero frames → never stops early"
#39787 opened
Jul 30, 2025 -
pip install 'transformers[torch]' pulls nvidia dependencies
#39780 opened
Jul 30, 2025 -
transformers env fails with: ModuleNotFoundError: No module named 'PIL'
#39779 opened
Jul 30, 2025 -
Granite 4.0 Tiny Preview inference broken in
#39775 opened
Jul 30, 2025 -
would it be possible to standardize on the vx.y.z format for all tags
#39771 opened
Jul 30, 2025 -
Model with non-string type property tool giving incomplete response using VLLM
#39767 opened
Jul 29, 2025 -
Follow-up on Issues Regarding Training State Restoration from Interruptions
#39755 opened
Jul 29, 2025 -
Inv frequency has not default, going against our philosophy
#39753 opened
Jul 29, 2025 -
Qwen2_5_VLForConditionalGeneration cfg forward twice error
#39749 opened
Jul 29, 2025 -
[transformers==4.54.0] FSDP1 forward misalignment after loading state dict
#39720 opened
Jul 28, 2025 -
OWLv2 with visual prompt - alternative query embedding selection method
#39710 opened
Jul 27, 2025 -
[i18n-<bn>] Translating docs to <Bengali>
#39705 opened
Jul 27, 2025 -
ValueError: Number of image placeholders in the prompt does not match the number of images. internVL3
#39703 opened
Jul 26, 2025 -
No flag to support Conditional Parameter Loading for gemma-3n-E2B models in transformer
#39699 opened
Jul 26, 2025 -
SigLIP2 documentation example has multiple errors (model/processor mismatch + quantization failure)
#39692 opened
Jul 26, 2025 -
Qwen 2.5 VL - error without attention_mask
#39685 opened
Jul 26, 2025 -
Add multi-candidate & tree search for assisted decoding (speculative decoding)
#39684 opened
Jul 25, 2025 -
Accelerate beam search decoding via tree attention
#39682 opened
Jul 25, 2025 -
error: argument --deepspeed: invalid dict value: '<path>'
#39673 opened
Jul 25, 2025 -
Issue when initializing a DynamicCache
#39668 opened
Jul 25, 2025 -
T5Gemma training not working
#39656 opened
Jul 25, 2025 -
Please develop DataCollatorForVisionLanguageModeling to support visual model training !!!
#39647 opened
Jul 24, 2025 -
FSDP v1 bug: trainer incorrectly uses an unwrapped model
#39619 opened
Jul 23, 2025 -
SageAttention for attention implementation?
#39618 opened
Jul 23, 2025 -
Trainer: Error when folded metrics are saved
#39616 opened
Jul 23, 2025 -
Qwen3 Fails w/4D Attn Mask when using FA2
#39608 opened
Jul 23, 2025 -
ImageClassificationPipeline preprocess should accept numpy/tensor arrays
#39607 opened
Jul 23, 2025 -
Does transformers support python3.13 -- disable-gil or python3.14 free threading?
#39596 opened
Jul 23, 2025 -
Model forward execution in full eager mode?
#39565 opened
Jul 21, 2025 -
Why `is_causal` is not used in `flash_attention_forward` ?
#39554 opened
Jul 21, 2025 -
Is there plan to integrate ColQwen2.5 into Transformers?
#39549 opened
Jul 21, 2025 -
ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time
#39542 opened
Jul 21, 2025 -
Add muon and flash-muon optimizer
#39537 opened
Jul 20, 2025 -
training google colab error
#39527 opened
Jul 19, 2025 -
paged attention NOT working with Qwen Models
#39525 opened
Jul 19, 2025 -
T5Gemma failing on provided example
#39522 opened
Jul 19, 2025 -
Export voxtral to ExecuTorch
#39511 opened
Jul 18, 2025 -
Whisper transcription is 2x slower between 4.51.3 -> 4.52.1
#39508 opened
Jul 18, 2025 -
Add Muon Optimiser for 2x faster convergence
#39495 opened
Jul 18, 2025 -
Transformers still tries to use apex.amp which is no longer a thing in apex.
#39484 opened
Jul 17, 2025 -
Adding Space-Time-MiniLM-v0
#39479 opened
Jul 17, 2025 -
Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved
#39476 opened
Jul 17, 2025 -
Unexpected behaviour with transformers versions above 4.28 for Donut
#39473 opened
Jul 17, 2025 -
Autoformer get_lagged_subsequences always true if condition
#39460 opened
Jul 16, 2025 -
Add Interactive Multi-Modal Attention Visualization for Vision-Language Models
#39440 opened
Jul 15, 2025 -
Export LFM2 to ExecuTorch
#39436 opened
Jul 15, 2025 -
Add DiCoW: Diarization-Conditioned Whisper
#39430 opened
Jul 15, 2025 -
Gemma 3 Compilation Issues During Generation
#39427 opened
Jul 15, 2025 -
object detection : matchin outputs.last_hidden_state with results
#39426 opened
Jul 15, 2025 -
Exeception 3 type mismatch
#39413 opened
Jul 15, 2025 -
FP8 training support for Model Parallel / Tensor Parallel (MP/TP)
#39410 opened
Jul 15, 2025 -
Off-by-one error when using flash_attention with a sliding window
#39408 opened
Jul 15, 2025 -
Whisper `return_language` with pipeline no longer working
#39404 opened
Jul 14, 2025 -
Qwen2.5-VL Sharding error when using Tensor Parallelism
#39399 opened
Jul 14, 2025 -
Mask2FormerImageProcessor yields inconsistent results between single and batch inference
#39382 opened
Jul 12, 2025 -
Handling of full_text_row_masked_out_mask in mllama is incorrect.
#39379 opened
Jul 12, 2025
138 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Segment Anything 2 (SAM2)
#32317 commented on
Aug 12, 2025 • 166 new comments -
[WiP] Add xcodec2 model
#37868 commented on
Jul 30, 2025 • 76 new comments -
Add support for Florence-2
#38188 commented on
Aug 7, 2025 • 61 new comments -
blt wip
#38579 commented on
Aug 11, 2025 • 47 new comments -
Feat: add Kwai-Keye transformers
#39292 commented on
Aug 11, 2025 • 42 new comments -
Add fastconformer encoder support for nvidia/parakeet and nvidia/canary models
#39062 commented on
Aug 3, 2025 • 21 new comments -
[WIP] Computer vision util: vision visualizer
#36892 commented on
Aug 11, 2025 • 16 new comments -
support MiniCPM-o2.6
#37917 commented on
Aug 12, 2025 • 16 new comments -
Add NVIDIA Cosmos
#36476 commented on
Jul 16, 2025 • 14 new comments -
Add T5LA models
#39293 commented on
Jul 25, 2025 • 14 new comments -
feat: Add ConvaiCausalLM model for Hindi Causal Language Modeling
#37837 commented on
Jul 16, 2025 • 10 new comments -
Add standardized model card for facebook/data2vec-audio-base-960h
#39368 commented on
Jul 24, 2025 • 10 new comments -
Fix the issue that csm model cannot work with pipeline mode.
#39349 commented on
Aug 11, 2025 • 7 new comments -
[omni modality] support composite processor config
#38142 commented on
Aug 5, 2025 • 7 new comments -
Add Ovis2 model and processor implementation
#37088 commented on
Aug 11, 2025 • 5 new comments -
fix: filter None router logits in Qwen3 MoE and handle empty router logits (#39203)
#39206 commented on
Jul 21, 2025 • 5 new comments -
Force real tensors and clone state_dict in src/transformers/modeling_utils.py
#38114 commented on
Jul 15, 2025 • 3 new comments -
Add X-Codec model
#38248 commented on
Jul 23, 2025 • 3 new comments -
Fix: Add version check for timm to support mobilenetv5 models (fixes #39208)
#39264 commented on
Jul 14, 2025 • 3 new comments -
feat: add sliding window attention to Continuous Batching
#39225 commented on
Aug 1, 2025 • 3 new comments -
fix bug when using DP in trl, the batch size of input and output dism…
#38938 commented on
Aug 11, 2025 • 2 new comments -
Fix audio pipeline with torchcodec input
#39309 commented on
Aug 1, 2025 • 2 new comments -
Make executorch integration more seamless by analyzing model signature
#36969 commented on
Jul 16, 2025 • 1 new comment -
Provide clearer instructions on how to specify target language.
#38786 commented on
Jul 21, 2025 • 1 new comment -
add pin memory and block table
#39130 commented on
Aug 11, 2025 • 1 new comment -
deci gguf support
#38669 commented on
Jul 29, 2025 • 1 new comment -
Fix ModernBERT tokenizer issue with is_split_into_words flag
#38564 commented on
Jul 16, 2025 • 1 new comment -
another way to use shift_labels
#38533 commented on
Jul 16, 2025 • 1 new comment -
Bug in modeling_bart.eager_attention_forward
#39365 commented on
Aug 11, 2025 • 0 new comments -
Add StyleTTS 2
#35790 commented on
Jul 28, 2025 • 0 new comments -
Fix ImportError: cannot import name 'GenerationMixin' from 'transformers.generation'
#36011 commented on
Jul 21, 2025 • 0 new comments -
env.useBrowserCache = true causes JSON parsing error, forced to disable cache making app slower.
#39352 commented on
Aug 11, 2025 • 0 new comments -
Add Phi-3.5-vision
#36036 commented on
Aug 1, 2025 • 0 new comments -
Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs
#39364 commented on
Aug 7, 2025 • 0 new comments -
[Validation] First implementation of `@strict` from `huggingface_hub`
#36534 commented on
Jul 29, 2025 • 0 new comments -
fix unexpected kws of input_ids when setup no speech detection of whisper
#36809 commented on
Jul 23, 2025 • 0 new comments -
Update docstring for glm4v
#39357 commented on
Jul 14, 2025 • 0 new comments -
RoBERTa is not well implemented for tokenizers with pad_token_id != 1
#34528 commented on
Aug 11, 2025 • 0 new comments -
Add FAST
#35476 commented on
Jul 30, 2025 • 0 new comments -
Add JinaBERT model
#35320 commented on
Jul 15, 2025 • 0 new comments -
Add Molmo (7B-D, 7B-O, 70B)
#33962 commented on
Jul 21, 2025 • 0 new comments -
[Community contributions] Model cards
#36979 commented on
Aug 11, 2025 • 0 new comments -
Load a pretrainedfast tokenizer if fast=true and tokenizer.json exists
#33751 commented on
Jul 15, 2025 • 0 new comments -
[Contributions Welcome] Add Fast Image Processors
#36978 commented on
Aug 11, 2025 • 0 new comments -
DeepSpeed sequence parallelism (aka Ulysses) integration with HF transformer
#32305 commented on
Jul 15, 2025 • 0 new comments -
Implement MambaForSequenceClassification
#31155 commented on
Jul 15, 2025 • 0 new comments -
AutoConfig has potential issue with composite config.
#38258 commented on
Aug 12, 2025 • 0 new comments -
Fix Inconsistant `input_feature` length and `attention_mask` length in `WhisperFeatureExtractor`
#39221 commented on
Jul 21, 2025 • 0 new comments -
feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT
#39140 commented on
Aug 5, 2025 • 0 new comments -
Disable static cache on certain MoE models
#39108 commented on
Jul 28, 2025 • 0 new comments -
Update Dockerfiles to install packages inside a virtual environment
#39098 commented on
Aug 11, 2025 • 0 new comments -
Bug/38843 fix pos idx in fp32 parameter error
#39064 commented on
Jul 22, 2025 • 0 new comments -
Fix slow test_moshika_greedy_unconditional_fp16
#39251 commented on
Jul 15, 2025 • 0 new comments -
Allow compression on meta device
#39039 commented on
Aug 8, 2025 • 0 new comments -
Add Dust3R
#38805 commented on
Jul 22, 2025 • 0 new comments -
Adding custom 3d mask into ModernBert
#38671 commented on
Jul 29, 2025 • 0 new comments -
Adds Universal Intelligence to awesome transformers documentation
#38641 commented on
Aug 7, 2025 • 0 new comments -
Add Bagel
#38569 commented on
Aug 10, 2025 • 0 new comments -
🔴[`Attention`] Bert-based Models Attention Refactor
#38301 commented on
Jul 16, 2025 • 0 new comments -
Fix the shape of ModernBertForMaskedLM's output hidden_states
#38272 commented on
Jul 16, 2025 • 0 new comments -
Add dates to the model docs
#39320 commented on
Aug 11, 2025 • 0 new comments -
add profiler to trainer
#37889 commented on
Jul 29, 2025 • 0 new comments -
fix colpali mapping
#39353 commented on
Jul 14, 2025 • 0 new comments -
Update ruff to 0.12.3 and apply its fixes
#37809 commented on
Jul 21, 2025 • 0 new comments -
Vectorize deepseek moe
#37769 commented on
Jul 16, 2025 • 0 new comments -
fix: qwen2.5 omni apply_chat_template system content check
#37511 commented on
Aug 8, 2025 • 0 new comments -
[RFC] Fix Gemma 3 FP16 with activation scaling
#37226 commented on
Jul 16, 2025 • 0 new comments -
trying custom tokenizer fix
#37177 commented on
Jul 16, 2025 • 0 new comments -
Add Plain-DETR
#37096 commented on
Aug 11, 2025 • 0 new comments -
Add Matching Anything by Segmenting Anything (MASA) MOT tracking model
#32164 commented on
Jul 14, 2025 • 0 new comments -
_load_rng_state after get_batch_samples may break training reproducibility when dataloader has random operations
#39215 commented on
Jul 23, 2025 • 0 new comments -
🌐 [i18n-KO] Translating docs to Korean
#20179 commented on
Jul 24, 2025 • 0 new comments -
Adding support for Gemma 3n GGUFs
#39329 commented on
Jul 24, 2025 • 0 new comments -
Model implmenetation using Liger Kernel layers
#38416 commented on
Jul 24, 2025 • 0 new comments -
Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer
#30725 commented on
Jul 24, 2025 • 0 new comments -
Trainer/accelerate doesn't save model when using FSDP with SHARDED_STATE_DICT
#30491 commented on
Jul 24, 2025 • 0 new comments -
Segfault on Apple M4 using AutoModelForSequenceClassification with BETO model on CPU
#39020 commented on
Jul 25, 2025 • 0 new comments -
Whisper word-level timestamp extraction fails with beam search
#36093 commented on
Jul 28, 2025 • 0 new comments -
Issue with module.smart_apply(module._initialize_weights) in the initialize_weights Function of modeling_utils.py
#39027 commented on
Jul 28, 2025 • 0 new comments -
Output logits differ significantly for different attn_implementations on image inputs
#39067 commented on
Jul 28, 2025 • 0 new comments -
ValueError: GGUF model with architecture deci is not supported yet.
#37736 commented on
Jul 28, 2025 • 0 new comments -
Resuming training from an interrupted checkpoint fails to save the final checkpoint.
#38939 commented on
Jul 30, 2025 • 0 new comments -
How to use other acceleration apis of npu?
#39105 commented on
Jul 31, 2025 • 0 new comments -
Adding native support to load GGUF models using transformers
#38063 commented on
Jul 31, 2025 • 0 new comments -
Add support for BAGEL from ByteDance
#38267 commented on
Jul 31, 2025 • 0 new comments -
Object detection training/fine-tuning for Owl-vit/Owlv2
#33664 commented on
Aug 1, 2025 • 0 new comments -
Community contribution: Adding GGUF support for more architectures
#33260 commented on
Aug 2, 2025 • 0 new comments -
If a training job job failed MLFlow will not be reported and MLFlow shows job still running
#30333 commented on
Jul 15, 2025 • 0 new comments -
[DOCS] Add `pruna` as optimization framework
#38740 commented on
Jul 16, 2025 • 0 new comments -
Modernbert 3D attention mask
#38040 commented on
Jul 16, 2025 • 0 new comments -
Automatic dynamic batch size selection for DataCollatorWithFlattening
#33945 commented on
Jul 16, 2025 • 0 new comments -
Flex attention support with arbitrary 4d mask for LlamaModel
#33898 commented on
Jul 17, 2025 • 0 new comments -
Add `pruna` integration for loading model through `transformers.from_pretrained` / `pipeline`.
#37971 commented on
Jul 17, 2025 • 0 new comments -
Add HF integration dates + paper release dates to the model docs
#39319 commented on
Jul 18, 2025 • 0 new comments -
The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct
#33399 commented on
Jul 18, 2025 • 0 new comments -
Safetensors deserializing silently mishandles tied parameters
#38870 commented on
Jul 18, 2025 • 0 new comments -
Support 2D Array Inputs in Wav2Vec2FeatureExtractor for Non-Waveform Modalities
#39291 commented on
Jul 18, 2025 • 0 new comments -
RuntimeError when loading llmcompressor W8A8 quantized model: int8 dtype in weight initialization
#39366 commented on
Jul 19, 2025 • 0 new comments -
Error: StaticCache.__init__() got an unexpected keyword argument 'batch_size'
#38914 commented on
Jul 20, 2025 • 0 new comments -
Implement Titans Architecture with GRPO Fine-Tuning
#36352 commented on
Jul 21, 2025 • 0 new comments -
`AutoTokenizer.from_pretrained` does not propagate `token`
#39030 commented on
Jul 21, 2025 • 0 new comments -
Caching of model code in ~/.cache/huggingface/modules/transformers_modules
#39107 commented on
Jul 22, 2025 • 0 new comments -
add MiniCPM-o
#37029 commented on
Jul 22, 2025 • 0 new comments -
Unknown Model (mobilenetv5_300m_enc) when loading Gemma 3n
#39208 commented on
Jul 22, 2025 • 0 new comments -
Inference with model.generate( ) using a quantized model leads to assertion error
#39311 commented on
Aug 9, 2025 • 0 new comments -
Qwen3 MOE models w/non-empty `mlp_only_layers` fail when `output_router_logits=True`
#39203 commented on
Aug 9, 2025 • 0 new comments -
Improve CI/CD by completing migration from setup.py to pyproject.toml
#38928 commented on
Aug 9, 2025 • 0 new comments -
CUDA OOM when running meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
#37532 commented on
Aug 9, 2025 • 0 new comments -
`MoshiIntegrationTests` started to fail after #34464
#38725 commented on
Aug 9, 2025 • 0 new comments -
TypeError: GenerationMixin._extract_past_from_model_output() got an unexpected keyword argument 'standardize_cache_format'
#39336 commented on
Aug 10, 2025 • 0 new comments -
[Trainer] Eval loss depends on batch size (with solution)
#39241 commented on
Aug 10, 2025 • 0 new comments -
bf16_full_eval=True moves model to device before FSDP application and causes cuda OOM
#39136 commented on
Aug 10, 2025 • 0 new comments -
QWEN2VLProcessor missing video_token_id in mm_token_type_ids
#39112 commented on
Aug 10, 2025 • 0 new comments -
AutoModelForCausalLM.from_pretrained(..., device_map=...) ignore `Tensor.retain_grad()` in Multi-GPUs setting
#39036 commented on
Aug 10, 2025 • 0 new comments -
CPMANT Model Fails to Run Following Official Tutorial
#39026 commented on
Aug 10, 2025 • 0 new comments -
Potential Memory Leak or Caching in Fast Image Processor
#38656 commented on
Aug 10, 2025 • 0 new comments -
Attention refactor in #35235 adds a `__getitem__` into the forward pass, which causes errors with torch dynamo.
#38271 commented on
Aug 10, 2025 • 0 new comments -
YaRN: factor is not effective with original_max_position_embeddings
#38224 commented on
Aug 10, 2025 • 0 new comments -
"pipeline" is not exported from module "transformers"
#37646 commented on
Aug 10, 2025 • 0 new comments -
Please support GGUF format for UMT5EncoderModel
#36774 commented on
Aug 10, 2025 • 0 new comments -
FlashAttention2 support for GSAI-ML / LLaDA-8B-Instruct?
#39377 commented on
Aug 11, 2025 • 0 new comments -
v4.53.0 - Qwen 2.5 VL Flash Attention error - object has no attribute is_causal
#39231 commented on
Aug 4, 2025 • 0 new comments -
torch fake_tensor load hf model failed
#39217 commented on
Aug 4, 2025 • 0 new comments -
Exporting Llava decoder into ONNX format
#38924 commented on
Aug 4, 2025 • 0 new comments -
transformers: FlaubertTokenizer: do_lowercase_and_remove_accent: make the logger warning actionable (don't only tell what's wrong, rather suggest what could be done about that)
#39224 commented on
Aug 4, 2025 • 0 new comments -
Failed to export PyTorch traced graph of Mixtral-8x7B-Instruct-v0.1 due to the PR #32429
#38518 commented on
Aug 4, 2025 • 0 new comments -
Torch patches tracker for HPU/Gaudi
#39175 commented on
Aug 5, 2025 • 0 new comments -
[FEAT] [non-CUDA]: Support alternative implementation for `constraints.positive_definite.check`
#36660 commented on
Aug 5, 2025 • 0 new comments -
We now require users to upgrade torch to at least v2.6 in order to use the function.
#38464 commented on
Aug 5, 2025 • 0 new comments -
../aten/src/ATen/native/cuda/Indexing.cu:1289: indexSelectLargeIndex: block: [267,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
#33985 commented on
Aug 6, 2025 • 0 new comments -
ImportError: cannot import name 'pipeline' from 'transformers'
#39137 commented on
Aug 6, 2025 • 0 new comments -
Loading audio in video from video URLs fail with chat template
#39076 commented on
Aug 7, 2025 • 0 new comments -
[RFC] Updating pipeline models
#26690 commented on
Aug 7, 2025 • 0 new comments -
Support for context-free-grammars (CFG) to constrain model output
#25778 commented on
Aug 7, 2025 • 0 new comments -
hangs during training using deepspeed
#39275 commented on
Aug 8, 2025 • 0 new comments -
Please help i am trying to run model but issue
#39260 commented on
Aug 8, 2025 • 0 new comments -
Support `StaticCache` in assisted generation
#32946 commented on
Aug 8, 2025 • 0 new comments -
Whisper demo code for model + processor API is broken
#39318 commented on
Aug 9, 2025 • 0 new comments