Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated PR: Downstream develop rebase new changes #69

Merged
merged 1,841 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
1841 commits
Select commit Hold shift + click to select a range
bab32d6
Added mamba.py backend (#30139)
alxndrTL Jul 23, 2024
034b477
Rename Phi-3 rope scaling type (#31436)
garg-amit Jul 23, 2024
3263b34
Revert "Incorrect Whisper long-form decoding timestamps " (#32148)
sanchit-gandhi Jul 23, 2024
a009fbd
Fix typing to be compatible with later py versions (#32155)
amyeroberts Jul 23, 2024
6370062
feat(cache): StaticCache uses index_copy_ to avoid useless copy (#31857)
tengomucho Jul 23, 2024
7d92009
Added additional kwarg for successful running of optuna hyperparamete…
DeF0017 Jul 23, 2024
9cf4f2a
Enhancing SFT Training Efficiency Using Packing and FlashAttention2 w…
RhuiDih Jul 23, 2024
d2c687b
Updated `ruff` to the latest version (#31926)
Sai-Suraj-27 Jul 23, 2024
ff0d708
Dev version: v4.44.0.dev0
LysandreJik Jul 23, 2024
d5a99df
Llama 3.1 conversion
LysandreJik Jul 23, 2024
23f6a43
fix (#32162)
gante Jul 23, 2024
bc2adb0
fix: Fixed an if condition that is always evaluating to true (#32160)
Sai-Suraj-27 Jul 23, 2024
c85510f
[docs] change temperature to a positive value (#32077)
faaany Jul 23, 2024
01be5b4
adds: extra_repr() to MambaRMSNorm to include hidden size / size of w…
rohitdwivedula Jul 24, 2024
8678879
fix: default value reflects the runtime environment variables rather …
junrae6454 Jul 24, 2024
5f4ee98
Update qwen2.md (#32108)
ArtificialZeng Jul 24, 2024
165116b
Remove conversational pipeline tests (#32099)
amyeroberts Jul 24, 2024
e0182f3
RoPE: relaxed rope validation (#32182)
gante Jul 24, 2024
8d2534c
let's not warn when someone is running a forward (#32176)
ArthurZucker Jul 24, 2024
1392a68
Fix resize embedding with Deepspeed (#32192)
zucchini-nlp Jul 24, 2024
af0e4b7
Fix float8_e4m3fn in modeling_utils (#32193)
SunMarc Jul 24, 2024
1c122a4
Support dequantizing GGUF FP16 format (#31783)
PenutChen Jul 24, 2024
edd68f4
:rotating_light: No more default chat templates (#31733)
Rocketknight1 Jul 24, 2024
85a1269
fix: Replaced deprecated `unittest method` with the correct one (#32198)
Sai-Suraj-27 Jul 24, 2024
5658e74
[whisper] fix short-form output type (#32178)
sanchit-gandhi Jul 25, 2024
f53a5de
remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1…
ji-huazhong Jul 25, 2024
1ecedf1
Update question_answering.py (#32208)
avlewis Jul 25, 2024
9b9a54e
[BigBird Pegasus] set _supports_param_buffer_assignment to False (#32…
kashif Jul 25, 2024
de23188
[warnings] fix E721 warnings (#32223)
kashif Jul 25, 2024
df6eee9
Follow up for #31973 (#32025)
ydshieh Jul 25, 2024
6ed0bf1
translate philosophy.md to chinese (#32177)
ji-huazhong Jul 25, 2024
3a83ec4
Allow a specific microphone to be used by the ffmpeg audio pipeline u…
jrhe Jul 25, 2024
9d6c064
Fix code snippet for Grounding DINO (#32229)
qubvel Jul 25, 2024
4ab33c2
Generation: stop at `eos` for assisted decoding (#31301)
zucchini-nlp Jul 26, 2024
fad15fb
Llava: generate without images (#32183)
zucchini-nlp Jul 26, 2024
c46edfb
Resize embeds with DeepSpeed (#32214)
zucchini-nlp Jul 26, 2024
1c7ebf1
don't log base model architecture in wandb if log model is false (#32…
joaonadkarni Jul 26, 2024
b8e5cd5
Refactor: Removed un-necessary `object` base class (#32230)
Sai-Suraj-27 Jul 26, 2024
f9756d9
Adds: extra_repr for RMSNorm layers in most models (#32204)
rohitdwivedula Jul 26, 2024
5f841c7
Add check for `target_sizes is None` in `post_process_image_guided_de…
catalys1 Jul 26, 2024
27c7f97
[tests] fix `static` cache implementation is not compatible with `att…
faaany Jul 26, 2024
81233c0
Flash-Attn: fix generation when no attention mask or no pading (#32241)
zucchini-nlp Jul 26, 2024
8da9068
More flexible trigger condition (#32251)
ydshieh Jul 26, 2024
44f6fdd
Llama 3.1: replace for loop by tensor ops at inv_freq initialization …
gante Jul 27, 2024
f739687
🚨 Bloom support for cache class (#31445)
zucchini-nlp Jul 29, 2024
f2122cc
Upload new model failure report to Hub (#32264)
ydshieh Jul 29, 2024
5019aab
Optimize t5 tokenize logic to avoid redundant calls (#32270)
leejet Jul 29, 2024
a2ad9d5
fix: Fixed wrong argument passed to `convert_blip_checkpoint` functio…
Sai-Suraj-27 Jul 29, 2024
535fe78
Repo: remove exceptions in `check_docstrings` (#32259)
gante Jul 29, 2024
6494479
make `p_mask` a numpy array before passing to `select_starts_ends` (#…
faaany Jul 29, 2024
4992889
fix(docs): Fixed a link in docs (#32274)
Sai-Suraj-27 Jul 29, 2024
7ffe25f
Generate: end-to-end compilation (#30788)
gante Jul 29, 2024
3fbaaaa
Whisper tokenizer word level timestamps (#32197)
kamilakesbi Jul 29, 2024
7f5d644
[pipeline] fix padding for 1-d tensors (#31776)
sanchit-gandhi Jul 29, 2024
811a9ca
Make static cache compatible with torch.export (#32168)
guangy10 Jul 29, 2024
a24a9a6
Add stream messages from agent run for gradio chatbot (#32142)
aymeric-roucher Jul 29, 2024
f0bc49e
use torch 2.4 in 2 CI jobs (#32302)
ydshieh Jul 29, 2024
3e8106d
Docs: fix GaLore optimizer code example (#32249)
gil2rok Jul 30, 2024
934fe15
Fix GGUF dequantize for `gguf==0.9.1` (#32298)
Isotr0py Jul 30, 2024
20528f0
Cast epochs_trained to int when resuming training (#32286)
teddy-f-47 Jul 30, 2024
084b509
feat(ci): set `fetch-depth: 0` in trufflehog checkout step (#31663)
McPatate Jul 30, 2024
2fbbcf5
Fix M4T for ASR pipeline (#32296)
ylacombe Jul 30, 2024
e68ec18
Docs: formatting nits (#32247)
gante Jul 30, 2024
bd54ed2
Alternative agent plan (#32295)
plaggy Jul 30, 2024
1627108
fix: Added missing raise keyword for few exceptions (#32333)
Sai-Suraj-27 Jul 30, 2024
62c60a3
fixes to properly shard FSDP across cpu and meta for cpu_efficient_lo…
winglian Jul 30, 2024
516af4b
fixes #32329 : The Torch code is correct - to get an average of 10% o…
fkrasnov2 Jul 30, 2024
026a173
Repo checks: skip docstring checks if not in the diff (#32328)
gante Jul 30, 2024
6e2d04e
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion proce…
xenova Jul 30, 2024
a326433
LLaVA-NeXT: fix anyres shapes (#32314)
zucchini-nlp Jul 31, 2024
7f552e2
Gemma2 and flash-attention (#32188)
zucchini-nlp Jul 31, 2024
b75ad56
Llama 3.1: Fix incorrect `inv_freq` assignment (#32330)
gante Jul 31, 2024
5f1fcc2
[Idefics2] - Fix FA2 call for Perceiver layer (#32275)
amyeroberts Jul 31, 2024
ef177a5
Gemma 2: support assisted generation (#32357)
gante Jul 31, 2024
b46bd8b
Fix error when streaming to gradio with non-string tool arguments (#3…
aymeric-roucher Jul 31, 2024
92abe60
>3-5x faster torch.compile forward compilation for autoregressive dec…
fxmarty Jul 31, 2024
53f0c9c
fix: Removed unnecessary `@staticmethod` decorator (#32361)
Sai-Suraj-27 Jul 31, 2024
14ee232
fix: warmup_steps check for training_args (#32236)
Ricardo-L-C Jul 31, 2024
453e748
LLaVa: add cache class attribute (#32278)
zucchini-nlp Aug 1, 2024
9451a38
[enc-dec cache] fix bug in indexing (#32370)
sanchit-gandhi Aug 1, 2024
e234061
[whisper] compile compatibility with long-form decoding (#31772)
sanchit-gandhi Aug 1, 2024
48ed24c
Remove size check between attn_weights and kv_seq_len for phi3 (#32339)
helunwencser Aug 1, 2024
9e28284
add missing attribute _supports_param_buffer_assignment for gpt-j. (#…
nv-guomingz Aug 1, 2024
05c1f9a
Check device map for saving tokenizer config on TPU (fix for issue #3…
ayukh Aug 1, 2024
2229ebe
update clean_up_tokenization_spaces warning (#32371)
itazap Aug 1, 2024
db8c7ca
Empty list in defaults for LLaMA special tokens during weights conver…
ViktorooReps Aug 1, 2024
b4727a1
Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233)
OmarManzoor Aug 1, 2024
ca59d6f
Offloaded KV Cache (#31325)
n17s Aug 1, 2024
e3d8285
Docker: add `speech` dep to the consistency docker image (#32374)
gante Aug 1, 2024
51ab25e
Fixed Hybrid Cache Shape Initialization. (#32163)
OsamaS99 Aug 1, 2024
82efc53
Yell at the user if zero-3 init wasn't performed, but expected to hav…
muellerzr Aug 1, 2024
2af199c
Update docs (#32368)
zucchini-nlp Aug 2, 2024
083e13b
RoPE: Add numerical tests ✨ (#32380)
gante Aug 2, 2024
c1aa0ed
[generate] only require an attention mask for mps with torch<2.4 (#32…
sanchit-gandhi Aug 2, 2024
7c31d05
fix: (issue #32124) Exception raised when running `transformers/examp…
fshp971 Aug 3, 2024
621fb3c
MixtralFlashAttention2: put "plus 1" inside parentheses when calculat…
xenshinu Aug 3, 2024
847bb85
Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decisi…
dependabot[bot] Aug 5, 2024
05ae3a3
fix: SeamlessM4TFeatureExtractor stride remainder (#32088)
TechInterMezzo Aug 5, 2024
3bb646a
Phi3 tests: fix typing for Python 3.8 (#32388)
zucchini-nlp Aug 5, 2024
3d7c2f9
#32184 save total_vocab_size (#32240)
itazap Aug 5, 2024
ea5da52
add values for neftune (#32399)
nbroad1881 Aug 5, 2024
f5f1e52
Fix documentation references to google/bit-50 model (#32407)
JuanFKurucz Aug 5, 2024
baf7e5c
Persist embedding type of BART and mBART models after resize (#32242)
AbdiHaryadi Aug 5, 2024
458b0cd
fix: Updated `test_embeded_special_tokens` for luke and mluke models …
Sai-Suraj-27 Aug 5, 2024
7e5d46d
Respect the config's attn_implementation if set (#32383)
amyeroberts Aug 5, 2024
13dc6b0
Fix documentation links and code reference to model llava-next (#32434)
JuanFKurucz Aug 5, 2024
37c5ca5
Cache: create docs (#32150)
zucchini-nlp Aug 6, 2024
0aa8328
Llava: fix checkpoint_doc (#32458)
RUFFY-369 Aug 6, 2024
e85d863
add the missing flash attention test marker (#32419)
faaany Aug 6, 2024
fb66ef8
Update kwargs validation for `preprocess` with decorator (#32024)
qubvel Aug 6, 2024
438d06c
Fix get large model config for Switch Transformer encoder only tester…
JuanFKurucz Aug 6, 2024
36fd35e
Dependencies: fix typo (#32389)
gante Aug 6, 2024
6a03942
Add Nemotron HF Support (#31699)
suiyoubi Aug 6, 2024
3d8bd11
Generate: fix end to end compilation (#32465)
gante Aug 6, 2024
80b90e7
Add codestral mamba2 (#32080)
molbap Aug 6, 2024
194cf1f
Migrate import checks not need accelerate, and be more clear on min v…
muellerzr Aug 6, 2024
50c3ba8
Documentation: BOS token_id deprecation change for NLLB (#32443)
christoukmaji Aug 6, 2024
26a9443
dev version 4.45.0
ArthurZucker Aug 6, 2024
4fdc702
`is_torchdynamo_compiling` -- cast a wide exception net (#32476)
gante Aug 6, 2024
ac2707e
Revert "fixes to properly shard FSDP across cpu and meta for cpu_effc…
matthewdouglas Aug 6, 2024
5301b98
🌐 [i18n-KO] Translated `mask_generation.md` to Korean (#32257)
jeongiin Aug 6, 2024
3b193c7
🌐 [i18n-KO] Translated `idefics.md` to Korean (#32258)
boyunJang Aug 6, 2024
6af0854
🌐 [i18n-KO] Translated `image_to_image.md` to Korean (#32327)
shinhyunji36 Aug 6, 2024
a30c865
Cache: new Cache format in decoder-only models (#31421)
zucchini-nlp Aug 7, 2024
7ad784a
Gemma2: add cache warning (#32279)
zucchini-nlp Aug 7, 2024
46d09af
enable xla fsdp (#32048)
hanwen-sun Aug 7, 2024
c54a6f9
Fix typo in tokenization_utils_base.py (#32484)
blubitz Aug 7, 2024
e0d8253
Agents use grammar (#31735)
aymeric-roucher Aug 7, 2024
b640103
fix broken link in docs (#32491)
jorahn Aug 7, 2024
b7fb393
Docs: alert for the possibility of manipulating logits (#32467)
gante Aug 7, 2024
1124d95
🌐 [i18n-KO] Translated `gptq.md` to Korean (#32293)
1kmmk1 Aug 7, 2024
fcc4f2a
🌐 [i18n-KO] Translated `prompting.md` to Korean (#32294)
chhaewxn Aug 7, 2024
fa59fd8
🌐 [i18n-KO] Translated `quantization/quanto.md` to Korean (#32281)
fabxoe Aug 7, 2024
cba7bcf
🌐 [i18n-KO] Translated `image_feature_extraction.md` to Korean (#32239)
mreraser Aug 7, 2024
73a59a2
Fix references to model google mt5 small (#32497)
JuanFKurucz Aug 7, 2024
543df48
Docs: Fixed WhisperModel.forward’s docstring link (#32498)
Sai-Suraj-27 Aug 7, 2024
78566db
🌐 [i18n-KO] Translated `chat_templating.md` to Korean (#32362)
enchantee00 Aug 7, 2024
f5cdbf6
Fix link to autoclass_tutorial.md in i18n.md (#32501)
JuanFKurucz Aug 7, 2024
aefd3e2
Fix typo: depracted -> deprecated (#32489)
tomaarsen Aug 8, 2024
1c944ac
Fix issue #32518: Update llm_tutorial.md (#32523)
doomdagadiggiedahdah Aug 8, 2024
e28784f
Change Phi3 `_supports_sdpa` to True (#32457)
pocca2048 Aug 8, 2024
d3b3551
Uniformize kwargs for processors - GroundingDINO (#31964)
SangbumChoi Aug 8, 2024
b51d414
Fix add-new-model-like (#31773)
molbap Aug 8, 2024
16ed064
Add Qwen2-Audio (#32137)
faychu Aug 8, 2024
cc832cb
filter flash_attn optional imports loading remote code (#30954)
eaidova Aug 8, 2024
43f3fe8
🌐 [i18n-KO] Translated `ko-llm_tutorial_optimization.md` to Korean (#…
010kim Aug 8, 2024
96ba7f0
🌐 [i18n-KO] Translated `trainer.md` to Korean (#32260)
cjfghk5697 Aug 8, 2024
e0396bd
🌐 [i18n-KO] Translated `eetq.md` to Korean (#32352)
jun048098 Aug 8, 2024
496207a
🌐 [i18n-KO] Translated `fsdp.md` to Korean (#32261)
win2dvp21 Aug 8, 2024
b01f9c4
🌐 [i18n-KO] Translated `bitsandbytes.md` to Korean (#32408)
SeungAhSon Aug 8, 2024
0442816
Fix generate with `inputs_embeds` as input (#32493)
molbap Aug 8, 2024
0164560
Fixed test `test_static_cache_exportability` with torch 2.4.0 (#32516)
guangy10 Aug 8, 2024
54ac39c
Fix code example to load bigcode starcoder2 7b (#32474)
JuanFKurucz Aug 8, 2024
85817d9
[docs] Translation guide (#32547)
stevhliu Aug 8, 2024
838d141
Gemma2: fix FA2 generation (#32553)
zucchini-nlp Aug 9, 2024
7728b78
Fix a bug in Qwen2Audio (#32552)
faychu Aug 9, 2024
e4522fe
fix slow integration gemma2 test (#32534)
ArthurZucker Aug 9, 2024
e7f4ace
fix non contiguous tensor value error in save_pretrained (#32422)
congcongke Aug 9, 2024
48101cf
🌐 [i18n-KO] Translated `agent.md` to Korean (#32351)
Jwaminju Aug 9, 2024
7c11491
Add new model (#32615)
younesbelkada Aug 12, 2024
8f2b6d5
Fix: FA2 with packed training (#32487)
zucchini-nlp Aug 12, 2024
342e3f9
Fix sliding window attention used in Gemma2FlashAttention2 (#32522)
brcps12 Aug 12, 2024
bd251e4
fix: Fixed conditional check for `encodec` model names (#32581)
Sai-Suraj-27 Aug 12, 2024
e31a7a2
Fix `.push_to_hub(..., create_pr=True, revision="my-branch")` when cr…
Wauplin Aug 12, 2024
50837f2
Bump aiohttp from 3.9.4 to 3.10.2 in /examples/research_projects/deci…
dependabot[bot] Aug 12, 2024
8a3c55e
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/visual…
dependabot[bot] Aug 12, 2024
b7ea171
Cleanup tool calling documentation and rename doc (#32337)
Rocketknight1 Aug 12, 2024
4996990
🌐 [i18n-KO] Translated `deepspeed.md` to Korean (#32431)
4N3MONE Aug 12, 2024
7f777ab
🌐 [i18n-KO] Translated `awq.md`to Korean (#32324)
ahnjj Aug 12, 2024
ce4b288
fix: Fixed failing `test_find_base_model_checkpoint` (#32638)
Sai-Suraj-27 Aug 12, 2024
126cbdb
Bump tensorflow from 2.11.1 to 2.12.1 in /examples/research_projects/…
dependabot[bot] Aug 12, 2024
f1c8542
"to be not" -> "not to be" (#32636)
qgallouedec Aug 12, 2024
2a5a6ad
fix: Updated the `is_torch_mps_available()` function to include `min_…
Sai-Suraj-27 Aug 12, 2024
a29eabd
Expand inputs in processors for VLMs (#30962)
zucchini-nlp Aug 13, 2024
29c3a0f
Automatically add `transformers` tag to the modelcard (#32623)
LysandreJik Aug 13, 2024
a5a8291
Fix tests (#32649)
molbap Aug 13, 2024
b5016d5
fix tensors on different devices in `WhisperGenerationMixin` (#32316)
faaany Aug 13, 2024
481e156
Add support for GrokAdamW optimizer (#32521)
ehartford Aug 13, 2024
cc25757
Add Depth Anything V2 Metric models (#32126)
bt2513 Aug 13, 2024
c3cd9d8
Fix: Fixed directory path for utils folder in `test_tokenization_util…
Sai-Suraj-27 Aug 13, 2024
5bcbdff
Modify ProcessorTesterMixin for better generalization (#32637)
yonigozlan Aug 13, 2024
9d2ab88
TF_Deberta supporting mixed precision (#32618)
pinesnow72 Aug 13, 2024
c135783
Fix tests recurrent (#32651)
molbap Aug 13, 2024
a22ff36
Support MUSA (Moore Threads GPU) backend in transformers (#31913)
fmo-mt Aug 14, 2024
df32347
fix: Fixed failing tests in `tests/utils/test_add_new_model_like.py` …
Sai-Suraj-27 Aug 14, 2024
9485289
Update translation docs review (#32662)
stevhliu Aug 14, 2024
78d78cd
Add TorchAOHfQuantizer (#32306)
jerryzh168 Aug 14, 2024
20a0449
Fix `JetMoeIntegrationTest` (#32332)
ydshieh Aug 14, 2024
6577c77
Update the distributed CPU training on Kubernetes documentation (#32669)
dmsuehir Aug 14, 2024
95a7781
fix: Fixed unknown pytest config option `doctest_glob` (#32475)
Sai-Suraj-27 Aug 14, 2024
0cea208
Unpin deepspeed in Docker image/tests (#32572)
muellerzr Aug 14, 2024
8820fe8
Updated workflows to the latest versions (#32405)
Sai-Suraj-27 Aug 14, 2024
e840127
reopen: llava-next fails to consider padding_side during Training (#3…
jp1924 Aug 15, 2024
ab7e893
fix: Corrected ` falcon-mamba-7b` model checkpoint name (#32837)
Sai-Suraj-27 Aug 15, 2024
d6751d9
fix: update doc link for runhouse in README.md (#32664)
muddlebee Aug 15, 2024
f3c8b18
VLMs: small clean-up for cache class (#32417)
zucchini-nlp Aug 16, 2024
c215523
add back the position ids (#32554)
ArthurZucker Aug 16, 2024
5fd7ca7
Use head_dim if in config for RoPE (#32495)
suiyoubi Aug 16, 2024
70d5df6
Generate: unify `LogitsWarper` and `LogitsProcessor` (#32626)
gante Aug 16, 2024
8f9fa3b
[tests] make test_sdpa_equivalence device-agnostic (#32520)
faaany Aug 16, 2024
cf32ee1
Cache: use `batch_size` instead of `max_batch_size` (#32657)
gante Aug 16, 2024
a27182b
Fix AutoConfig and AutoModel support for Llava-Next-Video (#32844)
TKONIY Aug 16, 2024
f20d0e8
improve _get_is_as_tensor_fns (#32596)
zrr1999 Aug 16, 2024
0b066be
Revert PR 32299, flag users when Zero-3 was missed (#32851)
muellerzr Aug 16, 2024
1c36db6
fix multi-gpu with static cache (#32543)
SunMarc Aug 16, 2024
8ec028a
Reduce the error log when using core models that need their weights r…
muellerzr Aug 16, 2024
6806d33
Make beam_constraints.Constraint.advance() docstring more accurate (#…
alex-calderwood Aug 16, 2024
52cb403
generate: missing `to` in DoLa body, causing exceptions in multi-gpu …
gante Aug 17, 2024
843e5e2
Add Flax Dinov2 (#31960)
MHRDYN7 Aug 19, 2024
8260cb3
Add Descript-Audio-Codec model (#31494)
kamilakesbi Aug 19, 2024
54b7703
support torch-speech (#32537)
itazap Aug 19, 2024
e55b33c
[tests] make `test_sdpa_can_compile_dynamic` device-agnostic (#32519)
faaany Aug 19, 2024
f1b720e
Add __repr__ for Conv1D (#32425)
AaronZLT Aug 19, 2024
8a4857c
Support save/load ckpt for XLA FSDP (#32311)
yitongh Aug 19, 2024
5f6c080
RT-DETR parameterized batchnorm freezing (#32631)
AlanBlanchet Aug 19, 2024
59e8f19
Fix incorrect vocab size retrieval in GGUF config (#32551)
Isotr0py Aug 19, 2024
93e538a
Mamba / FalconMamba: Fix mamba left padding (#32677)
younesbelkada Aug 19, 2024
61d89c1
Fix: Mamba2 generation mismatch between input_ids and inputs_embeds (…
vasqu Aug 19, 2024
3720484
Docs: Fixed `whisper-large-v2` model link in docs (#32871)
Sai-Suraj-27 Aug 19, 2024
85345bb
Add tip to clarify tool calling (#32883)
Rocketknight1 Aug 19, 2024
13e645b
Allow-head-dim (#32857)
ArthurZucker Aug 20, 2024
fd06ad5
🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627)
SunMarc Aug 20, 2024
65f4bc9
Fix repr for conv (#32897)
ArthurZucker Aug 20, 2024
01c4fc4
fix: jamba cache fails to use torch.nn.module (#32894)
xgal Aug 20, 2024
c63a3d0
Fix: Mamba2 `norm_before_gate` usage (#32686)
vasqu Aug 20, 2024
9800e6d
Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_tra…
dependabot[bot] Aug 20, 2024
078d5a8
Replace `tensor.norm()` with decomposed version for CLIP executorch e…
qubvel Aug 20, 2024
1dde50c
link for optimizer names (#32400)
nbroad1881 Aug 20, 2024
8713466
[i18n-ar] add README_ar.md to README.md (#32583)
AhmedAlmaghz Aug 20, 2024
c6d484e
fix: [whisper] don't overwrite GenerationConfig's `return_timestamps`…
hrl Aug 21, 2024
3bb7b05
Update docker image building (#32918)
ArthurZucker Aug 21, 2024
f6e2586
Jamba: update integration tests (#32250)
gante Aug 22, 2024
af638c4
fix: Added missing `huggingface_hub` installation to workflows (#32891)
Sai-Suraj-27 Aug 22, 2024
6baa6f2
fix: no need to dtype A in jamba (#32924)
xgal Aug 22, 2024
c42d264
FEAT / Trainer: Add adamw 4bit optimizer (#31865)
SunMarc Aug 22, 2024
8b94d28
CI: separate step to download nltk files (#32935)
gante Aug 22, 2024
eeea712
FIX / Hub: Also catch for `exceptions.ConnectionError` (#31469)
younesbelkada Aug 22, 2024
9282413
Add SynCode to llm_tutorial (#32884)
shubhamugare Aug 22, 2024
bf97d4a
Fix benchmark script (#32635)
ydshieh Aug 22, 2024
99d67f1
Improve greedy search memory usage (#32895)
regisss Aug 22, 2024
ee8c01f
Add chat_template for tokenizer extracted from GGUF model (#32908)
Isotr0py Aug 22, 2024
f1d822b
fix: (issue #32689) `AttributeError` raised when using `Trainer` with…
fshp971 Aug 22, 2024
975b988
Gemma2: eager attention by default (#32865)
gante Aug 22, 2024
18199b3
[run_slow] idefics2 (#32840)
andimarafioti Aug 22, 2024
273c0af
Fix regression on `Processor.save_pretrained` caused by #31691 (#32921)
leloykun Aug 22, 2024
09e6579
🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classificati…
JinukHong Aug 22, 2024
a26de15
Generate: Deprecate returning legacy cache by default; Handle `use_ca…
gante Aug 22, 2024
d806fa3
docs: fix outdated link to TF32 explanation (#32947)
anakin87 Aug 22, 2024
1f7e953
Merge downstream main into tmp-main-20241114 with conflicts
github-actions[bot] Nov 14, 2024
ddb7b59
conflict updates 11/14/2024
Cemberk Nov 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
LLaVa: add cache class attribute (huggingface#32278)
cache class flag
  • Loading branch information
zucchini-nlp authored Aug 1, 2024
commit 453e74884fb7e2613e7b45033fbb3c1cadb638b4
1 change: 1 addition & 0 deletions src/transformers/models/llava/modeling_llava.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ class LlavaPreTrainedModel(PreTrainedModel):
_no_split_modules = ["LlavaVisionAttention"]
_skip_keys_device_placement = "past_key_values"
_supports_flash_attn_2 = True
_supports_cache_class = True

def _init_weights(self, module):
# important: this ported version of Llava isn't meant for training from scratch - only
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/llava_next/modeling_llava_next.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ class LlavaNextPreTrainedModel(PreTrainedModel):
_no_split_modules = ["LlavaNextVisionAttention"]
_skip_keys_device_placement = "past_key_values"
_supports_flash_attn_2 = True
_supports_cache_class = True

def _init_weights(self, module):
# important: this ported version of LlavaNext isn't meant for training from scratch - only
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,7 @@ class LlavaNextVideoPreTrainedModel(PreTrainedModel):
_no_split_modules = ["LlavaNextVideoVisionAttention"]
_skip_keys_device_placement = "past_key_values"
_supports_flash_attn_2 = True
_supports_cache_class = True

def _init_weights(self, module):
# important: this ported version of LlavaNextVideo isn't meant for training from scratch - only
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/paligemma/modeling_paligemma.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ class PaliGemmaPreTrainedModel(PreTrainedModel):
_skip_keys_device_placement = "past_key_values"
_supports_flash_attn_2 = False
_supports_sdpa = True
_supports_cache_class = True

def _init_weights(self, module):
# important: this ported version of PaliGemmaisn't meant for training from scratch - only
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ class VideoLlavaPreTrainedModel(PreTrainedModel):
_no_split_modules = ["VideoLlavaVisionAttention"]
_skip_keys_device_placement = "past_key_values"
_supports_flash_attn_2 = True
_supports_cache_class = True

def _init_weights(self, module):
std = (
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/vipllava/modeling_vipllava.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ class VipLlavaPreTrainedModel(PreTrainedModel):
_no_split_modules = ["VipLlavaVisionAttention"]
_skip_keys_device_placement = "past_key_values"
_supports_flash_attn_2 = True
_supports_cache_class = True

def _init_weights(self, module):
# important: this ported version of VipLlava isn't meant for training from scratch - only
Expand Down