Update dependency transformers to v4.5.1 - abandoned #178
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==4.3.3->==4.5.1Release Notes
huggingface/transformers
v4.5.1Compare Source
pipelinewhen used with private models (#11123)v4.5.0Compare Source
v4.5.0: BigBird, GPT Neo, Examples, Flax support
BigBird (@vasudevgupta7)
Seven new models are released as part of the BigBird implementation:
BigBirdModel,BigBirdForPreTraining,BigBirdForMaskedLM,BigBirdForCausalLM,BigBirdForSequenceClassification,BigBirdForMultipleChoice,BigBirdForQuestionAnsweringin PyTorch.BigBird is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence.
The BigBird model was proposed in Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
It is released with an accompanying blog post: Understanding BigBird's Block Sparse Attention
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=big_bird
GPT Neo (@patil-suraj)
Two new models are released as part of the GPT Neo implementation:
GPTNeoModel,GPTNeoForCausalLMin PyTorch.GPT-Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. EleutherAI's primary goal is to replicate a GPT-3 DaVinci-sized model and open-source it to the public.
The implementation within Transformers is a GPT2-like causal language model trained on the Pile dataset.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gpt_neo
Examples
Features have been added to some examples, and additional examples have been added.
Raw training loop examples
Based on the
acceleratelibrary, examples completely exposing the training loop are now part of the library. For easy customization if you want to try a new research idea!examples/multiple-choice/run_swag_no_trainer.py#10934 (@stancld)examples/run_ner_no_trainer.py#10902 (@stancld)examples/language_modeling/run_mlm_no_trainer.py#11001 (@hemildesai)examples/language_modeling/run_clm_no_trainer.py#11026 (@hemildesai)Standardize examples with Trainer
Thanks to the amazing contributions of @bhadreshpsavani, all examples with Trainer are now standardized and all support the predict stage and will return/save metrics in the same fashion.
Trainer & SageMaker Model Parallelism
The
Trainernow supports SageMaker model parallelism out of the box, the oldSageMakerTraineris deprecated as a consequence and will be removed in version 5.FLAX
FLAX support has been widened to support all model heads of the BERT architecture, alongside a general conversion script for checkpoints in PyTorch to be used in FLAX.
Auto models now have a FLAX implementation.
General improvements and bugfixes
pipeline.frameworkwould actually contain a fully qualified model. #10970 (@Narsil)v4.4.2Compare Source
v4.4.1Compare Source
v4.4.0Compare Source
v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2
SpeechToText
Two new models are released as part of the S2T implementation:
Speech2TextModelandSpeech2TextForConditionalGeneration, in PyTorch.Speech2Text is a speech model that accepts a float tensor of log-mel filter-bank features extracted from the speech signal. It’s a transformer-based seq2seq model, so the transcripts/translations are generated autoregressively.
The Speech2Text model was proposed in fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text
M2M100
Two new models are released as part of the M2M100 implementation:
M2M100ModelandM2M100ForConditionalGeneration, in PyTorch.M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks.
The M2M100 model was proposed in Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=m2m_100
I-BERT
Six new models are released as part of the I-BERT implementation:
IBertModel,IBertForMaskedLM,IBertForSequenceClassification,IBertForMultipleChoice,IBertForTokenClassificationandIBertForQuestionAnswering, in PyTorch.I-BERT is a quantized version of RoBERTa running inference up to four times faster.
The I-BERT framework in PyTorch allows to identify the best parameters for quantization. Once the model is exported in a framework that supports int8 execution (such as TensorRT), a speedup of up to 4x is visible, with no loss in performance thanks to the parameter search.
The I-BERT model was proposed in I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=ibert
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text
mBART-50
MBart-50 is created using the original mbart-large-cc25 checkpoint by extending its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50 languages.
The MBart model was presented in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=mbart-50
DeBERTa-v2
Fixe new models are released as part of the DeBERTa-v2 implementation:
DebertaV2Model,DebertaV2ForMaskedLM,DebertaV2ForSequenceClassification,DeberaV2ForTokenClassificationandDebertaV2ForQuestionAnswering, in PyTorch.The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deberta-v2
Wav2Vec2
XLSR-Wav2Vec2
The XLSR-Wav2Vec2 model was proposed in Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
The checkpoint corresponding to that model is added to the model hub: facebook/
wav2vec2-large-xlsr-53
Training script
A fine-tuning script showcasing how the Wav2Vec2 model can be trained has been added.
Further improvements
The Wav2Vec2 architecture becomes more stable as several changes are done to its architecture. This introduces feature extractors and feature processors as the pre-processing aspect of multi-modal speech models.
AMP & XLA Support for TensorFlow models
Most of the TensorFlow models are now compatible with automatic mixed precision and have XLA support.
SageMaker Trainer for model parallelism
We are rolling out experimental support for model parallelism on SageMaker with a new
SageMakerTrainerthat can be used in place of the regularTrainer. This is a temporary class that will be removed in a future version, the end goal is to haveTrainersupport this feature out of the box.General improvements and bugfixes
[trainer] deepspeed bug fixes and tests #10039 (@stas00)
Removing run_pl_glue.py from text classification docs, include run_xnli.py & run_tf_text_classification.py #10066 (@cbjuan)
remove token_type_ids from TokenizerBertGeneration output #10070 (@sadakmed)
[deepspeed tests] transition to new tests dir #10080 (@stas00)
Added integration tests for Pytorch implementation of the ELECTRA model #10073 (@spatil6)
Fix naming in TF MobileBERT #10095 (@jplu)
[examples/s2s] add test set predictions #10085 (@patil-suraj)
Logging propagation #10092 (@LysandreJik)
Fix some edge cases in report_to and add deprecation warnings #10100 (@sgugger)
Add head_mask and decoder_head_mask to TF LED #9988 (@stancld)
Replace strided slice with tf.expand_dims #10078 (@jplu)
Fix Faiss Import #10103 (@patrickvonplaten)
[RAG] fix generate #10094 (@patil-suraj)
Fix TFConvBertModelIntegrationTest::test_inference_masked_lm Test #10104 (@abhishekkrthakur)
doc: update W&B related doc #10086 (@borisdayma)
Remove speed metrics from default compute objective [WIP]#10107 (@shiva-z)
Fix tokenizers training in notebooks #10110 (@n1t0)
[DeepSpeed docs] new information #9610 (@stas00)
[CI] build docs faster #10115 (@stas00)
[scheduled github CI] add deepspeed fairscale deps #10116 (@stas00)
Line endings should be LF across repo and not CRLF #10119 (@LysandreJik)
Fix TF LED/Longformer attentions computation #10007 (@jplu)
remove adjust_logits_during_generation method #10087 (@patil-suraj)
[DeepSpeed] restore memory for evaluation #10114 (@stas00)
Update run_xnli.py to use Datasets library #9829 (@Qbiwan)
Add new community notebook - Blenderbot #10126 (@lordtt13)
[DeepSpeed in notebooks] Jupyter + Colab #10130 (@stas00)
[examples/run_s2s] remove task_specific_params and update rouge computation #10133 (@patil-suraj)
Fix typo in GPT2DoubleHeadsModel docs #10148 (@M-Salti)
[hf_api] delete deprecated methods and tests #10159 (@julien-c)
Revert propagation #10171 (@LysandreJik)
Conversion from slow to fast for BPE spm vocabs contained an error. #10120 (@Narsil)
Fix typo in comments #10157 (@mrm8488)
Fix typo in comment #10156 (@mrm8488)
[Doc] Fix version control in internal pages #10124 (@sgugger)
[t5 tokenizer] add info logs #9897 (@stas00)
Fix v2 model loading issue #10129 (@BigBird01)
Fix datasets set_format #10178 (@sgugger)
Fixing NER pipeline for list inputs. #10184 (@Narsil)
Add new model to labels that should not stale #10187 (@LysandreJik)
Check TF ops for ONNX compliance #10025 (@jplu)
[RAG] fix tokenizer #10167 (@patil-suraj)
Fix TF template #10189 (@jplu)
fix run_seq2seq.py; porting trainer tests to it #10162 (@stas00)
Specify dataset dtype #10195 (@LysandreJik)
[CI] make the examples sub-group of tests run always #10196 (@stas00)
[WIP][examples/seq2seq] move old s2s scripts to legacy #10136 (@patil-suraj)
set tgt_lang of MBart Tokenizer for summarization #10205 (@HeroadZ)
Store FLOS as floats to avoid overflow. #10213 (@sgugger)
Fix add_token_positions in custom datasets tutorial #10217 (@joeddav)
[trainer] fix ignored columns logger #10219 (@stas00)
Factor out methods #10215 (@LysandreJik)
Fix head masking for TFT5 models #9877 (@stancld)
[CI] 2 fixes #10248 (@stas00)
[trainer] refactor place_model_on_device logic, add deepspeed #10243 (@stas00)
[Trainer] doc update #10241 (@stas00)
Reduce the time spent for the TF slow tests #10152 (@jplu)
Introduce warmup_ratio training argument #10229 (@tanmay17061)
[Trainer] memory tracker metrics #10225 (@stas00)
Script for distilling zero-shot classifier to more efficient student #10244 (@joeddav)
[test] fix func signature #10271 (@stas00)
[trainer] implement support for full fp16 in evaluation/predict #10268 (@stas00)
[ISSUES.md] propose using google colab to reproduce problems #10270 (@stas00)
Introduce logging_strategy training argument #10267 (@tanmay17061)
[CI] Kill any run-away pytest processes #10281 (@stas00)
Patch zero shot distillation script cuda issue #10284 (@joeddav)
Move the TF NER example #10276 (@jplu)
Fix example links in the task summary #10291 (@sgugger)
fixes #10303 #10304 (@cronoik)
[ci] don't fail when there are no zombies #10308 (@stas00)
fix typo in conversion script #10316 (@tagucci)
Add note to resize token embeddings matrix when adding new tokens to voc #10331 (@LysandreJik)
Deprecate prepare_seq2seq_batch #10287 (@sgugger)
[examples/seq2seq] defensive programming + expand/correct README #10295 (@stas00)
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration #10310 (@stas00)
Loading from last checkpoint functionality in Trainer.train #10334 (@tanmay17061)
[trainer] add Trainer methods for metrics logging and saving #10266 (@stas00)
Fix evaluation with label smoothing in Trainer #10338 (@sgugger)
Fix broken examples/seq2seq/README.md markdown #10344 (@Wikidepia)
[bert-base-german-cased] use model repo, not external bucket #10353 (@julien-c)
[Trainer/Deepspeed] handle get_last_lr() before first step() #10362 (@stas00)
ConvBERT fix torch <> tf weights conversion #10314 (@abhishekkrthakur)
fix deprecated reference
tokenizer.max_lenin glue.py #10220 (@poedator)[trainer] move secondary methods into a separate file #10363 (@stas00)
Run GA on every push even on forks #10383 (@LysandreJik)
GA: only run model templates once #10388 (@LysandreJik)
Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding #10200 (@mingruimingrui)
Remove unused variable in example for Q&A #10392 (@abhishekkrthakur)
Ignore unexpected weights from PT conversion #10397 (@LysandreJik)
Add support for ZeRO-2/3 and ZeRO-offload in fairscale #10354 (@sgugger)
Fix None in add_token_positions - issue #10210 #10374 (@andreabac3)
Make Barthez tokenizer tests a bit faster #10399 (@sgugger)
Fix run_glue evaluation when model has a label correspondence #10401 (@sgugger)
[ci, flax] non-existing models are unlikely to pass tests #10409 (@julien-c)
[LED] Correct Docs #10419 (@patrickvonplaten)
Add Ray Tune hyperparameter search integration test #10414 (@krfricke)
Ray Tune Integration Bug Fixes #10406 (@amogkam)
[examples] better model example #10427 (@stas00)
Fix conda-build #10431 (@LysandreJik)
[run_seq2seq.py] restore functionality: saving to test_generations.txt #10428 (@stas00)
updated logging and saving metrics #10436 (@bhadreshpsavani)
Introduce save_strategy training argument #10286 (@tanmay17061)
Adds terms to Glossary #10443 (@darigovresearch)
Fixes compatibility bug when using grouped beam search and constrained decoding together #10475 (@mnschmit)
Generate can return cross-attention weights too #10493 (@Mehrad0711)
Fix typos #10489 (@WybeKoper)
[T5] Fix speed degradation bug t5 #10496 (@patrickvonplaten)
feat(docs): navigate with left/right arrow keys #10481 ([@ydcjeff](https://togi
Configuration
📅 Schedule: At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻️ Rebasing: Renovate will not automatically rebase this PR, because other commits have been found.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by WhiteSource Renovate. View repository job log here.