Update dependency transformers to v4.5.1 - abandoned #178

renovate · 2021-05-06T21:22:31Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
transformers	`==4.3.3` -> `==4.5.1`

Release Notes

huggingface/transformers

`v4.5.1`

Compare Source

Fix pipeline when used with private models (#11123)
Fix loading an architecture in an other (#11207)

`v4.5.0`

Compare Source

v4.5.0: BigBird, GPT Neo, Examples, Flax support

BigBird (@vasudevgupta7)

Seven new models are released as part of the BigBird implementation: BigBirdModel, BigBirdForPreTraining, BigBirdForMaskedLM, BigBirdForCausalLM, BigBirdForSequenceClassification, BigBirdForMultipleChoice, BigBirdForQuestionAnswering in PyTorch.

BigBird is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence.

The BigBird model was proposed in Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.

It is released with an accompanying blog post: Understanding BigBird's Block Sparse Attention

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=big_bird

BigBird #10183 (@vasudevgupta7)
[BigBird] Fix big bird gpu test #10967 (@patrickvonplaten)
[Notebook] add BigBird trivia qa notebook #10995 (@patrickvonplaten)
[Docs] Add blog to BigBird docs #10997 (@patrickvonplaten)

GPT Neo (@patil-suraj)

Two new models are released as part of the GPT Neo implementation: GPTNeoModel, GPTNeoForCausalLM in PyTorch.

GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. EleutherAI's primary goal is to replicate a GPT⁠-⁠3 DaVinci-sized model and open-source it to the public.

The implementation within Transformers is a GPT2-like causal language model trained on the Pile dataset.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gpt_neo

GPT Neo #10848 (@patil-suraj)
GPT Neo few fixes #10968 (@patil-suraj)
GPT Neo configuration needs to be set to use GPT2 tokenizer #10992 (@LysandreJik)
[GPT Neo] fix example in config #10993 (@patil-suraj)
GPT Neo cleanup #10985 (@patil-suraj )

Examples

Features have been added to some examples, and additional examples have been added.

Raw training loop examples

Based on the accelerate library, examples completely exposing the training loop are now part of the library. For easy customization if you want to try a new research idea!

Expand a bit the presentation of examples #10799 (@sgugger)
Add examples/multiple-choice/run_swag_no_trainer.py #10934 (@stancld)
Update the example template for a no Trainer option #10865 (@sgugger)
Add examples/run_ner_no_trainer.py #10902 (@stancld)
Add examples/language_modeling/run_mlm_no_trainer.py #11001 (@hemildesai)
Add examples/language_modeling/run_clm_no_trainer.py #11026 (@hemildesai)

Standardize examples with Trainer

Thanks to the amazing contributions of @bhadreshpsavani, all examples with Trainer are now standardized and all support the predict stage and will return/save metrics in the same fashion.

[Example] Updating Question Answering examples for Predict Stage #10792 (@bhadreshpsavani)
[Examples] Added predict stage and Updated Example Template #10868 (@bhadreshpsavani)
[Example] Fixed finename for Saving null_odds in the evaluation stage in QA Examples #10939 (@bhadreshpsavani)
[trainer] Fixes Typo in Predict Method of Trainer #10861 (@bhadreshpsavani)

Trainer & SageMaker Model Parallelism

The Trainer now supports SageMaker model parallelism out of the box, the old SageMakerTrainer is deprecated as a consequence and will be removed in version 5.

Merge trainers #10975 (@sgugger)
added new notebook and merge of trainer #11015 (@philschmid)

FLAX

FLAX support has been widened to support all model heads of the BERT architecture, alongside a general conversion script for checkpoints in PyTorch to be used in FLAX.

Auto models now have a FLAX implementation.

[Flax] Add general conversion script #10809 (@patrickvonplaten)
[Flax] Add other BERT classes #10977 (@patrickvonplaten)
Refactor AutoModel classes and add Flax Auto classes #11027 (@sgugger)

General improvements and bugfixes

Patches the full import failure and adds a test #10750 (@LysandreJik)
Patches full import failure when sentencepiece is not installed #10752 (@LysandreJik)
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed #10464 (@cli99)
Fix ProphetNet Flaky Test #10771 (@patrickvonplaten)
[doc][testing] extend the pytest -k section with more examples #10761 (@stas00)
Wav2Vec2 - fix flaky test #10773 (@patrickvonplaten)
[DeepSpeed] simplify init #10762 (@stas00)
[DeepSpeed] improve checkpoint loading code plus tests #10760 (@stas00)
[trainer] make failure to find a resume checkpoint fatal + tests #10777 (@stas00)
[Issue template] need to update/extend who to tag #10728 (@stas00)
[examples] document resuming #10776 (@stas00)
Check copies blackify #10775 (@sgugger)
Smmp batch not divisible by microbatches fix #10778 (@mansimane)
Add support for detecting intel-tensorflow version #10781 (@mfuntowicz)
wav2vec2: support datasets other than LibriSpeech #10581 (@elgeish)
add run_common_voice script #10767 (@patil-suraj)
Fix bug in input check for LengthGroupSampler #10783 (@thominj)
[file_utils] do not gobble certain kinds of requests.ConnectionError #10235 (@julien-c)
from_pretrained: check that the pretrained model is for the right model architecture #10586 (@vimarshc)
[examples/seq2seq/README.md] fix t5 examples #10734 (@stas00)
Fix distributed evaluation #10795 (@sgugger)
Add XLSR-Wav2Vec2 Fine-Tuning README.md #10786 (@patrickvonplaten)
addressing vulnerability report in research project deps #10802 (@stas00)
fix backend tokenizer args override: key mismatch #10686 (@theo-m)
[XLSR-Wav2Vec2 Info doc] Add a couple of lines #10806 (@patrickvonplaten)
Add transformers id to hub requests #10811 (@philschmid)
wav2vec doc tweaks #10808 (@julien-c)
Sort init import #10801 (@sgugger)
[wav2vec sprint doc] add doc for Local machine #10828 (@patil-suraj)
Add new community notebook - wav2vec2 with GPT #10794 (@voidful)
[Wav2Vec2] Small improvements for wav2vec2 info script #10829 (@patrickvonplaten)
[Wav2Vec2] Small tab fix #10846 (@patrickvonplaten)
Fix: typo in FINE_TUNE_XLSR_WAV2VEC2.md #10849 (@qqhann)
Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert #10818 (@dependabot[bot])
[vulnerability] in example deps fix #10817 (@stas00)
Correct AutoConfig call docstrings #10822 (@Sebelino)
[makefile] autogenerate target #10814 (@stas00)
Fix on_step_begin and on_step_end Callback Sequencing #10839 (@siddk)
feat(wandb): logging and configuration improvements #10826 (@borisdayma)
Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823 (@ruanchaves)
Use DataCollatorForSeq2Seq in run_summarization in all cases #10856 (@elsanns)
[Generate] Add save mode logits processor to remove nans and infs if necessary #10769 (@patrickvonplaten)
Make convert_to_onnx runable as script again #10857 (@sgugger)
[trainer] fix nan in full-fp16 label_smoothing eval #10815 (@stas00)
Fix p_mask cls token masking in question-answering pipeline #10863 (@mmaslankowska-neurosys)
Amazon SageMaker Documentation #10867 (@philschmid)
[file_utils] import refactor #10859 (@stas00)
Fixed confusing order of args in generate() docstring #10862 (@RafaelWO)
Sm trainer smp init fix #10870 (@philschmid)
Fix test_trainer_distributed #10875 (@sgugger)
Add new notebook links in the docs #10876 (@sgugger)
error type of tokenizer in init definition #10879 (@ZhengZixiang)
[Community notebooks] Add notebook for fine-tuning Bart with Trainer in two langs #10883 (@elsanns)
Fix overflowing bad word ids #10889 (@LysandreJik)
Remove version warning in pretrained BART models #10890 (@sgugger)
Update Training Arguments Documentation: ignore_skip_data -> ignore_data_skip #10891 (@siddk)
run_glue_no_trainer: datasets -> raw_datasets #10898 (@jethrokuan)
updates sagemaker documentation #10899 (@philschmid)
Fix comment in modeling_t5.py #10886 (@lexhuismans)
Rename NLP library to Datasets library #10920 (@tomy0000000)
[vulnerability] fix dependency #10914 (@stas00)
Add ImageFeatureExtractionMixin #10905 (@sgugger)
Return global attentions (see #7514) #10906 (@gui11aume)
Updated colab links in readme of examples #10932 (@WybeKoper)
Fix initializing BertJapaneseTokenizer with AutoTokenizers #10936 (@singletongue)
Instantiate model only once in pipeline #10888 (@sgugger)
Use pre-computed lengths, if available, when grouping by length #10953 (@pcuenca)
[trainer metrics] fix cpu mem metrics; reformat runtime metric #10937 (@stas00)
[vulnerability] dep fix #10954 (@stas00)
Fixes in the templates #10951 (@sgugger)
Sagemaker test #10925 (@philschmid)
Fix summarization notebook link #10959 (@philschmid)
improved sagemaker documentation for git_config and examples #10966 (@philschmid)
Fixed a bug where the pipeline.framework would actually contain a fully qualified model. #10970 (@Narsil)
added py7zr #10971 (@philschmid)
fix md file to avoid evaluation crash #10962 (@ydshieh)
Fixed some typos and removed legacy url #10989 (@WybeKoper)
Sagemaker test fix #10987 (@philschmid)
Fix the checkpoint for I-BERT #10994 (@LysandreJik)
Add more metadata to the user agent #10972 (@sgugger)
Enforce string-formatting with f-strings #10980 (@sgugger)
In the group by length documentation length is misspelled as legnth #11000 (@JohnnyC08)
Fix Adafactor documentation (recommend correct settings) #10526 (@jsrozner)
Improve the speed of adding tokens from added_tokens.json #10780 (@cchen-dialpad)
Add Vision Transformer and ViTFeatureExtractor #10950 (@NielsRogge)
DebertaTokenizer Rework closes #10258 #10703 (@cronoik)
[doc] no more bucket #10793 (@julien-c)
Layout lm tf 2 #10636 (@atahmasb)
fixed typo: logging instead of logger #11025 (@versis)
Add a script to check inits are consistent #11024 (@sgugger)
fix incorrect case for s|Pretrained|PreTrained| #11048 (@stas00)
[doc] fix code-block rendering #11053 (@erensahin)
Pin docutils #11062 (@LysandreJik)
Remove unnecessary space #11060 (@LysandreJik)
Some models have no tokenizers #11064 (@LysandreJik)
Documentation about loading a fast tokenizer within Transformers #11029 (@LysandreJik)
Add example for registering callbacks with trainers #10928 (@amalad)
Replace pkg_resources with importlib_metadata #11061 (@konstin)
Add center_crop to ImageFeatureExtractionMixin #11066 (@sgugger)
Document common config attributes #11070 (@sgugger)
Fix distributed gather for tuples of tensors of varying sizes #11071 (@sgugger)
Make a base init in FeatureExtractionMixin #11074 (@sgugger)
Add Readme for language modeling scripts with custom training loop and accelerate #11073 (@hemildesai)
HF emoji unicode doesn't work in console #11081 (@stas00)
added social thumbnail for docs #11083 (@philschmid)
added new merged Trainer test #11090 (@philschmid)

`v4.4.2`

Compare Source

Add support for detecting intel-tensorflow version
Fix distributed evaluation on SageMaker with distributed evaluation

`v4.4.1`

Compare Source

`v4.4.0`

Compare Source

v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2

SpeechToText

Two new models are released as part of the S2T implementation: Speech2TextModel and Speech2TextForConditionalGeneration, in PyTorch.

Speech2Text is a speech model that accepts a float tensor of log-mel filter-bank features extracted from the speech signal. It’s a transformer-based seq2seq model, so the transcripts/translations are generated autoregressively.

The Speech2Text model was proposed in fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text

Speech2TextTransformer #10175 (@patil-suraj)

M2M100

Two new models are released as part of the M2M100 implementation: M2M100Model and M2M100ForConditionalGeneration, in PyTorch.

M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks.

The M2M100 model was proposed in Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=m2m_100

Add m2m100 #10236 (@patil-suraj)

I-BERT

Six new models are released as part of the I-BERT implementation: IBertModel, IBertForMaskedLM, IBertForSequenceClassification, IBertForMultipleChoice, IBertForTokenClassification and IBertForQuestionAnswering, in PyTorch.

I-BERT is a quantized version of RoBERTa running inference up to four times faster.

The I-BERT framework in PyTorch allows to identify the best parameters for quantization. Once the model is exported in a framework that supports int8 execution (such as TensorRT), a speedup of up to 4x is visible, with no loss in performance thanks to the parameter search.

The I-BERT model was proposed in I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=ibert

I-BERT model support #10153 (@kssteven418)
[IBert] Correct link to paper #10445 (@patrickvonplaten)
Add I-BERT to README #10462 (@LysandreJik)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text

mBART-50

MBart-50 is created using the original mbart-large-cc25 checkpoint by extending its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50 languages.

The MBart model was presented in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=mbart-50

Add mBART-50 #10154 (@patil-suraj)

DeBERTa-v2

Fixe new models are released as part of the DeBERTa-v2 implementation: DebertaV2Model, DebertaV2ForMaskedLM, DebertaV2ForSequenceClassification, DeberaV2ForTokenClassification and DebertaV2ForQuestionAnswering, in PyTorch.

The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.

It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deberta-v2

Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… #10018 (@BigBird01)
DeBERTa-v2 fixes #10328 (@LysandreJik)

Wav2Vec2

XLSR-Wav2Vec2

The XLSR-Wav2Vec2 model was proposed in Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.

The checkpoint corresponding to that model is added to the model hub: facebook/
wav2vec2-large-xlsr-53

[XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models #10648 (@patrickvonplaten)

Training script

A fine-tuning script showcasing how the Wav2Vec2 model can be trained has been added.

Add Fine-Tuning for Wav2Vec2 #10145 (@patrickvonplaten)

Further improvements

The Wav2Vec2 architecture becomes more stable as several changes are done to its architecture. This introduces feature extractors and feature processors as the pre-processing aspect of multi-modal speech models.

Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC #10089 (@patrickvonplaten)
Fix example in Wav2Vec2 documentation #10096 (@abhishekkrthakur)
[Wav2Vec2] Remove unused config #10457 (@patrickvonplaten)
[Wav2Vec2FeatureExtractor] smal fixes #10455 (@patil-suraj)
[Wav2Vec2] Improve Tokenizer & Model for batched inference #10117 (@patrickvonplaten)
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer #10324 (@patrickvonplaten)
[Wav2Vec2 Example Script] Typo #10547 (@patrickvonplaten)
[Wav2Vec2] Make wav2vec2 test deterministic #10714 (@patrickvonplaten)
[Wav2Vec2] Fix documentation inaccuracy #10694 (@MikeG112)

AMP & XLA Support for TensorFlow models

Most of the TensorFlow models are now compatible with automatic mixed precision and have XLA support.

Add AMP for TF Albert #10141 (@jplu)
Unlock XLA test for TF ConvBert #10207 (@jplu)
Making TF BART-like models XLA and AMP compliant #10191 (@jplu)
Making TF XLM-like models XLA and AMP compliant #10211 (@jplu)
Make TF CTRL compliant with XLA and AMP #10209 (@jplu)
Making TF GPT2 compliant with XLA and AMP #10230 (@jplu)
Making TF Funnel compliant with AMP #10216 (@jplu)
Making TF Lxmert model compliant with AMP #10257 (@jplu)
Making TF MobileBert model compliant with AMP #10259 (@jplu)
Making TF MPNet model compliant with XLA #10260 (@jplu)
Making TF T5 model compliant with AMP and XLA #10262 (@jplu)
Making TF TransfoXL model compliant with AMP #10264 (@jplu)
Making TF OpenAI GPT model compliant with AMP and XLA #10261 (@jplu)
Rework the AMP for TF XLNet #10274 (@jplu)
Making TF Longformer-like models compliant with AMP #10233 (@jplu)

SageMaker Trainer for model parallelism

We are rolling out experimental support for model parallelism on SageMaker with a new SageMakerTrainer that can be used in place of the regular Trainer. This is a temporary class that will be removed in a future version, the end goal is to have Trainer support this feature out of the box.

Add SageMakerTrainer for model paralellism #10122 (@sgugger)
Extend trainer logging for sm #10633 (@philschmid)
Sagemaker Model Parallel tensoboard writing fix #10403 (@mansimane)
Multiple fixes in SageMakerTrainer #10687 (@sgugger)
Add DistributedSamplerWithLoop #10746 (@sgugger)

General improvements and bugfixes

[trainer] deepspeed bug fixes and tests #10039 (@stas00)
Removing run_pl_glue.py from text classification docs, include run_xnli.py & run_tf_text_classification.py #10066 (@cbjuan)
remove token_type_ids from TokenizerBertGeneration output #10070 (@sadakmed)
[deepspeed tests] transition to new tests dir #10080 (@stas00)
Added integration tests for Pytorch implementation of the ELECTRA model #10073 (@spatil6)
Fix naming in TF MobileBERT #10095 (@jplu)
[examples/s2s] add test set predictions #10085 (@patil-suraj)
Logging propagation #10092 (@LysandreJik)
Fix some edge cases in report_to and add deprecation warnings #10100 (@sgugger)
Add head_mask and decoder_head_mask to TF LED #9988 (@stancld)
Replace strided slice with tf.expand_dims #10078 (@jplu)
Fix Faiss Import #10103 (@patrickvonplaten)
[RAG] fix generate #10094 (@patil-suraj)
Fix TFConvBertModelIntegrationTest::test_inference_masked_lm Test #10104 (@abhishekkrthakur)
doc: update W&B related doc #10086 (@borisdayma)
Remove speed metrics from default compute objective [WIP]#10107 (@shiva-z)
Fix tokenizers training in notebooks #10110 (@n1t0)
[DeepSpeed docs] new information #9610 (@stas00)
[CI] build docs faster #10115 (@stas00)
[scheduled github CI] add deepspeed fairscale deps #10116 (@stas00)
Line endings should be LF across repo and not CRLF #10119 (@LysandreJik)
Fix TF LED/Longformer attentions computation #10007 (@jplu)
remove adjust_logits_during_generation method #10087 (@patil-suraj)
[DeepSpeed] restore memory for evaluation #10114 (@stas00)
Update run_xnli.py to use Datasets library #9829 (@Qbiwan)
Add new community notebook - Blenderbot #10126 (@lordtt13)
[DeepSpeed in notebooks] Jupyter + Colab #10130 (@stas00)
[examples/run_s2s] remove task_specific_params and update rouge computation #10133 (@patil-suraj)
Fix typo in GPT2DoubleHeadsModel docs #10148 (@M-Salti)
[hf_api] delete deprecated methods and tests #10159 (@julien-c)
Revert propagation #10171 (@LysandreJik)
Conversion from slow to fast for BPE spm vocabs contained an error. #10120 (@Narsil)
Fix typo in comments #10157 (@mrm8488)
Fix typo in comment #10156 (@mrm8488)
[Doc] Fix version control in internal pages #10124 (@sgugger)
[t5 tokenizer] add info logs #9897 (@stas00)
Fix v2 model loading issue #10129 (@BigBird01)
Fix datasets set_format #10178 (@sgugger)
Fixing NER pipeline for list inputs. #10184 (@Narsil)
Add new model to labels that should not stale #10187 (@LysandreJik)
Check TF ops for ONNX compliance #10025 (@jplu)
[RAG] fix tokenizer #10167 (@patil-suraj)
Fix TF template #10189 (@jplu)
fix run_seq2seq.py; porting trainer tests to it #10162 (@stas00)
Specify dataset dtype #10195 (@LysandreJik)
[CI] make the examples sub-group of tests run always #10196 (@stas00)
[WIP][examples/seq2seq] move old s2s scripts to legacy #10136 (@patil-suraj)
set tgt_lang of MBart Tokenizer for summarization #10205 (@HeroadZ)
Store FLOS as floats to avoid overflow. #10213 (@sgugger)
Fix add_token_positions in custom datasets tutorial #10217 (@joeddav)
[trainer] fix ignored columns logger #10219 (@stas00)
Factor out methods #10215 (@LysandreJik)
Fix head masking for TFT5 models #9877 (@stancld)
[CI] 2 fixes #10248 (@stas00)
[trainer] refactor place_model_on_device logic, add deepspeed #10243 (@stas00)
[Trainer] doc update #10241 (@stas00)
Reduce the time spent for the TF slow tests #10152 (@jplu)
Introduce warmup_ratio training argument #10229 (@tanmay17061)
[Trainer] memory tracker metrics #10225 (@stas00)
Script for distilling zero-shot classifier to more efficient student #10244 (@joeddav)
[test] fix func signature #10271 (@stas00)
[trainer] implement support for full fp16 in evaluation/predict #10268 (@stas00)
[ISSUES.md] propose using google colab to reproduce problems #10270 (@stas00)
Introduce logging_strategy training argument #10267 (@tanmay17061)
[CI] Kill any run-away pytest processes #10281 (@stas00)
Patch zero shot distillation script cuda issue #10284 (@joeddav)
Move the TF NER example #10276 (@jplu)
Fix example links in the task summary #10291 (@sgugger)
fixes #10303 #10304 (@cronoik)
[ci] don't fail when there are no zombies #10308 (@stas00)
fix typo in conversion script #10316 (@tagucci)
Add note to resize token embeddings matrix when adding new tokens to voc #10331 (@LysandreJik)
Deprecate prepare_seq2seq_batch #10287 (@sgugger)
[examples/seq2seq] defensive programming + expand/correct README #10295 (@stas00)
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration #10310 (@stas00)
Loading from last checkpoint functionality in Trainer.train #10334 (@tanmay17061)
[trainer] add Trainer methods for metrics logging and saving #10266 (@stas00)
Fix evaluation with label smoothing in Trainer #10338 (@sgugger)
Fix broken examples/seq2seq/README.md markdown #10344 (@Wikidepia)
[bert-base-german-cased] use model repo, not external bucket #10353 (@julien-c)
[Trainer/Deepspeed] handle get_last_lr() before first step() #10362 (@stas00)
ConvBERT fix torch <> tf weights conversion #10314 (@abhishekkrthakur)
fix deprecated reference tokenizer.max_len in glue.py #10220 (@poedator)
[trainer] move secondary methods into a separate file #10363 (@stas00)
Run GA on every push even on forks #10383 (@LysandreJik)
GA: only run model templates once #10388 (@LysandreJik)
Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding #10200 (@mingruimingrui)
Remove unused variable in example for Q&A #10392 (@abhishekkrthakur)
Ignore unexpected weights from PT conversion #10397 (@LysandreJik)
Add support for ZeRO-2/3 and ZeRO-offload in fairscale #10354 (@sgugger)
Fix None in add_token_positions - issue #10210 #10374 (@andreabac3)
Make Barthez tokenizer tests a bit faster #10399 (@sgugger)
Fix run_glue evaluation when model has a label correspondence #10401 (@sgugger)
[ci, flax] non-existing models are unlikely to pass tests #10409 (@julien-c)
[LED] Correct Docs #10419 (@patrickvonplaten)
Add Ray Tune hyperparameter search integration test #10414 (@krfricke)
Ray Tune Integration Bug Fixes #10406 (@amogkam)
[examples] better model example #10427 (@stas00)
Fix conda-build #10431 (@LysandreJik)
[run_seq2seq.py] restore functionality: saving to test_generations.txt #10428 (@stas00)
updated logging and saving metrics #10436 (@bhadreshpsavani)
Introduce save_strategy training argument #10286 (@tanmay17061)
Adds terms to Glossary #10443 (@darigovresearch)
Fixes compatibility bug when using grouped beam search and constrained decoding together #10475 (@mnschmit)
Generate can return cross-attention weights too #10493 (@Mehrad0711)
Fix typos #10489 (@WybeKoper)
[T5] Fix speed degradation bug t5 #10496 (@patrickvonplaten)
feat(docs): navigate with left/right arrow keys #10481 ([@ydcjeff](https://togi

Configuration

📅 Schedule: At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻️ Rebasing: Renovate will not automatically rebase this PR, because other commits have been found.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box.

This PR has been generated by WhiteSource Renovate. View repository job log here.

renovate · 2021-08-28T10:35:23Z

Autoclosing Skipped

This PR has been flagged for autoclosing. However, it is being skipped due to the branch being already modified. Please close/delete it manually or report a bug if you think this is in error.

renovate-bot and others added 2 commits May 6, 2021 21:22

Update dependency transformers to v4.5.1

8bbab8f

Add dvc outputs

3fd04ad

renovate bot changed the title ~~Update dependency transformers to v4.5.1~~ Update dependency transformers to v4.5.1 - abandoned Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update dependency transformers to v4.5.1 - abandoned #178

Update dependency transformers to v4.5.1 - abandoned #178

Uh oh!

renovate bot commented May 6, 2021 •

edited

Loading

Uh oh!

renovate bot commented Aug 28, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update dependency transformers to v4.5.1 - abandoned #178

Are you sure you want to change the base?

Update dependency transformers to v4.5.1 - abandoned #178

Uh oh!

Conversation

renovate bot commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v4.5.1

v4.5.0

v4.5.0: BigBird, GPT Neo, Examples, Flax support

BigBird (@​vasudevgupta7)

GPT Neo (@​patil-suraj)

Examples

Raw training loop examples

Standardize examples with Trainer

Trainer & SageMaker Model Parallelism

FLAX

General improvements and bugfixes

v4.4.2

v4.4.1

v4.4.0

v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2

SpeechToText

M2M100

I-BERT

mBART-50

DeBERTa-v2

Wav2Vec2

XLSR-Wav2Vec2

Training script

Further improvements

AMP & XLA Support for TensorFlow models

SageMaker Trainer for model parallelism

General improvements and bugfixes

Configuration

Uh oh!

renovate bot commented Aug 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Autoclosing Skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

renovate bot commented May 6, 2021 •

edited

Loading

`v4.5.1`

`v4.5.0`

BigBird (@vasudevgupta7)

GPT Neo (@patil-suraj)

`v4.4.2`

`v4.4.1`

`v4.4.0`

renovate bot commented Aug 28, 2021 •

edited

Loading