Rewrite TensorFlow train_step and test_step #17057

Rocketknight1 · 2022-05-03T11:01:34Z

Draft PR for a full rewrite of the TF train/test steps. I swear this will fix like 50% of our TF issues in one PR.

Current status:

Correctly handles output mapping across most model classes for losses + metrics
Keras metrics are back, even with the dummy loss. (!!!!)
Keras metrics work correctly even for multi-output models (like QA)
In most cases, users can pass tensors in either the input dict or the labels and the model will handle them correctly.
No more errors when calling fit() when the model has nested output structure (e.g. the model outputting a past tuple)

What's left to do:

Models with multiple unusual outputs that do not match label names may still have issues with metrics. This is relatively uncommon. We support adding a property to those classes to tell Keras what to do with the labels, but we haven't added it to any models yet. (None are failing in tests, so hopefully we won't need to worry too much about this!)
Testing testing testing! I want to rerun all notebooks/examples and make sure the user experience is good.
CI testing - We need to make sure we don't regress on any of this
Discoverability: After this is merged we should update notebooks/examples to show off the cool new features, and document our TF workflow/philosophy somewhere that new users will find.

HuggingFaceDocBuilderDev · 2022-05-03T11:17:01Z

The documentation is not available anymore as the PR was closed or merged.

Rocketknight1 · 2022-05-16T11:33:48Z

(Requesting reviews now that @gante is back)

gante

<3 This is great, Keras users will definitely feel more at home

I've added two comments: a suggestion (for potentially more organized code) and a question. Other than that, LGTM!

gante · 2022-05-16T14:43:28Z

src/transformers/modeling_tf_utils.py

+        if self._label_to_output_map is not None:
+            label_to_output = self._label_to_output_map
+        elif "start_positions" in arg_names:
+            label_to_output = {"start_positions": "start_logits", "end_positions": "end_logits"}
+        elif "sentence_order_label" in arg_names:
+            label_to_output = {"labels": "prediction_logits", "sentence_order_label": "sop_logits"}
+        elif "next_sentence_label" in arg_names:
+            label_to_output = {"labels": "prediction_logits", "next_sentence_label": "seq_relationship_logits"}
+        elif "mc_labels" in arg_names:
+            label_to_output = {"labels": "logits", "mc_labels": "mc_logits"}


Suggestion for discussion: Could these hardcoded defaults be part of the TFXXXLoss (e.g.)? The actual models inherit from these classes (e.g.), and thus we could add them on a per-loss basis, as opposed to having a big if/else in the train/test steps :D

I explored this when writing the PR! I think that would work in a lot of cases, but there are some models which have their own custom losses, and other models that define hf_compute_loss in the model class itself.

So I'm not sure if moving this to the Loss classes would be that easy, but for cleanliness, I can extract this to a method called something like infer_label_to_output_map() and just call that in train_step instead?

Extracting to an external function sounds good 👍 (especially because it is reused between train and test)

gante · 2022-05-16T15:03:57Z

src/transformers/modeling_tf_utils.py

+            if len(y) == 1:
+                _, y = y.popitem()


This converts y from a dictionary with one item to the value of that dictionary entry. Looking below, it seems like it should handle dicts correctly. What's happening here?

The reason I did it this way is to catch more cases, but I realize now I could have been a lot smarter about it. One sec!

Fixed! This code was added because the user often passes a dict where the key is "labels", which is not the name of any of the outputs. The correct thing to do for those models is to map the "labels" tensor to the first model output - I changed this line so that it checks the single key is called "labels" before doing so.

Fixed it up a little more - now we try to map by key name before falling back to mapping to the first output as a last resort.

gante

LGTM 👍

commit 5419205 Author: Patrick von Platen <patrick.v.platen@gmail.com> Date: Thu May 19 23:46:26 2022 +0200 [Test OPT] Add batch generation test opt (huggingface#17359) * up * up commit 48c2269 Author: ddobokki <44228269+ddobokki@users.noreply.github.com> Date: Fri May 20 05:42:44 2022 +0900 Fix bug in Wav2Vec2 pretrain example (huggingface#17326) commit 5d6feec Author: Nathan Dahlberg <58701810+nadahlberg@users.noreply.github.com> Date: Thu May 19 16:21:19 2022 -0400 fix for 17292 (huggingface#17293) commit 518bd02 Author: Patrick von Platen <patrick.v.platen@gmail.com> Date: Thu May 19 22:17:02 2022 +0200 [Generation] Fix Transition probs (huggingface#17311) * [Draft] fix transition probs * up * up * up * make it work * fix * finish * update commit e8714c0 Author: Patrick von Platen <patrick.v.platen@gmail.com> Date: Thu May 19 22:15:36 2022 +0200 [OPT] Run test in lower precision on GPU (huggingface#17353) * [OPT] Run test only in half precision * up * up * up * up * finish * fix on GPU * Update tests/models/opt/test_modeling_opt.py commit 2b28229 Author: Nicolas Patry <patry.nicolas@protonmail.com> Date: Thu May 19 20:28:12 2022 +0200 Adding `batch_size` test to QA pipeline. (huggingface#17330) commit a4386d7 Author: Nicolas Patry <patry.nicolas@protonmail.com> Date: Thu May 19 10:29:16 2022 +0200 [BC] Fixing usage of text pairs (huggingface#17324) * [BC] Fixing usage of text pairs The BC is actually preventing users from misusing the pipeline since users could have been willing to send text pairs and the pipeline would instead understand the thing as a batch returning bogus results. The correct usage of text pairs is preserved in this PR even when that makes the code clunky. Adds support for {"text":..,, "text_pair": ...} inputs for both dataset iteration and more explicit usage to pairs. * Updating the doc. * Update src/transformers/pipelines/text_classification.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/text_classification.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/pipelines/test_pipelines_text_classification.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * quality. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> commit 3601aa8 Author: Stas Bekman <stas00@users.noreply.github.com> Date: Wed May 18 16:00:47 2022 -0700 [tests] fix copy-n-paste error (huggingface#17312) * [tests] fix copy-n-paste error * fix commit 1b20c97 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 18 21:49:08 2022 +0200 Fix ci_url might be None (huggingface#17332) * fix * Update utils/notification_service.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> commit 6aad387 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 18 21:26:44 2022 +0200 fix (huggingface#17337) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 1762ded Author: Zachary Mueller <muellerzr@gmail.com> Date: Wed May 18 14:17:40 2022 -0400 Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts (huggingface#17331) * Fix length in no_trainer examples * Add setup and teardown * Use new accelerator config generator to automatically make tests able to run based on environment commit 6e195eb Author: Jader Martins <jadermcs@users.noreply.github.com> Date: Wed May 18 14:18:43 2022 -0300 docs for typical decoding (huggingface#17186) Co-authored-by: Jader Martins <jadermcs94@gmail.com> commit 060fe61 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 18 19:07:48 2022 +0200 Not send successful report (huggingface#17329) * send report only if there is any failure Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit b3b9f99 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 18 17:57:23 2022 +0200 Fix test_t5_decoder_model_past_large_inputs (huggingface#17320) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 6da76b9 Author: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com> Date: Wed May 18 17:52:13 2022 +0200 Add onnx export cuda support (huggingface#17183) Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit adc0ff2 Author: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Date: Wed May 18 17:47:18 2022 +0200 Add CvT (huggingface#17299) * Adding cvt files * Adding cvt files * changes in init file * Adding cvt files * changes in init file * Style fixes * Address comments from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Format lists in docstring * Fix copies * Apply suggestion from code review Co-authored-by: AnugunjNaman <anugunjjha@gmail.com> Co-authored-by: Ayushman Singh <singhayushman13@protonmail.com> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> commit 4710702 Author: Sylvain Gugger <Sylvain.gugger@gmail.com> Date: Wed May 18 10:46:40 2022 -0400 Fix style commit 5fdb54e Author: mraunak <83710963+mraunak@users.noreply.github.com> Date: Wed May 18 10:39:02 2022 -0400 Add Information Gain Filtration algorithm (huggingface#16953) * Add information gain filtration algorithm * Complying with black requirements * Added author * Fixed import order * flake8 corrections Co-authored-by: Javier Turek <javier.turek@intel.com> commit 91ede48 Author: Kamal Raj <kamalraj97@gmail.com> Date: Wed May 18 19:59:53 2022 +0530 Fix typo (huggingface#17328) commit fe28eb9 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 18 16:06:41 2022 +0200 remove (huggingface#17325) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 2cb2ea3 Author: Nicolas Patry <patry.nicolas@protonmail.com> Date: Wed May 18 16:06:24 2022 +0200 Accepting real pytorch device as arguments. (huggingface#17318) * Accepting real pytorch device as arguments. * is_torch_available. commit 1c9d1f4 Author: Nicolas Patry <patry.nicolas@protonmail.com> Date: Wed May 18 15:46:12 2022 +0200 Updating the docs for `max_seq_len` in QA pipeline (huggingface#17316) commit 60ad734 Author: Patrick von Platen <patrick.v.platen@gmail.com> Date: Wed May 18 15:08:56 2022 +0200 [T5] Fix init in TF and Flax for pretraining (huggingface#17294) * fix init * Apply suggestions from code review * fix * finish * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> commit 7ba1d4e Author: Joaq <55513213+jQuinRivero@users.noreply.github.com> Date: Wed May 18 09:23:47 2022 -0300 Add type hints for ProphetNet (Pytorch) (huggingface#17223) * added type hints to prophetnet * reformatted with black * fix bc black misformatted some parts * fix imports * fix imports * Update src/transformers/models/prophetnet/configuration_prophetnet.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * update OPTIONAL type hint and docstring Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> commit d6b8e9c Author: Carl <carl.cochet@gmail.com> Date: Wed May 18 01:07:43 2022 +0200 Add trajectory transformer (huggingface#17141) * Add trajectory transformer Fix model init Fix end of lines for .mdx files Add trajectory transformer model to toctree Add forward input docs Fix docs, remove prints, simplify prediction test Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update docs, more descriptive comments Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update readme Small comment update and add conversion script Rebase and reformat Fix copies Fix rebase, remove duplicates Fix rebase, remove duplicates * Remove tapex * Remove tapex * Remove tapex commit c352640 Author: Patrick von Platen <patrick.v.platen@gmail.com> Date: Wed May 18 00:34:31 2022 +0200 fix (huggingface#17310) commit d9050dc Author: Cesare Campagnano <cesare.campagnano@gmail.com> Date: Tue May 17 23:44:37 2022 +0200 [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing (huggingface#17112) * [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing * LED docs clarification Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [LED] gradient_checkpointing=True should be passed to TrainingArguments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [LED] docs: remove wrong word Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * [LED] docs fix typo Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> commit bad3583 Author: Jean Vancoppenolle <jean.vcop@gmail.com> Date: Tue May 17 23:42:14 2022 +0200 Add support for pretraining recurring span selection to Splinter (huggingface#17247) * Add SplinterForSpanSelection for pre-training recurring span selection. * Formatting. * Rename SplinterForSpanSelection to SplinterForPreTraining. * Ensure repo consistency * Fixup changes * Address SplinterForPreTraining PR comments * Incorporate feedback and derive multiple question tokens per example. * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Jean Vancoppenole <jean.vancoppenolle@retresco.de> Co-authored-by: Tobias Günther <tobias.guenther@retresco.de> Co-authored-by: Tobias Günther <github@tobigue.de> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> commit 0511305 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue May 17 18:56:58 2022 +0200 Add PR author in CI report + merged by info (huggingface#17298) * Add author info to CI report * Add merged by info * update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 032d63b Author: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Date: Tue May 17 12:56:24 2022 -0400 Fix dummy creation script (huggingface#17304) commit 986dd5c Author: Sylvain Gugger <Sylvain.gugger@gmail.com> Date: Tue May 17 12:50:14 2022 -0400 Fix style commit 38ddab1 Author: Karim Foda <35491698+KMFODA@users.noreply.github.com> Date: Tue May 17 09:32:12 2022 -0700 Doctest longformer (huggingface#16441) * Add initial doctring changes * make fixup * Add TF doc changes * fix seq classifier output * fix quality errors * t * swithc head to random init * Fix expected outputs * Update src/transformers/models/longformer/modeling_longformer.py Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> commit 10704e1 Author: Patrick von Platen <patrick.v.platen@gmail.com> Date: Tue May 17 18:20:36 2022 +0200 [Test] Fix W2V-Conformer integration test (huggingface#17303) * [Test] Fix W2V-Conformer integration test * correct w2v2 * up commit 28a0811 Author: regisss <15324346+regisss@users.noreply.github.com> Date: Tue May 17 17:58:14 2022 +0200 Improve mismatched sizes management when loading a pretrained model (huggingface#17257) - Add --ignore_mismatched_sizes argument to classification examples - Expand the error message when loading a model whose head dimensions are different from expected dimensions commit 1f13ba8 Author: Patrick von Platen <patrick.v.platen@gmail.com> Date: Tue May 17 15:48:23 2022 +0200 correct opt (huggingface#17301) commit 349f1c8 Author: Matt <Rocketknight1@users.noreply.github.com> Date: Tue May 17 14:36:23 2022 +0100 Rewrite TensorFlow train_step and test_step (huggingface#17057) * Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method commit 651e48e Author: Matt <Rocketknight1@users.noreply.github.com> Date: Tue May 17 14:14:17 2022 +0100 Fix tests of mixed precision now that experimental is deprecated (huggingface#17300) * Fix tests of mixed precision now that experimental is deprecated * Fix mixed precision in training_args_tf.py too commit 6d21142 Author: SaulLu <55560583+SaulLu@users.noreply.github.com> Date: Tue May 17 14:33:13 2022 +0200 fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231)

fix tokenizer autodoc fix minor CI issues fix minor CI issues fix minor CI issues fix style issue fix minor import issues fix few issues remove def main on the test add require torch replace decorator with 'with' fix style change to bloom add quick fix tokenizer fix tokenizer file fix tokenizer - merge tests - small fixes fix import issue add bloom to readme fix consistency Update docs/source/en/model_doc/bloom.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Apply suggestions from code review fix comment issues on file headers Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> fix doc issue small fix - modeling test some changes - refactor some code - taking into account reviews - more tests should pass - removed pruning tests remove useless division more tests should pass more tests should pass more tests should pass let's try this one -add alibi offset - remove all permutes to make the grad operations work - finger crossed Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) (huggingface#17194) * Update data2vec.mdx * Update data2vec.mdx * Update docs/source/en/model_doc/data2vec.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Dev version Add test to ensure models can take int64 inputs (huggingface#17210) * Add test to ensure models can take int64 inputs * is_integer is an attribute, not a method * Fix test when some inputs aren't tensors * Add casts to blenderbot and blenderbot-small * Add casts to the other failing models Fix dependency table update BART docs (huggingface#17212) Black preview (huggingface#17217) * Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black Fix typo in bug report template (huggingface#17178) * Fix typo * Force rerun workflows Co-authored-by: Felix Marty <felix@huggingface.co> Added translation of installation.mdx to Portuguese Issue huggingface#16824 (huggingface#16979) * Added translation of installation.mdx to Portuguese, as well as default templates of _toctree.yml and _config.py * [ build_documentation.yml ] - Updated doc_builder to build documentation in Portuguese. [ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx. * [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder. [ pipeline_tutorial.mdx ] - Grammar changes. * [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial. * [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial. [ training.mdx ] - Added portuguese translation for training tutorial. * [ preprocessing.mdx ] - WIP * Update _toctree.yml * Adding Pré-processamento to _toctree.yml * Update accelerate.mdx * Nits and eliminate preprocessing file while it is ready Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> OPT-fix (huggingface#17229) * try fixes * Revert "try fixes" This reverts commit a8ad75e. * add correct shape * add correct path OPT - fix docstring and improve tests slighly (huggingface#17228) * correct some stuff * fix doc tests * make style Update self-push workflow (huggingface#17177) * update push ci * install git-python * update comment * update deepspeed jobs * fix report * skip 2 more tests that require fairscale * Fix changes in test_fetcher.py (to deal with `setup.py` is changed) * set RUN_PT_TF_CROSS_TESTS=1 and final clean-up * remove SIGOPT_API_TOKEN * remove echo "$matrix_folders" Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> fix --gpus option for docker (huggingface#17235) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Handle copyright in add-new-model-like (huggingface#17218) Fix Trainer for Datasets that don't have dict items (huggingface#17239) install dev. version of accelerate (huggingface#17243) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Fix push CI channel (huggingface#17242) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Add PR title to push CI report (huggingface#17246) * add PR title to push CI report * add link Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial (huggingface#17076) * [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial * Delete docs/source/pt-br directory * [ fast_tokenizers.mdx ] - Continuing work on file * [ fast_tokenizers.mdx ] - Continuing work on file * Add fast tokenizers to _toctree.yml * Eliminated config and toctree.yml * Nits in fast_tokenizers.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Translated version of model_sharing.mdx doc to spanish (huggingface#16184) * Translated version of model_sharing to spanish * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Update docs/source_es/model_sharing.mdx * Addind model sharing to _toctree.yml Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Guide to create custom models in Spanish (huggingface#17158) * file copied and toctree updated * Intro and configuration translated * model section translated * enter hotfix * Translation over, correction pending * Typos and corrections * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> * Update docs/source/es/create_a_model.mdx Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Co-authored-by: Omar U. Espejel <espejelomar@gmail.com> Fix obvious typos in flax decoder impl (huggingface#17279) Change config.encoder_ffn_dim -> config.decoder_ffn_dim for decoder. TF - Fix convnext classification example (huggingface#17261) [WIP] [doc] performance/scalability revamp (huggingface#15723) * [doc] performance/scalability revamp * link the new docs * no : * mixed precision * work on the first doc * expand the main doc * Trigger CI * style * revamp single GPU training section * work on training performance * remove files not used anymore or will be added later * final touches * fix rebase * Add hardware section to toctree * fix toctree again * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove `fast_tokenizers` entry that was copied in rebase * add warning about DP vs DDP * remove todo * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix missing closure of codeblock * Update docs/source/en/perf_train_gpu_many.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * sync with huggingface#16860 * update toc Co-authored-by: leandro <leandro.vonwerra@spoud.io> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> fixed bug in run_mlm_flax_stream.py (huggingface#17203) * fixed bug run_mlm_flax_stream.py Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output. * Update run_mlm_flax_stream.py * adding missing paranthesis * formatted to black * remove cols from dataset instead * reformat to black * moved rem. columns to map * formatted to black Co-authored-by: KennethEnevoldsen <kennethcenevolsen@gmail.com> Updated checkpoint support for Sagemaker Model Parallel (huggingface#17219) * adding partial checkpoint support for optimizer state * formatted trainer.py * Refactoring based on comments * reformatting * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update codeparrot data preprocessing (huggingface#16944) * add new preprocessing arguments * add new filters * add new filters to readme * fix config and test count, update function names and docstrings * reformat code * update readme * Update readme * rename config_test filter Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * rename few_assignments filter Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * rename tokenizer in arguments Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * rename functions and add limit_line argument for config_test filter * update threshold for config_test filter Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com> CodeParrot data pretokenization (huggingface#16932) * add pretokenization arguments * add pretokenization script * add support for pretokenized data * reformat code * fix run command for training * fix model call from config * remove a package * add comments on pretokenization in the readme * remove explicit parallelization Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * update readme Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * update readme -remove username Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * update readme -remove username Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * keep data parallelization * reformat code * reformat code * update readme * reformat code * Update examples/research_projects/codeparrot/README.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com> Remove next sentence prediction from supported ONNX tasks (huggingface#17276) Align logits and labels in OPT (huggingface#17237) Mlflowcallback fix nonetype error (huggingface#17171) * Fix edge cases TypeError: 'NoneType' object is not callable * fix style Automatically sort auto mappings (huggingface#17250) * Automatically sort auto mappings * Better class extraction * Some auto class magic * Adapt test and underlying behavior * Remove re-used config * Quality Make TrainerHyperParameterSigOptIntegrationTest slow test (huggingface#17288) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Better error in the Auto API when a dep is missing (huggingface#17289) Fix FlavaForPreTrainingIntegrationTest CI test (huggingface#17232) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Use the PR URL in CI report (huggingface#17269) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> logging documentation update (huggingface#17174) * logging documentation * style Co-authored-by: Sander Land <sander@chatdesk.com> docs(transformers): fix typo (huggingface#17263) Add Tensorflow Swin model (huggingface#16988) Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> [Tests] Fix slow opt tests (huggingface#17282) * fix opt tests * remove unused tok * make style * make flake8 happy * Update tests/models/opt/test_modeling_opt.py Fix test_model_parallelization (huggingface#17249) * Fix test_model_parallelization * Modify Add Wav2Vec2Conformer (huggingface#16812) * save intermediate * add wav2vec2 conformer * add more code * more * first test passes * make all checkpoints work * update * up * more clean ups * save clean-up * save clean-up * save more * remove bogus * finalize design conformer * remove vision * finish all tests * more changes * finish code * add doc tests * add slow tests * fix autoconfig test * up * correct docstring * up * update * fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Update docs/source/en/model_doc/wav2vec2-conformer.mdx * upload * save copied from * correct configs * fix model outputs * add to docs * fix imports * finish * finish code * correct copied from * correct again * correct make fix * improve make fix copies * save * correct fix copy from * correct init structure * correct * fix import * apply suggestions Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Fix missing job action button in CI report (huggingface#17270) * use matrix.machine_type * fix job names used in job_link Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Fix wrong PT/TF categories in CI report (huggingface#17272) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> [ConvNeXT] Fix drop_path_rate (huggingface#17280) * Fix drop_path_rate * Fix TF's drop path rate fix retribert's `test_torch_encode_plus_sent_to_model` (huggingface#17231) Fix tests of mixed precision now that experimental is deprecated (huggingface#17300) * Fix tests of mixed precision now that experimental is deprecated * Fix mixed precision in training_args_tf.py too Rewrite TensorFlow train_step and test_step (huggingface#17057) * Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method correct opt (huggingface#17301) refactor - refactor code - style changes - add new threshold for test major changes - change BLOOM to Bloom - add quick doc on bloom.mdx - move embeddings test on modeling test modify readme small fixes small fix - better threshold for a test remove old test file from fetcher fix small typo major change - change BloomLMHead to BloomForCausalLM remove onnx config major changes - refactor the code - remove asserts - change tol for test make style small change adding a slow test + commenting old ones for now make style Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> make style fix duplicates cleaning comments on config clean a bit conversion file refacor a bit modeling file refactor tokenizer file fix tokenization test issue fix tokenization issue second try fix tokenization issue #2 fix test issue make style + add suggestions change test fetcher try this one - slow tests should pass - finger crossed possible final changes make style try fix padding side issue fix side fix padding issue fix ko-readme fix config auto cleaning modeling file keep bloom in caps in ko update config docs remove pretraining_pp remove model parallel update config - add correct config files fix duplicates fix fetcher fix refactor issue - remove divide function try to remove alibi small fixes - fix alibi - remove seq length - refactor a bit the code put correct values - fix bos and eos token ids fix attention mask loop Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com> small fixes: - remove skip bias add small fixes - fix typo in readme - fix typos in config small changes - remove a test - add reconstruction test - change config small changes - change Scaled Softmax to BloomScaledSoftmax small fixes - fix alibi dtype major changes - removing explicit dtype when loading modules - fixing test args (torch_dtype=auto) - add dosctring fix readmes major changes - now bloom supports alibi shifting - refactor a bit the code - better test tolerance now refactor a bit refactor a bit put correct name on test change docstring small changes - fix docstring modeling - fix test tolerance fix small nit - take dtype from tensors in the conversion script minor fix - fix mdx issue minor fix - change config docstring forward contrib credits from PR14084 Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> apply modifications Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> resolve softmax upcast Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> final changes modeling Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2' merge commit Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> apply suggestions Apply suggestions from Stas comments Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Initial commit * Better label renaming * Remove breakpoint before pushing (this is your job) * Test a lot more in the Keras fit() test * make fixup * Clarify the case where we flatten y dicts into tensors * Clarify the case where we flatten y dicts into tensors * Extract label name remapping to a method

Rocketknight1 marked this pull request as ready for review May 3, 2022 13:54

Rocketknight1 added 5 commits May 9, 2022 17:13

Initial commit

a414e65

Better label renaming

03e5208

Remove breakpoint before pushing (this is your job)

a3dcca3

Test a lot more in the Keras fit() test

aa0aee1

make fixup

b7db255

Rocketknight1 force-pushed the tf_train_step_rewrite branch from 57a8ab0 to b7db255 Compare May 9, 2022 16:13

Rocketknight1 requested review from gante and sgugger May 16, 2022 11:33

gante reviewed May 16, 2022

View reviewed changes

Rocketknight1 added 2 commits May 16, 2022 17:26

Clarify the case where we flatten y dicts into tensors

95701a2

Clarify the case where we flatten y dicts into tensors

af1ae3c

gante approved these changes May 16, 2022

View reviewed changes

Extract label name remapping to a method

a9fdf07

Rocketknight1 merged commit 349f1c8 into main May 17, 2022

Rocketknight1 deleted the tf_train_step_rewrite branch May 17, 2022 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite TensorFlow train_step and test_step #17057

Rewrite TensorFlow train_step and test_step #17057

Rocketknight1 commented May 3, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented May 3, 2022 •

edited

Loading

Rocketknight1 commented May 16, 2022

gante left a comment

gante May 16, 2022

Rocketknight1 May 16, 2022

gante May 16, 2022

gante May 16, 2022

Rocketknight1 May 16, 2022

Rocketknight1 May 16, 2022

Rocketknight1 May 16, 2022

gante left a comment

Rewrite TensorFlow train_step and test_step #17057

Rewrite TensorFlow train_step and test_step #17057

Conversation

Rocketknight1 commented May 3, 2022 • edited Loading

HuggingFaceDocBuilderDev commented May 3, 2022 • edited Loading

Rocketknight1 commented May 16, 2022

gante left a comment

Choose a reason for hiding this comment

gante May 16, 2022

Choose a reason for hiding this comment

Rocketknight1 May 16, 2022

Choose a reason for hiding this comment

gante May 16, 2022

Choose a reason for hiding this comment

gante May 16, 2022

Choose a reason for hiding this comment

Rocketknight1 May 16, 2022

Choose a reason for hiding this comment

Rocketknight1 May 16, 2022

Choose a reason for hiding this comment

Rocketknight1 May 16, 2022

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

Rocketknight1 commented May 3, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented May 3, 2022 •

edited

Loading