🤏 New models for tests #2287

qgallouedec · 2024-10-27T17:29:34Z

What does this PR do?

The models currently used for testing lack uniformity and practicality. Specifically:

They are sourced from various namespaces (trl-internal-testing, facebook, philschmid, hf-internal-testing, etc.).
They are not all small.
Their naming conventions are inconsistent (e.g., tiny-random-MistralForCausalLM, dummy-GPT2-correct-vocab, gpt2, pythia-14m).
There is no existing script for generating these models.

This PR introduces the following improvements:

A script to create tiny models specifically for testing purposes.
A uniform naming convention for these models, all placed under a single namespace (trl-internal-testing).

When approved, I'll move every models into trl-internal-testing namespace (instead of qgallouedec). And probably remove all the old testing models in trl-internal-testing.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-10-27T17:35:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2024-11-20T21:07:10Z

tests/test_kto_trainer.py

-            self.assertListEqual(tokenized_dataset["prompt_input_ids"][0], [5377, 11141])
-            self.assertListEqual(tokenized_dataset["prompt_attention_mask"][0], [1, 1])
-            self.assertListEqual(tokenized_dataset["answer_input_ids"][0], [318, 1365, 621, 8253, 13])
+            self.assertListEqual(tokenized_dataset["prompt_input_ids"][0], [31137])
+            self.assertListEqual(tokenized_dataset["prompt_attention_mask"][0], [1])
+            self.assertListEqual(tokenized_dataset["answer_input_ids"][0], [374, 2664, 1091, 16965, 13])


The tokenizer has changed, so these value have also changed.

qgallouedec · 2024-11-20T21:07:47Z

tests/test_kto_trainer.py

-            self.assertListEqual(processed_dataset["prompt_input_ids"][0], [50256, 5377, 11141])
-            self.assertListEqual(processed_dataset["prompt_attention_mask"][0], [1, 1, 1])
+            self.assertListEqual(processed_dataset["prompt_input_ids"][0], [31137])
+            self.assertListEqual(processed_dataset["prompt_attention_mask"][0], [1])
            self.assertListEqual(
-                processed_dataset["completion_input_ids"][0], [50256, 5377, 11141, 318, 1365, 621, 8253, 13, 50256]
-            )
-            self.assertListEqual(processed_dataset["completion_attention_mask"][0], [1, 1, 1, 1, 1, 1, 1, 1, 1])
-            self.assertListEqual(
-                processed_dataset["completion_labels"][0], [-100, -100, -100, 318, 1365, 621, 8253, 13, 50256]
+                processed_dataset["completion_input_ids"][0], [31137, 374, 2664, 1091, 16965, 13, 151645]
            )
+            self.assertListEqual(processed_dataset["completion_attention_mask"][0], [1, 1, 1, 1, 1, 1, 1])
+            self.assertListEqual(processed_dataset["completion_labels"][0], [-100, 374, 2664, 1091, 16965, 13, 151645])


The tokenizer has changed, so this value has also changed.

qgallouedec · 2024-11-20T21:08:36Z

tests/test_modeling_value_head.py

-    "trl-internal-testing/tiny-random-BartForConditionalGeneration",
-    "trl-internal-testing/tiny-random-BigBirdPegasusForConditionalGeneration",
-    "trl-internal-testing/tiny-random-BlenderbotForConditionalGeneration",
-    "trl-internal-testing/tiny-random-BlenderbotSmallForConditionalGeneration",
-    "trl-internal-testing/tiny-random-FSMTForConditionalGeneration",
-    "trl-internal-testing/tiny-random-LEDForConditionalGeneration",
-    "trl-internal-testing/tiny-random-LongT5ForConditionalGeneration",
-    "trl-internal-testing/tiny-random-M2M100ForConditionalGeneration",
-    "trl-internal-testing/tiny-random-MarianMTModel",
-    "trl-internal-testing/tiny-random-MBartForConditionalGeneration",
-    "trl-internal-testing/tiny-random-MT5ForConditionalGeneration",
-    "trl-internal-testing/tiny-random-MvpForConditionalGeneration",
-    "trl-internal-testing/tiny-random-PegasusForConditionalGeneration",
-    "trl-internal-testing/tiny-random-PegasusXForConditionalGeneration",
-    "trl-internal-testing/tiny-random-PLBartForConditionalGeneration",
-    "trl-internal-testing/tiny-random-ProphetNetForConditionalGeneration",
-    "trl-internal-testing/tiny-random-SwitchTransformersForConditionalGeneration",
-    "trl-internal-testing/tiny-random-T5ForConditionalGeneration",
+    "qgallouedec/tiny-T5ForConditionalGeneration",
+    "qgallouedec/tiny-BartModel",


Only use the two most popular enc-dec models

qgallouedec · 2024-11-20T21:09:43Z

tests/test_online_dpo_trainer.py

        self.model = AutoModelForCausalLM.from_pretrained(self.model_id)
        self.ref_model = AutoModelForCausalLM.from_pretrained(self.model_id)
-        self.reward_model = AutoModelForSequenceClassification.from_pretrained("EleutherAI/pythia-14m", num_labels=1)
-        self.reward_tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-14m")
-        self.reward_tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE


The model already have a chat template

qgallouedec · 2024-11-20T21:10:10Z

tests/test_orpo_trainer.py

        self.t5_model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
        self.t5_tokenizer = AutoTokenizer.from_pretrained(model_id)
+        self.t5_tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE


T5 doesn't have a chat template

qgallouedec · 2024-11-20T21:10:58Z

tests/test_reward_trainer.py

        self.tokenizer = AutoTokenizer.from_pretrained(self.model_id)
-        self.tokenizer.chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"


The model already has a chat template

qgallouedec · 2024-11-20T21:11:53Z

tests/test_reward_trainer.py

        self.model = AutoModelForSequenceClassification.from_pretrained(self.model_id)
+        self.model.config.pad_token_id = self.tokenizer.pad_token_id


needed for batched forward in the model

qgallouedec · 2024-11-20T21:15:49Z

tests/test_utils.py

@@ -205,6 +205,7 @@ def setUp(self):
            ignore_index=self.ignore_index,
        )

+    @unittest.skip("This test must be updated.")


This test seems very related to the way the codellama/CodeLlama-7b-Instruct-hf tokenizer works (especially on BOS and EOS) and is not general. It should be rewritten in the future.

That model is largely outdated in any case, so I would be fine to replace it with something more recent / standard like StarCoder2 or QwenCoder / DeepSeekCoder

… test_dpo_trainer.py

qgallouedec · 2024-11-20T23:24:55Z

tests/test_dpo_trainer.py

-            ["trl-internal-testing/tiny-random-paligemma"],
-            ["trl-internal-testing/tiny-random-llava-1.5"],
+            ("qgallouedec/tiny-Idefics2ForConditionalGeneration",),
+            # ("qgallouedec/tiny-PaliGemmaForConditionalGeneration",),


paligemma doesn't have a chat template (not meant for chat). We've to find a way to test it. Dedicated method? We can do it in the future

lewtun

Excellent refactor and massive QoL improvements to the tests @qgallouedec! LGTM with a minor question about whether you want to distinguish between tiny base and instruct models

lewtun · 2024-11-25T12:59:33Z

scripts/generate_tiny_models.py

For reference, we have a similar script in transformers in case you want to see the generic case: https://github.com/huggingface/transformers/blob/a0f4f3174f4aee87dd88ffda95579f7450934fc8/utils/create_dummy_models.py#L1403

lewtun · 2024-11-25T13:01:04Z

scripts/generate_tiny_models.py

+
+# Decoder models
+for model_id, config_class, model_class, suffix in [
+    ("bigscience/bloomz-560m", BloomConfig, BloomForCausalLM, None),


After this PR is merged, I would be in favour of just relying on a small, curated set of popular architectures for our tests (e.g. Qwen / Mistral / Llama / Gemma) and remove all the rest where appropriate

Also, is this script supposed to be re-run whenever we add a model to the list? If so, I recommend adding a note either at the top of this script or in our contributor guide

of just relying on a small, curated set of popular architectures

Would you remove any model from this list?

supposed to be re-run whenever we add a model

Yes.

adding a note

I added a note in the script in c851842

lewtun · 2024-11-25T13:08:53Z

scripts/generate_tiny_models.py

+
+
+def push_to_hub(model, tokenizer, suffix=None):
+    model_class_name = model.__class__.__name__


Not sure it matters much, but this won't make a distinction between base and instruct models as they share the same class. If we don't care about this difference in our tests, no need to change it

Yes, I wasn't sure what to do about that. Most of our tests are based on Qwen2.5 in its instruct version. So I don't know how compatible the trainers are to non-instruct versions. Let's keep it like this for the moment.

lewtun · 2024-11-25T13:12:33Z

tests/test_utils.py

@@ -205,6 +205,7 @@ def setUp(self):
            ignore_index=self.ignore_index,
        )

+    @unittest.skip("This test must be updated.")


That model is largely outdated in any case, so I would be fine to replace it with something more recent / standard like StarCoder2 or QwenCoder / DeepSeekCoder

first commit

2ea93cb

qgallouedec mentioned this pull request Oct 27, 2024

Drop GPT2 in our test in favour of a more recent instruct model #2177

Closed

uncomment

6735bf5

qgallouedec and others added 9 commits October 27, 2024 21:47

other tests adaptations

53f3091

Remove unused variable in test_setup_chat_format

2db5415

Merge branch 'main' into tiny-models-for-testing

4202f8d

Remove unused import statement

7c4069e

Merge branch 'main' into tiny-models-for-testing

88e371e

Merge branch 'main' into tiny-models-for-testing

170c950

Merge branch 'main' into tiny-models-for-testing

73bafb8

Merge branch 'main' into tiny-models-for-testing

dd5c131

style

029d758

qgallouedec changed the title ~~New models for tests~~ 🤏 New models for tests Nov 10, 2024

qgallouedec and others added 16 commits November 10, 2024 18:30

Merge branch 'main' into tiny-models-for-testing

ad43271

Add Bart model

79bb504

Update BCOTrainerTester class in test_bco_trainer.py

71d04f0

Update model IDs and tokenizers in test files

2c364c5

Add new models and processors

48bb040

Update model IDs in test files

68d1fa1

Fix formatting issue in test_dataset_formatting.py

5219d9b

Refactor dataset formatting in test_dataset_formatting.py

a45fbcb

Fix dataset sequence length in SFTTrainerTester

e39a75b

Remove tokenizer

e8c0e43

Remove print statement

3393333

Add reward_model_path and sft_model_path to PPO trainer

162fdb2

Fix tokenizer padding issue

8c1effe

Add chat template for testing purposes in PaliGemma model

ea50da1

Update PaliGemma model and chat template

1f52cec

Increase learning rate to speed up test

5855322

qgallouedec commented Nov 20, 2024

View reviewed changes

qgallouedec added 3 commits November 20, 2024 22:19

Add new vision language models

ae6f210

Commented out unused model IDs in test_vdpo_trainer

0a9b7d7

Update model and vision configurations in generate_tiny_models.py and…

48ff8d8

… test_dpo_trainer.py

qgallouedec commented Nov 20, 2024

View reviewed changes

qgallouedec marked this pull request as ready for review November 20, 2024 23:31

qgallouedec requested review from lewtun and kashif November 20, 2024 23:32

qgallouedec and others added 4 commits November 21, 2024 15:32

Merge branch 'main' into tiny-models-for-testing

36938c1

Update model and tokenizer references

a3ff8ee

Merge branch 'main' into tiny-models-for-testing

c03aa35

Merge branch 'main' into tiny-models-for-testing

2e7695a

kashif approved these changes Nov 25, 2024

View reviewed changes

lewtun approved these changes Nov 25, 2024

View reviewed changes

qgallouedec and others added 6 commits November 25, 2024 13:29

Don't push if it already exists

c851842

Add comment explaining test skip

8ee173a

Fix model_exists function call and add new models

48a134d

Merge branch 'main' into tiny-models-for-testing

58c033a

Update LlavaForConditionalGeneration model and processor

f8b02be

qgallouedec -> trl-internal-testing

33baa27

qgallouedec merged commit 453db5c into main Nov 25, 2024
14 checks passed

qgallouedec deleted the tiny-models-for-testing branch November 25, 2024 15:31

qgallouedec mentioned this pull request Dec 10, 2024

⚖️ Add tests_latest.yml workflow file #2457

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤏 New models for tests #2287

🤏 New models for tests #2287

qgallouedec commented Oct 27, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 27, 2024

qgallouedec Nov 20, 2024

qgallouedec Nov 20, 2024

qgallouedec Nov 20, 2024 •

edited

Loading

qgallouedec Nov 20, 2024

qgallouedec Nov 20, 2024

qgallouedec Nov 20, 2024 •

edited

Loading

qgallouedec Nov 20, 2024 •

edited

Loading

qgallouedec Nov 20, 2024

lewtun Nov 25, 2024

qgallouedec Nov 20, 2024 •

edited

Loading

lewtun left a comment

lewtun Nov 25, 2024

lewtun Nov 25, 2024

lewtun Nov 25, 2024

qgallouedec Nov 25, 2024

lewtun Nov 25, 2024

qgallouedec Nov 25, 2024 •

edited

Loading

lewtun Nov 25, 2024

		self.tokenizer = AutoTokenizer.from_pretrained(self.model_id)
		self.tokenizer.chat_template = "{% for message in messages %}{{'<\|im_start\|>' + message['role'] + '\n' + message['content'] + '<\|im_end\|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<\|im_start\|>assistant\n' }}{% endif %}"

		self.model = AutoModelForSequenceClassification.from_pretrained(self.model_id)
		self.model.config.pad_token_id = self.tokenizer.pad_token_id



		def push_to_hub(model, tokenizer, suffix=None):
		model_class_name = model.__class__.__name__

🤏 New models for tests #2287

🤏 New models for tests #2287

Conversation

qgallouedec commented Oct 27, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Oct 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qgallouedec Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qgallouedec Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

qgallouedec Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qgallouedec Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qgallouedec Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qgallouedec commented Oct 27, 2024 •

edited

Loading

qgallouedec Nov 20, 2024 •

edited

Loading

qgallouedec Nov 20, 2024 •

edited

Loading

qgallouedec Nov 20, 2024 •

edited

Loading

qgallouedec Nov 20, 2024 •

edited

Loading

qgallouedec Nov 25, 2024 •

edited

Loading