Refactor dead code - Removing all `flash_xxx.py` files. #2166

Narsil · 2024-07-02T12:22:07Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

server/text_generation_server/models/__init__.py

server/text_generation_server/models/flash_causal_lm.py

server/text_generation_server/models/causal_lm.py

server/text_generation_server/models/custom_modeling/llava_next.py

server/text_generation_server/models/causal_lm.py

server/text_generation_server/models/flash_rw.py

server/text_generation_server/models/mpt.py

server/text_generation_server/models/seq2seq_lm.py

OlivierDehaene · 2024-07-03T09:41:25Z

server/text_generation_server/models/seq2seq_lm.py

+    # Not used anymore
+    # def decode(self, decoder_ids: List[int]) -> str:
+    #     return self.tokenizer.decode(
+    #         decoder_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
+    #     )


Suggested change

# Not used anymore

# def decode(self, decoder_ids: List[int]) -> str:

# return self.tokenizer.decode(

# decoder_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False

# )

Isn't thtat surprising that it's not used anymore ?

Shouldn't we have those flags be used somewhere for those models ? Do we have test that cover the raison d'être of this code ?

I think this code was to allow models to specify the value of skip_special_tokens. For example for santacoder, you needed the fill in the middle special tokens to correctly display the outputs.

danieldk · 2024-07-04T12:00:36Z

server/text_generation_server/models/custom_modeling/flash_santacoder_modeling.py

        super().__init__()
+        config.transpose = config.architectures[0].startswith("GPT2")
        self.transformer = FlashSantacoderModel(config, weights)


I think this will also need to be self.model instead, otherwise the iteration in adapter_target_to_layer from flash_causal_lm.py does not work.

I don' think loras for Santacoder even exist, right ?

Even if that's the case, ideally we want to push the logic about layer loads into the model itself (which makes more sense than keeping around random layer names in flash_causal_lm.py.

Yeah, indeed seems like a better place than flash_causal_lm.py.

danieldk · 2024-07-04T12:05:21Z

server/text_generation_server/models/flash_causal_lm.py

+            weights._set_gptq_params(model_id, revision)
+
+        prefix = ""
+        model = model_class(prefix, config, weights)


Currently breaks with Gemma because the FlashGemmaForCausalLM takes an extra causal argument.

Good catch, fixed it by making it a default (since it seems too niche to be worth an extra flag).

* Refactor dead code. * First working step. * Remove a lot of duplicated code. * More dead code. * More cleanup. * Fix Santacoder test. * Fixing the simple tests. * Fixing sharding. * Fixes for VLM. * Fixing santacoder (num_kv_heads hardcoded). * Removing more dead code. * Fixing `config.n_head`. * Stopping earlier because of `<end_of_utterance>` in idefics2. * Addresses comments. * Removing the dead code. * Fuse back mistral into FlashCausalLM. * Finish removal. * Fixing docs + causal_lm `batch_class`. * Fixing docs + causal.lm. * Add default to Gemma Causality. * Default value for gemma/gemma2. * Wrong default.

…2166) * Refactor dead code. * First working step. * Remove a lot of duplicated code. * More dead code. * More cleanup. * Fix Santacoder test. * Fixing the simple tests. * Fixing sharding. * Fixes for VLM. * Fixing santacoder (num_kv_heads hardcoded). * Removing more dead code. * Fixing `config.n_head`. * Stopping earlier because of `<end_of_utterance>` in idefics2. * Addresses comments. * Removing the dead code. * Fuse back mistral into FlashCausalLM. * Finish removal. * Fixing docs + causal_lm `batch_class`. * Fixing docs + causal.lm. * Add default to Gemma Causality. * Default value for gemma/gemma2. * Wrong default.

OlivierDehaene previously approved these changes Jul 3, 2024

View reviewed changes

OlivierDehaene self-requested a review July 3, 2024 09:08

OlivierDehaene reviewed Jul 3, 2024

View reviewed changes

Narsil dismissed OlivierDehaene’s stale review via d2900ab July 3, 2024 13:29

danieldk reviewed Jul 4, 2024

View reviewed changes

Narsil added 20 commits July 4, 2024 16:29

Refactor dead code.

b28946d

First working step.

69cb084

Remove a lot of duplicated code.

ed34cf0

More dead code.

7d96b1a

More cleanup.

ce913b8

Fix Santacoder test.

db9acc4

Fixing the simple tests.

298500a

Fixing sharding.

b2fb845

Fixes for VLM.

43ef526

Fixing santacoder (num_kv_heads hardcoded).

dbf9292

Removing more dead code.

24bbd7b

Fixing config.n_head.

e8ff76f

Stopping earlier because of <end_of_utterance> in idefics2.

2259d2f

Addresses comments.

9cc58d1

Removing the dead code.

fbf38c9

Fuse back mistral into FlashCausalLM.

f5ff9b5

Finish removal.

e2edf2b

Fixing docs + causal_lm batch_class.

8ecee72

Fixing docs + causal.lm.

fc5bfa0

Add default to Gemma Causality.

425f348

Narsil force-pushed the refactor_dead_code branch from f89c2b4 to 425f348 Compare July 4, 2024 14:38

Narsil added 2 commits July 4, 2024 17:17

Default value for gemma/gemma2.

4aa0642

Wrong default.

25c9611

Narsil merged commit fb2f74e into main Jul 5, 2024
8 of 9 checks passed

Narsil deleted the refactor_dead_code branch July 5, 2024 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dead code - Removing all `flash_xxx.py` files. #2166

Refactor dead code - Removing all `flash_xxx.py` files. #2166

Narsil commented Jul 2, 2024

OlivierDehaene Jul 3, 2024

Narsil Jul 3, 2024

OlivierDehaene Jul 4, 2024

danieldk Jul 4, 2024

Narsil Jul 4, 2024 •

edited

Loading

danieldk Jul 4, 2024

danieldk Jul 4, 2024 •

edited

Loading

Narsil Jul 4, 2024

Refactor dead code - Removing all flash_xxx.py files. #2166

Refactor dead code - Removing all flash_xxx.py files. #2166

Conversation

Narsil commented Jul 2, 2024

What does this PR do?

Before submitting

Who can review?

OlivierDehaene Jul 3, 2024

Choose a reason for hiding this comment

Narsil Jul 3, 2024

Choose a reason for hiding this comment

OlivierDehaene Jul 4, 2024

Choose a reason for hiding this comment

danieldk Jul 4, 2024

Choose a reason for hiding this comment

Narsil Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

danieldk Jul 4, 2024

Choose a reason for hiding this comment

danieldk Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

Narsil Jul 4, 2024

Choose a reason for hiding this comment

Refactor dead code - Removing all `flash_xxx.py` files. #2166

Refactor dead code - Removing all `flash_xxx.py` files. #2166

Narsil Jul 4, 2024 •

edited

Loading

danieldk Jul 4, 2024 •

edited

Loading