[python] Rolling batch support for flash models #865

xyang16 · 2023-06-24T17:55:13Z

Description

Brief description of what this PR is about

If this change is a backward incompatible change, why must this change be made?
Interesting edge cases to note here

engines/python/setup/djl_python/rolling_batch/lmi_dist_rolling_batch.py

lanking520 · 2023-06-28T22:25:28Z

engines/python/setup/djl_python/huggingface.py

-            self.rolling_batch = SchedulerRollingBatch(model_id_or_path,
-                                                       self.device, properties,
-                                                       **kwargs)
+            if os.getenv('OMPI_COMM_WORLD_SIZE'):


check for engine, if engine != Python and tensor_parallel!=1

lanking520 · 2023-06-28T22:47:47Z

engines/python/setup/djl_python/huggingface.py

-        if self.enable_rolling_batch and self.enable_rolling_batch.lower(
-        ) == "false":
-            self.enable_rolling_batch = None
+        self.enable_rolling_batch = properties.get("rolling_batch") is not None


self.enable_rolling_batch = properties.get("rolling_batch", None)

lanking520 · 2023-06-28T22:54:16Z

engines/python/setup/djl_python/huggingface.py

-                                                       **kwargs)
+            if self.engine != "Python" and tp_degree != 1:
+                self.device = int(os.getenv("LOCAL_RANK", 0))
+            rolling_batch_type = properties.get("rolling_batch")


no need for this line

lanking520 · 2023-06-28T22:55:03Z

engines/python/setup/djl_python/huggingface.py

+                self.device = int(os.getenv("LOCAL_RANK", 0))
+            rolling_batch_type = properties.get("rolling_batch")
+            model_config = AutoConfig.from_pretrained(model_id_or_path, **kwargs)
+            _rolling_batch_cls = get_rolling_batch_class_from_str(rolling_batch_type, model_config)


_rolling_batch_cls = get_rolling_batch_class_from_str(self.enable_rolling_batch, model_config)

* [python] Add rolling batch for flash attention models * Flash gptneox support * fix py tests * Set sharded to false for tp 1 * Review changes --------- Co-authored-by: sindhuso <somasundaram.sindhu@gmail.com>

xyang16 requested a review from sindhuvahinis June 24, 2023 17:55

sindhuvahinis reviewed Jun 25, 2023

View reviewed changes

engines/python/setup/djl_python/rolling_batch/lmi_dist_rolling_batch.py Outdated Show resolved Hide resolved

xyang16 force-pushed the lmi_dist branch from 0b8b4a9 to 0ad5e0f Compare June 26, 2023 20:23

xyang16 marked this pull request as ready for review June 28, 2023 17:24

xyang16 requested review from zachgk, frankfliu and a team as code owners June 28, 2023 17:24

xyang16 and others added 4 commits June 28, 2023 14:02

[python] Add rolling batch for flash attention models

646dd02

Flash gptneox support

bca8c62

fix py tests

bfb5466

Set sharded to false for tp 1

4829afd

xyang16 force-pushed the lmi_dist branch from 1dbd39f to 4829afd Compare June 28, 2023 21:04

xyang16 changed the title ~~[python] Add rolling batch for flash attention models~~ [python] Rolling batch support for flash attention models Jun 28, 2023

xyang16 changed the title ~~[python] Rolling batch support for flash attention models~~ [python] Rolling batch support for flash models Jun 28, 2023

lanking520 reviewed Jun 28, 2023

View reviewed changes

xyang16 force-pushed the lmi_dist branch from dedb247 to 22a34db Compare June 28, 2023 22:54

lanking520 reviewed Jun 28, 2023

View reviewed changes

Review changes

5b352cc

xyang16 force-pushed the lmi_dist branch from 38bbc72 to 5b352cc Compare June 28, 2023 23:05

lanking520 approved these changes Jun 28, 2023

View reviewed changes

lanking520 merged commit 9d98f2e into deepjavalibrary:master Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Rolling batch support for flash models #865

[python] Rolling batch support for flash models #865

xyang16 commented Jun 24, 2023

lanking520 Jun 28, 2023

xyang16 Jun 28, 2023

lanking520 Jun 28, 2023

xyang16 Jun 28, 2023

lanking520 Jun 28, 2023

xyang16 Jun 28, 2023

lanking520 Jun 28, 2023

xyang16 Jun 28, 2023

[python] Rolling batch support for flash models #865

[python] Rolling batch support for flash models #865

Conversation

xyang16 commented Jun 24, 2023

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment