fix: various fixes and enhancements #46

pyetras · 2024-02-12T18:13:51Z

This PR enables:

decoding without flash attention
early termination when decoding
prompt free guidance
kv-cache with any dtype

Signed-off-by: Piotr Sokólski <piotr@themetavoice.xyz>

pyetras · 2024-02-12T18:23:58Z

Fixes: #19, #22
Might fix, but not tested: #1

vatsalaggarwal · 2024-02-12T20:03:50Z

fam/llm/layers/attn.py

        ).transpose(
            1, 2
        )  # (B, nh, T, hs) -> (B, T, nh, hs)

        return y

-    def _fa2_attention(self, c_x: torch.Tensor) -> torch.Tensor:
+    def _vanilla_attn(self, c_x: torch.Tensor) -> torch.Tensor:


why do we need this for this PR? for context, this used to be used as a test, otherwise _torch_attn does the job?

vatsalaggarwal · 2024-02-12T20:05:14Z

fam/llm/serving.py

@@ -161,7 +166,7 @@ def _convert_audiodata_to_wav_path(audiodata, wav_tmp):
        seed=1337,
        device=device,
        dtype=GlobalState.config.dtype,
-        compile=False,
+        compile=GlobalState.config.compile,


this will not work right now and cause recompilations at each time-steps

it does not seem to do that for me

did you try with vanilla kv-cache or flash decoding?

also, we probably need to change the mode for torch.compile during inference to get the most out of it, and to use cuda graphs

vatsalaggarwal · 2024-02-13T11:43:34Z

fam/llm/layers/attn.py

+        flash_attn_with_kvcache,
+    )
+except ImportError:
+    warnings.warn("flash_attn not installed, make sure to replace attention mechanism with torch_attn")


nit: improve warning by providing the change in command required instead

Piotr Sokolski added 4 commits February 12, 2024 10:12

fix: various fixes and enhancements

ba811c8

Update sample.py

e3c2377

Signed-off-by: Piotr Sokólski <piotr@themetavoice.xyz>

Update serving.py

c949735

Signed-off-by: Piotr Sokólski <piotr@themetavoice.xyz>

move requirements

6cdaf40

pyetras mentioned this pull request Feb 12, 2024

Faster inference: Implemented EOT for causal sampling stopping #29

Closed

sidroopdaska approved these changes Feb 12, 2024

View reviewed changes

pyetras merged commit 43f97a0 into main Feb 12, 2024

pyetras mentioned this pull request Feb 12, 2024

Installation OS info #42

Closed

vatsalaggarwal reviewed Feb 12, 2024

View reviewed changes

vatsalaggarwal reviewed Feb 13, 2024

View reviewed changes

vatsalaggarwal mentioned this pull request Feb 13, 2024

pip install doesn't work on Nvidia RTX 2070 Super on Ubuntu 20.04.3 LTS #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: various fixes and enhancements #46

fix: various fixes and enhancements #46

pyetras commented Feb 12, 2024

pyetras commented Feb 12, 2024

vatsalaggarwal Feb 12, 2024 •

edited

Loading

vatsalaggarwal Feb 12, 2024

pyetras Feb 12, 2024

vatsalaggarwal Feb 12, 2024

vatsalaggarwal Feb 12, 2024

vatsalaggarwal Feb 13, 2024

fix: various fixes and enhancements #46

fix: various fixes and enhancements #46

Conversation

pyetras commented Feb 12, 2024

pyetras commented Feb 12, 2024

vatsalaggarwal Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

vatsalaggarwal Feb 12, 2024

Choose a reason for hiding this comment

pyetras Feb 12, 2024

Choose a reason for hiding this comment

vatsalaggarwal Feb 12, 2024

Choose a reason for hiding this comment

vatsalaggarwal Feb 12, 2024

Choose a reason for hiding this comment

vatsalaggarwal Feb 13, 2024

Choose a reason for hiding this comment

vatsalaggarwal Feb 12, 2024 •

edited

Loading