using bigdl-llm fused rope for llama #9066

yangw1234 · 2023-09-27T00:12:10Z

Description

using bigdl-llm fused rope for llama to reduce generation latency

jason-dai · 2023-10-04T01:41:21Z

python/llm/src/bigdl/llm/transformers/models/llama.py

+    q_embed = torch.empty(query_states.shape, dtype=query_states.dtype, device=query_states.device)
+    k_embed = torch.empty(key_states.shape, dtype=key_states.dtype, device=key_states.device)
+
+    linear_q4_0.apply_rotary_embedding_half_qk(query_states, key_states, position_ids, q_embed, k_embed)


this only works on GPU, yes?

Yes, only on GPU.

I think cpu ipex has a similar function.

I think cpu ipex has a similar function.

We can add our GPU optimizations first

yangw1234 · 2023-10-04T20:46:55Z

I think I am going to separate rms_norm, rope and other changes into different PRs with rms_norm being the first.

yangw1234 · 2023-10-05T23:28:34Z

performance updated here https://github.com/analytics-zoo/nano/issues/606

@jason-dai would you mind reviewing again?

yangw1234 · 2023-10-06T00:04:55Z

These optimizations does not work for training.

Should we check if it is in the training mode in every place?
Or should we can assume user should not set optimize_model=True in training?

@jason-dai

yangw1234 · 2023-10-06T00:11:38Z

These optimizations does not work for training.

Should we check if it is in the training mode in every place? Or should we can assume user should not set optimize_model=True in training?

@jason-dai

added checking for now

jason-dai

LGTM

* optimize llama xpu rope * fix bug * fix style * refine append cache * remove check * do not cache cos sin * remove unnecessary changes * clean up * fix style * check for training

yangw1234 force-pushed the xpu_rope branch from 7da83af to 6d79c53 Compare September 27, 2023 00:21

yangw1234 changed the title ~~[WIP] fused rope~~ [WIP] fused rope and rmsnorm Oct 3, 2023

jason-dai reviewed Oct 4, 2023

View reviewed changes

yangw1234 added 8 commits October 5, 2023 13:04

optimize llama xpu rope

3b40ae4

fix bug

7b06db0

fix style

8374590

refine append cache

62d7999

remove check

dd6ca01

do not cache cos sin

cf8e4be

remove unnecessary changes

3063cec

clean up

a220366

yangw1234 force-pushed the xpu_rope branch from 27a7587 to a220366 Compare October 5, 2023 20:55

fix style

56acf82

yangw1234 changed the title ~~[WIP] fused rope and rmsnorm~~ using bigdl-llm fused rope for llama Oct 5, 2023

check for training

05571fc

jason-dai approved these changes Oct 6, 2023

View reviewed changes

yangw1234 merged commit 1739372 into intel-analytics:main Oct 6, 2023
16 checks passed

This was referenced Oct 7, 2023

Arc NF4 OOM using the latest code #9095

Open

Arc prompt num > 1 generates abnormal output #9107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using bigdl-llm fused rope for llama #9066

using bigdl-llm fused rope for llama #9066

yangw1234 commented Sep 27, 2023 •

edited

Loading

jason-dai Oct 4, 2023

yangw1234 Oct 4, 2023

yangw1234 Oct 4, 2023

jason-dai Oct 4, 2023

yangw1234 commented Oct 4, 2023

yangw1234 commented Oct 5, 2023

yangw1234 commented Oct 6, 2023

yangw1234 commented Oct 6, 2023

jason-dai left a comment

using bigdl-llm fused rope for llama #9066

using bigdl-llm fused rope for llama #9066

Conversation

yangw1234 commented Sep 27, 2023 • edited Loading

Description

jason-dai Oct 4, 2023

Choose a reason for hiding this comment

yangw1234 Oct 4, 2023

Choose a reason for hiding this comment

yangw1234 Oct 4, 2023

Choose a reason for hiding this comment

jason-dai Oct 4, 2023

Choose a reason for hiding this comment

yangw1234 commented Oct 4, 2023

yangw1234 commented Oct 5, 2023

yangw1234 commented Oct 6, 2023

yangw1234 commented Oct 6, 2023

jason-dai left a comment

Choose a reason for hiding this comment

yangw1234 commented Sep 27, 2023 •

edited

Loading