[KVCache] INT8 KV cache implementation and related changes #320

pujiang2018 · 2024-04-16T14:56:49Z

Main changes:

INT8 KV cache implementation
Move crossAttnShardedHead from layers to kernels

…feature/model_kvcache_dt

abenmao · 2024-04-19T10:26:02Z

src/layers/attention.h


-                    xft::copy(dstK, srcK, headSize);
-                    xft::copy(dstV, srcV, headSize);
+                    // Suppose dstK and dstV are the same data type


Since there are type conversions in storeKVCache. The old comments can be removed.

abenmao · 2024-04-19T10:28:12Z

If it is convenient, maybe you can temporarily enable int8_t in type_selector.h to run ci test

pujiang2018 · 2024-04-20T07:56:07Z

If it is convenient, maybe you can temporarily enable int8_t in type_selector.h to run ci test

currently no interface. should be more convenient after adding interface.

changqi1 · 2024-04-22T01:48:45Z

src/layers/attention.h

+    // ldq: leading dimension of query; lds: LD of score
    template <typename T1, typename T2, typename T3>
-    void gemm1(T1 *query, T2 *key, T3 *score, int M, int N, int headSize, int ldq, int ldk, int lds) {
+    void gemm1(T1 *query, const std::tuple<T2 *, int, float *> &keyMat, T3 *score, int M, int N, int headSize, int ldq,


gemm1 -> gemm_query?

changqi1 · 2024-04-22T01:49:21Z

src/layers/attention.h

    // score: M * K(keyLen), value: K * headSize, output: M * headSize
    template <typename T1, typename T2, typename T3>
-    void gemm2(T1 *score, T2 *value, T3 *output, int M, int headSize, int K, int lds, int ldv, int ldo) {
+    void gemm2(T1 *score, const std::tuple<T2 *, int, float *> &valueMat, T3 *output, int M, int headSize, int K,


gemm2 -> gemm_score?

will move to kernel in the progress, will get a better name then.

pujiang2018 added 6 commits April 14, 2024 21:16

expose KV cache data type in llama model

ea4d0af

Merge commit '280a915055913764c9c2f7374edb798d46f0216c' into pujiang/…

ae2aced

…feature/model_kvcache_dt

Merge commit '5349b3b90ec721cd5c5bf9753d8f8846edbcd01e' into pujiang/…

1dcd8b5

…feature/model_kvcache_dt

fix bug of small_gemm_transb_1xn_dynk

362193a

Merge commit '15451f20a64ebf5c1d4726bee3afe34f4dc45119' into pujiang/…

a5f1691

…feature/model_kvcache_dt

INT8 KV cache impl.

600dc26

pujiang2018 requested review from abenmao and changqi1 April 16, 2024 14:57

fix the issue cannot deduce KVCacheT in crossAttnShardedHead

3bfb763

abenmao reviewed Apr 19, 2024

View reviewed changes

remove the old incorrect comment

7094064

abenmao approved these changes Apr 22, 2024

View reviewed changes

changqi1 reviewed Apr 22, 2024

View reviewed changes

pujiang2018 merged commit 5270b21 into main Apr 22, 2024

Duyi-Wang deleted the pujiang/feature/model_kvcache_dt branch April 22, 2024 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KVCache] INT8 KV cache implementation and related changes #320

[KVCache] INT8 KV cache implementation and related changes #320

Uh oh!

pujiang2018 commented Apr 16, 2024

Uh oh!

abenmao Apr 19, 2024

Uh oh!

pujiang2018 Apr 20, 2024

Uh oh!

abenmao commented Apr 19, 2024

Uh oh!

pujiang2018 commented Apr 20, 2024

Uh oh!

changqi1 Apr 22, 2024

Uh oh!

changqi1 Apr 22, 2024

Uh oh!

pujiang2018 Apr 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[KVCache] INT8 KV cache implementation and related changes #320

[KVCache] INT8 KV cache implementation and related changes #320

Uh oh!

Conversation

pujiang2018 commented Apr 16, 2024

Uh oh!

abenmao Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

pujiang2018 Apr 20, 2024

Choose a reason for hiding this comment

Uh oh!

abenmao commented Apr 19, 2024

Uh oh!

pujiang2018 commented Apr 20, 2024

Uh oh!

changqi1 Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

changqi1 Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

pujiang2018 Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants