Skip to content

Conversation

@pujiang2018
Copy link
Contributor

Main changes:

  1. INT8 KV cache implementation
  2. Move crossAttnShardedHead from layers to kernels

@pujiang2018 pujiang2018 requested review from abenmao and changqi1 April 16, 2024 14:57

xft::copy(dstK, srcK, headSize);
xft::copy(dstV, srcV, headSize);
// Suppose dstK and dstV are the same data type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are type conversions in storeKVCache. The old comments can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@abenmao
Copy link
Contributor

abenmao commented Apr 19, 2024

If it is convenient, maybe you can temporarily enable int8_t in type_selector.h to run ci test

@pujiang2018
Copy link
Contributor Author

If it is convenient, maybe you can temporarily enable int8_t in type_selector.h to run ci test

currently no interface. should be more convenient after adding interface.

// ldq: leading dimension of query; lds: LD of score
template <typename T1, typename T2, typename T3>
void gemm1(T1 *query, T2 *key, T3 *score, int M, int N, int headSize, int ldq, int ldk, int lds) {
void gemm1(T1 *query, const std::tuple<T2 *, int, float *> &keyMat, T3 *score, int M, int N, int headSize, int ldq,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gemm1 -> gemm_query?

// score: M * K(keyLen), value: K * headSize, output: M * headSize
template <typename T1, typename T2, typename T3>
void gemm2(T1 *score, T2 *value, T3 *output, int M, int headSize, int K, int lds, int ldv, int ldo) {
void gemm2(T1 *score, const std::tuple<T2 *, int, float *> &valueMat, T3 *output, int M, int headSize, int K,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gemm2 -> gemm_score?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will move to kernel in the progress, will get a better name then.

@pujiang2018 pujiang2018 merged commit 5270b21 into main Apr 22, 2024
@Duyi-Wang Duyi-Wang deleted the pujiang/feature/model_kvcache_dt branch April 22, 2024 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants