-
Notifications
You must be signed in to change notification settings - Fork 77
[KVCache] INT8 KV cache implementation and related changes #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…feature/model_kvcache_dt
…feature/model_kvcache_dt
…feature/model_kvcache_dt
src/layers/attention.h
Outdated
|
|
||
| xft::copy(dstK, srcK, headSize); | ||
| xft::copy(dstV, srcV, headSize); | ||
| // Suppose dstK and dstV are the same data type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are type conversions in storeKVCache. The old comments can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
|
If it is convenient, maybe you can temporarily enable int8_t in type_selector.h to run ci test |
currently no interface. should be more convenient after adding interface. |
| // ldq: leading dimension of query; lds: LD of score | ||
| template <typename T1, typename T2, typename T3> | ||
| void gemm1(T1 *query, T2 *key, T3 *score, int M, int N, int headSize, int ldq, int ldk, int lds) { | ||
| void gemm1(T1 *query, const std::tuple<T2 *, int, float *> &keyMat, T3 *score, int M, int N, int headSize, int ldq, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gemm1 -> gemm_query?
| // score: M * K(keyLen), value: K * headSize, output: M * headSize | ||
| template <typename T1, typename T2, typename T3> | ||
| void gemm2(T1 *score, T2 *value, T3 *output, int M, int headSize, int K, int lds, int ldv, int ldo) { | ||
| void gemm2(T1 *score, const std::tuple<T2 *, int, float *> &valueMat, T3 *output, int M, int headSize, int K, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gemm2 -> gemm_score?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will move to kernel in the progress, will get a better name then.
Main changes: