Skip to content

Conversation

@abenmao
Copy link
Contributor

@abenmao abenmao commented Mar 28, 2024

Transpose KVCache with env var "ENABLE_KV_TRANS" for long sequence
Tune addreduce between shm and ccl for faster comm

workload baseline-2s opt-2s
llama-2-13b bs24 in512 bf16 (SPR+HBM) 350ms 90ms
llama-2-70b bs32 in4096 bf16 (SPR) 1300ms 640ms

@abenmao abenmao force-pushed the perf/comm/kvcache branch from 4cfb6d9 to 99731f4 Compare March 28, 2024 12:42
@abenmao abenmao changed the title Add KVCache for long sequence && tuned comm for faster Addreduce Add KVCache trans for long sequence && tuned comm for faster Addreduce Mar 28, 2024

/**
* Tensor specially designed for KV Cache
* Naturaly, it could be represented in the shape of [seq_length][batch_size][head_num][head_size]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also modify the comments here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done~ New layout default disabled.

static int kvTrans = -1;
if (kvTrans == -1) {
kvTrans = (getenv("ENABLE_KV_TRANS") ? atoi(getenv("ENABLE_KV_TRANS")) : 0);
// if (kvTrans == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove it if not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

return catMlp == 1;
}

bool tunedComm() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not so easy to understand "Tuned communication". add some comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added~

@abenmao abenmao force-pushed the perf/comm/kvcache branch 2 times, most recently from 9dd2a0f to fc3e8ff Compare March 29, 2024 13:55
@abenmao abenmao force-pushed the perf/comm/kvcache branch from fc3e8ff to 5da8280 Compare March 30, 2024 04:25
@pujiang2018 pujiang2018 merged commit bb83063 into intel:main Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants