Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc]: Debugging the paged attention issue for customized LLMs #9231

Open
1 task done
protossw512 opened this issue Oct 10, 2024 · 0 comments
Open
1 task done

[Misc]: Debugging the paged attention issue for customized LLMs #9231

protossw512 opened this issue Oct 10, 2024 · 0 comments
Labels

Comments

@protossw512
Copy link

protossw512 commented Oct 10, 2024

Anything you want to discuss about vllm.

I am trying to add a proprietary model (a pretty standard LLM with hf model.py) to VLLM, I finished the porting script but the generation results does not match and goes terribly wrong. Then I start to comparing the output of the VLLM model and the HF model side by side.
I noticed that everything is correct except when calling the flash_attn_with_kvcache function. This is very strange behavior since the result of flash_attn_varlen_func function call does match the HF output. Let's say you have a prompt ABC, and HF model completes it with DEF. If you run the prompt on VLLM model, the first generated result is D, but then with some random stuff. If you change prompt to ABCD, VLLM can generate E correctly and then followed with random stuff.
It seems the qkv passed into FlashAttentionImpl.forward() all matches the HF values. Seems to me the kv cache stored in the paged attention has something wrong.

Any ideas on how to debug this issue? Currently I am having difficult to slice the correct indices from the paged attention data (kv_cache). I know I can use `block_tables' to select the block, but now sure how to select the relevant data inside a block. And let's say if I found the kv_cache value does not match HF's past_key_and_values, what would be the possible issues?

Any help would be much appreciated.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant