-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model #4799
Conversation
I made a pass. I think once this PR adds unit test for both the Triton and PagedAttention kernels it should be good to go. You might also need to run clang-format to fix the merge conflict. |
Thanks @simon-mo for the review. I'll add the mising unitests today. |
…nterface (e.g., unittest)
…g beta state version warning
I have tested the PR locally as well. |
Phi-3-small's SPECIAL_TOKENS('<|******|>') will cause guided_grammar crash
server: client:
#5068 add a test case. |
…-Small model (vllm-project#4799) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
…-Small model (vllm-project#4799) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
…-Small model (vllm-project#4799) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
…-Small model (vllm-project#4799) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
…-Small model (vllm-project#4799) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
This is joint work between Microsoft GenAI @linxihui, @beagleski, and vLLM @zhuohan123, @simon-mo @youkaichao.