Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking Issue][Help Wanted]: FlashInfer backend improvements #8786

Open
7 tasks
comaniac opened this issue Sep 24, 2024 · 0 comments
Open
7 tasks

[Tracking Issue][Help Wanted]: FlashInfer backend improvements #8786

comaniac opened this issue Sep 24, 2024 · 0 comments
Labels
help wanted Extra attention is needed misc

Comments

@comaniac
Copy link
Collaborator

comaniac commented Sep 24, 2024

This issue tracks the progress and roadmap of integrating new FlashInfer kernels and enabling more vLLM features in FlashInfer backend. Items listed in Milestone 1 is a part of re-arch #8779 .

If you want any features available in FlashInfer backend but not listed here, or if you're interested in taking items, please feel free to comment.

Milestone 1

Milestone 2

  • Support chunked prefill.
  • Support FP8 E4M3 kv-cache with kv scale.
  • Support batch expansion in speculative decoding.
  • Integrate RaggedTensor kernel.

cc @yzh119 @LiuXiaoxuanPKU @raywanb @simon-mo @WoosukKwon @pavanimajety

@comaniac comaniac added misc help wanted Extra attention is needed labels Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed misc
Projects
None yet
Development

No branches or pull requests

1 participant