Closed
Description
- Alpha release for vLLM V1 architecture, ETA 1/23-1/24
- Pending V1 items
- Performance numbers @ywang96 @robertgshaw2-redhat @WoosukKwon
- Documentation @WoosukKwon
- Blog post @WoosukKwon
- Other Pending PRs
- [core] add wake_up doc and some sanity check #12361
- [torch.compile] decouple compile sizes and cudagraph sizes #12243
-
Revert "[core] separate builder init and builder prepare for each batch" #12377[perf] fix perf regression from #12253 #12380 - [Bugfix][Kernel] Fix CUDA 11.8 being broken by FA3 build #12375
- [Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons). #12405