Roadmap

1. Port vllm/main feature to ROCm

- [x] Support Llama/Llama-2 models for v0.2.x
- [x] Support SqueezeLLM
- [x] Support YARN
- [x] Merge into upstream vllm (https://github.com/vllm-project/vllm/pull/1836)
- [x] Look into supporting multi LORA on ROCm (https://github.com/vllm-project/vllm/pull/1804)
- [x] Support GGML Kernel (GGUF Quantization on ROCm) (https://github.com/EmbeddedLLM/vllm/tree/ggml-rocm) (https://github.com/vllm-project/vllm/pull/10254)
- [ ] Prompt https://github.com/LMCache/LMCache
  - [x] Add ROCm support to torchac_cuda (https://github.com/EmbeddedLLM/torchac_rocm)
  - [ ] Validate rocTX usage in Python (skip for now)
- [ ] Support AQLM Kernel (https://github.com/EmbeddedLLM/vllm/tree/aqlm-rocm)
- [ ] Upstream Cross-Attention kernel to support Llama 3.2 Vision Model
  - [ ] BLOCKER: When passing only text input, the LLM engine will crash.   

2. Upstream New Feature
- [ ] Add context parallelism support through [Star-Attention](https://github.com/NVIDIA/Star-Attention)

4. Benchmark
- [x] Real-world Distribution benchmark https://blog.vllm.ai/2024/10/23/vllm-serving-amd.html
- [x] Benchmark GGUF support


Interesting works:
- https://github.com/xiayuqing0622/customized-flash-attention


Interesting features regarding:
1.  long context mechanism on vLLM:
  - https://github.com/vllm-project/vllm/pull/5411 
  - https://github.com/vllm-project/vllm/pull/6139 
2. compute kernels on vLLM:
  - https://github.com/vllm-project/vllm/pull/2772
3. quantization schemes on vLLM:
  - https://github.com/vllm-project/vllm/pull/8751
4. Disaggregated prefill feature on vLLM:
  - https://github.com/vllm-project/vllm/pull/10884/files
  - https://github.com/kvcache-ai/Mooncake
5. vLLM v0.7.0 tracker https://github.com/vllm-project/vllm/issues/11218

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Roadmap #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roadmap #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions