-
Notifications
You must be signed in to change notification settings - Fork 258
Open
Labels
enhancementNew feature or requestNew feature or request
Description
WIP project roadmap for LoRAX. We'll continue to update this over time.
v0.10
- Speculative decoding adapters
- AQLM
v0.11
- Prefix caching
- BERT support
- Embedding endpoint
- Embedding adapters
- Classification adapters
Previous Releases
v0.9
- Adapter memory pool
Backlog
Models
- Llama
- Mistral
- GPT2
- Qwen
- Mixtral
- Phi
- Bloom
- BERT
- Stable-Diffusion
Adapters
- LoRA
- Classification Head
- Embedding MLP
- MoLoRA
- Medusa
- TART
- LCM-LoRA
- LoRA blending (multiple LoRAs per request)
- LongLoRA
Throughput / Latency
- Paged Attention v2
- Lookahead Decoding
- SGMV with variable ranks
- SGMV with tensor parallelism
Quantization
- bitsandbytes
- GPT-Q
- AWQ
Usability
- Prebuilt server wheels
- SkyPilot usage guide
- Example notebooks
ivanbaldo and ucyang
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request