1. [ ] Reducing the latency of LoRA operators (per lorax feedback, lora operators introduce ~20% overhead). 2. [ ] Numerical issue of LoRA operators for large batch size. 3. [ ] Using fp8 tensor cores for LoRA operators.