🚀 The feature, motivation and pitch
Support longer max_num_batched_tokens for lora to free long context ablitity of many models have long context such as qwen2(128k).
Alternatives
No response
Additional context
https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L1379-L1386