Skip to content

[Feature]: support longer max_num_batched_tokens for lora #7259

@NiuBlibing

Description

@NiuBlibing

🚀 The feature, motivation and pitch

Support longer max_num_batched_tokens for lora to free long context ablitity of many models have long context such as qwen2(128k).

Alternatives

No response

Additional context

https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L1379-L1386

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions