add max model length support on vLLM #1510

lanking520 · 2024-01-24T01:49:57Z

Description

This PR adds max model length support to address the issues with small models like Mistral 7B 32k context more than the KV cache limited range problems

vllm-project/vllm#2418

add max model length support on vLLM

2de6af6

lanking520 requested review from zachgk, frankfliu and a team as code owners January 24, 2024 01:49

zachgk approved these changes Jan 24, 2024

View reviewed changes

lanking520 merged commit 24b8de4 into deepjavalibrary:master Jan 24, 2024
7 of 8 checks passed

sindhuvahinis pushed a commit to sindhuvahinis/djl-serving that referenced this pull request Jan 26, 2024

add max model length support on vLLM (deepjavalibrary#1510)

6635076

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add max model length support on vLLM #1510

add max model length support on vLLM #1510

lanking520 commented Jan 24, 2024

add max model length support on vLLM #1510

add max model length support on vLLM #1510

Conversation

lanking520 commented Jan 24, 2024

Description