-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Usage]: Can only run tensor parallel=2 once and subsequent stalls on Ada A6000/48GB x2 GPUs for any models
usage
How to use vllm
#9920
opened Nov 1, 2024 by
hengck23
1 task done
[BUG]: How to use VLLM to serve local models in an environment without networks
usage
How to use vllm
#9909
opened Nov 1, 2024 by
immusferr
1 task done
[Usage]: How to get How to use vllm
router_logits
in Mixtral MoE?
usage
#9905
opened Nov 1, 2024 by
Lan13
1 task done
[Usage]: Request for Detailed Configuration
usage
How to use vllm
#9899
opened Nov 1, 2024 by
Playerrrrr
1 task done
[Usage]: Hidden States not working in Speculative Decode
usage
How to use vllm
#9879
opened Oct 31, 2024 by
ChiKaWa3077
1 task done
[Usage]: ValueError: Unexpected weight for Qwen2-VL GPTQ 4-bit custom model.
usage
How to use vllm
#9832
opened Oct 30, 2024 by
bhavyajoshi-mahindra
1 task done
[Usage]: With the vLLM engine, the inference speed of the GPU A800 is not as fast as RTX3090 block, is this normal?
usage
How to use vllm
#9796
opened Oct 29, 2024 by
sznariOsmosis
1 task done
[Usage]: Running Phi3.5 on Intel x86 MacBook Pro?
usage
How to use vllm
#9795
opened Oct 29, 2024 by
neviaumi
1 task done
[Usage]: prefix caching support for multimodal models
usage
How to use vllm
#9790
opened Oct 29, 2024 by
mearcstapa-gqz
1 task done
[Usage]: how to return logits
usage
How to use vllm
#9784
opened Oct 29, 2024 by
psh0628-eng
1 task done
[Usage]: vLLM For maximally batched use case
usage
How to use vllm
#9760
opened Oct 28, 2024 by
BoazTouitou97
1 task done
[Usage]: How to use AsyncLLMEngine in a multithreaded environment
usage
How to use vllm
#9757
opened Oct 28, 2024 by
pvardanis
1 task done
[Usage]: Can I get the loss of model directly?
usage
How to use vllm
#9750
opened Oct 28, 2024 by
Ther-LF
[Usage]: how to get average prompt token length and output token length
usage
How to use vllm
#9711
opened Oct 26, 2024 by
starrlee356
1 task done
[Usage]: Are prompts processed sequentially?
usage
How to use vllm
#9695
opened Oct 25, 2024 by
nishadsinghi
1 task done
[Usage]: How do I use langchain for tool calls?
usage
How to use vllm
#9692
opened Oct 25, 2024 by
Jimmy-L99
1 task done
[Usage]: How to improve throughput with multi-card inference?
usage
How to use vllm
#9684
opened Oct 25, 2024 by
tensorflowt
1 task done
[Usage]: Llama-3.1-70B-Instruct best arguments for throughput at scale for multiple users
usage
How to use vllm
#9658
opened Oct 24, 2024 by
squinn1
1 task done
[Usage]: Pass multiple LoRA modules through YAML config
usage
How to use vllm
#9655
opened Oct 24, 2024 by
andreapairon
[Usage]: Multimodal content with benchmark_serving.py
usage
How to use vllm
#9633
opened Oct 23, 2024 by
khayamgondal
1 task done
[Usage]: How to add a plugin process to the vLLM process world?
usage
How to use vllm
#9603
opened Oct 23, 2024 by
chenhongyu2048
1 task done
[Usage]: How to use pdb using the latest version?
usage
How to use vllm
#9602
opened Oct 23, 2024 by
sleepwalker2017
1 task done
[Usage]: OpenAI-Compatible Server online Batch inference
usage
How to use vllm
#9575
opened Oct 22, 2024 by
cjfcsjt
1 task done
[Usage]: how to understand logs (when speculative decoding)
usage
How to use vllm
#9539
opened Oct 21, 2024 by
chenchunhui97
1 task done
Previous Next
ProTip!
Updated in the last three days: updated:>2024-10-29.