-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Performance]: Qwen2-VL-7B AWQ model performance
performance
Performance-related issues
#9863
opened Oct 31, 2024 by
zzf2grx
1 task done
[Performance]: How to Improve Performance Under Concurrency
performance
Performance-related issues
#9722
opened Oct 26, 2024 by
ljwps
1 task done
[Performance]: Empirical Measurement of NVLS
performance
Performance-related issues
#9699
opened Oct 25, 2024 by
youkaichao
[Performance]: Low GPU utilization - is it normal?
performance
Performance-related issues
#9651
opened Oct 24, 2024 by
fzyzcjy
1 task done
[Performance]: The performance of version 0.6.3 is weaker than that of version 0.6.2 in stress testing.
performance
Performance-related issues
#9581
opened Oct 22, 2024 by
skylee-01
1 task done
[Performance]: vllm Eagle performance is worse than expected
performance
Performance-related issues
#9565
opened Oct 21, 2024 by
LiuXiaoxuanPKU
1 task done
[Performance]: bitsandbytes quantization slow
performance
Performance-related issues
#9535
opened Oct 20, 2024 by
lance0108
1 task done
[Performance]: attention speed regression 0.6.0 => 0.6.3
performance
Performance-related issues
#9527
opened Oct 19, 2024 by
rayhuang90
1 task done
[Performance]: InternVL multi image speed is not improved compare to original
help wanted
Extra attention is needed
performance
Performance-related issues
#9483
opened Oct 18, 2024 by
luohao123
1 task done
[Performance]: speed regression 0.6.2 => 0.6.3?
performance
Performance-related issues
#9476
opened Oct 17, 2024 by
stas00
[Performance]: VLLM 请求数量过多时太慢
performance
Performance-related issues
#9474
opened Oct 17, 2024 by
lxb0425
1 task done
[Performance]: inference with qwen2.5 using version vLLM 0.6.3 is felt to be slower
performance
Performance-related issues
#9413
opened Oct 16, 2024 by
Jimmy-L99
1 task done
[Performance]: Maximizing the performance of batch inference of big models on vllm 0.6.3
performance
Performance-related issues
#9383
opened Oct 15, 2024 by
Hellisotherpeople
1 task done
Questions about the inference performance of the GPTQ model
performance
Performance-related issues
#9240
opened Oct 10, 2024 by
Rssevenyu
[Performance]: phi 3.5 vision model consuming high CPU RAM and the process getting killed
performance
Performance-related issues
#9190
opened Oct 9, 2024 by
kuladeephx
1 task done
[Performance] In v0.6.2, when tp=1, TPOT becomes very slow for batch sizes of 10 or so. (not happened in v0.5.5)
performance
Performance-related issues
#9113
opened Oct 7, 2024 by
ashgold
1 task done
[Performance]: Transformers 4.45.1 slows down Performance-related issues
outlines
guided decoding
performance
#9032
opened Oct 2, 2024 by
joerunde
1 task done
[Performance]: Why is Llama 3.1 405B 5 times faster than 70B on benchmarks?
performance
Performance-related issues
#9022
opened Oct 2, 2024 by
tommy-function
1 task done
[Performance] TTFT regression from v0.5.4 to 0.6.2
performance
Performance-related issues
#8918
opened Sep 27, 2024 by
rickyyx
1 task done
[Performance]: Talk about the model parallelism
performance
Performance-related issues
#8898
opened Sep 27, 2024 by
baifanxxx
1 task done
[Performance]: Slowdown compared to Gradio
performance
Performance-related issues
#8866
opened Sep 26, 2024 by
theoren
1 task done
[Performance]: Analysis of performance dashboard movements
performance
Performance-related issues
#8749
opened Sep 23, 2024 by
njhill
[Performance]: Suitable draft model for llama3.1 8b
performance
Performance-related issues
#8530
opened Sep 17, 2024 by
hustxiayang
1 task done
[Performance]: Moving the initialisation of the v variable in the _fwd_kernel() function has an effect on performance.
performance
Performance-related issues
#8466
opened Sep 13, 2024 by
1392001sai
1 task done
[Performance]: JSONLogitsProcessor repeats the same Performance-related issues
build_regex_from_schema
again and again
performance
#8383
opened Sep 11, 2024 by
stas00
Previous Next
ProTip!
Updated in the last three days: updated:>2024-10-29.