You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is very useful if you want to serve to models with different params (prompt, or temperate etc.) in some frontend, e.g. OpenWebUI. I use Qwen3 models, and I want to serve same model with /no_think prompt and without this (thinking on). And even recommended temperature and top-p is different for thinking and non-thinking mode.
Currently if you enter multiple --alias X --alias Y values, last Y value is used. vLLM allows --served-model-name X Y, then both X and Y are returned from /v1/models request, both can be called.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
This is very useful if you want to serve to models with different params (prompt, or temperate etc.) in some frontend, e.g. OpenWebUI. I use Qwen3 models, and I want to serve same model with
/no_think
prompt and without this (thinking on). And even recommended temperature and top-p is different for thinking and non-thinking mode.Currently if you enter multiple
--alias X --alias Y
values, lastY
value is used.vLLM
allows--served-model-name X Y
, then bothX
andY
are returned from /v1/models request, both can be called.Beta Was this translation helpful? Give feedback.
All reactions