-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tllm
Description
What happened + What you expected to happen
vllm-project/vllm#21739 has hard deprecated the disable_log_requests parameter without keeping the backward compatibility at this layer. As a result ray serve llm does not work with the latest wheel released with gptoss (vllm 0.10.1). We need to basically upgrade ray serve llm to use enable_log_requests which will be introduced after 0.10.1 is officially released. Then the nightly of ray will work with gpt-oss deployment.
To get around this issue you can comment out this part: https://github.com/ray-project/ray/blob/master/python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_models.py#L100 and build ray from source.
Versions / Dependencies
ray nightly @commit 9fb05f
vllm 0.10.1+gptoss
Reproduction script
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app
llm_config = LLMConfig(
model_loading_config=dict(
model_id="openai/gpt-oss-20b",
),
deployment_config=dict(
autoscaling_config=dict(
min_replicas=1, max_replicas=2,
)
),
# You can customize the engine arguments (e.g. vLLM engine kwargs)
engine_kwargs=dict(
tensor_parallel_size=2,
)
)
app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)Issue Severity
Low: It annoys or frustrates me.
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tllm