Skip to content

[Bugfix] make test_openai_schema.py pass #18224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 22, 2025
Merged

Conversation

davidxia
Copy link
Contributor

@davidxia davidxia commented May 15, 2025

by filtering out test cases to POST /tokenize endpoint that are known to fail with a reply of HTTP 501 Not Implemented.

Enable the test in CI.

FIX #18162

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label May 15, 2025
@davidxia davidxia marked this pull request as ready for review May 15, 2025 20:53
@davidxia
Copy link
Contributor Author

I couldn't find an easy way to modify the generated OpenAPI or Swagger UI to not have "type": "file". So instead I decided to skip those test cases.

@DarkLight1337
Copy link
Member

cc @tarukumar @hmellor

@davidxia
Copy link
Contributor Author

Is it expected that POST /v1/chat/completions times out? It times out for me locally on an A100 GPU too.

(verbose_name='POST /v1/chat/completions') SUBFAIL entrypoints/openai/test_openai_schema.py::test_openapi_stateless[POST /v1/chat/completions] - schemathesis.exceptions.CheckFailed:

1. Response timeout

The server failed to respond within the specified limit of 10000.00ms

Reproduce with:

    curl -X POST -H 'Content-Type: application/json' -d '{"messages": [{"content": "\u00f6\u001f\u00d2?\u00fa", "role": "tool", "tool_call_id": ""}, {"content": [{"text": "", "type": "text"}, {"text": "", "type": "text"}, {"text": "", "type": "text", "": null}], "role": "developer"}]}' --insecure http://localhost:34335/v1/chat/completions

Falsifying example: test_openapi_stateless(
    case=,
)

@hmellor
Copy link
Member

hmellor commented May 19, 2025

I don't think so, was the GPU busy during those 10 seconds? And if you increase the timout will the test eventually pass?

@davidxia
Copy link
Contributor Author

I don't think so, was the GPU busy during those 10 seconds?

I'm using a dedicated GCE VM with no other workloads on it. So I don't think so.

And if you increase the timout will the test eventually pass?

yes, see commit d84a35e

@hmellor
Copy link
Member

hmellor commented May 20, 2025

I'm using a dedicated GCE VM with no other workloads on it. So I don't think so.

Sorry I meant is vLLM doing things during those 10 seconds. Sounds like it just needed more time

Comment on lines 98 to 101
key = (
case.operation.method.upper(),
case.operation.path,
)
timeout = {
("POST", "/v1/chat/completions"): 60,
}.get(key, DEFAULT_TIMEOUT_SECONDS)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-route request timeouts are documented here. Without this I get

schemathesis.exceptions.CheckFailed...The server failed to respond within the specified limit of 30000.00ms
$ pytest tests/entrypoints/openai/test_openai_schema.py -v
/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=========================================================== test session starts ===========================================================
platform linux -- Python 3.12.3, pytest-8.3.3, pluggy-1.5.0 -- /home/dxia/src/github.com/vllm-project/vllm/.venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/home/dxia/src/github.com/vllm-project/vllm/.hypothesis/examples'))
rootdir: /home/dxia/src/github.com/vllm-project/vllm
configfile: pyproject.toml
plugins: mock-3.14.0, anyio-4.6.2.post1, timeout-2.3.1, subtests-0.14.1, shard-0.1.2, buildkite-test-collector-0.1.9, asyncio-0.24.0, forked-1.6.0, hypothesis-6.131.0, schemathesis-3.39.15, rerunfailures-14.0
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 1 item                                                                                                                          
Running 1 items in this shard: tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless

tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless 
tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless[POST /v1/chat/completions] (verbose_name='POST /v1/chat/completions') SUBFAIL [ 50%]
tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless PASSED                                                       [100%]

================================================================ FAILURES =================================================================
____________________________________ test_openapi_stateless (verbose_name='POST /v1/chat/completions') ____________________________________
  + Exception Group Traceback (most recent call last):
  |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/lazy.py", line 370, in run_subtest
  |     sub_test(**fixtures)
  |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/_hypothesis.py", line 83, in test_openapi_stateless
  |     def test_function(*args: Any, **kwargs: Any) -> Any:
  |                ^^^
  |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/hypothesis/core.py", line 1839, in wrapped_test
  |     raise the_error_hypothesis_found
  | hypothesis.errors.FlakyFailure: Hypothesis test_openapi_stateless(case=) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
  | Falsifying example: test_openapi_stateless(
  |     case=,
  | )
  | Failed to reproduce exception. Expected: 
  | case = Case(body={'messages': [{'content': 'K', 'name': ";\x83\U000164edC\U00049bcc,\U0010bb3eõæÃ\x9evUàk?\U000f8fd6\n'Ðß'\U0...09, 'Ó': -1.0091098906725308e+16}], '\x1e=8\U00051cc9Ý': {'e': [None], '\U0003bf24ЧÇ': []}}], '': [-1.192092896e-07]})
  | 
  |     @schema.include(method="POST", path="/v1/chat/completions").parametrize()
  |     @schema.override(headers={"Content-Type": "application/json"})
  |     @settings(deadline=30000)
  |     def test_openapi_stateless(case: schemathesis.Case):
  |         key = (
  |             case.operation.method.upper(),
  |             case.operation.path,
  |         )
  |         timeout = {
  |             ("POST", "/v1/chat/completions"): 30,
  |         }.get(key, DEFAULT_TIMEOUT_SECONDS)
  |     
  |         #No need to verify SSL certificate for localhost
  | >       case.call_and_validate(verify=False, timeout=timeout)
  | 
  | tests/entrypoints/openai/test_openai_schema.py:106: 
  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  | .venv/lib/python3.12/site-packages/schemathesis/models.py:418: in call
  |     response = self.operation.schema.transport.send(
  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  | 
  | self = <schemathesis.transports.RequestsTransport object at 0x7992cd34cb30>
  | case = Case(body={'messages': [{'content': 'K', 'name': ";\x83\U000164edC\U00049bcc,\U0010bb3eõæÃ\x9evUàk?\U000f8fd6\n'Ðß'\U0...09, 'Ó': -1.0091098906725308e+16}], '\x1e=8\U00051cc9Ý': {'e': [None], '\U0003bf24ЧÇ': []}}], '': [-1.192092896e-07]})
  | session = <requests.sessions.Session object at 0x7992c7aa0380>, base_url = None
  | headers = None, params = None, cookies = None
  | kwargs = {'timeout': 30, 'verify': False}
  | requests = <module 'requests' from '/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/requests/__init__.py'>
  | ReadTimeoutError = <class 'urllib3.exceptions.ReadTimeoutError'>
  | 
  |     def send(
  |         self,
  |         case: Case,
  |         *,
  |         session: requests.Session | None = None,
  |         base_url: str | None = None,
  |         headers: dict[str, Any] | None = None,
  |         params: dict[str, Any] | None = None,
  |         cookies: dict[str, Any] | None = None,
  |         **kwargs: Any,
  |     ) -> requests.Response:
  |         import requests
  |         from urllib3.exceptions import ReadTimeoutError
  |     
  |         data = self.serialize_case(case, base_url=base_url, headers=headers, params=params, cookies=cookies)
  |         data.update(kwargs)
  |         data.setdefault("timeout", DEFAULT_RESPONSE_TIMEOUT / 1000)
  |         if session is None:
  |             validate_vanilla_requests_kwargs(data)
  |             session = requests.Session()
  |             close_session = True
  |         else:
  |             close_session = False
  |         verify = data.get("verify", True)
  |         try:
  |             with case.operation.schema.ratelimit():
  |                 response = session.request(**data)  # type: ignore
  |         except (requests.Timeout, requests.ConnectionError) as exc:
  |             if isinstance(exc, requests.ConnectionError):
  |                 if not isinstance(exc.args[0], ReadTimeoutError):
  |                     raise
  |                 req = requests.Request(
  |                     method=data["method"].upper(),
  |                     url=data["url"],
  |                     headers=data["headers"],
  |                     files=data.get("files"),
  |                     data=data.get("data") or {},
  |                     json=data.get("json"),
  |                     params=data.get("params") or {},
  |                     auth=data.get("auth"),
  |                     cookies=data["cookies"],
  |                     hooks=data.get("hooks"),
  |                 )
  |                 request = session.prepare_request(req)
  |             else:
  |                 request = cast(requests.PreparedRequest, exc.request)
  |             timeout = 1000 * data["timeout"]  # It is defined and not empty, since the exception happened
  |             code_message = case._get_code_message(case.operation.schema.code_sample_style, request, verify=verify)
  |             message = f"The server failed to respond within the specified limit of {timeout:.2f}ms"
  | >           raise get_timeout_error(case.operation.verbose_name, timeout)(
  |                 f"\n\n1. {failures.RequestTimeout.title}\n\n{message}\n\n{code_message}",
  |                 context=failures.RequestTimeout(message=message, timeout=timeout),
  |             ) from None
  | E           schemathesis.exceptions.CheckFailed: 
  | E           
  | E           1. Response timeout
  | E           
  | E           The server failed to respond within the specified limit of 30000.00ms
  | E           
  | E           Reproduce with: 
  | E           
  | E               curl -X POST -H 'Content-Type: application/json' -d '{"messages": [{"content": "K", "name": ";\u0083\ud819\udcedC\ud8e6\udfcc,\udbee\udf3e\u00f5\u00e6\u00c3\u009evU\u00e0k?\udba3\udfd6\n'"'"'\u00d0\u00df'"'"'\ud894\udf99", "role": "function", "\u00f1": [[-1.7976931348623157e+308], -15141, false], "\u00e3\ud97c\udf74x\u0084": {}, "\udace\udc0f": [{"\udbec\udc3b\u0017 ": [-1.7976931348623157e+308], "\u009ae\u00c4L\"\u0011\u009c\u00f5\u0014\u00de": [null], "": [[true], []]}, [[]]], "\u0082ti*\u00d1\u00d4\u00b6CAq": [], "\udb60\ude86\u0085(\u00b5\udb4d\udc0c\u0007\u0097\u00ec\u0082\u00e6\udbef\uddb0\u0015": [[-1.7976931348623157e+308], -15141, false], "\u00ce\u00e9K\u00b3\ud97d\udfccN\u00e3\u00caMlm\u007f": [{"\b\udb28\udf9f\u00ab\u00ea\u001emS\u0098u\udbc2\uddbc\u0086\udaeb\udd9e\u0004\ud94f\uddee\u0094\u008b2\uda24\udc2d\u0083\ud868\ude84\u0084": -2.225073858507203e-309, "\u00d3": -1.0091098906725308e+16}], "\u001e=8\ud907\udcc9\u00dd": {"e": [null], "\ud8af\udf24\u00d0\u00a7\u00c7": []}}], "": [-1.192092896e-07]}' --insecure http://localhost:57477/v1/chat/completions
  | 
  | .venv/lib/python3.12/site-packages/schemathesis/transports/__init__.py:202: CheckFailed
  | 
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/hypothesis/core.py", line 1087, in _execute_once_for_engine
    |     result = self.execute_once(data)
    |              ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/hypothesis/core.py", line 1024, in execute_once
    |     result = self.test_runner(data, run)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/hypothesis/core.py", line 729, in default_executor
    |     return function(data)
    |            ^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/hypothesis/core.py", line 996, in run
    |     return test(*args, **kwargs)
    |            ^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/_hypothesis.py", line 83, in test_openapi_stateless
    |     def test_function(*args: Any, **kwargs: Any) -> Any:
    |                ^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/hypothesis/core.py", line 906, in test
    |     result = self.test(*args, **kwargs)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/_hypothesis.py", line 89, in collecting_wrapper
    |     wrapped_test = hypothesis.seed(seed)(wrapped_test)
    |         ^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/_hypothesis.py", line 86, in collecting_wrapper
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/_hypothesis.py", line 85, in test_function
    |     return test(*args, **kwargs)
    |            ^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/tests/entrypoints/openai/test_openai_schema.py", line 106, in test_openapi_stateless
    |     case.call_and_validate(verify=False, timeout=timeout)
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/models.py", line 590, in call_and_validate
    |     response = self.call(base_url, session, headers, **kwargs)
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/models.py", line 418, in call
    |     response = self.operation.schema.transport.send(
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/transports/__init__.py", line 202, in send
    |     raise get_timeout_error(case.operation.verbose_name, timeout)(
    | schemathesis.exceptions.CheckFailed: 
    | 
    | 1. Response timeout
    | 
    | The server failed to respond within the specified limit of 30000.00ms
    | 
    | Reproduce with: 
    | 
    |     curl -X POST -H 'Content-Type: application/json' -d '{"messages": [{"content": "K", "name": ";\u0083\ud819\udcedC\ud8e6\udfcc,\udbee\udf3e\u00f5\u00e6\u00c3\u009evU\u00e0k?\udba3\udfd6\n'"'"'\u00d0\u00df'"'"'\ud894\udf99", "role": "function", "\u00f1": [[-1.7976931348623157e+308], -15141, false], "\u00e3\ud97c\udf74x\u0084": {}, "\udace\udc0f": [{"\udbec\udc3b\u0017 ": [-1.7976931348623157e+308], "\u009ae\u00c4L\"\u0011\u009c\u00f5\u0014\u00de": [null], "": [[true], []]}, [[]]], "\u0082ti*\u00d1\u00d4\u00b6CAq": [], "\udb60\ude86\u0085(\u00b5\udb4d\udc0c\u0007\u0097\u00ec\u0082\u00e6\udbef\uddb0\u0015": [[-1.7976931348623157e+308], -15141, false], "\u00ce\u00e9K\u00b3\ud97d\udfccN\u00e3\u00caMlm\u007f": [{"\b\udb28\udf9f\u00ab\u00ea\u001emS\u0098u\udbc2\uddbc\u0086\udaeb\udd9e\u0004\ud94f\uddee\u0094\u008b2\uda24\udc2d\u0083\ud868\ude84\u0084": -2.225073858507203e-309, "\u00d3": -1.0091098906725308e+16}], "\u001e=8\ud907\udcc9\u00dd": {"e": [null], "\ud8af\udf24\u00d0\u00a7\u00c7": []}}], "": [-1.192092896e-07]}' --insecure http://localhost:57477/v1/chat/completions
    | 
    +------------------------------------
---------------------------------------------------------- Captured stdout setup ----------------------------------------------------------
INFO 05-20 12:42:15 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 05-20 12:42:15 [__init__.py:32] name=lora_filesystem_resolver, value=vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
INFO 05-20 12:42:15 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 05-20 12:42:15 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 05-20 12:42:15 [__init__.py:44] plugin lora_filesystem_resolver loaded.
INFO 05-20 12:42:27 [weight_utils.py:291] Using model weights format ['*.safetensors']
INFO 05-20 12:42:27 [weight_utils.py:341] No model.safetensors.index.json found in remote.
INFO 05-20 12:42:32 [__init__.py:248] Automatically detected platform cuda.
INFO 05-20 12:42:37 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 05-20 12:42:37 [__init__.py:32] name=lora_filesystem_resolver, value=vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
INFO 05-20 12:42:37 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 05-20 12:42:37 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 05-20 12:42:37 [__init__.py:44] plugin lora_filesystem_resolver loaded.
INFO 05-20 12:42:39 [api_server.py:1289] vLLM API server version 0.8.5.dev708+g451da4bcb
INFO 05-20 12:42:39 [cli_args.py:300] non-default args: {'port': 57477, 'task': 'generate', 'trust_remote_code': True, 'seed': 0, 'max_model_len': 2048, 'enforce_eager': True, 'limit_mm_per_prompt': {'image': 2}, 'max_num_seqs': 5}
INFO 05-20 12:42:51 [config.py:2112] Chunked prefill is enabled with max_num_batched_tokens=2048.
WARNING 05-20 12:42:51 [cuda.py:87] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 05-20 12:42:56 [__init__.py:248] Automatically detected platform cuda.
INFO 05-20 12:43:01 [core.py:427] Waiting for init message from front-end.
INFO 05-20 12:43:01 [core.py:61] Initializing a V1 LLM engine (v0.8.5.dev708+g451da4bcb) with config: model='HuggingFaceTB/SmolVLM-256M-Instruct', speculative_config=None, tokenizer='HuggingFaceTB/SmolVLM-256M-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=HuggingFaceTB/SmolVLM-256M-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"compile_sizes": [], "inductor_compile_config": {"enable_auto_functionalized_v2": false}, "cudagraph_capture_sizes": [], "max_capture_size": 0}
INFO 05-20 12:43:01 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 05-20 12:43:01 [__init__.py:32] name=lora_filesystem_resolver, value=vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
INFO 05-20 12:43:01 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 05-20 12:43:01 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 05-20 12:43:01 [__init__.py:44] plugin lora_filesystem_resolver loaded.
WARNING 05-20 12:43:01 [utils.py:2664] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7dfa41c164b0>
INFO 05-20 12:43:02 [parallel_state.py:1079] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 05-20 12:43:05 [topk_topp_sampler.py:58] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 05-20 12:43:05 [gpu_model_runner.py:1503] Starting to load model HuggingFaceTB/SmolVLM-256M-Instruct...
INFO 05-20 12:43:05 [cuda.py:216] Using Flash Attention backend on V1 engine.
INFO 05-20 12:43:06 [weight_utils.py:291] Using model weights format ['*.safetensors']
INFO 05-20 12:43:06 [weight_utils.py:341] No model.safetensors.index.json found in remote.
INFO 05-20 12:43:06 [default_loader.py:279] Loading weights took 0.24 seconds
INFO 05-20 12:43:06 [gpu_model_runner.py:1521] Model loading took 0.4899 GiB and 0.755285 seconds
INFO 05-20 12:43:06 [gpu_model_runner.py:1823] Encoder cache will be initialized with a budget of 2048 tokens, and profiled with 2 image items of the maximum feature size.
INFO 05-20 12:43:08 [kv_cache_utils.py:637] GPU KV cache size: 1,567,632 tokens
INFO 05-20 12:43:08 [kv_cache_utils.py:640] Maximum concurrency for 2,048 tokens per request: 765.45x
INFO 05-20 12:43:08 [core.py:163] init engine (profile, create kv cache, warmup model) took 1.55 seconds
INFO 05-20 12:43:08 [loggers.py:134] vllm cache_config_info with initialization after num_gpu_blocks is: 97977
INFO 05-20 12:43:09 [api_server.py:1336] Starting vLLM API server on http://0.0.0.0:57477
INFO 05-20 12:43:09 [launcher.py:28] Available routes are:
INFO 05-20 12:43:09 [launcher.py:36] Route: /openapi.json, Methods: HEAD, GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /docs, Methods: HEAD, GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /redoc, Methods: HEAD, GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /health, Methods: GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /load, Methods: GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /ping, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /ping, Methods: GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /tokenize, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /detokenize, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /v1/models, Methods: GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /version, Methods: GET
INFO 05-20 12:43:09 [launcher.py:36] Route: /v1/chat/completions, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /v1/completions, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /v1/embeddings, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /pooling, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /classify, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /score, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /v1/score, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /v1/audio/transcriptions, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /rerank, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /v1/rerank, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /v2/rerank, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /invocations, Methods: POST
INFO 05-20 12:43:09 [launcher.py:36] Route: /metrics, Methods: GET
INFO:     127.0.0.1:53264 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:53272 - "GET /openapi.json HTTP/1.1" 200 OK
---------------------------------------------------------- Captured stderr setup ----------------------------------------------------------
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.58it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.58it/s]

INFO:     Started server process [219237]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
------------------------------------------------------------ Captured log call ------------------------------------------------------------

============================================================ warnings summary =============================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6: DeprecationWarning: Argument `method` is deprecated and will be removed in Schemathesis 4.0. Use `include` and `exclude` methods instead.
    warnings.warn(

.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6: DeprecationWarning: Argument `endpoint` is deprecated and will be removed in Schemathesis 4.0. Use `include` and `exclude` methods instead.
    warnings.warn(

.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6: DeprecationWarning: Argument `tag` is deprecated and will be removed in Schemathesis 4.0. Use `include` and `exclude` methods instead.
    warnings.warn(

.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/internal/deprecation.py:6: DeprecationWarning: Argument `operation_id` is deprecated and will be removed in Schemathesis 4.0. Use `include` and `exclude` methods instead.
    warnings.warn(

tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless
  /home/dxia/src/github.com/vllm-project/vllm/vllm/engine/arg_utils.py:58: DeprecationWarning: Passing a JSON argument as a string containing comma separated key=value pairs is deprecated. This will be removed in v0.10.0. Please use a JSON string instead.
    return cast(T, nullable_kvs(val))

tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/specs/openapi/references.py:11: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    from jsonschema.exceptions import RefResolutionError

tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/specs/openapi/references.py:51: DeprecationWarning: jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.
    class InliningResolver(jsonschema.RefResolver):

tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless
  /home/dxia/src/github.com/vllm-project/vllm/.venv/lib/python3.12/site-packages/schemathesis/specs/openapi/schemas.py:89: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    SCHEMA_PARSING_ERRORS = (KeyError, AttributeError, jsonschema.exceptions.RefResolutionError)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================= short test summary info =========================================================
(verbose_name='POST /v1/chat/completions') SUBFAIL tests/entrypoints/openai/test_openai_schema.py::test_openapi_stateless[POST /v1/chat/completions] - hypothesis.errors.FlakyFailure: Hypothesis test_openapi_stateless(case=) produces unreliable results: Falsified on the first call but ...
=========================================== 1 failed, 1 passed, 9 warnings in 203.01s (0:03:23) ===========================================

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to add a comment explaining that this specific method+path requires a longer timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, done

@davidxia
Copy link
Contributor Author

davidxia commented May 20, 2025

Sorry I meant is vLLM doing things during those 10 seconds.

Oh, I'm not sure. How can I check? Even when I run this test for only that endpoint with

@schema.include(method="POST", path="/v1/chat/completions").parametrize()
...
definitely test_openapi_stateless(case: schemathesis.Case):

it still times out without the changes in d84a35e.

@davidxia
Copy link
Contributor Author

Ready for another review. cc @hmellor

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a nit about adding a comment and moving some magic numbers to globals

Comment on lines 98 to 101
key = (
case.operation.method.upper(),
case.operation.path,
)
timeout = {
("POST", "/v1/chat/completions"): 60,
}.get(key, DEFAULT_TIMEOUT_SECONDS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to add a comment explaining that this specific method+path requires a longer timeout.

@davidxia davidxia force-pushed the patch40 branch 2 times, most recently from f838428 to cba1194 Compare May 22, 2025 12:12
@davidxia
Copy link
Contributor Author

@hmellor thanks for the review and suggestions. updated!

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this!

@hmellor hmellor enabled auto-merge (squash) May 22, 2025 13:09
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 22, 2025
@DarkLight1337
Copy link
Member

Please merge from main to fix CI

davidxia and others added 3 commits May 22, 2025 10:46
by filtering out test cases to `POST /tokenize` endpoint
that are known to fail with a reply of HTTP 501 Not Implemented.

Enable the test in CI.

FIX vllm-project#18162

Signed-off-by: David Xia <david@davidxia.com>
Increase both Schemathesis and Hypothesis timeouts
(they are configured separately) to 60s to
allow the test to pass comfortably in both CI and in local testing.
We are interested in testing the OpenAPI schema here not performance.

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: David Xia <david@davidxia.com>
auto-merge was automatically disabled May 22, 2025 14:47

Head branch was pushed to by a user without write access

@davidxia
Copy link
Contributor Author

Please merge from main to fix CI

thanks, done. @hmellor you'll probably need to enable auto-merge again

@hmellor hmellor enabled auto-merge (squash) May 22, 2025 14:54
@hmellor hmellor merged commit 1f3a120 into vllm-project:main May 22, 2025
94 checks passed
@davidxia davidxia deleted the patch40 branch May 22, 2025 18:34
huachenheli pushed a commit to huachenheli/vllm that referenced this pull request May 22, 2025
Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
gshtras added a commit to ROCm/vllm that referenced this pull request May 27, 2025
* Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (vllm-project#18337)

* [Misc] Fix typo (vllm-project#18330)

* Neuron up mistral (vllm-project#18222)

Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>

* fix CUDA_check redefinition in vllm-project#17918 (vllm-project#18287)

Signed-off-by: Lucia Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>

* [neuron] fix authorization issue (vllm-project#18364)

Signed-off-by: Liangfu Chen <liangfc@amazon.com>

* [Misc] Allow `AutoWeightsLoader` to skip loading weights with specific substr in name (vllm-project#18358)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Core] [Bugfix]: tensor parallel with prompt embeds (vllm-project#18171)

Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>

* [release] Change dockerhub username for TPU release (vllm-project#18389)

* [Bugfix] fix adding bias twice in ipex GPTQ quantization (vllm-project#18363)

Signed-off-by: rand-fly <randfly@outlook.com>

* [doc] update env variable export (vllm-project#18391)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Misc] Add LoRA code owner (vllm-project#18387)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* Update cpu.txt (vllm-project#18398)

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

* [CI] Add mteb testing to test the accuracy of the embedding model (vllm-project#17175)

* [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18407)

Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>

* [Misc] refactor prompt embedding examples (vllm-project#18405)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Minor] Rename quantization nvfp4 to modelopt_fp4 (vllm-project#18356)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Model] use AutoWeightsLoader for bloom (vllm-project#18300)

Signed-off-by: calvin chen <120380290@qq.com>

* [Kernel] update comment for KV shape in unified triton attn (vllm-project#18099)

Signed-off-by: haochengxia <xhc_1007@163.com>

* fix:Build torch wheel inline rather than picking from nightly (vllm-project#18351)

Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>

* [TPU] Re-enable the Pallas MoE kernel (vllm-project#18025)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

* [Bugfix] config.head_dim is now explicitly set to None (vllm-project#18432)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [Bug] Fix moe_sum signature (vllm-project#18440)

Signed-off-by: Bill Nell <bnell@redhat.com>

* Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18407)" (vllm-project#18456)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix][Failing Test] Fix nixl connector test when promt size < block size (vllm-project#18429)

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

* [Misc] MultiConnector._connectors type (vllm-project#18423)

Signed-off-by: nicklucche <nlucches@redhat.com>

* [Frontend] deprecate `--device` arg (vllm-project#18399)

Signed-off-by: Kebe <mail@kebe7jun.com>

* [V1] Fix general plugins not loaded in engine for multiproc (vllm-project#18326)

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

* [Misc] refactor disaggregated-prefill-v1 example (vllm-project#18474)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix][Failing Test] Fix test_events.py (vllm-project#18460)

Signed-off-by: rabi <ramishra@redhat.com>

* [MODEL] FalconH1 (vllm-project#18406)

Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>

* [Doc] fix arg docstring in linear layers (vllm-project#18410)

Signed-off-by: giantcroc <1204449533@qq.com>

* [Bugfix] Reduce moe_sum test size to avoid OOM (vllm-project#18484)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [Build] fix Dockerfile shell (vllm-project#18402)

* [Misc] Update deprecation message for `--enable-reasoning` (vllm-project#18404)

* [ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (vllm-project#17004)

Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>

* Remove incorrect env value

* Revert "[v1] Support multiple KV cache groups in GPU model runner (vllm-project#17945) (vllm-project#18459)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

* [FEAT][ROCm] Upgrade AITER MLA v1 backend (vllm-project#18338)

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

* [Bugfix] Consistent ascii handling in tool parsers (vllm-project#17704)

Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>

* [FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (vllm-project#18500)

Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>

* [MISC] update project urls in pyproject.toml (vllm-project#18519)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [CI] Fix race condition with StatelessProcessGroup.barrier (vllm-project#18506)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* Intialize io_thread_pool attribute in the beginning. (vllm-project#18331)

Signed-off-by: rabi <ramishra@redhat.com>

* [Bugfix] Inconsistent token calculation compared to HF in llava family (vllm-project#18479)

Signed-off-by: jaycha <jaycha@ncsoft.com>

* [BugFix][DP] Send DP wave completion only from `dp_rank==0` (vllm-project#18502)

Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com>

* [Bugfix][Model] Make Olmo2Model weight loading return loaded weights (vllm-project#18504)

Signed-off-by: Shane A <shanea@allenai.org>

* [Bugfix] Fix LoRA test (vllm-project#18518)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [Doc] Fix invalid JSON in example args (vllm-project#18527)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) (vllm-project#18512)

Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>

* Update default neuron config for speculation (vllm-project#18274)

Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com>
Co-authored-by: Aakash Shetty <sheaak@amazon.com>

* Order sequence ids + config update to support specifying custom quantization layers (vllm-project#18279)

Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Tailin Pan <tailinpa@amazon.com>
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com>
Co-authored-by: Yishan McNabb <yishanm@amazon.com>
Co-authored-by: Patrick Lange <patlange@amazon.com>
Co-authored-by: Maxwell Goldberg <mgld@amazon.com>
Co-authored-by: Aakash Shetty <sheaak@amazon.com>

* [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18526)

Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (vllm-project#18513)

Signed-off-by: Linkun <github@lkchen.net>

* [CI/Build] Update bamba test model location (vllm-project#18544)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Doc] Support --stream arg in openai_completion_client.py script (vllm-project#18388)

Signed-off-by: googs1025 <googs1025@gmail.com>

* [Bugfix] Use random hidden states in dummy sampler run (vllm-project#18543)

Signed-off-by: Bowen Wang <abmfy@icloud.com>

* [Doc] Add stream flag for chat completion example (vllm-project#18524)

Signed-off-by: calvin chen <120380290@qq.com>

* [BugFix][CPU] Fix x86 SHM distributed module initialization (vllm-project#18536)

Signed-off-by: jiang.li <jiang1.li@intel.com>

* [Misc] improve Automatic Prefix Caching example (vllm-project#18554)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Misc] Call `ndarray.tobytes()` directly instead of `ndarray.data.tobytes()` (vllm-project#18347)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

* [Bugfix] make `test_openai_schema.py` pass (vllm-project#18224)

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Platform] Move platform check to right place (vllm-project#18470)

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

* [Compile][Platform] Make PiecewiseBackend pluggable and extendable (vllm-project#18076)

Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>

* [Build/CI] Fix CUDA 11.8 build (vllm-project#17679)

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

* [Tool] Add NIXL installation script (vllm-project#18172)

Signed-off-by: Linkun <github@lkchen.net>

* [V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (vllm-project#18290)

* [Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (vllm-project#17917)

Signed-off-by: Kai Wu <kaiwu@meta.com>

* [Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (vllm-project#17926)

Signed-off-by: Sanger Steel <sangersteel@gmail.com>

* [AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (vllm-project#18568)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (vllm-project#18569)

Signed-off-by: Chenheli Hua <huachenheli@outlook.com>

* [V1][Spec Decoding] Use model_loader.get_model() to load models (vllm-project#18273)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

* Enable hybrid attention models for Transformers backend (vllm-project#18494)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (vllm-project#18482)

Signed-off-by: googs1025 <googs1025@gmail.com>

* [BugFix] Increase TP execute_model timeout (vllm-project#18558)

Signed-off-by: Nick Hill <nhill@redhat.com>

* [Bugfix] Set `KVTransferConfig.engine_id` in post_init (vllm-project#18576)

Signed-off-by: Linkun Chen <github@lkchen.net>

* [Spec Decode] Make EAGLE3 draft token ID mapping optional (vllm-project#18488)

Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Neuron] Remove bypass on EAGLEConfig and add a test (vllm-project#18514)

Signed-off-by: Elaine Zhao <elaineyz@amazon.com>

* [Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (vllm-project#17291)

Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>

* [Misc] Replace `cuda` hard code with `current_platform` (vllm-project#16983)

Signed-off-by: shen-shanshan <467638484@qq.com>

* [Hardware] correct method signatures for HPU,ROCm,XPU (vllm-project#18551)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (vllm-project#18034)

Signed-off-by: Ronald Xu <ronaldxu@amazon.com>

* [Feature]Add async tensor parallelism using compilation pass (vllm-project#17882)

Signed-off-by: cascade812 <cascade812@outlook.com>

* [Doc] Update quickstart and install for cu128 using `--torch-backend=auto` (vllm-project#18505)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Feature][V1]: suupports cached_tokens in response usage (vllm-project#18149)

Co-authored-by: simon-mo <xmo@berkeley.edu>

* [Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (vllm-project#18430)

Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Co-authored-by: Yuqi Zhang <yuqizhang@google.com>

* Migrate docs from Sphinx to MkDocs (vllm-project#18145)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (vllm-project#18034)" (vllm-project#18600)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix][Model] Fix baichuan model loader for tp (vllm-project#18597)

Signed-off-by: Mengqing Cao <cmq0113@163.com>

* [V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (vllm-project#17731)

Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>

* Add myself as docs code owner (vllm-project#18605)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to `requirements/cpu.txt`  (vllm-project#18542)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* [CI] fix kv_cache_type argument (vllm-project#18594)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [Doc] Fix indent of contributing to vllm (vllm-project#18611)

Signed-off-by: Zerohertz <ohg3417@gmail.com>

* Replace `{func}` with mkdocs style links (vllm-project#18610)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [CI/Build] Fix V1 flag being set in entrypoints tests (vllm-project#18598)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Fix examples with code blocks in docs (vllm-project#18609)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix] Fix transformers model impl ignored for mixtral quant (vllm-project#18602)

Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>

* Include private attributes in API documentation (vllm-project#18614)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Misc] add Haystack integration (vllm-project#18601)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (vllm-project#18579)

* [Doc] Fix markdown list indentation for MkDocs rendering (vllm-project#18620)

Signed-off-by: Zerohertz <ohg3417@gmail.com>

* [Doc] Use a different color for the announcement (vllm-project#18616)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Refactor pplx init logic to make it modular (prepare for deepep) (vllm-project#18200)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* Fix figures in design doc (vllm-project#18612)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Docs] Change mkdocs to not use directory urls (vllm-project#18622)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [v1] Redo "Support multiple KV cache groups in GPU model runner (vllm-project#17945)" (vllm-project#18593)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [Doc] fix list formatting (vllm-project#18624)

Signed-off-by: David Xia <david@davidxia.com>

* [Doc] Fix top-level API links/docs (vllm-project#18621)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Avoid documenting dynamic / internal modules (vllm-project#18626)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (vllm-project#18627)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [V1] Support Deepseek MTP (vllm-project#18435)

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>

* Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI (vllm-project#18537)

Signed-off-by: Huy Do <huydhn@gmail.com>

* [CI] Enable test_initialization to run on V1 (vllm-project#16736)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Doc] Update references to doc files (vllm-project#18637)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (vllm-project#18160)

Signed-off-by: Pavani Majety <pmajety@nvidia.com>

* [Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (vllm-project#18454)

Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>

* [Bugfix][Nixl] Fix Preemption Bug (vllm-project#18631)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* config.py: Clarify that only local GGUF checkpoints are supported. (vllm-project#18623)

Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com>

* FIX MOE issue in AutoRound format (vllm-project#18586)

Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>

* [V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (vllm-project#18424)

Signed-off-by: qizixi <qizixi@meta.com>

* [Frontend] improve vllm serve --help display (vllm-project#18643)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (vllm-project#18647)

* [V1][Spec Decode] Support multi-layer eagle draft model (vllm-project#18030)

Signed-off-by: qizixi <qizixi@meta.com>

* [Doc] Update README links, mark external links (vllm-project#18635)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [MISC][pre-commit] Add pre-commit check for triton import (vllm-project#17716)

Signed-off-by: Mengqing Cao <cmq0113@163.com>

* [Doc] Fix indentation problems in V0 Paged Attention docs (vllm-project#18659)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Add community links (vllm-project#18657)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Model] use AutoWeightsLoader for gpt2 (vllm-project#18625)

Signed-off-by: zt2370 <ztang2370@gmail.com>

* [Doc] Reorganize user guide (vllm-project#18661)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [CI/Build] `chmod +x` to `cleanup_pr_body.sh` (vllm-project#18650)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [MISC] typo fix and clean import (vllm-project#18664)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [BugFix] Fix import error for fused_moe (vllm-project#18642)

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

* [CI] enforce import regex instead of re (vllm-project#18665)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

* fix(regression): clone from reference items (vllm-project#18662)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

* [CI/Build] fix permission denied issue (vllm-project#18645)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (vllm-project#18668)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... (vllm-project#18640)

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

* [MISC] correct signature for LoaderFunction (vllm-project#18670)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [Misc]Replace `cuda` hard code with `current_platform` in Ray (vllm-project#14668)

Signed-off-by: noemotiovon <757486878@qq.com>

* [Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (vllm-project#18655)

Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>

* [VLM] Initialize video input support for InternVL models (vllm-project#18499)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* Speed up the `kernels/quantization/` tests (vllm-project#18669)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [BUGFIX] catch subclass first for try...except (vllm-project#18672)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [Misc] Reduce logs on startup (vllm-project#18649)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [doc] fix broken links (vllm-project#18671)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [doc] improve readability (vllm-project#18675)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment (vllm-project#18674)

Signed-off-by: zzzyq <zhangyuqi94@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [CI/build] fix no regex (vllm-project#18676)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Misc] small improve (vllm-project#18680)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix] Fix profiling dummy data for Pixtral (vllm-project#18677)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Core][Multimodal] Convert PIL Image to array without data copy when hashing (vllm-project#18682)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

* [CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (vllm-project#18683)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>

* [Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (vllm-project#18644)

Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com>
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>

* refactor: simplify request handler, use positive condition check for handler assignment (vllm-project#18690)

Signed-off-by: googs1025 <googs1025@gmail.com>

* [Bugfix] Fix the lm_head in gpt_bigcode in lora mode (vllm-project#6357)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

* [CI] add missing argument (vllm-project#18694)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [GH] Add issue template for reporting CI failures (vllm-project#18696)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Fix issue template format (vllm-project#18699)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix Mistral-format models with sliding window (vllm-project#18693)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [CI/Build] Replace `math.isclose` with `pytest.approx` (vllm-project#18703)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [CI] fix dump_input for str type (vllm-project#18697)

Signed-off-by: Andy Xie <andy.xning@gmail.com>

* [Model] Add support for YARN in NemotronNAS models (vllm-project#18427)

Signed-off-by: Nave Assaf <nassaf@nvidia.com>

* [CI/Build] Split pooling and generation extended language models tests in CI (vllm-project#18705)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (vllm-project#18709)

Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>

* [Misc] add AutoGen integration (vllm-project#18712)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (vllm-project#18701)

* [Doc] Improve API docs (vllm-project#18713)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Move examples and further reorganize user guide (vllm-project#18666)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix Llama GGUF initialization (vllm-project#18717)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (vllm-project#18608)

* Convert `examples` to `ruff-format` (vllm-project#18400)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Model][Gemma3] Simplify image input validation (vllm-project#18710)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

* [Misc] improve web section group title display (vllm-project#18684)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [V1][Quantization] Add CUDA graph compatible v1 GGUF support (vllm-project#18646)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>

* [Model][Gemma3] Cast image pixel values already on CPU (vllm-project#18732)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

* [FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (vllm-project#18271)

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

* [Doc] Update OOT model docs (vllm-project#18742)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Update reproducibility doc and example (vllm-project#18741)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Misc] improve docs (vllm-project#18734)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* feat(rocm-support): support mamba2 on rocm (vllm-project#18565)

Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>

* [Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (vllm-project#18752)

Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>

* [Doc] cleanup deprecated flag for doc (vllm-project#18715)

Signed-off-by: calvin chen <120380290@qq.com>

* Minor fix about MooncakeStoreConnector (vllm-project#18721)

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* [Build] fix cpu build missing libtbbmalloc.so (vllm-project#18744)

Signed-off-by: Kebe <mail@kebe7jun.com>

* [BUG FIX] minicpm (vllm-project#18739)

Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>

* [Doc]  Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (vllm-project#18663)

Signed-off-by: Zerohertz <ohg3417@gmail.com>

* [CI/Build] Remove imports of built-in `re` (vllm-project#18750)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [V1][Metrics] Add API for accessing in-memory Prometheus metrics (vllm-project#17010)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

* Disable prefix cache by default for benchmark (vllm-project#18639)

Signed-off-by: cascade812 <cascade812@outlook.com>

* optimize get_kv_cache_torch_dtype (vllm-project#18531)

Signed-off-by: idellzheng <idellzheng@tencent.com>

* [Core] Automatically cast multi-modal input dtype (vllm-project#18756)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Mistral tool calling when content is list (vllm-project#18729)

Signed-off-by: mgoin <mgoin64@gmail.com>

---------

Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Nan2018 <nan@protopia.ai>
Signed-off-by: rand-fly <randfly@outlook.com>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: calvin chen <120380290@qq.com>
Signed-off-by: haochengxia <xhc_1007@163.com>
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: rabi <ramishra@redhat.com>
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Signed-off-by: giantcroc <1204449533@qq.com>
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: jaycha <jaycha@ncsoft.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Shane A <shanea@allenai.org>
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Signed-off-by: Linkun <github@lkchen.net>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: googs1025 <googs1025@gmail.com>
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: David Xia <david@davidxia.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Kai Wu <kaiwu@meta.com>
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Linkun Chen <github@lkchen.net>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com>
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: zt2370 <ztang2370@gmail.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: noemotiovon <757486878@qq.com>
Signed-off-by: zzzyq <zhangyuqi94@gmail.com>
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com>
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com>
Signed-off-by: idellzheng <idellzheng@tencent.com>
Co-authored-by: sunyicode0012 <116338547+sunyicode0012@users.noreply.github.com>
Co-authored-by: Gong Shufan <2624542821@qq.com>
Co-authored-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nan Qin <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Random Fly <renfei8@live.cn>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: 燃 <wulipc@163.com>
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Calvin Chen <45745657+calvin0327@users.noreply.github.com>
Co-authored-by: Percy <xhc_1007@163.com>
Co-authored-by: Dilip Gowda Bhagavan <110233170+dilipgb@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Kebe <mail@kebe7jun.com>
Co-authored-by: Yong Hoon Shin <48474650+sarckk@users.noreply.github.com>
Co-authored-by: Rabi Mishra <ramishra@redhat.com>
Co-authored-by: Dhia Eddine Rhaiem <163106757+dhiaEddineRhaiem@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
Co-authored-by: GiantCroc <1204449533@qq.com>
Co-authored-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
Co-authored-by: Hosang <156028780+hyoon1@users.noreply.github.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Sebastian Schoennenbeck <sebastian.schoennenbeck@comma-soft.com>
Co-authored-by: Ning Xie <andy.xning@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: youngrok cha <line0930@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: aws-elaineyz <elaineyz@amazon.com>
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com>
Co-authored-by: Aakash Shetty <sheaak@amazon.com>
Co-authored-by: Tailin Pan <tailinpa@amazon.com>
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com>
Co-authored-by: Yishan McNabb <yishanm@amazon.com>
Co-authored-by: Patrick Lange <patlange@amazon.com>
Co-authored-by: Maxwell Goldberg <mgld@amazon.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: lkchen <github@lkchen.net>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com>
Co-authored-by: Bowen Wang <abmfy@icloud.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: David Xia <david@davidxia.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
Co-authored-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Teruaki Ishizaki <tell.ishi@gmail.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: RonaldBXu <72748153+RonaldBXu@users.noreply.github.com>
Co-authored-by: cascade <cascade812@outlook.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Yuqi Zhang <zhangyuqi94@gmail.com>
Co-authored-by: Yuqi Zhang <yuqizhang@google.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Tristan Leclercq <49700633+tristanleclercq@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Jiayi Yao <82156730+YaoJiayi@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
Co-authored-by: Feng XiaoLong <79261065+Crucifixion-Fxl@users.noreply.github.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Mathieu Borderé <mathieu@bordere.org>
Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
Co-authored-by: Yuanhao WU <Nalkey@users.noreply.github.com>
Co-authored-by: ztang2370 <ztang2370@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: AlexZhao <zhaohaidao2008@hotmail.com>
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com>
Co-authored-by: Naveassaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Łukasz Durejko <lukasz.durejko@intel.com>
Co-authored-by: dylan <xuhao296@qq.com>
Co-authored-by: almersawi <43927639+almersawi@users.noreply.github.com>
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>
Co-authored-by: Łukasz Durejko <ldurejko@habana.ai>
Co-authored-by: maobaolong <baoloongmao@tencent.com>
Co-authored-by: Shawn Huang <57223022+huangyuxiang03@users.noreply.github.com>
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>
Co-authored-by: chunxiaozheng <55471457+chunxiaozheng@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: make test_openai_schema.py pass, enable it in CI
4 participants