Add vLLM version info to logs and openai API server #3161

jasonacox · 2024-03-03T00:45:48Z

This PR makes the version of vLLM running accessible for operational maintainability via logfile and API.

Logging - openai/api_server.py will log a header "vLLM API server version X.X.X" and llm_engine.py will log "Initializing an LLM engine (vX.X.X)" - This will help for forensics and startup validation.
Version URL Route - This PR also adds /version as a route in the API server, with JSON output of vLLM version {"version": vllm.__version__}

So why add this? I have multiple openai API server instances running and need a hook to validate version consistency. I figure others would find this useful as well.

Example Logs

INFO 03-02 23:59:21 api_server.py:240] vLLM API server version 0.3.3
INFO 03-02 23:59:21 api_server.py:241] args: Namespace(host='0.0.0.0', port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name='mistralai/Mistral-7B-Instruct-v0.1', lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, root_path=None, middleware=[], model='mistralai/Mistral-7B-Instruct-v0.1', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=20000, worker_use_ray=True, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, seed=0, swap_space=4, gpu_memory_utilization=0.85, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', engine_use_ray=False, disable_log_requests=False, max_log_len=None)
2024-03-02 23:59:23,441	INFO worker.py:1724 -- Started a local Ray instance.
INFO 03-02 23:59:24 llm_engine.py:88] Initializing an LLM engine (v0.3.3) with config: model='mistralai/Mistral-7B-Instruct-v0.1', tokenizer='mistralai/Mistral-7B-Instruct-v0.1', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=20000, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)

Example APi - http://loalhost:8000/version

{
"version": "0.3.3"
}

Add vLLM version to startup log Fix vllm Lsat try Import version from parent Import update Update import order Add version endpoint and in server start Revert "Add vLLM version to startup log" This reverts commit 7318debcba6376975893e607ec1b1c05da85ef90. Revert "Revert "Add vLLM version to startup log"" This reverts commit 9e6befe49365de154137b048c5030d79b1c83ce4.

simon-mo

Looks great!

simon-mo approved these changes Mar 3, 2024

View reviewed changes

simon-mo merged commit d65fac2 into vllm-project:main Mar 3, 2024
22 checks passed

dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024

Add vLLM version info to logs and openai API server (vllm-project#3161)

eb9cde7

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Add vLLM version info to logs and openai API server (vllm-project#3161)

ad2542d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM version info to logs and openai API server #3161

Add vLLM version info to logs and openai API server #3161

jasonacox commented Mar 3, 2024

simon-mo left a comment

Add vLLM version info to logs and openai API server #3161

Add vLLM version info to logs and openai API server #3161

Conversation

jasonacox commented Mar 3, 2024

simon-mo left a comment

Choose a reason for hiding this comment