Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vLLM version info to logs and openai API server #3161

Merged
merged 1 commit into from
Mar 3, 2024

Conversation

jasonacox
Copy link
Contributor

This PR makes the version of vLLM running accessible for operational maintainability via logfile and API.

  • Logging - openai/api_server.py will log a header "vLLM API server version X.X.X" and llm_engine.py will log "Initializing an LLM engine (vX.X.X)" - This will help for forensics and startup validation.
  • Version URL Route - This PR also adds /version as a route in the API server, with JSON output of vLLM version {"version": vllm.__version__}

So why add this? I have multiple openai API server instances running and need a hook to validate version consistency. I figure others would find this useful as well.

Example Logs

INFO 03-02 23:59:21 api_server.py:240] vLLM API server version 0.3.3
INFO 03-02 23:59:21 api_server.py:241] args: Namespace(host='0.0.0.0', port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name='mistralai/Mistral-7B-Instruct-v0.1', lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, root_path=None, middleware=[], model='mistralai/Mistral-7B-Instruct-v0.1', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=20000, worker_use_ray=True, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, seed=0, swap_space=4, gpu_memory_utilization=0.85, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', engine_use_ray=False, disable_log_requests=False, max_log_len=None)
2024-03-02 23:59:23,441	INFO worker.py:1724 -- Started a local Ray instance.
INFO 03-02 23:59:24 llm_engine.py:88] Initializing an LLM engine (v0.3.3) with config: model='mistralai/Mistral-7B-Instruct-v0.1', tokenizer='mistralai/Mistral-7B-Instruct-v0.1', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=20000, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)

Example APi - http://loalhost:8000/version

{
"version": "0.3.3"
}

Add vLLM version to startup log

Fix vllm

Lsat try

Import version from parent

Import update

Update import order

Add version endpoint and in server start

Revert "Add vLLM version to startup log"

This reverts commit 7318debcba6376975893e607ec1b1c05da85ef90.

Revert "Revert "Add vLLM version to startup log""

This reverts commit 9e6befe49365de154137b048c5030d79b1c83ce4.
Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@simon-mo simon-mo merged commit d65fac2 into vllm-project:main Mar 3, 2024
22 checks passed
dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants