-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage Stats Collection #2852
Usage Stats Collection #2852
Conversation
[ROCm] Fix build problem resulted from previous commit related to FP8 kv-cache support (vllm-project#2790) Add documentation on how to do incremental builds (vllm-project#2796) [Ray] Integration compiled DAG off by default (vllm-project#2471) Disable custom all reduce by default (vllm-project#2808) add usage context removed usage_context from Engine_args Move IO to another process added http request [ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (vllm-project#2768) Add documentation section about LoRA (vllm-project#2834) Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723) Co-authored-by: Chunan Zeng <chunanzeng@Chunans-Air.attlocal.net> Added additional arg for from_engine_args comments
vllm/usage/usage_lib.py
Outdated
_USAGE_STATS_FILE = os.path.join( | ||
os.path.dirname(os.path.abspath(__file__)), | ||
'usage_stats.json') #File path to store usage data locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the package can be installed in many different ways/places, it might make more sense to have one clear path such as in the user's directory for documentation. Maybe ~/.config/vllm/usage_stats.json
or ~/.cache/vllm/usage_stats.json
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! @yhu422 please do ~/.config/vllm/usage_stats.json
and respect XDG_CONFIG_HOME
env var.
vllm/usage/usage_lib.py
Outdated
self.vllm_version = pkg_resources.get_distribution("vllm").version | ||
self.model = model | ||
self.log_time = _get_current_timestamp_ns() | ||
self.num_cpu = os.cpu_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to get the type of CPU as well, such as it's product name so you can be aware of what ISA extensions are available for performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1!
vllm/usage/usage_lib.py
Outdated
_USAGE_STATS_FILE = os.path.join( | ||
os.path.dirname(os.path.abspath(__file__)), | ||
'usage_stats.json') #File path to store usage data locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! @yhu422 please do ~/.config/vllm/usage_stats.json
and respect XDG_CONFIG_HOME
env var.
vllm/usage/usage_lib.py
Outdated
self.vllm_version = pkg_resources.get_distribution("vllm").version | ||
self.model = model | ||
self.log_time = _get_current_timestamp_ns() | ||
self.num_cpu = os.cpu_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1!
I made a pass. The current output is as follows {
"provider": "GCP",
"num_cpu": 24,
"cpu_type": "Intel(R) Xeon(R) CPU @ 2.20GHz",
"cpu_family_model_stepping": "6,85,7",
"total_memory": 101261135872,
"architecture": "x86_64",
"platform": "Linux-5.10.0-28-cloud-amd64-x86_64-with-glibc2.31",
"gpu_count": 2,
"gpu_type": "NVIDIA L4",
"gpu_memory_per_device": 23580639232,
"model_architecture": "LlamaForCausalLM",
"vllm_version": "0.3.3+cu123",
"context": "OPENAI_API_SERVER",
"log_time": 1710635056116821000,
"source": "production",
"dtype": "torch.float16",
"tensor_parallel_size": 2,
"block_size": 16,
"gpu_memory_utilization": 0.9,
"quantization": null,
"kv_cache_dtype": "auto",
"enable_lora": false,
"enable_prefix_caching": false,
"enforce_eager": false,
"disable_custom_all_reduce": true
} There are three remaining items before merge (to be completed by @simon-mo)
|
Issue: #1376
Added usage_lib.py: Collect information such as GPU type, vllm version, cloud provider upon engine initialization. The info is written to a local file and sent to server. The collection can be disabled through setting environment variables or config file.
Added arguments to from_engine_args(...) to pass in entry point
TODO: Deploy a vector.dev http server to receive usage info.