Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage Stats Collection #2852

Merged
merged 62 commits into from
Mar 29, 2024
Merged

Usage Stats Collection #2852

merged 62 commits into from
Mar 29, 2024

Conversation

yhu422
Copy link
Contributor

@yhu422 yhu422 commented Feb 13, 2024

Issue: #1376

Added usage_lib.py: Collect information such as GPU type, vllm version, cloud provider upon engine initialization. The info is written to a local file and sent to server. The collection can be disabled through setting environment variables or config file.

Added arguments to from_engine_args(...) to pass in entry point

TODO: Deploy a vector.dev http server to receive usage info.

yhu422 and others added 26 commits February 8, 2024 11:44
[ROCm] Fix build problem resulted from previous commit related to FP8 kv-cache support  (vllm-project#2790)

Add documentation on how to do incremental builds (vllm-project#2796)

[Ray] Integration compiled DAG off by default (vllm-project#2471)

Disable custom all reduce by default (vllm-project#2808)

add usage context

removed usage_context from Engine_args

Move IO to another process

added http request

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (vllm-project#2768)

Add documentation section about LoRA (vllm-project#2834)

Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723)

Co-authored-by: Chunan Zeng <chunanzeng@Chunans-Air.attlocal.net>

Added additional arg for from_engine_args

comments
Comment on lines 14 to 16
_USAGE_STATS_FILE = os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'usage_stats.json') #File path to store usage data locally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the package can be installed in many different ways/places, it might make more sense to have one clear path such as in the user's directory for documentation. Maybe ~/.config/vllm/usage_stats.json or ~/.cache/vllm/usage_stats.json?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! @yhu422 please do ~/.config/vllm/usage_stats.json and respect XDG_CONFIG_HOME env var.

vllm/usage/usage_lib.py Outdated Show resolved Hide resolved
self.vllm_version = pkg_resources.get_distribution("vllm").version
self.model = model
self.log_time = _get_current_timestamp_ns()
self.num_cpu = os.cpu_count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to get the type of CPU as well, such as it's product name so you can be aware of what ISA extensions are available for performance

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1!

vllm/engine/async_llm_engine.py Outdated Show resolved Hide resolved
vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
vllm/entrypoints/llm.py Outdated Show resolved Hide resolved
Comment on lines 14 to 16
_USAGE_STATS_FILE = os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'usage_stats.json') #File path to store usage data locally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! @yhu422 please do ~/.config/vllm/usage_stats.json and respect XDG_CONFIG_HOME env var.

vllm/usage/usage_lib.py Outdated Show resolved Hide resolved
self.vllm_version = pkg_resources.get_distribution("vllm").version
self.model = model
self.log_time = _get_current_timestamp_ns()
self.num_cpu = os.cpu_count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1!

@simon-mo
Copy link
Collaborator

simon-mo commented Mar 17, 2024

I made a pass. The current output is as follows

{
  "provider": "GCP",
  "num_cpu": 24,
  "cpu_type": "Intel(R) Xeon(R) CPU @ 2.20GHz",
  "cpu_family_model_stepping": "6,85,7",
  "total_memory": 101261135872,
  "architecture": "x86_64",
  "platform": "Linux-5.10.0-28-cloud-amd64-x86_64-with-glibc2.31",
  "gpu_count": 2,
  "gpu_type": "NVIDIA L4",
  "gpu_memory_per_device": 23580639232,
  "model_architecture": "LlamaForCausalLM",
  "vllm_version": "0.3.3+cu123",
  "context": "OPENAI_API_SERVER",
  "log_time": 1710635056116821000,
  "source": "production",
  "dtype": "torch.float16",
  "tensor_parallel_size": 2,
  "block_size": 16,
  "gpu_memory_utilization": 0.9,
  "quantization": null,
  "kv_cache_dtype": "auto",
  "enable_lora": false,
  "enable_prefix_caching": false,
  "enforce_eager": false,
  "disable_custom_all_reduce": true
}

There are three remaining items before merge (to be completed by @simon-mo)

  • Correlation ID for feature that's not available when instantiated. The idea is to generate uuid upon the first message. Then use the same ID to perform runtime logging if needed. (and useful to measure the runtime of the system)
  • Documentation about the data being sent over the network.
  • On the server side, figure out a way to publish the data.

@simon-mo simon-mo added the release-blocker This PR/issue blocks the next release, therefore deserves highest priority label Mar 27, 2024
@simon-mo simon-mo merged commit d8658c8 into vllm-project:main Mar 29, 2024
34 checks passed
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 31, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker This PR/issue blocks the next release, therefore deserves highest priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants