Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Add githash to nm-vllm #299

Merged
merged 14 commits into from
Jun 19, 2024
Merged

Add githash to nm-vllm #299

merged 14 commits into from
Jun 19, 2024

Conversation

dhuangnm
Copy link
Member

@dhuangnm dhuangnm commented Jun 11, 2024

Add git hash information to nm-vllm:

>>> import vllm
>>> vllm.githash()
'106796861914146372aba9386aeff9361edfb34d'

vllm/__init__.py Outdated Show resolved Hide resolved
csrc/cpu/pybind.cpp Outdated Show resolved Hide resolved
@robertgshaw2-neuralmagic
Copy link
Collaborator

We might want to add this info to collect_env.py

@dhuangnm
Copy link
Member Author

dhuangnm commented Jun 11, 2024

We might want to add this info to collect_env.py

Good idea, thanks for the suggestion. I added the githash, however I found that I cannot run the script from the repo root directory, otherwise I hit following error:

$ python vllm/collect_env.py 
Collecting environment information...
WARNING 06-11 19:28:07 _custom_ops.py:11] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
Traceback (most recent call last):
  File "/home/dhuang/vllm/collect_env.py", line 735, in <module>
    main()
  File "/home/dhuang/vllm/collect_env.py", line 714, in main
    output = get_pretty_env_info()
  File "/home/dhuang/vllm/collect_env.py", line 709, in get_pretty_env_info
    return pretty_str(get_env_info())
  File "/home/dhuang/vllm/collect_env.py", line 545, in get_env_info
    vllm_git_hash=get_vllm_git_hash(),
  File "/home/dhuang/vllm/collect_env.py", line 144, in get_vllm_git_hash
    return vllm.githash()
AttributeError: module 'vllm' has no attribute 'githash'

I think this is due to the vllm/ folder in the repo. If however I move collect_env.py to a different location and run it from there, it ran fine after I installed the nm-vllm wheel:

$ python collect_env.py 
Collecting environment information...
...
vllm git hash: 106796861914146372aba9386aeff9361edfb34d
Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.19.0-1010-nvidia-lowlatency-x86_64-with-glibc2.35

Solved this issue by making installed vllm taking precedence over the local vllm/ directory.

@dhuangnm dhuangnm changed the title [WIP] add githash to nm-vllm Add githash to nm-vllm Jun 11, 2024
@dhuangnm
Copy link
Member Author

I file this PR as a reference for nm-vllm. Since we're planning to make this change to the upstream, I'll file another PR for the upstream so people can compare the code between the PRs. There are some slight difference between the two.

@bnellnm
Copy link
Member

bnellnm commented Jun 11, 2024

This looks good to me but the pybind stuff has just been switched over to torch library so that will probably require a few changes/merges.

@dbarbuzzi
Copy link

dbarbuzzi commented Jun 11, 2024

I found that I cannot run the script from the repo root directory, otherwise I hit following error:

@dhuangnm This is probably fine as the end-user typically would not have the vllm source or run it from that directory. Instead, their support process has you download just that file (collect_env.py) and run it within your environment, which as an end-user would be some environment that has vllm likely installed from PyPI.

They're actually likely to run it from the source directory since this python script is from the repo and they may just check out the repo first then run it from there. Anyways, the hack allows it to work no matter if it's called from the repo dir or not.

@dhuangnm
Copy link
Member Author

This looks good to me but the pybind stuff has just been switched over to torch library so that will probably require a few changes/merges.

Yes I'll make the changes for the PR against the upstream. I'll post the PR shortly.

@dhuangnm
Copy link
Member Author

Can I get an approval? Reran the failed 38 job and it passed now.

@dhuangnm
Copy link
Member Author

Thanks Bill. It looks I need another approval?

collect_env.py Outdated Show resolved Hide resolved
Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@dhuangnm
Copy link
Member Author

It looks there are two failures due to OOM across the python versions:

FAILED tests/models_core/test_magic_wand.py::test_magic_wand[5-32-half-model_format_extrablocks0] - torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU
FAILED tests/models_core/test_magic_wand.py::test_magic_wand[5-32-half-model_format_extrablocks1] - torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 252.00 MiB. GPU

These are also failing in the nightly so seems not caused by this PR.

@dhuangnm dhuangnm merged commit d8da97b into main Jun 19, 2024
33 of 37 checks passed
@dhuangnm dhuangnm deleted the githash branch June 19, 2024 13:13
derekk-nm pushed a commit that referenced this pull request Jun 24, 2024
Add git hash information to nm-vllm:

```
>>> import vllm
>>> vllm.githash()
'106796861914146372aba9386aeff9361edfb34d'
```

---------

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants