-
Notifications
You must be signed in to change notification settings - Fork 59
enhance auto-round eval with vllm backend #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the auto-round evaluation functionality by adding support for vLLM backend for model evaluation. The change allows users to leverage vLLM's optimized inference capabilities for running evaluations on CUDA and HPU devices.
- Added comprehensive vLLM argument support for model evaluation
- Integrated vLLM backend option into the CLI evaluation workflow
- Added test coverage for the new vLLM evaluation functionality
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| auto_round/script/llm.py | Added vLLM-specific arguments and eval_with_vllm function for vLLM backend evaluation |
| auto_round/main.py | Added vLLM backend routing in the evaluation entry point |
| test/test_cuda/test_vllm.py | Added integration test for vLLM evaluation functionality |
| docs/step_by_step.md | Added documentation note about using --vllm flag |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: xinhe3 <xinhe3@habana.ai>
53633f0 to
04fdb94
Compare
|
how about support quantize & eval with vllm mode at the same time, for example: |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Hi @WeiweiZhang1 It's not supported to load a model object for evaluation, only string is allowed. Please refer to the requirement of lm_eval here. |
|
it's a little difficult to infer the meaning --vllm, I'd suggest to add an arg lm_eval_backend="hf"/"vllm"/"sglang"(TODO) |
I agree, we can update like that when more backends are added. |
* support lm_eval vllm backend --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
It was tested on HPU and CUDA. GGUF is not supported since vllm doesn't allow passing model object, only model string is allowed.
Command:
VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_SKIP_WARMUP=true auto-round --model facebook/opt-125m --eval --vllm --tasks lambada_openai --limit 10