Skip to content

Conversation

@xin3he
Copy link
Contributor

@xin3he xin3he commented Sep 11, 2025

It was tested on HPU and CUDA. GGUF is not supported since vllm doesn't allow passing model object, only model string is allowed.

Command: VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_SKIP_WARMUP=true auto-round --model facebook/opt-125m --eval --vllm --tasks lambada_openai --limit 10

@xin3he xin3he requested review from WeiweiZhang1, Copilot, n1ck-guo and wenhuach21 and removed request for n1ck-guo and wenhuach21 September 11, 2025 02:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the auto-round evaluation functionality by adding support for vLLM backend for model evaluation. The change allows users to leverage vLLM's optimized inference capabilities for running evaluations on CUDA and HPU devices.

  • Added comprehensive vLLM argument support for model evaluation
  • Integrated vLLM backend option into the CLI evaluation workflow
  • Added test coverage for the new vLLM evaluation functionality

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
auto_round/script/llm.py Added vLLM-specific arguments and eval_with_vllm function for vLLM backend evaluation
auto_round/main.py Added vLLM backend routing in the evaluation entry point
test/test_cuda/test_vllm.py Added integration test for vLLM evaluation functionality
docs/step_by_step.md Added documentation note about using --vllm flag

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@WeiweiZhang1 WeiweiZhang1 requested review from wenhuach21 and removed request for wenhuach21 September 11, 2025 02:38
Signed-off-by: xinhe3 <xinhe3@habana.ai>
@WeiweiZhang1
Copy link
Contributor

WeiweiZhang1 commented Sep 11, 2025

how about support quantize & eval with vllm mode at the same time, for example:
--model /$model_dir/${model}
--iters 0
--scheme "W4A16"
--eval_bs 16
--tasks 'lambada_openai,hellaswag,piqa,winogrande,mmlu'
--format fake
--vllm ## like this

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@xin3he
Copy link
Contributor Author

xin3he commented Sep 11, 2025

how about support quantize & eval with vllm mode at the same time, for example: --model /$model_dir/${model} --iters 0 --scheme "W4A16" --eval_bs 16 --tasks 'lambada_openai,hellaswag,piqa,winogrande,mmlu' --format fake --vllm ## like this

Hi @WeiweiZhang1 It's not supported to load a model object for evaluation, only string is allowed. Please refer to the requirement of lm_eval here.
https://github.com/EleutherAI/lm-evaluation-harness/blob/3bc7cc8a72c66bac8d5b830cb3ccec9a5f691b12/lm_eval/models/vllm_causallms.py#L114

@wenhuach21
Copy link
Contributor

it's a little difficult to infer the meaning --vllm, I'd suggest to add an arg lm_eval_backend="hf"/"vllm"/"sglang"(TODO)

@wenhuach21 wenhuach21 requested a review from yiliu30 September 11, 2025 05:59
@xin3he
Copy link
Contributor Author

xin3he commented Sep 12, 2025

it's a little difficult to infer the meaning --vllm, I'd suggest to add an arg lm_eval_backend="hf"/"vllm"/"sglang"(TODO)

I agree, we can update like that when more backends are added.

@xin3he xin3he requested a review from yiliu30 September 12, 2025 06:20
@xin3he xin3he merged commit d74f4bc into main Sep 16, 2025
7 of 8 checks passed
@xin3he xin3he deleted the xinhe/vllm_lm_eval branch September 16, 2025 03:01
xin3he added a commit that referenced this pull request Sep 18, 2025
* support lm_eval vllm backend
---------

Signed-off-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants