Skip to content

[Bug]: phi-3 (microsoft/Phi-3-mini-128k-instruct) fails with assert "factor" in rope_scaling #4323

Closed
@pseudotensor

Description

@pseudotensor

Your current environment

docker 0.4.0.post1

🐛 Describe the bug

docker run -d     --runtime=nvidia     --gpus '"device=1"'     --shm-size=10.24gb     -p 5001:5001         -e NCCL_IGNORE_DISABLED_P2P=1     -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN     -v /etc/passwd:/etc/passwd:ro     -v /etc/group:/etc/group:ro     -u `id -u`:`id -g`     -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/      --network host     vllm/vllm-openai:latest         --port=5001         --host=0.0.0.0         --model=microsoft/Phi-3-mini-128k-instruct         --seed 1234         --trust-remote-code         --tensor-parallel-size=1         --max-num-batched-tokens=131072 --max-log-len=100         --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.phi3.txt

gives:

(h2ogpt) fsuser@e2e-77-235:~/h2ogpt_ops$ docker logs d7b0c7e07f4d6055cce27ba8e7244860463d89ad36aa7bf2e0e9e13ea7941843
INFO 04-24 06:29:08 api_server.py:149] vLLM API server version 0.4.0.post1
INFO 04-24 06:29:08 api_server.py:150] args: Namespace(host='0.0.0.0', port=5001, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='microsoft/Phi-3-mini-128k-instruct', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir='/home/fsuser/.cache/huggingface/hub', load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=1234, swap_space=4, gpu_memory_utilization=0.9, forced_num_gpu_blocks=None, max_num_batched_tokens=131072, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, engine_use_ray=False, disable_log_requests=False, max_log_len=100)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/vllm/entrypoints/openai/api_server.py", line 157, in <module>
    engine = AsyncLLMEngine.from_engine_args(
  File "/workspace/vllm/engine/async_llm_engine.py", line 331, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/workspace/vllm/engine/arg_utils.py", line 406, in create_engine_config
    model_config = ModelConfig(
  File "/workspace/vllm/config.py", line 125, in __init__
    self.max_model_len = _get_and_verify_max_len(self.hf_text_config,
  File "/workspace/vllm/config.py", line 969, in _get_and_verify_max_len
    assert "factor" in rope_scaling
AssertionError

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions