Skip to content

Single node LUMI example - Failed to infer device type #2

Closed
@kaiserdan

Description

@kaiserdan

Unfortunately also the single node LUMI example script does not work as provided as the dependencies provided in /appl/local/csc/modulefiles/ fail with the error RuntimeError: Failed to infer device type during startup.

NOTE: This module uses Singularity. Some commands execute inside the container
(e.g. python3, pip3).

This module has been installed by CSC.

Documentation: https://docs.csc.fi/apps/pytorch/
Support: https://docs.csc.fi/support/contact/

Starting vLLM process 152953 - logs go to /scratch/project_xxx/xxx/vllm-logs/.log
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.6.dev27+ge461c262.rocm624-py3.10-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line 709, in <module>
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.6.dev27+ge461c262.rocm624-py3.10-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line 675, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.6.dev27+ge461c262.rocm624-py3.10-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line 118, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.6.dev27+ge461c262.rocm624-py3.10-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line 140, in build_async_engine_client_from_engine_args
    engine_config = engine_args.create_engine_config(
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.6.dev27+ge461c262.rocm624-py3.10-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1041, in create_engine_config
    device_config = DeviceConfig(device=self.device)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.6.dev27+ge461c262.rocm624-py3.10-linux-x86_64.egg/vllm/config.py", line 1491, in __init__
    raise RuntimeError("Failed to infer device type")
RuntimeError: Failed to infer device type

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions