Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc]: How to use intel-gpu in openvino #7418

Closed
liuxingbin opened this issue Aug 12, 2024 · 7 comments
Closed

[Misc]: How to use intel-gpu in openvino #7418

liuxingbin opened this issue Aug 12, 2024 · 7 comments

Comments

@liuxingbin
Copy link

Anything you want to discuss about vllm.

Hi, I successfully create the openvino env. I am wondering how to use the intel-gpu?

@liuxingbin
Copy link
Author

I change the code in vllm/model_executor/model_loader/openvino.py to 'GPU'
image

It turns out that
[rank0]: RuntimeError: Exception from src/inference/src/cpp/core.cpp:104: [rank0]: Exception from src/inference/src/dev/plugin.cpp:53: [rank0]: Exception from src/plugins/intel_gpu/src/plugin/program_builder.cpp:246: [rank0]: Operation: PagedAttentionExtension_39914 of type PagedAttentionExtension(extension) is not supported openvino

@ilya-lavrenov
Copy link
Contributor

Hi @liuxingbin
Intel GPU support via OpenVINO is added in this PR #8192
Please, try it out.

@liuxingbin
Copy link
Author

Hi
I tried the PR, but new error occurred. I use openvino-gpu to run qwen2-0.5b. It turns out:

Traceback (most recent call last):
  File "/workspace/vllm/vllm/worker/openvino_worker.py", line 302, in determine_num_available_blocks
    kv_cache_size = self.profile_run()
  File "/workspace/vllm/vllm/worker/openvino_worker.py", line 549, in profile_run
    model_profile_run()
  File "/workspace/vllm/vllm/worker/openvino_worker.py", line 538, in model_profile_run
    self.model_runner.execute_model(seqs,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/vllm/vllm/worker/openvino_model_runner.py", line 340, in execute_model
    hidden_states = model_executable(**execute_model_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nncf/torch/dynamic_graph/wrappers.py", line 146, in wrapped
    return module_call(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/vllm/vllm/model_executor/model_loader/openvino.py", line 164, in forward
    self.ov_request.wait()
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:245:
Exception from src/bindings/python/src/pyopenvino/core/infer_request.hpp:54:
Caught exception: Check '!exceed_allocatable_mem_size' failed at src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cpp:139:
[GPU] Exceeded max size of memory object allocation: requested 19914555392 bytes, but max alloc size supported by device is 1073741824 bytes.Please try to reduce batch size or use lower precision.

19914555392 bytes equals 18.5GB. That's strange. I tried some solutions, but they didn't solve my problem.
Any solution or hint?

FYI: I use gpu-version vllm to run qwen2-1.5b, which uses ~8GB by showing nvidia-smi.

@sshlyapn
Copy link
Contributor

Hi @liuxingbin
Can you share how you are running VLLM? Did you try setting a lower max_model_length value? We assume there should be enough GPU memory to run max_model_length tokens at once. If the model has some large max_model_length value, it could result in an error due to insufficient GPU memory

@liuxingbin
Copy link
Author

I change the GPU available memory here, which solves my problem.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Dec 11, 2024
Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants