Skip to content

OVGenAI [GPU] clFinish, error code: -5 CL_OUT_OF_RESOURCES (Windows) GPT-OSS-20B A770 #58

@savvadesogle

Description

@savvadesogle

Hi.
I have a problem with GPT-OSS 20B, my master. And we need help from the OpenVINO GenAI team.

openarc add --model-name gpt-oss-20b-int4-ov --model-path T:\models\desmondsow\gpt-oss-20b-int4-ov --engine ovgenai --model-type llm --device GPU.0

✅ Dolphin-X1-8B-int4_asym-awq-ov is working
❌ GPT-OSS 20B not working
❌ Qwen3-14B-int4-ov not working
I wrote in Discord about Qwen3 14b, it doesn't work in Windows, but it works in Linux.
https://discord.com/channels/1341627368581628004/1361023199109845282/1458763931588886540

OS: Windows 11
Driver: 8331
Intel Arc A770 (16gb)
Models:

  1. https://huggingface.co/desmondsow/gpt-oss-20b-int4-ov
  2. https://huggingface.co/OpenVINO/Qwen3-14B-int4-ov/
Image

(bench Dolphin-X1-8B-int4_asym-awq-ov)

I'm using the latest dev build because gpt-oss is not supported on 2025.3.
openvino 2026.0.0.dev20260109
openvino-genai 2026.0.0.0.dev20260109

openvino-tokenizers 2026.0.0.0.dev20260109

ERROR

[GPU] clFinish, error code: -5 CL_OUT_OF_RESOURCES

(openarc) C:\llm\openarc\201>openarc serve start --host 127.0.0.1
Configuration saved to: C:\llm\openarc\201\openarc_config.json
Starting OpenArc server on 127.0.0.1:8000
2026-01-12 23:14:15,372 - INFO - Launching  127.0.0.1:8000
2026-01-12 23:14:15,373 - INFO - --------------------------------
2026-01-12 23:14:15,373 - INFO - OpenArc endpoints:
2026-01-12 23:14:15,373 - INFO -   - POST   /openarc/load           Load a model
2026-01-12 23:14:15,373 - INFO -   - POST   /openarc/unload         Unload a model
2026-01-12 23:14:15,373 - INFO -   - GET    /openarc/status         Get model status
2026-01-12 23:14:15,374 - INFO - --------------------------------
2026-01-12 23:14:15,374 - INFO - OpenAI compatible endpoints:
2026-01-12 23:14:15,374 - INFO -   - GET    /v1/models
2026-01-12 23:14:15,374 - INFO -   - POST   /v1/chat/completions
2026-01-12 23:14:15,374 - INFO -   - POST   /v1/audio/transcriptions: Whisper only
2026-01-12 23:14:15,374 - INFO -   - POST   /v1/audio/speech: Kokoro only
2026-01-12 23:14:15,375 - INFO -   - POST   /v1/embeddings
2026-01-12 23:14:15,375 - INFO -   - POST   /v1/rerank
C:\llm\openarc\201\.venv\Lib\site-packages\torch\onnx\_internal\registration.py:162: OnnxExporterWarning: Symbolic function 'aten::scaled_dot_product_attention' already registered for opset 14. Replacing the existing function with new function. This is unexpected. Please report it on https://github.com/pytorch/pytorch/issues.
  warnings.warn(
2026-01-12 23:14:26,280 - INFO - Started server process [10640]
2026-01-12 23:14:26,280 - INFO - Waiting for application startup.
2026-01-12 23:14:26,282 - INFO - Application startup complete.
2026-01-12 23:14:26,283 - INFO - Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
2026-01-12 23:14:27,636 - INFO - Request received: POST /openarc/unload from 127.0.0.1
2026-01-12 23:14:27,639 - INFO - Request completed: POST /openarc/unload status=500 duration=0.004s
2026-01-12 23:14:38,689 - INFO - Request received: POST /openarc/load from 127.0.0.1
2026-01-12 23:14:38,699 - INFO - gpt-oss-20b-int4-ov loading...
2026-01-12 23:14:38,699 - INFO - ModelType.LLM on GPU.0 with {}
2026-01-12 23:14:58,995 - INFO - gpt-oss-20b-int4-ov loaded successfully
2026-01-12 23:14:58,997 - INFO - [gpt-oss-20b-int4-ov LLM Worker] Started, waiting for packets...
2026-01-12 23:14:58,998 - INFO - Request completed: POST /openarc/load status=200 duration=20.310s
2026-01-12 23:15:26,869 - INFO - Request received: GET /v1/models from 127.0.0.1
2026-01-12 23:15:26,871 - INFO - Request completed: GET /v1/models status=200 duration=0.002s
2026-01-12 23:15:33,396 - INFO - Request received: POST /openarc/bench from 127.0.0.1
2026-01-12 23:15:36,104 - ERROR - LLM inference failed!
Traceback (most recent call last):
  File "C:\llm\openarc\201\src\server\worker_registry.py", line 87, in infer_llm
    async for item in llm_instance.generate_type(packet.gen_config):
  File "C:\llm\openarc\201\src\engine\ov_genai\llm.py", line 100, in generate_text
    result = await asyncio.to_thread(self.model.generate, prompt_token_ids, generation_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\uuk\AppData\Local\Programs\Python\Python311\Lib\asyncio\threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\uuk\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:224:
Exception from src\plugins\intel_gpu\src\runtime\ocl\ocl_stream.cpp:376:
[GPU] clFinish, error code: -5 CL_OUT_OF_RESOURCES


2026-01-12 23:15:36,111 - ERROR - [gpt-oss-20b-int4-ov LLM Worker] Inference failed, triggering model unload...
Unhandled exception caught in c10/util/AbortHandler.h
00007FF921F3369400007FF921F2ACE0 torch_python.dll!torch::autograd::THPCppFunction_requires_grad [<unknown file> @ <unknown line number>]
00007FF98D7BEE1200007FF98D7BEDF0 ucrtbase.dll!terminate [<unknown file> @ <unknown line number>]
00007FF96A571AB100007FF96A571150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>]
00007FF96A57232F00007FF96A571150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>]
00007FF96A57238900007FF96A571150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>]
00007FF96A57418900007FF96A5740E0 VCRUNTIME140_1.dll!_CxxFrameHandler4 [<unknown file> @ <unknown line number>]
00007FF8F748B69000007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF98FF9547F00007FF98FF95350 ntdll.dll!_chkstk [<unknown file> @ <unknown line number>]
00007FF98FF0E88600007FF98FF0DDF0 ntdll.dll!RtlFindCharInUnicodeString [<unknown file> @ <unknown line number>]
00007FF98FF4495500007FF98FF447C0 ntdll.dll!RtlRaiseException [<unknown file> @ <unknown line number>]
00007FF98D24016C00007FF98D240100 KERNELBASE.dll!RaiseException [<unknown file> @ <unknown line number>]
00007FF9893E6BA700007FF9893E6B10 VCRUNTIME140.dll!CxxThrowException [<unknown file> @ <unknown line number>]
00007FF922F5092700007FF922F508E0 openvino.dll!ov::Exception::create [<unknown file> @ <unknown line number>]
00007FF8F74CBD0B00007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF96A571080 <unknown symbol address> VCRUNTIME140_1.dll!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FF96A57271500007FF96A571150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>]
00007FF98FF94CD600007FF98FF94830 ntdll.dll!RtlCaptureContext2 [<unknown file> @ <unknown line number>]
00007FF8F69D131000007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF8F647276100007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF8F626F86900007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF8F6240599 <unknown symbol address> openvino_intel_gpu_plugin.dll!<unknown symbol> [<unknown file> @ <unknown line number>] 
00007FF8F6241F7E <unknown symbol address> openvino_intel_gpu_plugin.dll!<unknown symbol> [<unknown file> @ <unknown line number>] 
00007FF8F625EA0300007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF8F625EF1400007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF9232E934800007FF9232E9190 openvino.dll!ov::ISyncInferRequest::~ISyncInferRequest [<unknown file> @ <unknown line number>]  
00007FF8F638586400007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF92331C91E00007FF92331C7E0 openvino.dll!ov::IAsyncInferRequest::~IAsyncInferRequest [<unknown file> @ <unknown line number>]
00007FF8F625173400007FF8F62510A0 openvino_intel_gpu_plugin.dll!create_plugin_engine [<unknown file> @ <unknown line number>]      
00007FF9232929A400007FF923292960 openvino.dll!ov::VariableState::~VariableState [<unknown file> @ <unknown line number>]
00007FF9202FB1AF00007FF9202E2720 openvino_genai.dll!ov::genai::StreamerBase::write [<unknown file> @ <unknown line number>]       
00007FF9202D4E5400007FF9202D4870 openvino_genai.dll!ov::genai::ReasoningParser::`default constructor closure' [<unknown file> @ <unknown line number>]
00007FF9203E0E9A00007FF9203E07D0 openvino_genai.dll!ov::genai::LLMPipeline::LLMPipeline [<unknown file> @ <unknown line number>]  
00007FF9203E171B00007FF9203E0F00 openvino_genai.dll!ov::genai::LLMPipeline::~LLMPipeline [<unknown file> @ <unknown line number>] 
00007FF9203E0F2500007FF9203E0F00 openvino_genai.dll!ov::genai::LLMPipeline::~LLMPipeline [<unknown file> @ <unknown line number>] 
00007FF9269906A8 <unknown symbol address> py_openvino_genai.cp311-win_amd64.pyd!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FF926990636 <unknown symbol address> py_openvino_genai.cp311-win_amd64.pyd!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FF926AC354C <unknown symbol address> _pyopenvino.cp311-win_amd64.pyd!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FF926AD328B00007FF926AD2A10 _pyopenvino.cp311-win_amd64.pyd!PyInit__pyopenvino [<unknown file> @ <unknown line number>]      
00007FF92BEFFDC000007FF92BEFFBA0 python311.dll!PyObject_GenericSetAttrWithDict [<unknown file> @ <unknown line number>]
00007FF92BE98D2C00007FF92BE98CA0 python311.dll!PyObject_SetAttr [<unknown file> @ <unknown line number>]
00007FF92BEC19F000007FF92BEBCC20 python311.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FF92BEF104E00007FF92BEF0D9C python311.dll!Py_BuildValue_SizeT [<unknown file> @ <unknown line number>]
00007FF92BFCF87500007FF92BFCF5F8 python311.dll!PyImport_GetMagicNumber [<unknown file> @ <unknown line number>]
00007FF92BEC097700007FF92BEBCC20 python311.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FF92BEF104E00007FF92BEF0D9C python311.dll!Py_BuildValue_SizeT [<unknown file> @ <unknown line number>]
00007FF92BFCF87500007FF92BFCF5F8 python311.dll!PyImport_GetMagicNumber [<unknown file> @ <unknown line number>]
00007FF96DF958BF00007FF96DF91000 _asyncio.pyd!PyInit__asyncio [<unknown file> @ <unknown line number>]
00007FF96DF9573300007FF96DF91000 _asyncio.pyd!PyInit__asyncio [<unknown file> @ <unknown line number>]
00007FF92BEB83DE00007FF92BEB8060 python311.dll!PyObject_MakeTpCall [<unknown file> @ <unknown line number>]
00007FF92C0BBC2400007FF92C0BBBA4 python311.dll!PyContext_NewHamtForTests [<unknown file> @ <unknown line number>]
00007FF92C0BBED500007FF92C0BBBA4 python311.dll!PyContext_NewHamtForTests [<unknown file> @ <unknown line number>]
00007FF92BEFD2EC00007FF92BEFD208 python311.dll!PyArg_UnpackTuple [<unknown file> @ <unknown line number>]
00007FF92BF6507300007FF92BF65018 python311.dll!PyObject_Call [<unknown file> @ <unknown line number>]
00007FF92BF64D4000007FF92BF648EC python311.dll!PyObject_CallObject [<unknown file> @ <unknown line number>]
00007FF92BEC1F7F00007FF92BEBCC20 python311.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FF92BEBAE6400007FF92BEBACC0 python311.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FF92BF64C6A00007FF92BF648EC python311.dll!PyObject_CallObject [<unknown file> @ <unknown line number>]
00007FF92BEC1F7F00007FF92BEBCC20 python311.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FF92BEBAE6400007FF92BEBACC0 python311.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FF92BF64C6A00007FF92BF648EC python311.dll!PyObject_CallObject [<unknown file> @ <unknown line number>]
00007FF92BEC1F7F00007FF92BEBCC20 python311.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FF92BEBAE6400007FF92BEBACC0 python311.dll!PyFunction_Vectorcall [<unknown file> @ <unknown line number>]
00007FF92BF05C2500007FF92BF03DC4 python311.dll!PyIter_Send [<unknown file> @ <unknown line number>]
00007FF92BF64C6A00007FF92BF648EC python311.dll!PyObject_CallObject [<unknown file> @ <unknown line number>]

uv pip list

(openarc) C:\llm\openarc\201>uv pip list
Package                    Version                Editable project location
-------------------------- ---------------------- -------------------------
about-time                 4.2.1
addict                     2.4.0
aiohappyeyeballs           2.6.1
aiohttp                    3.12.14
aiosignal                  1.4.0
alive-progress             3.2.0
annotated-types            0.7.0
anyio                      4.9.0
asttokens                  3.0.0
attrs                      25.3.0
audioread                  3.0.1
autograd                   1.8.0
babel                      2.17.0
blis                       1.3.0
brotli                     1.1.0
catalogue                  2.0.10
certifi                    2025.7.14
cffi                       2.0.0
charset-normalizer         3.4.2
click                      8.2.1
cloudpathlib               0.22.0
cma                        4.2.0
colorama                   0.4.6
comm                       0.2.3
confection                 0.1.5
contourpy                  1.3.2
cryptography               46.0.3
csvw                       3.6.0
curated-tokenizers         0.0.9
curated-transformers       0.1.1
cycler                     0.12.1
cymem                      2.0.11
datasets                   4.0.0
ddgs                       9.6.1
debugpy                    1.8.17
decorator                  5.2.1
deprecated                 1.2.18
dill                       0.3.8
distro                     1.9.0
dlinfo                     2.0.0
docopt                     0.6.2
espeakng-loader            0.2.4
executing                  2.2.1
fastapi                    0.116.1
filelock                   3.18.0
fonttools                  4.58.5
frozenlist                 1.7.0
fsspec                     2025.3.0
grapheme                   0.6.0
griffe                     1.14.0
h11                        0.16.0
h2                         4.3.0
hpack                      4.1.0
httpcore                   1.0.9
httpx                      0.28.1
httpx-sse                  0.4.3
huggingface-hub            0.33.4
hyperframe                 6.1.0
idna                       3.10
iniconfig                  2.3.0
inquirerpy                 0.3.4
ipykernel                  7.0.1
ipython                    9.6.0
ipython-pygments-lexers    1.1.1
ipywidgets                 8.1.7
isodate                    0.7.2
jedi                       0.19.2
jinja2                     3.1.6
jiter                      0.11.0
joblib                     1.5.1
jsonschema                 4.24.0
jsonschema-specifications  2025.4.1
jupyter-client             8.6.3
jupyter-core               5.9.1
jupyterlab-widgets         3.0.15
kiwisolver                 1.4.8
kokoro                     0.9.4
langcodes                  3.5.0
language-data              1.3.0
language-tags              1.2.0
lazy-loader                0.4
librosa                    0.11.0
llvmlite                   0.45.0
loguru                     0.7.3
lxml                       6.0.2
marisa-trie                1.3.1
markdown-it-py             3.0.0
markupsafe                 3.0.2
matplotlib                 3.10.3
matplotlib-inline          0.1.7
mcp                        1.20.0
mdurl                      0.1.2
misaki                     0.9.4
mpmath                     1.3.0
msgpack                    1.1.1
multidict                  6.6.3
multiprocess               0.70.16
murmurhash                 1.0.13
natsort                    8.4.0
nest-asyncio               1.6.0
networkx                   3.4.2
ninja                      1.11.1.4
nncf                       2.17.0
num2words                  0.5.14
numba                      0.62.0
numpy                      2.2.6
onnx                       1.18.0
openai                     2.2.0
openai-agents              0.4.2
openarc                    2.0                    C:\llm\openarc\201
openvino                   2026.0.0.dev20260109
openvino-genai             2026.0.0.0.dev20260109
openvino-telemetry         2025.2.0
openvino-tokenizers        2026.0.0.0.dev20260109
optimum                    1.27.0
optimum-intel              1.25.2
packaging                  25.0
pandas                     2.2.3
parso                      0.8.5
pfzy                       0.3.4
phonemizer-fork            3.3.2
pillow                     11.3.0
pip                        25.2
platformdirs               4.4.0
pluggy                     1.6.0
pooch                      1.8.2
preshed                    3.0.10
primp                      0.15.0
prompt-toolkit             3.0.52
propcache                  0.3.2
protobuf                   6.31.1
psutil                     7.0.0
pure-eval                  0.2.3
pyarrow                    20.0.0
pycparser                  2.23
pydantic                   2.11.7
pydantic-core              2.33.2
pydantic-settings          2.11.0
pydot                      3.0.4
pygments                   2.19.2
pyjwt                      2.10.1
pymoo                      0.6.1.5
pynput                     1.8.1
pyparsing                  3.2.3
pytest                     8.4.2
python-dateutil            2.9.0.post0
python-dotenv              1.2.1
python-multipart           0.0.20
pytz                       2025.2
pywin32                    311
pyyaml                     6.0.2
pyzmq                      27.1.0
rdflib                     7.2.1
referencing                0.36.2
regex                      2024.11.6
requests                   2.32.4
rfc3986                    1.5.0
rich                       14.0.0
rich-click                 1.8.9
rpds-py                    0.26.0
safetensors                0.5.3
scikit-learn               1.7.0
scipy                      1.16.0
segments                   2.3.0
setuptools                 80.9.0
shellingham                1.5.4
six                        1.17.0
smart-open                 7.3.1
smolagents                 1.22.0
sniffio                    1.3.1
socksio                    1.0.0
sounddevice                0.5.2
soundfile                  0.13.1
soxr                       1.0.0
spacy                      3.8.7
spacy-curated-transformers 0.3.1
spacy-legacy               3.0.12
spacy-loggers              1.0.5
srsly                      2.5.1
sse-starlette              3.0.3
stack-data                 0.6.3
starlette                  0.47.1
sympy                      1.14.0
tabulate                   0.9.0
termcolor                  3.1.0
thinc                      8.3.6
threadpoolctl              3.6.0
tokenizers                 0.21.2
torch                      2.8.0+cpu
torchvision                0.23.0+cpu
tornado                    6.5.2
tqdm                       4.67.1
traitlets                  5.14.3
transformers               4.52.4
typer                      0.19.2
types-requests             2.32.4.20250913
typing-extensions          4.14.1
typing-inspection          0.4.1
tzdata                     2025.2
uritemplate                4.2.0
urllib3                    2.5.0
uvicorn                    0.35.0
wasabi                     1.1.3
wcwidth                    0.2.14
weasel                     0.4.1
widgetsnbextension         4.0.14
win32-setctime             1.2.0
wrapt                      1.17.2
xxhash                     3.5.0
yarl                       1.20.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions