Description
Is there an existing issue for the same bug?
- I have checked the existing issues.
RAGFlow workspace code commit ID
Latest v0.17.0
RAGFlow image version
Latest v0.17.0
Other environment information
Windows 11 system cpu i7-14700kf、gpu 5080, Installed Docker desktop.
After installing the graphics card driver cuda kit 12.8、cudnn 9.8.1。
Then it was changed to version 12.6 of CUDA, and the corresponding version of CUDNN still reported the same problem.
Actual behavior
Specific error information:
ERROR:root:Fail to bind embedding model: CUDA error: no kernel image is available for execution on the device 2025-03-11 14:57:45 CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-03-11 14:57:45 For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-03-11 14:57:45 Compile with to enable device-side assertions. 2025-03-11 14:57:45 Traceback (most recent call last): 2025-03-11 14:57:45 File "/ragflow/rag/svr/task_executor.py", line 519, in do_handle_task 2025-03-11 14:57:45 vts, _ = embedding_model.encode(["ok"]) 2025-03-11 14:57:45 File "<https://github.com/beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7f089fb60430>", line 31, in encode 2025-03-11 14:57:45 File "/ragflow/api/db/services/llm_service.py", line 240, in encode 2025-03-11 14:57:45 embeddings, used_tokens = self.mdl.encode(texts) 2025-03-11 14:57:45 File "<https://github.com/beartype(rag.llm.embedding_model.DefaultEmbedding.encode) at 0x7f08a1bddf30>", line 31, in encode 2025-03-11 14:57:45 File "/ragflow/rag/llm/embedding_model.py", line 104, in encode 2025-03-11 14:57:45 ress.extend(self._model.encode(texts[i:i + batch_size]).tolist()) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context 2025-03-11 14:57:45 return func(*args, **kwargs) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/FlagEmbedding/flag_models.py", line 96, in encode 2025-03-11 14:57:45 last_hidden_state = self.model(**inputs, return_dict=True).last_hidden_state 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl 2025-03-11 14:57:45 return self._call_impl(*args, **kwargs) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl 2025-03-11 14:57:45 return forward_call(*args, **kwargs) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 986, in forward 2025-03-11 14:57:45 extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 969, in get_extended_attention_mask 2025-03-11 14:57:45 extended_attention_mask = extended_attention_mask.to(dtype=dtype) # fp16 compatibility 2025-03-11 14:57:45 RuntimeError: CUDA error: no kernel image is available for execution on the device 2025-03-11 14:57:45 CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-03-11 14:57:45 For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-03-11 14:57:45 Compile with to enable device-side assertions. 2025-03-11 14:57:45 2025-03-11 14:57:45 ERROR:root:handle_task got exception for task {"id": "22c765f2fe4611ef98d00242ac120006", "doc_id": "2ea8b848fe1f11efa1b20242ac120006", "from_page": 0, "to_page": 94, "retry_count": 0, "kb_id": "1ddb13e4fe1f11efbe600242ac120006", "parser_id": "table", "parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "graphrag": {"use_graphrag": false}, "pages": []}, "name": "\u4e09\u8f6e\u592e\u7763\u6574\u6539\u8fdb\u5c55.xlsx", "type": "doc", "location": "\u4e09\u8f6e\u592e\u7763\u6574\u6539\u8fdb\u5c55.xlsx", "size": 149379, "tenant_id": "3fcce5fafe1e11efa9770242ac120006", "language": "Chinese", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": false}}, "img2txt_id": "", "asr_id": "", "llm_id": "deepseek-r1-distill-llama-70b@Tongyi-Qianwen", "update_time": 1741676265665, "task_type": ""} 2025-03-11 14:57:45 Traceback (most recent call last): 2025-03-11 14:57:45 File "/ragflow/rag/svr/task_executor.py", line 662, in handle_task 2025-03-11 14:57:45 do_handle_task(task) 2025-03-11 14:57:45 File "/ragflow/rag/svr/task_executor.py", line 519, in do_handle_task 2025-03-11 14:57:45 vts, _ = embedding_model.encode(["ok"]) 2025-03-11 14:57:45 File "<https://github.com/beartype(api.db.services.llm_service.LLMBundle.encode) at 0x7f089fb60430>", line 31, in encode 2025-03-11 14:57:45 File "/ragflow/api/db/services/llm_service.py", line 240, in encode 2025-03-11 14:57:45 embeddings, used_tokens = self.mdl.encode(texts) 2025-03-11 14:57:45 File "<https://github.com/beartype(rag.llm.embedding_model.DefaultEmbedding.encode) at 0x7f08a1bddf30>", line 31, in encode 2025-03-11 14:57:45 File "/ragflow/rag/llm/embedding_model.py", line 104, in encode 2025-03-11 14:57:45 ress.extend(self._model.encode(texts[i:i + batch_size]).tolist()) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context 2025-03-11 14:57:45 return func(*args, **kwargs) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/FlagEmbedding/flag_models.py", line 96, in encode 2025-03-11 14:57:45 last_hidden_state = self.model(**inputs, return_dict=True).last_hidden_state 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl 2025-03-11 14:57:45 return self._call_impl(*args, **kwargs) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl 2025-03-11 14:57:45 return forward_call(*args, **kwargs) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 986, in forward 2025-03-11 14:57:45 extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape) 2025-03-11 14:57:45 File "/ragflow/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 969, in get_extended_attention_mask 2025-03-11 14:57:45 extended_attention_mask = extended_attention_mask.to(dtype=dtype) # fp16 compatibility 2025-03-11 14:57:45 RuntimeError: CUDA error: no kernel image is available for execution on the device 2025-03-11 14:57:45 CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-03-11 14:57:45 For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-03-11 14:57:45 Compile with to enable device-side assertions.TORCH_USE_CUDA_DSATORCH_USE_CUDA_DSATORCH_USE_CUDA_DSA
image:
Expected behavior
No response
Steps to reproduce
According to the above content
Additional information
No response