Open
Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.I confirm that I am using English to submit this report (Language Policy).Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
I don't know
RAGFlow image version
v0.17.2-slim
Other environment information
OS: Ubuntu 20.04
Actual behavior
When I selected the pdf paper to use openai API embedding model to build my knowledge, the error occurred:
TypeError("'<' not supported between instances of 'NoneType' and 'int'")
Expected behavior
embedding the paper works well
Steps to reproduce
Only occurs in individual documents.
Additional information
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
KevinHuSh commentedon Mar 24, 2025
Could you paste the error logs of backend here?
chenjh356 commentedon Mar 24, 2025
chenjh356 commentedon Mar 24, 2025
2025-03-24 11:03:38,363 INFO 48 172.17.0.6 - - [24/Mar/2025 11:03:38] "GET /v1/document/list?kb_id=f7d2c9ca085b11f09fa30242ac110006&keywords=&page_size=10&page=1 HTTP/1.1" 200 -
2025-03-24 11:03:44,785 INFO 49 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2025-03-24T11:03:44.784+08:00", "boot_at": "2025-03-22T15:00:47.022+08:00", "pending": 0, "lag": 0, "done": 249, "failed": 0, "current": {}}
2025-03-24 11:03:52,008 ERROR 48 Fail to get f7d2c9ca085b11f09fa30242ac110006/Thehost genetics in shaping intergenerational microbiomes.pdf
Traceback (most recent call last):
File "/ragflow/rag/utils/minio_conn.py", line 88, in get
r = self.conn.get_object(bucket, filename)
File "/ragflow/.venv/lib/python3.10/site-packages/minio/api.py", line 1244, in get_object
return self._execute(
File "/ragflow/.venv/lib/python3.10/site-packages/minio/api.py", line 440, in _execute
return self._url_open(
File "/ragflow/.venv/lib/python3.10/site-packages/minio/api.py", line 423, in _url_open
raise response_error
minio.error.S3Error: S3 operation failed; code: NoSuchKey, message: The specified key does not exist., resource: /f7d2c9ca085b11f09fa30242ac110006/Thehost genetics in shaping intergenerational microbiomes.pdf, request_id: 182F9E1DE27FD23C, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: f7d2c9ca085b11f09fa30242ac110006, object_name: Thehost genetics in shaping intergenerational microbiomes.pdf
2025-03-24 11:03:53,010 ERROR 48 total_page_number
Traceback (most recent call last):
File "/ragflow/deepdoc/parser/pdf_parser.py", line 958, in total_page_number
pdf = pdfplumber.open(
File "/ragflow/.venv/lib/python3.10/site-packages/pdfplumber/pdf.py", line 86, in open
stream = open(path_or_fp, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'Thehost genetics in shaping intergenerational microbiomes.pdf'
2025-03-24 11:03:53,011 ERROR 48 '<' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
File "/ragflow/api/apps/document_app.py", line 383, in run
queue_tasks(doc, bucket, name)
File "<@beartype(api.db.services.task_service.queue_tasks) at 0x77c5377aa950>", line 69, in queue_tasks
File "/ragflow/api/db/services/task_service.py", line 223, in queue_tasks
e = min(e - 1, pages)
TypeError: '<' not supported between instances of 'NoneType' and 'int'
2025-03-24 11:03:53,013 INFO 48 172.17.0.6 - - [24/Mar/2025 11:03:53] "POST /v1/document/run HTTP/1.1" 200 -
2025-03-24 11:03:53,431 INFO 48 172.17.0.6 - - [24/Mar/2025 11:03:53] "GET /v1/document/list?kb_id=f7d2c9ca085b11f09fa30242ac110006&keywords=&page_size=10&page=1 HTTP/1.1" 200 -
chenjh356 commentedon Mar 24, 2025
it seems the problem of minio?
S3 operation failed; code: XMinioStorageFull, message: Storage backend has reached its minimum free drive threshold. Please delete a few objects to proceed., resource: /txtxtxtxt1/txtxtxtxt1, request_id: 182F9E5CA4173159, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: txtxtxtxt1, object_name: txtxtxtxt1
KevinHuSh commentedon Mar 24, 2025
For the error stack, the code might be out of date.
Please pull the
nightly
version of docker image.