Skip to content

[Bug]: embedding error #6418

Open
Open
@chenjh356

Description

@chenjh356

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
    I confirm that I am using English to submit this report (Language Policy).
    Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
    Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

I don't know

RAGFlow image version

v0.17.2-slim

Other environment information

OS: Ubuntu 20.04

Actual behavior

When I selected the pdf paper to use openai API embedding model to build my knowledge, the error occurred:
TypeError("'<' not supported between instances of 'NoneType' and 'int'")

Image

Expected behavior

embedding the paper works well

Steps to reproduce

Only occurs in individual documents.

Additional information

No response

Activity

KevinHuSh

KevinHuSh commented on Mar 24, 2025

@KevinHuSh
Collaborator

Could you paste the error logs of backend here?

chenjh356

chenjh356 commented on Mar 24, 2025

@chenjh356
Author
chenjh356

chenjh356 commented on Mar 24, 2025

@chenjh356
Author

Could you paste the error logs of backend here?

2025-03-24 11:03:38,363 INFO 48 172.17.0.6 - - [24/Mar/2025 11:03:38] "GET /v1/document/list?kb_id=f7d2c9ca085b11f09fa30242ac110006&keywords=&page_size=10&page=1 HTTP/1.1" 200 -
2025-03-24 11:03:44,785 INFO 49 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2025-03-24T11:03:44.784+08:00", "boot_at": "2025-03-22T15:00:47.022+08:00", "pending": 0, "lag": 0, "done": 249, "failed": 0, "current": {}}
2025-03-24 11:03:52,008 ERROR 48 Fail to get f7d2c9ca085b11f09fa30242ac110006/Thehost genetics in shaping intergenerational microbiomes.pdf
Traceback (most recent call last):
File "/ragflow/rag/utils/minio_conn.py", line 88, in get
r = self.conn.get_object(bucket, filename)
File "/ragflow/.venv/lib/python3.10/site-packages/minio/api.py", line 1244, in get_object
return self._execute(
File "/ragflow/.venv/lib/python3.10/site-packages/minio/api.py", line 440, in _execute
return self._url_open(
File "/ragflow/.venv/lib/python3.10/site-packages/minio/api.py", line 423, in _url_open
raise response_error
minio.error.S3Error: S3 operation failed; code: NoSuchKey, message: The specified key does not exist., resource: /f7d2c9ca085b11f09fa30242ac110006/Thehost genetics in shaping intergenerational microbiomes.pdf, request_id: 182F9E1DE27FD23C, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: f7d2c9ca085b11f09fa30242ac110006, object_name: Thehost genetics in shaping intergenerational microbiomes.pdf
2025-03-24 11:03:53,010 ERROR 48 total_page_number
Traceback (most recent call last):
File "/ragflow/deepdoc/parser/pdf_parser.py", line 958, in total_page_number
pdf = pdfplumber.open(
File "/ragflow/.venv/lib/python3.10/site-packages/pdfplumber/pdf.py", line 86, in open
stream = open(path_or_fp, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'Thehost genetics in shaping intergenerational microbiomes.pdf'
2025-03-24 11:03:53,011 ERROR 48 '<' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
File "/ragflow/api/apps/document_app.py", line 383, in run
queue_tasks(doc, bucket, name)
File "<@beartype(api.db.services.task_service.queue_tasks) at 0x77c5377aa950>", line 69, in queue_tasks
File "/ragflow/api/db/services/task_service.py", line 223, in queue_tasks
e = min(e - 1, pages)
TypeError: '<' not supported between instances of 'NoneType' and 'int'
2025-03-24 11:03:53,013 INFO 48 172.17.0.6 - - [24/Mar/2025 11:03:53] "POST /v1/document/run HTTP/1.1" 200 -
2025-03-24 11:03:53,431 INFO 48 172.17.0.6 - - [24/Mar/2025 11:03:53] "GET /v1/document/list?kb_id=f7d2c9ca085b11f09fa30242ac110006&keywords=&page_size=10&page=1 HTTP/1.1" 200 -

chenjh356

chenjh356 commented on Mar 24, 2025

@chenjh356
Author

it seems the problem of minio?
S3 operation failed; code: XMinioStorageFull, message: Storage backend has reached its minimum free drive threshold. Please delete a few objects to proceed., resource: /txtxtxtxt1/txtxtxtxt1, request_id: 182F9E5CA4173159, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: txtxtxtxt1, object_name: txtxtxtxt1

KevinHuSh

KevinHuSh commented on Mar 24, 2025

@KevinHuSh
Collaborator

For the error stack, the code might be out of date.
Please pull the nightly version of docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @KevinHuSh@chenjh356

      Issue actions

        [Bug]: embedding error · Issue #6418 · infiniflow/ragflow