Skip to content

[Bug]: pdf ocr failed with msg: Pdf object has no attribute page_images #5816

Open
@l0rraine

Description

@l0rraine

Is there an existing issue for the same bug?

  • I have checked the existing issues.

RAGFlow workspace code commit ID

2b09df8

RAGFlow image version

0.17.0

Other environment information

Actual behavior

while parsing a pdf, report this error. looks like some coroutine error. after added some await and async to related func, warn disappered but error continues

2025-03-10 09:30:35,301 WARNING  253312 /home/deepseek/ragflow-master/rag/app/naive.py:138: RuntimeWarning: coroutine 'RAGFlowPdfParser.__images__' was never awaited
  self.__images__(

2025-03-10 09:30:35,314 INFO     253312 set_progress(41997d96fd4f11ef917f89813de1e997), progress: None, progress_msg: 09:30:35 Page(1~13): OCR finished (0.08s)
2025-03-10 09:30:35,406 INFO     253312 OCR(0~12): 0.19s
2025-03-10 09:30:35,516 INFO     253312 set_progress(41997d96fd4f11ef917f89813de1e997), progress: -1, progress_msg: 09:30:35 Page(1~13): [ERROR]Internal server error while chunking: Pdf object has no attribute page_images

Expected behavior

No response

Steps to reproduce

any pdf need ocr occurred

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions