-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dify doesn't seem to support parsing image-only PDF files #11063
Comments
Hey @Tuyohai! I'm here to assist you with any bugs, questions, or contributions for Dify. Let's tackle this issue together! I found a similar unsolved discussion regarding Dify's ability to parse images in PDFs and embed them into vector databases [1]. Additionally, there are no existing solutions or built-in OCR capabilities in Dify for extracting text from images within PDFs. The current PDF processing in Dify uses the To extend Dify's capabilities to include OCR for image-only PDFs, you might need to integrate an OCR library like Tesseract or a similar tool that can handle image-based text extraction. |
Whether to consider adding OCR recognition, may deploy an OCR model, or use a third-party service like the LLM Service |
#10829 We do have a tool pending on review. |
Hi, @Tuyohai. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale. Issue Summary:
Next Steps:
Thank you for your understanding and contribution! |
Self Checks
Dify version
0.11.2
Cloud or Self Hosted
Cloud
Steps to reproduce
Upload image-only PDF to the knowledge base.
✔️ Expected Behavior
Parse out the text in the image.
❌ Actual Behavior
The parsed result is empty.
The text was updated successfully, but these errors were encountered: