Skip to content

[Bug]: RAGFLOW almost perfect … but it CRASHES on GPU #5521

Open
@jlostanau

Description

@jlostanau

Is there an existing issue for the same bug?

  • I have checked the existing issues.

RAGFlow workspace code commit ID

null

RAGFlow image version

RagFlow v0.16.0

Other environment information

CPU: 4
Memory: 16GB
GPU: T4 with 16GB VRAM

Actual behavior

First of all, I want to thank everyone involved in this project, which I see has a lot of potential and can become a reference for open-source enterprise RAG solutions.

I have been doing several tests with RAGFLOW because I intend to take it to a production environment for a company. During the tests I have been conducting in this first part, I am focusing on document loading and the time it takes to process them.

To give a reference, for example, Google’s NotebookLM performs document loading and data extraction from PDFs almost immediately, which is understandable given the hardware infrastructure behind it, with thousands of GPUs doing all this work in seconds. This is the user-level perception of how long it should take to load and parse document information. Unfortunately, implementing a solution like Ragflow makes it a bit complicated to reach that level without investing large amounts of money renting GPUs.

But the point is that RAGFLOW can reach an optimal loading level that is reasonably understandable for the user, considering the hardware we can use. For these tests, I have been using an AWS g4dn.xlarge instance with the following hardware characteristics:

CPU: 4
Memory: 16GB
GPU: T4 with 16GB VRAM

For the loading tests, I used a PDF document of some legal regulations with 73 pages and a size of 17MB.

Ragflow GPU Performance in Ragflow v0.16

During the loading process, several GPU usage messages appear for various functions, which proves that GPU resources are indeed used for the process.

Image

It is also evident that GPU consumption is stable without peaks, maintaining a constant 2360MB.

Image

For the mentioned file, the entire extraction process took 20 minutes and 36 seconds, which as a user, I consider too long. If I had a queue of documents of similar size, the wait for loading would be quite heavy, especially since not all GPU resources are utilized, as it only consumes 2360MB out of the available 15360MB.

Image

Ragflow GPU Performance in Ragflow nightly as of March 1, 2025

Now with the files updated until March 1, I created an image to be used on GPU and noticed a very considerable improvement.

This time, I do notice that it makes more intensive use of the GPU, taking advantage of its capacity, reaching usage peaks of 10265MB.

Image

Regarding the logs, many details that appear in version 0.16 no longer appear.

Image

But the most notable thing is the time it now takes to perform the extraction. The process now only lasts 5 minutes and 16 seconds. That is, a 74% time saving or about 15 minutes and 20 seconds!!

Image

With this time reduction, RAGFLOW can become very competitive as an enterprise-level RAG tool, considering the basic hardware and GPU being used.

But here comes the problem, after the extractions, when making a query, the server CRASHES and stops responding.

Image

It also CRASHES when trying to load a new document.

Image

In the application logs, only the loading of some libraries before the CRASH is seen.

Image

Image

I have conducted some tests, and the CRASH appears after these activities:

  • A small file is loaded

  • Queries are made with the assistant

  • Another file is attempted to be loaded

  • CRASH!!

  • A large file is loaded

  • Queries are made with the assistant

  • CRASH!!

From what I have been able to review on the internet, the problem could be due to pdfplumber or some part of the code that does not release memory after parsing documents.

I even manually applied the following fixes that are still pending approval, but the CRASH still occurs.

Image

I am not a programmer, but I am an enthusiast for the use of RAGFLOW. I hope you can consider reviewing this point, which would be very helpful to scale this tool to the next level, as it could be used professionally in various fields.

Thank you very much.

Expected behavior

No response

Steps to reproduce

On a GPU System

- A small file is loaded

- Queries are made with the assistant

- Another file is attempted to be loaded

- CRASH!!


- A large file is loaded

- Queries are made with the assistant

- CRASH!!

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions