-
Notifications
You must be signed in to change notification settings - Fork 4.8k
perf: optimze figure parser #7392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: optimze figure parser #7392
Conversation
@liuzhenghua |
@asiroliu It’s a simple Word document serving as an operation manual, containing around 90 screenshots. Sorry, I can’t provide the file as it contains sensitive company data. |
@liuzhenghua
|
@asiroliu Sorry for my previous response — the document I mentioned belongs to the company and can't be shared. You'll need to create a Microsoft Word document yourself, including some text and around 90+ images. |
@liuzhenghua
![]() |
@asiroliu |
@asiroliu My local version is 0.17.2. When the log message "Visual model detected. Attempting to enhance figure extraction" appears, I debugged and found that it processes the 90 images in the document by calling the VL model one by one in a single queue, which leads to a long processing time. You observed a similar processing time in your test, but that might be due to one or more of the following reasons:
|
Got it, I'll verify this later per your suggestions. |
What problem does this PR solve?
When parsing documents containing images, the current code uses a single-threaded approach to call the VL model, resulting in extremely slow parsing speed (e.g., parsing a Word document with dozens of images takes over 20 minutes).
By switching to a multithreaded approach to call the VL model, the parsing speed can be improved to an acceptable level.
Type of change