perf: optimze figure parser #7392

liuzhenghua · 2025-04-28T10:26:00Z

What problem does this PR solve?

When parsing documents containing images, the current code uses a single-threaded approach to call the VL model, resulting in extremely slow parsing speed (e.g., parsing a Word document with dozens of images takes over 20 minutes).

By switching to a multithreaded approach to call the VL model, the parsing speed can be improved to an acceptable level.

Type of change

Performance Improvement

asiroliu · 2025-04-29T11:11:20Z

@liuzhenghua
Could you provide a test Word document?

liuzhenghua · 2025-04-29T13:49:46Z

@liuzhenghua Could you provide a test Word document?

@asiroliu It’s a simple Word document serving as an operation manual, containing around 90 screenshots. Sorry, I can’t provide the file as it contains sensitive company data.

asiroliu · 2025-04-30T03:28:49Z

@liuzhenghua
Thank you for your response. Based on your suggestion, i can use the following search terms on Google to find the relevant documentation:

filetype:doc OR filetype:docx "User Manual" OR "Instruction Manual" OR "Operation Guide"

liuzhenghua · 2025-04-30T03:49:19Z

@liuzhenghua Thank you for your response. Based on your suggestion, i can use the following search terms on Google to find the relevant documentation:
filetype:doc OR filetype:docx "User Manual" OR "Instruction Manual" OR "Operation Guide"

@asiroliu Sorry for my previous response — the document I mentioned belongs to the company and can't be shared. You'll need to create a Microsoft Word document yourself, including some text and around 90+ images.

asiroliu · 2025-04-30T04:00:05Z

@liuzhenghua
I've compared the multi-image document parsing performance between the nightly build and your latest commit. There doesn't appear to be any noticeable efficiency improvement.

Test word docment: https://irtfweb.ifa.hawaii.edu/~tcs3/oldstuff/osprey/userman.doc
nightly(78b00d61fd59)(2025_04_29): 207 secs

you lastest commit(3281d47): 215 secs

liuzhenghua · 2025-04-30T06:52:36Z

https://irtfweb.ifa.hawaii.edu/~tcs3/oldstuff/osprey/userman.doc

@asiroliu
The test document you used didn't contain any images. The one I tested had around 90+ images. Before the optimization, it took 20 minutes to parse the images and another 20 minutes to upload them to MinIO. After the changes, both steps now only take 2 minutes each.

liuzhenghua · 2025-04-30T07:01:18Z

@asiroliu My local version is 0.17.2. When the log message "Visual model detected. Attempting to enhance figure extraction" appears, I debugged and found that it processes the 90 images in the document by calling the VL model one by one in a single queue, which leads to a long processing time.

You observed a similar processing time in your test, but that might be due to one or more of the following reasons:

Your document contains very few images.
You haven't configured a VL model.
Version 0.18.0 doesn't have this issue.

asiroliu · 2025-04-30T07:08:32Z

Got it, I'll verify this later per your suggestions.

Modify the figure parser to use multithreading

5d1bf50

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Apr 28, 2025

yingfeng added the ci Continue Integration label Apr 28, 2025

perf: modify the minio to use multithreading

5973469

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Apr 28, 2025

fix: resolve format issue

3281d47

KevinHuSh requested a review from asiroliu April 29, 2025 02:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimze figure parser #7392

perf: optimze figure parser #7392

liuzhenghua commented Apr 28, 2025 •

edited by yingfeng

Loading

asiroliu commented Apr 29, 2025

liuzhenghua commented Apr 29, 2025

asiroliu commented Apr 30, 2025 •

edited

Loading

liuzhenghua commented Apr 30, 2025

asiroliu commented Apr 30, 2025

liuzhenghua commented Apr 30, 2025

liuzhenghua commented Apr 30, 2025

asiroliu commented Apr 30, 2025

perf: optimze figure parser #7392

Are you sure you want to change the base?

perf: optimze figure parser #7392

Conversation

liuzhenghua commented Apr 28, 2025 • edited by yingfeng Loading

What problem does this PR solve?

Type of change

asiroliu commented Apr 29, 2025

liuzhenghua commented Apr 29, 2025

asiroliu commented Apr 30, 2025 • edited Loading

liuzhenghua commented Apr 30, 2025

asiroliu commented Apr 30, 2025

liuzhenghua commented Apr 30, 2025

liuzhenghua commented Apr 30, 2025

asiroliu commented Apr 30, 2025

liuzhenghua commented Apr 28, 2025 •

edited by yingfeng

Loading

asiroliu commented Apr 30, 2025 •

edited

Loading