-
Notifications
You must be signed in to change notification settings - Fork 362
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix & refactor & docs:update ocr logic and installation guides (#88)
* refactor(extract_pdf): When converting a PDF to a list of images, do not perform a BGR channel conversion upfront. * feat(self_modify): refine text and formula detection box updating logic Update the logic for merging and refining detection boxes in self_modify module. Replace hardcoded checks with dynamic calculations for determining overlapping regions, resulting in more accurate detection box merging when formulae are identified within texts. * fix(pdf_extract): optimize batch size and worker count for DataLoader Reduce the batch size from 128 to 64 and set the number of workers to 0 in the DataLoaderto improve stability and performance on systems with limited resources. refactor(pdf_extract): refactor ocr and table recognition logicRefactor the ocr and table recognition logic to enhance readability and maintainability.This includes the adjustment of formula recognition coordinates relative to the cropped image and streamlining the process for handling OCR results and table recognition. * refactor(pdf_extract): optimize image processing and table recognition - Rename loop variable 'idx' to 'pdf_idx' for clarity.- Adjust image pasting and coordinate handling during OCR processing.- Add comments for improved code understanding.- Ensure proper rendering of images during PDF visualization. - Refactor logging and utility imports in self_modify module. The changes include improvements to image processing routines, better variable naming, and streamlined table recognition logic. Also, the visualization process has been tweaked to handle images more accurately. Additionally, redundant logging and utility importshave been cleaned up in the self_modify module to declutter the codebase. * refactor(pdf_extract): remove hardcoded paste values in crop_img function The crop_img function now accepts `crop_paste_x` and `crop_paste_y` as parameters instead of using hardcoded values. This change makes the function more flexible andeasier to adjust for different use cases. * fix(extract_pdf): prevent overscaling of large images Adjust the condition to prevent images from being enlarged beyond a width or height of 9000 pixels, ensuring large images do not become overly large when processed. This change avoids unnecessary resource consumption and potential performance issues when handling scaled images. * docs: update installation guides and requirements - Update the installation guides for macOS and Windows with new commands and simplified dependency installation. - Add new installation guide for Linux. - Modify requirements for CPU and GPU environments, including updates to `unimernet`, `matplotlib`, and `paddlepaddle`. - Provide precompiled wheels for `detectron2` in the installation process. * docs(windows_en): update config guidance for windows * Update func description in self_modify.py * change parameter name in pdf_extract.py, update padding size in ocr * update some instructions in Install_in_Windows_en.md * update some instructions in Install_in_Windows_zh_cn.md * Update README.md * Update README-zh_CN.md --------- Co-authored-by: Fan Wu <34300920+wufan-tb@users.noreply.github.com>
- Loading branch information
Showing
13 changed files
with
239 additions
and
145 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.