Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix & refactor & docs:update ocr logic and installation guides #88

Merged
merged 18 commits into from
Aug 14, 2024

Conversation

myhloli
Copy link
Collaborator

@myhloli myhloli commented Aug 12, 2024

  • refactor(extract_pdf): When converting a PDF to a list of images, don’t perform a BGR channel conversion upfront.
  • refactor(self_modify): refine text and formula detection box updating logic
  • refactor(pdf_extract): optimize image processing and table recognition
  • refactor(pdf_extract): remove hardcoded paste values in crop_img function
  • fix(extract_pdf): prevent overscaling of large images
  • fix(pdf_extract): optimize batch size and worker count for DataLoader
  • docs: update installation guides and requirements

myhloli and others added 18 commits August 7, 2024 18:54
…not perform a BGR channel conversion upfront.
Update the logic for merging and refining detection boxes in self_modify module.
Replace hardcoded checks with dynamic calculations for determining overlapping regions,
resulting in more accurate detection box merging when formulae are identified within texts.
Reduce the batch size from 128 to 64 and set the number of workers to 0 in the DataLoaderto improve stability and performance on systems with limited resources.

refactor(pdf_extract): refactor ocr and table recognition logicRefactor the ocr and table recognition logic to enhance readability and maintainability.This includes the adjustment of formula recognition coordinates relative to the cropped
image and streamlining the process for handling OCR results and table recognition.
- Rename loop variable 'idx' to 'pdf_idx' for clarity.- Adjust image pasting and coordinate handling during OCR processing.- Add comments for improved code understanding.- Ensure proper rendering of images during PDF visualization.
- Refactor logging and utility imports in self_modify module.

The changes include improvements to image processing routines, better variable naming,
and streamlined table recognition logic. Also, the visualization process has been tweaked
to handle images more accurately. Additionally, redundant logging and utility importshave been cleaned up in the self_modify module to declutter the codebase.
refactor ocr and table recognition logic
…tion

The crop_img function now accepts `crop_paste_x` and `crop_paste_y` as parameters
instead of using hardcoded values. This change makes the function more flexible andeasier to adjust for different use cases.
refactor(pdf_extract): remove hardcoded paste values in crop_img function
Adjust the condition to prevent images from being enlarged beyond a width or
height of 9000 pixels, ensuring large images do not become overly large when
processed. This change avoids unnecessary resource consumption and potential
performance issues when handling scaled images.
修复(extract_pdf):防止大图像的过度缩放
- Update the installation guides for macOS and Windows with new commands and simplified dependency installation.
- Add new installation guide for Linux.
- Modify requirements for CPU and GPU environments, including updates to
  `unimernet`, `matplotlib`, and `paddlepaddle`.
- Provide precompiled wheels for `detectron2` in the installation process.
docs: update installation guides and requirements
@wufan-tb wufan-tb merged commit 74a5e17 into dev Aug 14, 2024
@myhloli myhloli deleted the xiaomeng_dev branch August 15, 2024 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants