Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: remove analysis scripts #305

Merged
merged 10 commits into from
Jun 25, 2024

Conversation

christinestraub
Copy link
Contributor

@christinestraub christinestraub commented Dec 15, 2023

Summary

This PR is the first part of the "layout analysis" refactor to move it from unstructured-inference repo to unstructured repo. This PR removes "layout analysis" related code from unstructured-inference repo and works together with the unstructured refactor PR - Unstructured-IO/unstructured#2273.

This PR also adds a few more test cases for layoutelement.py to make coverage to over 95%.

…notate all types of layout elements (extracted, inferred, and final) except OCR'd elements
@christinestraub christinestraub changed the title Refactor/remove analysis scripts Refactor: remove analysis scripts Dec 15, 2023
Copy link
Contributor

@scanny scanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one suggested change but approving in advance :)

unstructured_inference/utils.py Outdated Show resolved Hide resolved
github-merge-queue bot pushed a commit to Unstructured-IO/unstructured that referenced this pull request Dec 19, 2023
### Summary
This PR is the second part of the "layout analysis" refactor to move it
from unstructured-inference repo to unstructured repo, the first part is
done in
Unstructured-IO/unstructured-inference#305. This
PR adds logic to support annotating `inferred` and `extracted` elements.

### Testing

```
PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy> <document_type>
```
e.g.
```
PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/layout-parser-paper-fast.pdf hi_res pdf
```
# Conflicts:
#	CHANGELOG.md
#	test_unstructured_inference/test_elements.py
#	unstructured_inference/__version__.py
#	unstructured_inference/utils.py
@christinestraub christinestraub merged commit e2a6757 into main Jun 25, 2024
5 checks passed
@christinestraub christinestraub deleted the refactor/remove-analysis-scripts branch June 25, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants