Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new partitioning brick that calls the document image analysis API #68

Merged
merged 14 commits into from
Nov 16, 2022

Conversation

LaverdeS
Copy link
Contributor

@LaverdeS LaverdeS commented Nov 13, 2022

add a new partitioning brick partition_pfd that calls the document image analysis API to segment a PDF

Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality looks great! I liked the healtcheck too. Just a couple of minor comments on the code. Could you also add documentation for this brick to bricks.rst under docs? You can run make html under the docs folder to regenerate the docs, once you've installed docs/requirements.txt.

CHANGELOG.md Show resolved Hide resolved
unstructured/nlp/partition.py Show resolved Hide resolved
if not token:
healthcheck_response = requests.get(url=f"{url}healthcheck")

if healthcheck_response.status_code == 200:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the nested ifs, can we check for != 200 instead?

Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor doc nit, otherwise LGTM! Could you also add a description to the PR before merging?

@@ -174,6 +174,23 @@ Examples:

# Returns False because the text is more than 1% caps
exceeds_cap_ratio(example_2, threshold=0.01)


``partition_pfd``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``partition_pfd``
``partition_pdf``

``partition_pfd``
---------------------

The ``partition_pdf`` function calls the document image analysis API. The intent of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``partition_pdf`` function calls the document image analysis API. The intent of the
The ``partition_pdf`` segments a PDF document the document image analysis API. The intent of the

@LaverdeS LaverdeS merged commit baa15d0 into main Nov 16, 2022
@LaverdeS LaverdeS deleted the CORE-270/partition-pdf branch November 16, 2022 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants