-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new partitioning brick that calls the document image analysis API #68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality looks great! I liked the healtcheck too. Just a couple of minor comments on the code. Could you also add documentation for this brick to bricks.rst
under docs
? You can run make html
under the docs
folder to regenerate the docs, once you've installed docs/requirements.txt
.
unstructured/nlp/partition.py
Outdated
if not token: | ||
healthcheck_response = requests.get(url=f"{url}healthcheck") | ||
|
||
if healthcheck_response.status_code == 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the nested if
s, can we check for != 200
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor doc nit, otherwise LGTM! Could you also add a description to the PR before merging?
docs/source/bricks.rst
Outdated
@@ -174,6 +174,23 @@ Examples: | |||
|
|||
# Returns False because the text is more than 1% caps | |||
exceeds_cap_ratio(example_2, threshold=0.01) | |||
|
|||
|
|||
``partition_pfd`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``partition_pfd`` | |
``partition_pdf`` |
docs/source/bricks.rst
Outdated
``partition_pfd`` | ||
--------------------- | ||
|
||
The ``partition_pdf`` function calls the document image analysis API. The intent of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ``partition_pdf`` function calls the document image analysis API. The intent of the | |
The ``partition_pdf`` segments a PDF document the document image analysis API. The intent of the |
add a new partitioning brick partition_pfd that calls the document image analysis API to segment a PDF