-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update auto.partition() function to recognize Unstructured json #337
Conversation
unstructured/partition/json.py
Outdated
if file_text_whitespace_removed[0] == "[" and file_text_whitespace_removed[-1] == "]": | ||
if len(file_text_whitespace_removed) == 2 or \ | ||
(file_text_whitespace_removed[1] == "{" and file_text_whitespace_removed[-2] == "}"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good way to verify if text is indeed parsable as JSON. Consider e.g.:
try:
unstructured_json = json.loads(file_text)
except json.JSONDecodeError:
raise ...
return dict_to_elements(unstructured_json)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've originally put these conditions to quickly raise for invalid input format as optimization, since we call json_loads
in elements_from_json
anyways.
…eat/auto-partition
#275