unstructured/CHANGELOG.md at 3c19c7cd8af3defcc3a12312306f90b4a14c1e7d · siddartha-RE/unstructured · GitHub

0.3.0-dev8

Implement staging brick for Argilla. Converts lists of Text elements to argilla dataset classes.
Removing the local PDF parsing code and any dependencies and tests.
Reorganizes the staging bricks in the unstructured.partition module
Allow entities to be passed into the Datasaur staging brick
Added HTML escapes to the replace_unicode_quotes brick
Fix bad responses in partition_pdf to raise ValueError
Adds partition_html for partitioning HTML documents.

0.2.6

Small change to how _read is placed within the inheritance structure since it doesn't really apply to pdf
Add partitioning brick for calling the document image analysis API

0.2.5

Update python requirement to >=3.7

0.2.4

Add alternative way of importing Final to support google colab

0.2.3

Add cleaning bricks for removing prefixes and postfixes
Add cleaning bricks for extracting text before and after a pattern

0.2.2

Add staging brick for Datasaur

0.2.1

Added brick to convert an ISD dictionary to a list of elements
Update PDFDocument to use the from_file method
Added staging brick for CSV format for ISD (Initial Structured Data) format.
Added staging brick for separating text into attention window size chunks for transformers.
Added staging brick for LabelBox.
Added ability to upload LabelStudio predictions
Added utility function for JSONL reading and writing
Added staging brick for CSV format for Prodigy
Added staging brick for Prodigy
Added ability to upload LabelStudio annotations
Added text_field and id_field to stage_for_label_studio signature

0.2.0

Initial release of unstructured