Skip to content

Latest commit

 

History

History
49 lines (34 loc) · 1.64 KB

CHANGELOG.md

File metadata and controls

49 lines (34 loc) · 1.64 KB

0.3.0-dev8

  • Implement staging brick for Argilla. Converts lists of Text elements to argilla dataset classes.
  • Removing the local PDF parsing code and any dependencies and tests.
  • Reorganizes the staging bricks in the unstructured.partition module
  • Allow entities to be passed into the Datasaur staging brick
  • Added HTML escapes to the replace_unicode_quotes brick
  • Fix bad responses in partition_pdf to raise ValueError
  • Adds partition_html for partitioning HTML documents.

0.2.6

  • Small change to how _read is placed within the inheritance structure since it doesn't really apply to pdf
  • Add partitioning brick for calling the document image analysis API

0.2.5

  • Update python requirement to >=3.7

0.2.4

  • Add alternative way of importing Final to support google colab

0.2.3

  • Add cleaning bricks for removing prefixes and postfixes
  • Add cleaning bricks for extracting text before and after a pattern

0.2.2

  • Add staging brick for Datasaur

0.2.1

  • Added brick to convert an ISD dictionary to a list of elements
  • Update PDFDocument to use the from_file method
  • Added staging brick for CSV format for ISD (Initial Structured Data) format.
  • Added staging brick for separating text into attention window size chunks for transformers.
  • Added staging brick for LabelBox.
  • Added ability to upload LabelStudio predictions
  • Added utility function for JSONL reading and writing
  • Added staging brick for CSV format for Prodigy
  • Added staging brick for Prodigy
  • Added ability to upload LabelStudio annotations
  • Added text_field and id_field to stage_for_label_studio signature

0.2.0

  • Initial release of unstructured