Skip to content

increase coverage #2

@bertsky

Description

@bertsky
  • Confidence (unfortunately, this conflates Coords and Text @conf)
  • TextType (HANDWRITING@production=handwritten-printscript|handwritten-cursive, PRINTED@production=printed)
  • support tables:
    • top-level TableRegion for TABLE block
    • recursive TextRegion for CELL block (i.e. ColumnIndexRoles/TableCellRole/@columnIndex, RowIndexRoles/TableCellRole/@rowIndex)
    • recursive TextRegion for MERGED_CELL block (i.e. ColumnSpanRoles/TableCellRole/@colSpan, RowSpanRoles/TableCellRole/@rowSpan) – diverging recursion between Textract and PAGE?
    • recursive TextRegion for TABLE_TITLE and TABLE_FOOTER block (i.e. Roles/TableCellRole/@header... or via ReadingOrder)
    • EntityTypesSTRUCTURED_TABLE|SEMI_STRUCTURED_TABLE (unclear how to represent in PAGE), TABLE_TITLE|TABLE_SECTION_TITLE|TABLE_FOOTER|TABLE_SUMMARY|COLUMN_HEADER (unclear how this looks and compares with the actual recursive BlockType)?
    • also via ordered groups in ReadingOrder?
    • unclear: LineItemGroup and LineItems
  • PageClassification/PageType (unclear, but probably Page/@type)
  • support forms
    • BlockType=KEY_VALUE_SET and EntityTypes=KEY|VALUE → unclear how to represent: TableRegion or recursive TextRegion? Labels/Label?
    • register KEY_VALUE_SET
    • represent in page
  • support checkboxes within tables or forms
    • BlockType=SELECTION_ELEMENT and SelectionStatus=SELECTED|NOT_SELECTED → unclear how to represent
    • register SELECTION_ELEMENT
    • represent in page
  • ignore query type

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions