Skip to content

first pass get_text method for ALTO XML input #252

@rlskoeser

Description

@rlskoeser

update ALTO FileInput class get_text method to return lines of text in document order

  • for each xml document in the zipfile:
    initialize as alto xmlobject, then
    • for each text block in the xml document:
      • for each textline in the text block, return string contents

Once basic text iteration is working, sort text by VPOS

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions