Skip to content

Fix TEI-XML parsing so lines with multiple <lb> tags are handled correctly #248

@laurejt

Description

@laurejt

Acceptance Testing Notes

Download this sample TEI file that contains tricky edge cases for testing: sample_tei_with_footnotes.xml
Run Corpus Builder in the app to build a sentence corpus for this sample TEI file.

  • Verify all sentences have correct line numbers except:
    - the one sentence on page 20 whose line begin tag has no line number attribute
    - the one sentence on page 22 which has no line begin tag at all.

Current TEI-XML parsing generates incorrect text when there are multiple <lb> tags on the same line.

For example, the XML line

<lb n="28"/>Das bisher Entwickelte gilt unter der Voraussetzung, daß im Fortgang der<lb n="29"/>Accumulation <hi rendition="i">das Verhältniß zwischen der Masse der Produktionsmittel und

gets incorrectly parsed as:

Das bisher Entwickelte gilt unter der Voraussetzung, daß im Fortgang derAccumulation das Verhältniß zwischen der Masse der Produktionsmittel und

Metadata

Metadata

Assignees

Labels

👇this sprintWork scheduled for the current sprint

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions