Skip to content

xlsx with new Lines (CR+LF) : Parsing error #392

@olivbrau

Description

@olivbrau

Hi, I'm trying to open an xslx file.
I get this error : Current state not START_ELEMENT, END_ELEMENT or ENTITY_REFERENCE
This error lies in OPCPackage.extractFormat() :
In the while loop, the instruction if ("numFmt".equals(reader.getLocalName())) {
creates en error because the current token type of the reader is CHARACTERS, and then getLocalName() create an exception.
It is because my xlsx file has new lines (CRL+LF) in the styles.xml and so, when we are in the loop that comes after getting the <cellXFS>, insideCellXfs is true, but we can't call getLocalName() since there are CHARACTERS (the new line).
I hope I'm clear in spite of my bad english.

To get this error, take a valid xlsx file, and put CR+LF after each tag in the styles.xml
Excel doesn't put CRLF in the xml it creates, but my xlsx file comes from another tools which put theses CRLF. I think that xlsx format doesn't forbid this, so fastexcel should consider this possibility.

Another side effect of adding CRLF after each tag, is when fastexcel reads sharedStrings.xml :
instead of keeping the text in <t> tags, it retrieves all the text between <si> tags, so including all the CRLF. As a consequence, the strings returned are not the good one (but it doesn't crash the reading, compare to the styles.xml problem explained above).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions