Skip to content

pxx-13 Duplicate errors due to long element names #19

Open
@mluis7

Description

@mluis7

UPDATE:
Reported as libxml2 bug since xmllint shows same behavior.
https://gitlab.gnome.org/GNOME/libxml2/-/issues/715

Long element names may cause some duplicate and count errors while parsing

ERROR: duplicated path: /xbrli:xbrl/in-bse-cg:AggregateValueOfSecurityProvidedDuringSixMonthsOfSecurityInConnectionWithLoanOrAnyOtherD
...
ERROR: 0 elements found with /xbrli:xbrl/in-bse-cg:AggregateAmountAdvancedDuringSixMonthsOfAnyLoanOrAnyOtherFormOfDebtToPromoterOrAnyOtherE xpath expression.
Original xpath: /xbrli:xbrl/in-bse-cg:AggregateAmountAdvancedDuringSixMonthsOfAnyLoanOrAnyOtherFormOfDebtToPromoterOrAnyOtherE

Cause:
tree.getpath(ele) truncates the returned path to 110 characters.

Possible fix:
tree.getelementpath(ele)

Reproduce
Given an XML with namespaces

<root xmlns:ns="http://example.com/ns">
    <ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x>1</ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x>
</root>
>>> from lxml import etree

>>> xtree = etree.parse('tmp.xml')
>>> root = xtree.getroot()

truncated path


>>> xtree.getpath(root.getchildren()[0])
'/root/ns:a0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123'

complete path with tree.getelementpath()


>>> xtree.getelementpath(root.getchildren()[0])
'{http://example.com/ns}a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x'

While trying to report an lxml bug I found that it's an libxml2 bug as xmllint shows the same behavior

xmllint --shell tmp.xml
/ > du
/
root
  ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x
/ > setrootns
/ > setns default=
/ > setns ns=http://example.com/ns
/ > whereis //ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x
/root/ns:a0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions