Open
Description
UPDATE:
Reported as libxml2
bug since xmllint
shows same behavior.
https://gitlab.gnome.org/GNOME/libxml2/-/issues/715
Long element names may cause some duplicate and count errors while parsing
ERROR: duplicated path: /xbrli:xbrl/in-bse-cg:AggregateValueOfSecurityProvidedDuringSixMonthsOfSecurityInConnectionWithLoanOrAnyOtherD
...
ERROR: 0 elements found with /xbrli:xbrl/in-bse-cg:AggregateAmountAdvancedDuringSixMonthsOfAnyLoanOrAnyOtherFormOfDebtToPromoterOrAnyOtherE xpath expression.
Original xpath: /xbrli:xbrl/in-bse-cg:AggregateAmountAdvancedDuringSixMonthsOfAnyLoanOrAnyOtherFormOfDebtToPromoterOrAnyOtherE
Cause:
tree.getpath(ele)
truncates the returned path to 110 characters.
Possible fix:
tree.getelementpath(ele)
Reproduce
Given an XML with namespaces
<root xmlns:ns="http://example.com/ns">
<ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x>1</ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x>
</root>
>>> from lxml import etree
>>> xtree = etree.parse('tmp.xml')
>>> root = xtree.getroot()
truncated path
>>> xtree.getpath(root.getchildren()[0])
'/root/ns:a0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123'
complete path with tree.getelementpath()
>>> xtree.getelementpath(root.getchildren()[0])
'{http://example.com/ns}a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x'
While trying to report an lxml
bug I found that it's an libxml2
bug as xmllint
shows the same behavior
xmllint --shell tmp.xml
/ > du
/
root
ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x
/ > setrootns
/ > setns default=
/ > setns ns=http://example.com/ns
/ > whereis //ns:a01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678z01234567890123456789012345678x
/root/ns:a0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123