Make sel.xpath('.') work the same for text elements #130

Gallaecio · 2018-12-18T05:54:06Z

Given:

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
...         <body>
...             <h1>Hello, Parsel!</h1>
...         </body>
...         </html>""")

For text, you get:

>>> subsel = sel.css('h1::text')
>>> subsel
[<Selector xpath=u'descendant-or-self::h1/text()' data=u'Hello, Parsel!'>]
>>> subsubsel = subsel.xpath('.')
>>> subsubsel
[]

However, regular elements work as you would expect:

>>> subsel = sel.css('h1')
>>> subsel
[<Selector xpath=u'descendant-or-self::h1' data=u'<h1>Hello, Parsel!</h1>'>]
>>> subsubsel = subsel.xpath('.')
>>> subsubsel
[<Selector xpath='.' data=u'<h1>Hello, Parsel!</h1>'>]

I believe text elements should work the same. '.' should select them if they are the current element.

The text was updated successfully, but these errors were encountered:

redapple · 2018-12-18T09:53:23Z

Hey @Gallaecio , I'd also want to see this.
Also, I believe the issue is with lxml and not libxml2 (and not parsel either): lxml text nodes do not accept further XPath calls (you can only call .getparent() on the "smart strings" results -- note that "smart_strings" are disabled by default in parsel), while libxml2 allows XPath operations on text nodes:

>>> import libxml2
>>> doc = libxml2.htmlParseDoc('''<html>
... <head>
... <meta charset="UTF-8">
... <title>Title of the document</title>
... </head>
... 
... <body>
... Content of the document......
... </body>
... 
... </html>''', 'ascii')
>>> doc
<xmlDoc (None) object at 0x7ff070272680>
>>> ctxt = doc.xpathNewContext()
>>> res = ctxt.xpathEval("//text()")
>>> res
[<xmlNode (text) object at 0x7ff0702a2560>, <xmlNode (text) object at 0x7ff071d95320>]
>>> res[0].get_content()
'Title of the document'
>>> for t in res:
...     print(t.xpathEval("parent::*"))
... 
[<xmlNode (title) object at 0x7ff07025e7e8>]
[<xmlNode (body) object at 0x7ff07025e878>]
>>>

If you know Cython, it could be a nice addition to lxml to support this

redapple · 2018-12-18T10:39:11Z

Related: https://bugs.launchpad.net/lxml/+bug/996134

Gallaecio added the enhancement label Jul 11, 2019

Gallaecio added the upstream issue label Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sel.xpath('.') work the same for text elements #130

Make sel.xpath('.') work the same for text elements #130

Gallaecio commented Dec 18, 2018 •

edited

Loading

redapple commented Dec 18, 2018 •

edited

Loading

redapple commented Dec 18, 2018

Make sel.xpath('.') work the same for text elements #130

Make sel.xpath('.') work the same for text elements #130

Comments

Gallaecio commented Dec 18, 2018 • edited Loading

redapple commented Dec 18, 2018 • edited Loading

redapple commented Dec 18, 2018

Gallaecio commented Dec 18, 2018 •

edited

Loading

redapple commented Dec 18, 2018 •

edited

Loading