-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SelectorList.drop() removing elements doesn't work as expected #297
Comments
Could someone help me understand why this is happening? |
I've figured out why this is happening. If you perform a drop operation on a Selector that's been created from JSON in Scrapy, it cannot correctly handle the DOM. However, if you extract the HTML text from the JSON and reconstruct the Selector, this issue does not occur. This seems to be a bug in Parsel's Selector implementation.
|
When using the .xpath method to create nodes from a text type selector, it appears that these nodes are actually copies generated from the text, rather than being generated based on the original root node. As a result, when executing the .drop method, it doesn't affect the content of the original HTML tree. This happens mostly when using jmespath and xpath in combination This process is quite subtle. To make the .drop operation effective, we need to call .xpath(".") to generate a new HtmlSelector. Only then does the .drop operation work as expected on it. This behavior is not intuitive and could potentially lead to confusion or unexpected results. I believe it would be beneficial to either adjust this behavior or clarify it in the documentation to prevent future confusion.
|
Refs #298 |
I'm trying to remove the 'style' tag from the element using
selector.xpath(".//script|.//style").drop()
. However, even after executing this line of code, the 'style' element still exists in the DOM.Here's url:
https://newsinfo.eastmoney.com/kuaixun/v2/api/content/getnews?newsid=202406083099747443&newstype=1
The text was updated successfully, but these errors were encountered: